A Max-Term Counting Based Knowledge Inconsistency Checking Strategy and Inconsistency Measure Calculation of Fuzzy Knowledge Based Systems

The task of finding all the minimal inconsistent subsets plays a vital role in many theoretical works especially in large knowledge bases and it has been proved to be a NP-complete problem. In this work, at first we propose a max-term counting based knowledge inconsistency checking strategy. And, then, we put forward an algorithm for finding all minimal inconsistent subsets, in which we establish a Boolean lattice to organize the subsets of the given knowledge base and use leaf pruning to optimize the algorithm efficiency. Comparative experiments and analysis also show the algorithm’s improvement over past approaches. Finally, we give an application for inconsistency measure calculation of fuzzy knowledge based systems.


Introduction
A large knowledge system operating for a long time almost inevitably becomes polluted by wrong data that make the system inconsistent.Despite this fact, a sizeable part of the system remains unpolluted and retains useful information.It is widely adopted that a maximal consistent subset of a system contains a significant portion of unpolluted data [1].So, simply characterizing a knowledge base as either consistent or inconsistent is of little practical value, and thus ensuring the consistency becomes an important issue [2][3][4].
In practice, there are two types of methods: one method is based on minimal inconsistent subsets, where every strict subset is consistent and the other is directly based on maximal consistent subsets.Actually, the relationship between minimal inconsistent subsets and maximal consistent subsets was discovered separately in [1,5,6], which is known as the hitting subset problem [7].
As finding minimal inconsistent subsets or maximal consistent subsets is NP-complete, the most efficient algorithm is not known yet, and there are a number of heuristic optimizations that can be used to substantially reduce the size of the search space.In practice, heuristic information [8,9], optimization [10,11], and hybrid techniques [12,13] are recognized to reduce time complexity.In the latest research, McAreavey et al. presented a computational approach to finding and measuring inconsistency in arbitrary knowledge bases [14], while Mu et al. gave a method for measuring the significance of inconsistency in the viewpoints framework [15].In all the abovementioned works, effectively finding minimal inconsistent subsets is the critical step which has a great impact on the applications especially for large knowledge bases.Apparently, its computational complexity depends on the underlying strategies used for checking the consistency of subsets of the knowledge base, but till now this important issue has not gotten a satisfying solution.
In this paper, we first propose an efficient strategy to check the consistency of a given knowledge base.And, then, we put forward an algorithm to find all of the minimal inconsistent subsets of the given knowledge base.Thereafter, to illustrate the algorithm's improvement, we conduct thorough comparative experiments and analysis with respect to one of the latest proposed algorithm MARCO [16] and give a discussion on the relative algorithms DAA [17] and PDDS [18].Finally, we give an application for inconsistency measure calculation of fuzzy knowledge based systems.

Theoretical Basis
Let  denote the propositional language built from a finite set of variables  using logical connectives {∧, ∨, ¬, → } and logical constants {, }.Every variable  ∈  is called an atomic formula or an atom.A literal is an atom or its negation.A clause  is a formula restricted to a disjunction of literals and let var() denote the set of variables in a clause .A knowledge base  ∈ 2  is a finite set of arbitrary formulae.
As every formula can be converted into an equivalent conjunction normal form (CNF) formula, knowledge base can be normalized in such a way that every formula contained in it is a clause.For a given normalized knowledge base, if there are no redundant clauses, we say it is an optimized knowledge base.
By the syntactic approach in proof theory, if both  and ¬ can be derived from a knowledge base , then we say  is inconsistent.With the semantic approach in model theory, an interpretation or world  is a function  :   → {, } from  to the set of Boolean values {, }.Let 2  denote the set of worlds of .A world  is a model of , denoted as  ⇒ , iff  is true under  in the classical truth-functional manner.Let mod() denote the set of models of ; that is, mod() = { :  ∈ 2  |  ⇒ }.We say that  is satisfiable iff there exists a model of .Conversely,  is unsatisfiable iff there are no models of .These two approaches coincide in propositional logic; that is, a knowledge base  is consistent iff  is satisfiable.
In the following discussion, let the Greek lower case letters , , . . .be formulae from  and English lower case letters , , . . .variables from .Definition 1.For a Boolean function of  variables  1 , . . .,   , a sum term in which each of the  variables appears once (in either its complemented or uncomplemented form) is called a max-term.Proposition 2. Let  1 , . . .,   be  variables and  0 , . . .,  2  −1 the 2  different max-terms built on these  variables.

An Algorithm for Finding All Minimal Inconsistent Subsets
In this section, at first we propose an algorithm for finding all nominal inconsistent subsets via Boolean lattice.And, then, we give an illustrative example and a thorough comparative study with algorithm MARCO by using the number of visited subsets as the benchmark.Besides this, we also give a discussion on relative algorithms DAA and PDDS.

3.1.
Algorithm.An algorithm to find the minimal inconsistent subsets of a given knowledge system must check each of its subsets for inconsistency.One way to proceed is to construct a Boolean lattice of subsets of the given knowledge system, which is initially used by Bird and Hinze in the process of finding the maximal consistent subsets [19].
Figure 1 sketches a three-variable Boolean lattice, where all the labels of the nodes consist of the power set of set {, , }.
In Algorithm 1, a Boolean lattice is also established and leaf pruning is adopted to optimize the algorithm efficiency.Because the cardinality of the subsets at each level is smaller than those on the level above it, a breadth-first search of the lattice will consider all smaller sets before any larger ones.Apparently, leaf pruning strategy can be used based on the fact that if a node denotes an minimal inconsistent subset, then all of its ancestors are inconsistent; dually, if a node denotes a consistent subset, then all of its descendents are consistent.According to Theorem 7, directly computing  is time consuming, so we will store the intermediate calculation results.For example, in Algorithm 1, for each visited node (whose corresponding formula set is {  |  ∈ }), we will store the value of | ⋂ ∈ ext(  )|.Apparently, when computing the  value of -degree node, the stored value | ⋂ ∈ ext(  )| of each ( − 1)-degree nodes can be reused to save time cost.
In the minimal inconsistent subsets finding algorithm proposed in [14], which is derived indirectly on maximal consistent subset, there exists a disadvantage that while getting maximal consistent subset, pseudo-maximal consistent subset will be generated [11,15].As our proposed algorithm always checks the smaller sets before the larger sets, so it can overcome this problem.
If the maximal cost for checking inconsistency of is , the complexity of this algorithm is (2 || ) in the worst case.In the following, by using experiment, we will show the relationship between the number of subsets that were checked for inconsistency and the size the of the knowledge base with respect to different probabilities that two formulas are consistent.
In the experiment, we use generator GENBAL [20] to generate knowledge bases.The graphs in Figures 2 and 3 show the number of subsets that were checked for inconsistency related to ||, the number of clauses contained in the given normalized knowledge base and , and the probability that two formulas are consistent.All counts are averaged across 100 randomly generated formulae by using GENBAL.
From Figures 2 and 3, we can see that larger values for  mean that more subsets will be checked.Moreover, it is easy to show that larger values for  generally also lead to fewer and smaller minimal inconsistent subsets.

An Illustrative Example and Comparative Study.
Apart from our proposed method, there are many other solvers for computing minimal inconsistent subsets.One of the latest published algorithm is MARCO [16], which adopts the most recent advances.At first we give an illustrative example and then we compare our method with MARCO.
At first we also establish a Boolean lattice, which is shown in Figure 4.
It is apparent that in order to get all the minimal inconsistent subsets we have to judge the consistency of 6 sets, which are Fundamentally, the MARCO algorithm operates repeatedly: (i) Selecting an unexplored point in the power set lattice, a subset of  that we call a seed.
(ii) Checking the satisfiability of the seed.
(iii) Growing or shrinking it to an MSS or an MUS as appropriate.
(iv) Marking a corresponding region of the lattice as explored.
When we use algorithm MARCO, the consistency of 10 sets needs to be considered one by one, which are So, our algorithm performs better than MARCO with respect to the number of the visited sets.The difference between our algorithm and MARCO is that our algorithm traverses the Boolean lattice incremental according to the cardinalities of the sets while MARCO traverses the Boolean lattice randomly, as the function GetUnexplored() used in MARCO randomly returns any unexplored sets.As the objectives of both algorithms are to find all of the minimal sets which are inconsistent, the incremental feature of our algorithm brings a higher efficiency.
In the comparative study, we also use generator GENBAL [20] to generate knowledge base.Number of visited subsets is used as the benchmarks, as it is more objective than the other benchmarks.For example, we do not choose CPU times    as the benchmark as it is strongly affected by the running environment, including the status of both hardware and software.
The graphs in Figures 5, 6, and 7 show the number of subsets that were checked for inconsistency related to ||, the number of clauses contained in the given normalized knowledge base and , and the probability that two formulas are consistent.All counts are averaged across 100 randomly generated formulae by using GENBAL.All of Figures 5, 6, and 7 show that our algorithm performs better than MARCO with respect to the number of the visited sets.
DAA is another algorithm that exploits the hitting set duality between minimal correction sets (MCSes) and minimal unsatisfiable subsets [17].DAA uses the Grow subroutine on known-satisfiable subsets to produce maximal satisfiable subsets (MSSes) and their complementary MCSes and then computes minimal hitting sets of the MCSes found thus  far.PDDS, an approach closely related to DAA, was later proposed [18].The main differences are that PDDS takes an initial set of either maximal satisfiable subsets (MUSes) or MCSes as input, and PDDS does not necessarily compute all hitting sets of the MCSes at each iteration, avoiding the memory scaling issues of DAA.
The DAA and PDDS algorithms have the benefit that they are decoupled from the choice of hitting set algorithm.It is pointed out that the choice of the incremental algorithm presented by Fredman and Khachiyan [21] for computing hitting sets results in a version of the DAA algorithm with worst case runtime that is subexponential in the size of the output [22].And studies have shown that MARCO performs better than DAA and PDDS [16].

Inconsistency Measure Calculation for Fuzzy Knowledge Based Systems
In this section, we show an application of Algorithm 1 for inconsistency measure calculation of fuzzy knowledge based systems.Fuzzy knowledge based systems are a typical rule-based inference system for providing expertise over a domain, which is capable of drawing conclusions from given uncertain evidence [23].In fuzzy knowledge based systems, knowledge is represented by using possibilistic logic.
Let Δ = {( Definition 12 (see [24]).Let Δ be a fuzzy knowledge based system.If Δ is inconsistent, then its inconsistency measure is defined as  According to Theorem 13, we know that Δ is inconsistent and its inconsistency measure is 0.3.

Conclusions
The purpose of this paper is to find all the minimal inconsistent subsets of a given knowledge system.Initially we propose a max-term counting based knowledge inconsistency checking strategy.And, then, we put forward an algorithm for finding all minimal inconsistent subsets, in which we establish a Boolean lattice to organize the subsets of the given knowledge base and use leaf pruning to optimize the algorithm efficiency.Finally, we give a method for inconsistency measure calculation of fuzzy knowledge based system.
As in a fuzzy knowledge based system, there may be several statements in contradiction to each other; how to measure the significance of the inconsistency is a valuable problem for further study.

Figure 2 :
Figure 2: Number of subsets visited as a function of  for || = 15.

Figure 5 :
Figure 5: Number of subsets visited as a function of || for  = 0.3.

Figure 6 :
Figure 6: Number of subsets visited as a function of || for  = 0.6.

Figure 7 :
Figure 7: Number of subsets visited as a function of || for  = 0.9.
4. It is easy to see that carrying extension of  does not change the original meaning of . . . ., })) be two formulas built on a set of variables (|| = ), in which  0  =   ,  1  = ¬  .Then the following propositions hold: (i) If there exists a variable   such that one of   and ¬  appears in   and the other one appears in   , then |ext( 1 ) ∩ ext( 2 )| = 0. (ii) Otherwise, |ext( 1 ) ∩ ext( 2 )| = 2 −|var( 1 )∪var( 2 )| .Proof.(i) As there exists a variable   such that one of   and ¬  appears in  1 and the other one appears in  2 , then one of   and ¬  must appear in ext( 1 ) and the other one must appear in ext( 2 ), which makes ext( 1 ) differ from ext( 2 ).Therefore we have |ext( 1 ) ∩ ext( 2 )| = 0.