A Method to Identify the Incomplete Framework of Discernment in Evidence Theory

Dempster-Shafer evidence theory is a very useful tool in dealing with the decision-making of uncertain information. However, the classical evidence theory is no longer applicable when the frame of discernment (FOD) is incomplete. Moreover, incomplete FOD is an important reason for the emergence of conflict. So it is necessary to identify whether the FOD of the system is complete or not. In this paper, a method is proposed to identify the incomplete FOD under framework of the generalized evidence theory dealing with incomplete information. Within the proposed method, pieces of evidence are generated from the attributes of each sample firstly; then three criteria are used to identify weather the FOD is incomplete according to these evidence. The main parameters of the criteria are the amount of 0 being a focal element in generated evidence, the mass of 0 in the weighted average evidence from generated evidence, and the mass of 0 in the combination of generated evidence. Some experiments are used to demonstrate the effectiveness of the proposed method.


Introduction
Dempster-Shafer evidence theory (D-S evidence theory) [1,2] is widely used in many fields such as decision-making [3][4][5][6], evidential reasoning [7,8], uncertainty measure [9,10], and others [11][12][13][14][15] because of its advantages in handling uncertainty information.This theory is also widely used in practical applications, such as fault diagnosis [16,17], knowledge acquisition [18], risk and reliability analysis [19,20], and failure mode [21][22][23].While the counterintuitive results can be obtained when the given evidences highly conflict with each other, hundreds of methods have been proposed to address this issue [24].In D-S evidence theory, conflict management is still an important issue.In general, there are two main reasons that may lead to conflict: one is the incomplete frame of discernment (FOD) and the other is that the sensors are disturbed.In order to better implement the combination of conflicting evidence, it is necessary and important to identify whether the FOD is complete or not.
In previous studies, Lefevre et al. [25] proposed a unified belief function combination method to manage the conflict, mainly considering the issue of conflict redistribution.Haenni's view is to get the pretreatment evidence and then use Dempster combination rule to manage the conflict [26].Murphy [27] presented a method to combine belief functions named averaging to balance multiple evidence.But it does not offer convergence toward certainty.Based on this, an improved method is presented in [28].While these studies ignored the fact that the incomplete FOD is also an important cause that may lead to the conflict, for that, Smets and Kennes [29] proposed the TBM model under the open world assumption.Recently, a generalized evidence theory was presented in [30], addressing the combination of conflicting evidence in open world.It greatly expands the application of evidence theory like fuzzy [31][32][33] and game theory [34] and jumps out of the original model in dealing with conflicts.But the research aforementioned also did not tell under which condition the system is in incomplete FOD.According to previous studies, the research on identifying the incomplete FOD is still an open issue and has not been given deserved attention.
In this paper, a method is proposed to identify the incomplete FOD under framework of the generalized evidence theory.Generalized evidence theory [30] is a novel theory which can express and deal with uncertain information in an incomplete FOD.In consideration of empty set 0 can express the information of incomplete FOD; three parameters of 0 are used in the proposed method.They are the mass of 0 in the weight average evidence from generated pieces of evidence, the amount of 0 being a focal element in generated pieces of evidence, and the mass of 0 in the combination of generated pieces of evidence.Within the proposed method, pieces of evidence can be generated from the attributes of each sample firstly; then three criteria are used to identify weather the FOD is incomplete according to these evidence.
In this paper, a method to identify the incomplete FOD, which takes into consideration the information in both the evidence and the samples, is proposed.The method uses the correlation coefficient  BPA [35] which has a better performance compared with other coefficients to express the similarity of evidence.From three aspects the proposed method collects the information about the FOD of the system, respectively, the mass of 0 in one piece of evidence, the mass distribution of 0 in weighted average evidence, and the mass distribution of 0 in combination result.Some experiments are used to demonstrate the effectiveness of the proposed method.The experiments show that, for a collected sample, if the criteria can be satisfied, the system is seen incomplete.If it is not satisfied, it is seen as a complete FOD.
The rest of this paper is organised as follows.In Section 2, the preliminaries about D-S evidence and generalized evidence theory are briefly introduced.Section 3 presents the proposed method with three criteria.In Section 4, some experiments are shown to demonstrate the effectiveness of our method.An application about the motor rotor fault diagnosis is shown in Section 5. Finally, a brief conclusion is made in Section 6.

Dempster-Shafer (D-S) Evidence
Theory.D-S evidence theory is introduced by Dempster [1] and then developed by Shafer [2].Owing to its outstanding performance in uncertainty model and process, this theory is widely applied to decision-making, optimization and reliability, and risk analysis.
Definition 1.Let Θ be a finite nonempty set of mutually exclusive hypotheses, indicated by where set Θ is called a frame of discernment.The power set of Θ, 2 Θ , is indicated as follows: which satisfies the following condition: When () > 0,  is called a focal element of the mass function.
Definition 3. Evidence combination in D-S evidence theory is noted as ⊕.Assume that there are two BPAs indicated by  1 and  2 ; the evidence combination of the two BPAs with Dempster's combination rule [1] is formulated as follows: where reflects the conflict between the two BPAs  1 and  2 .
When given  ( > 2) pieces of evidence, the evidence fusion with Dempster's combination rule can be shown in (8).It attributes to the merit of the commutativity and associativity of the combination rule Recently, Jiang proposed a correlation coefficient [35] to measure the degree of evidence.Definition 4. For a discernment frame Θ with  elements, suppose the mass of two pieces of evidence denoted by  1 ,  2 .A correlation coefficient is defined as follows: where ( 1 ,  2 ) is degree of correlation denoted as then   is the generalized basic probability assignment (GBPA) of the FOD .
The difference between GBPA and classical BPA is the restriction of (4), which means the empty set should also be regarded as a focal element and represents the union of the focal element out of the given FOD.And if (0) = 0, the GBPA degenerates to a classical BPA.
The same as GET, TBM model also assigns mass to empty set to represent unknown information.But the difference is the generation process of the mass of empty set.TBM model is simply to remove the normalization process of the Dempster's combination rule and assigns the value of conflict coefficient  to empty set.When generating evidence, the mass of empty set is still 0, while in GET, when generating evidence, mass can be assigned to empty set.This means there is no the restriction of (0) = 0 (4).Definition 6.Given two GBPAs ( 1 and  2 ),  1 (0) and  2 (0) are regarded as conflict with each other and the mass distribution of (0) should be assigned to conflict.The generalized combination rule (GCR) is defined as follows: with Equation ( 13) defines the generalized conflict coefficient, and when (0) = 0 means the framework of discernment is complete, the generalized conflict coefficient degenerates to a classical conflict coefficient.
While Jiang and Zhan pointed that there are two shortcomings of GCR in [30], one is the way to obtain that (0) is unreasonable and lacks specific physical meaning.The other is that the way to obtain generalized conflict coefficient  in ( 16) is not consistent with the GBPA.So the modified generalized combination rule (mGCR) in GET was proposed in [30].Definition 7. In mGCR,  1 (0) and  2 (0) are considered as a support for 0. The orthogonal sum of  1 (0) and  2 (0) should also be normalized like other focal elements.Given two GBPAs ( 1 and  2 ), the mGCR is defined as follows: with Also the distance between two bodies of evidence based on GET is proposed the same as in D-S evidence theory.Definition 8. Let  1 and  2 be the two GBPAs on the framework of discernment ; the distance between  1 and  2 can be defined as follows: where  is an 2  × 2  dimension matrix and its element is expressed as follows: and its computing method is where Equation ( 19) can be used in the situation when the frame of discernment is complete, and the result is similar to the distance using the definition in [36].

The Proposed Method
Generally speaking, the empty set 0 indicates that no elements are included.In the classical evidence theory, there is no mass assignment to 0. While in GET [30], it is considered to indicate the elements that are not in the framework, which presents the information that is out of the FOD.Based on this idea, a method mainly employing the mass of 0 is proposed under the GET framework to identify the incomplete FOD.
An incomplete FOD means that there are targets or classes or anything else that are not included in the current FOD.Let us consider a classification problem.Assume it is known that all known samples belong to  classes  1 ,  2 , . . .,   which constitute a FOD Θ = { 1 ,  2 , . . .,   }.And it is assumed that each sample has  attributes.Now a new sample  is obtained.How can we identify the completeness of FOD Θ according to ?In this paper, for the sample , at first  GBPAs which allow the empty set to have mass (i.e., (0) > 0), denoted as  1 ,  2 , . . .,   , will be generated from the  attributes.Then, three criteria are used as follows.
Otherwise, FOD Θ is said to be complete.This criterion illustrates that if the amount of the initial generated GBPA of 0 whose value exceeds 0.5 is more than half of the quantity of evidence, it is considered in an incomplete FOD.There is a physical meaning of the criterion.Firstly, the parameter (0) of a generated GBPA represents the evidence's confidence to support the incomplete FOD.That is because 0 is treated as a focal element which expresses the elements that are not in the FOD.That is to say if the distribution of (0) is larger, there is a larger support that the FOD is incomplete.Then assuming 0.5 is a threshold, if (0) exceeds the threshold, this evidence is judged supporting an incomplete FOD.Therefore, the amount of evidence which satisfies   (0) > 0.5, equivalent to ∑  =1   , is used to identify the FOD.If the criteria can be satisfied, which says more than half of the evidence support  is out of the FOD, the FOD is incomplete.
The criterion indicates if the mass of 0 in weighted average evidence  is more than 0.5, it is considered in an incomplete FOD.In this criterion, a weighted average of  1 ,  2 , . . .,   , namely, , is calculated by considering that the evidence generated from different attributes should have different importance.(0) represents the total support of the incomplete FOD, taking the correlativity and difference of the generated evidence into account.That is to say the value of (0) also is an information to identify the FOD.And if the value of (0) is larger, there is a larger support that the FOD is incomplete.Assuming 0.5 is a threshold, if (0) > 0.5, it is judged as supporting an incomplete FOD.
As shown above, this criterion is based on the weighted average evidence .In this paper, Deng's approach given in [28] is used to obtain .The process is given as below.
Step 1.For each pair of generated GBPA   and   , the similarity between   and   is denoted as Sim  .Deng proposed to calculate Sim  based on the distance of evidence.While Jiang discussed in [35] that the correlation coefficient she proposed has a better performance compared with distance of evidence, so in this paper the correlation coefficient is used to measure the similarity measure Sim  .Definition 9. Let  be a frame of discernment (FOD) in an open world, containing  mutually exclusive and exhaustive hypotheses.The similarity measure Sim  is expressed as where where ,  = 1, 2, . . ., 2  ;   ,   is the focal elements of mass, respectively; and | ⋅ | is the cardinality of subset, especially, |0 ∩ 0|/|0 ∪ 0| = 0.
Step 2. For the  generated GBPA, we can calculate the similarity measure Sim  between   and   (,  = 1, . . ., ).So a similarity measure matrix (SMM) can be constructed to give the insight into the agreement between the pieces of evidence: Step 3.After obtaining the similarity measure matrix SMM, the support degree sup(  ) of each evidence   (,  = 1, 2, . . ., ) is defined by Then, the credibility degree Crd  of evidence   (i.e.,  = 1, 2, . . ., ) is obtained For each piece of evidence its credibility degree is seen as its weight.
Step 4. Finally, the modified weighted average evidence  is given as Once  is obtained, (0) can be known as well.According to this criterion, if (0) > 0.5, it can be judged that the FOD is incomplete.
This criterion shows that if the mass of 0 in the combination of generated pieces of evidence with mGCR is more than 0.5, it is considered in an incomplete FOD.That is because the parameter  mGCR (0) represents the pieces of evidence's total confidence of the incomplete FOD, and the combination rule takes all generated pieces of evidence into account to get a final result to identify which set the sample belongs to.If the assignment of  mGCR (0) is large, which means the assignment of other set will be inversely small, it is supporting the set 0 and the incomplete FOD.Assuming 0.5 is a threshold, if  mGCR (0) > 0.5, this system is judged supporting the incomplete FOD.

A Numerical
Example for the Three Criteria.The proposed method can be used to identify the incomplete FOD, as long as we obtain the three parameters ∑  =1   , (0),  mGCR (0).In this subsection, an illustrative example is given to show the identification result according to the three criteria.
Steps 1 and 2. The similarity measure matrix (SMM) can be calculated with ( 22) and ( 24 (29) Step 3. Then the weights of four pieces of evidence  1 ,  2 ,  3 ,  4 are calculated according to ( 25) and ( 26 Step 4. Finally, the weighted average evidence  of the four pieces of evidence is Therefore, according to Criterion 2, (0) = 0.6313 > 0.5, which shows that the FOD is still incomplete.At last, let us use Criterion 3 to identify the completeness of Θ.By using mGCR to combine the four pieces of evidence, the result is where  mGCR (0) = 1 > 0.5.According to Criterion 3, the FOD Θ is incomplete.To express intuitively, these results are all shown in Table 1.As shown in this example, evidence  1 supports that this sample belongs to element  with 0.7988 mass of distribution; other evidences  2 ,  3 ,  4 support the sample belonging to 0 which represents the element out of the FOD Θ.If all these evidence are not disturbed, human will consider this sample is in an incomplete FOD intuitively.From Table 1, ∑ 4 =1   = 3 > /2 = 2, (0) = 0.6313 > 0.5,  mGCR (0) = 1 > 0.5; three criteria are all satisfied; this sample is in an incomplete FOD as these evidences imply.

Case Study
In this section, several experiments are given to demonstrate the effectiveness of the proposed method.Iris data set, a popular data set for classification problem, is used in modeling and simulation in these experiments.In this data set, 150 samples belong to three categories, namely, setosa (), versicolor (), and virginica ().Each category has 50 samples.Every sample in the data set has four attributes: sepal length (PL), sepal width (SW), petal length (PL), and petal width (PW).Now the data set is divided into two parts; one is the training set which includes 90 samples that are randomly selected from the three categories with equal quantity and the other is the test set which contains 60 samples from the three categories.In order to establish an incomplete FOD Θ  = {, }, we only abstract samples belonging to categories  and  from the training set to construct the training model.By using the method mentioned in [37], a triangular fuzzy number model for each attribute of categories  and  is constituted according to the minimum, mean, and maximum value of each attribute in training set.Then for each attribute two triangular fuzzy numbers associated with categories  and  are generated to form a membership function for each attribute.The relevant values are listed in Table 2 and associated training modes are shown in Figure 1.

Mathematical Problems in Engineering
Then, based on the training models shown in Figure 1, four pieces of evidence can be generated for each sample in the test set because there are four attributes for an Iris sample.The method in [38] to generate GBPA is used in this paper.Based on the mentioned above for each sample in the test set pieces of evidence (i.e., GBPAs) associated with attributes can be obtained, as shown in Figures 2-4.
Now four experiments are carried out to verify the effectiveness of the proposed criteria of identifying incomplete FOD.Three of them will consider a single criterion, while the last one will use all three criteria simultaneously.Experiment 2. Only consider Criterion 2. For the generated GBPAs of each sample in the test set, use the method proposed in Section 3.2 to derive the average weighted evidence .Then (0) is obtained as well, according to Criterion 2, judging the completeness of FOD Θ  .
Experiment 3.Only consider Criterion 3. Use mGCR to combine the generated evidence of each sample to derive  mGCR .Then  mGCR (0) is obtained as well.According to Criterion 3 the completeness of FOD Θ  can be judged.
Experiment 4. In this case, Criterions 1, 2, and 3 are simultaneously considered.According to the method in Section 3 to derive the parameters of three criteria, judge whether the FOD Θ  is incomplete or not.If all three criteria are satisfied, the FOD Θ  is incomplete.If one of the criteria cannot be satisfied, the FOD Θ  is complete.
In order to clearly show the results of these experiments, the confusion matrix [39] is used containing the information about actual and identified situation.Based on the matrix, some indices, for instance, accuracy, sensitivity (also called recall), and precision, have been developed to evaluate the performance of each criteria.For each sample of  and , if it does not meet the criteria, it means FOD Θ  is complete, which is correct.If it does, it is considered that Θ  is not complete which is incorrect, while, for each sample of , it is recognized correctly only when the result supports that Θ     experiment occupies the highest accuracy for the identification, which shows that the simultaneous consideration of the three criteria could lead to the best accuracy compared with just simply considering one criterion.As also can be found from Table 8, Criterion 2 has the best performance in accuracy among the three criteria.That is because it takes the correlativity and difference of evidence into account.
In order to give a clear comparison to the information of the results, Figures 5-7 graphically show the results of recall, precision, and accuracy rate, derived from different experiments.It is found that in every experiment the recall rate of   is greater than that of  , which says for each actual situation there is a greater possibility of judging correctly for the samples out of the FOD.And precision rate of   smaller than that of   means the identification result is more accurate compared with the  .
Figure 7 shows that Experiment 4 which simultaneously considers the three criteria has the best performance in accuracy.There are several reasons for this result.For example, in identifying the complete FOD the condition is weakened which means three criteria do not need to be simultaneously satisfied.According to Table 8, the accuracy rate for Criterion 3 is lower compared with other experiments.So this condition eliminates some of Criterion 3 influence.
In a word, for a sample FOD can be identified with a high degree of accuracy.According to this, the proposed method is proved to be effective.Moreover, if all these three criteria can be satisfied, the system identified in incomplete frame of discernment is correct with 88% of the accuracy.

Application
In this section, the case of the motor rotor fault diagnosis is shown to demonstrate the effectiveness of the proposed method.In this case, we can obtain the rotor acceleration spectrum and the time domain vibration displacement average amplitude according to the sensor data.Then we can judge which kind of fault is the rotor based on in the proposed method.
There are three kinds of faults for the motor rotor: imbalance (), misalignment (), and loose support base ().We select the the frequency to be at an amplitude of   are generated.The same as the above experiments, there are four experiments taking into the criteria, respectively, and simultaneously.Then average identification results are shown in Figure 9, as well as the recall, precision, and accuracy rate based on the confusion matrix.
In Table 9, we can observe the same conclusion as the fourth experiment occupies the highest accuracy for the identification.Moreover for each experiment the accuracy is quiet high, and particularly all samples out of the FOD can be identified correctly in this application.The experiment results prove the effectiveness of the proposed method.

Conclusions
As stressed in previous studies, the method to identify the incomplete frame of discernment is still not presented.
And the incomplete framework is an important reason for the emergence of conflict.In this paper, a new method is proposed under the framework of GET, making full use of the available information contained in the generated pieces of evidence from a sample, to identify the incomplete frame of discernment.The case study of four experiments demonstrates the effectiveness of the proposed method.And in the experiments, three criteria parameters have a great influence on identifying the incomplete FOD.
The proposed method can be applied to many applications, such as fault identification and infectious disease surveillance.However, the incorrect identification result may be obtained because of the inaccuracy membership function.In the future, the proposed criteria of identifying incomplete FOD will be merged with the combination of highly conflicting pieces of evidence to obtain more reasonable combination result.

Figure 1 :
Figure 1: Training models associated with four attributes of Iris data set.

Experiment 1 .
Only consider Criterion 1.For each sample in the test set, we can calculate ∑ 4 =1   .Then use Criterion 1 to judge whether FOD Θ  is incomplete or not.

Figure 2 :Figure 3 :
Figure 2: Generated GBPAs associated with different attributes for the samples belonging to category  in the test set.

Figure 4 :
Figure 4: Generated GBPAs associated with different attributes for the samples belonging to category  in the test set.

Figure 7 :
Figure 7: Accuracy rates of different situations in every experiment.

Figure 8 :
Figure 8: Training models associated with four attributes of motor rotor fault.

Figure 9 :
Figure 9: Generated GBPAs associated with different attributes for the samples belonging to fault  in the test set.

Figure 10 : 14 MathematicalFigure 11 :
Figure 10: Generated GBPAs associated with different attributes for the samples belonging to category  in the test set.

Table 1 :
The criteria parameters for Example 10.

Table 2 :
Minimum, mean, and maximum value of each attribute in training models.

Table 3 :
The quantity of correct identification in the experiments sorted by categories.
is incomplete.Therefore, for each sample, it either implies FOD Θ  is complete or supports FOD Θ  is incomplete.The simulation results of the four experiments are given in Table3.According to Table3, the identification for FOD Θ  is divided into two situations: complete FOD and incomplete

Table 4 :
Confusion matrix for Experiment 1.Based on the experiment results, four confusion matrixes are derived, shown in Tables4-7.And the recall rate, precision rate, and the overall accuracy rate can be calculated, as shown in Table8.As shown in the last column of Table8, the fourth FOD, simply denoted as   and  .

Table 8 :
Performance of criteria in the experiments.

Table 9 :
The average quantity of correct fault diagnosis in the experiments.