A Ranking Procedure by Incomplete Pairwise Comparisons Using Information Entropy and Dempster-Shafer Evidence Theory

Decision-making, as a way to discover the preference of ranking, has been used in various fields. However, owing to the uncertainty in group decision-making, how to rank alternatives by incomplete pairwise comparisons has become an open issue. In this paper, an improved method is proposed for ranking of alternatives by incomplete pairwise comparisons using Dempster-Shafer evidence theory and information entropy. Firstly, taking the probability assignment of the chosen preference into consideration, the comparison of alternatives to each group is addressed. Experiments verified that the information entropy of the data itself can determine the different weight of each group's choices objectively. Numerical examples in group decision-making environments are used to test the effectiveness of the proposed method. Moreover, the divergence of ranking mechanism is analyzed briefly in conclusion section.


Introduction
The increasing trend toward decision-making in various fields requires computational methods for discovering the preferences of ranking. In fact, methods for finding and predicting preferences in a reasonable way are among the very hot topics in recent science study, such as information systems [1], control systems [2], social choices [3,4], and so on.
The term "pairwise comparisons" generally refers to any process of comparing entities in pairs to judge which of each entity is preferred or has a greater amount of quantitative property. Prominent psychometrician Thurstone first introduced a scientific approach to use pairwise comparisons for measurement in 1927, which he referred to as the law of comparative judgment [5]. Thurstone demonstrated that the method can be used to order items along a dimension such as preference or importance using an interval-type scale. The Bradley-Terry-Luce (BTL) model was applied to pairwise comparison data to scale preferences [6,7]. The BTL model was identical to Thurstone's model if the simple logistic function was used. Thurstone used the normal distribution in applications of the model, which the method of pairwise comparisons was used as an approach to measuring perceived intensity of physical stimuli, attitudes, preferences, choices, and values. He also studied implications of the theory he developed for opinion polls and political voting [8]. If an individual or organization expresses a preference between two mutually distinct alternatives, this preference can be expressed as a pairwise comparison. If pairwise comparisons are in fact transitive, then pairwise comparisons for a list alternatives ( 1 , 2 , 3 , . . . , −1 , ) can take the form and it means that the alternative is preferred to , in which < . Or the alternatives can be expressed as and it means that the alternative is strictly preferred to , if < .
Although pairwise comparison is a well-known technique in decision-making, in some cases we have to be faced with the problem of incomplete judgement to preference. For instance, if the number of the alternatives is large, the experts may not give the full comparison one by one. In order 2 The Scientific World Journal to overcome such problem, a decision support system (DSS) based on fuzzy information axiom (FIA) is developed in [9]. However, calculation procedure of information axiom is not only incommodious but also difficult for decision makers and it is hard to deal with the incomplete pairwise problems. For the purpose of reducing the complexity of calculation and preference eliciting process, [10] points out that some comparison between alternative can be skipped and a method is proposed to derive the priorities of alternatives from an incomplete × pairwise comparison matrices in [11]. Furthermore, [12] introduces a fuzzy multiexpert multicriteria decision-making method in possibility measure to handle the difficulty of conflict aggregation process. Moreover, the well-known methods of eigenvector or geometric mean are used to ranking pairwise comparison also introduced in [11,13]. But all of the above references only focus on the complexity of calculation with the complete comparison. However, some methods have been proposed to solve the incomplete comparison problem. Shiraishi et al. proposed a heuristic method which is based on a property of a coefficient of the characteristic polynomial of pairwise comparison matrices [14]. But the solving is mainly depending on the polynomial which has infinitely many solutions, and it is difficult to get the best candidate. In [15], a least squares type method is proposed to directly calculate the priority vector as the solution of a constrained optimization problem instead of calculating the missing entries of pairwise comparison matrices. In [16], a new centroid-index ranking method of fuzzy number in decision-making was proposed by Yong and Qi. He also proposed a method to concern with the ranking of decision alternatives based on preference judgments made on decision alternatives over a number of criteria to Multiplecriteria decision-making (MCDM) problem in [17]. And in fuzzy group decision-making, an optimal consensus method, in which the limit of each expert's compromise was under consideration in the process of reaching group consensus, was proposed by Liu et al. in [18]. Similarly, in order to calculate the priority vector of incomplete preference, a fuzzy preference relation by a goal programming approach was proposed by Xu in [19], and then he developed a method for incomplete fuzzy preference [20]. Subsequently, the eigenvector method and least square method are proposed with incomplete fuzzy preference relation in [21,22]. The fuzzy decision-making can not only reduce the complexity of calculation, but also be used in incomplete pairwise comparison, but it is important that the little variety of preference will change the result mainly. Moreover, a method for learning valued preference structures using a natural extension of socalled pairwise classification was proposed by Hullermeier and Furnkranz, which may have a potential application in fuzzy classification [23,24]. A weighted voting procedure is used in the Hullermeier's proposed method, but the problem is that the voting procedure is always lost in the dead cycle.
In this paper, a discount rate is derived from the information entropy, which determines the certainty or uncertainty of the preference. Then an improved method to calculate the probability assignment of preference using an index introduced in [25] with the discount rate is proposed. Comparing to the current method, the incomplete comparison results of the proposed method are completely dependendent on the information itself instead of the human factor. The paper is organized as follows. Some fundamental and quantifying principles of Dempster-Shafer evidence theory are given in Section 2. Then, the ranking procedures by comparing with single group of experts and with independent groups are introduced in Section 3. In this section, the Dempster-Shafer evidence theory model and imprecise Dirichlet model, which give a basic ranking, are concerned. Next, the proposed enhanced method on the case when comparisons are supplied by independent groups of experts is presented in Section 4. In this section, a weight derived from the entropy of BPA's probability is used in Dempster-Shafer evidence theory and improves the existing ranking method mentioned in the previous section. Finally, the conclusions are given in Section 5.

Dempster-Shafer Evidence Theory.
Dempster-Shafer evidence theory can be divided into probability distribution function, plausibility function, and Dempster evidence combination rule [26,27]. It is rather flexible for many applied problems.
For ∀ ⊂ , ( ) is the basic probability of . The meaning of ( ) is that if ⊂ Ω and ̸ = Ω, thus ( ) is the accurate trust degree of ; if = Ω, thus ( ) means that the trust degree of can not be allocated accurately.  The relation between belief function and plausibility function is that Bel( ) and pl( ) are, respectively, referred to as the lower limit and the upper limit function of pl( ) ≥ Bel( ).
Even if they are the same evidences, the probability assignment might be different when they came from different sources. Then the orthogonal method is used to combine these functions by dempster-shafer evidence theory. Assume 1 , 2 , . . . , are the basic probability assignment functions of 2 Ω , and their orthogonal = 1 ⊕ 2 ⊕ ⋅ ⋅ ⋅ ⊕ are ( ) = 0, The Scientific World Journal 3 Several of the algorithms basic to Dempster-Shafer's evidence theory are as follows.
(1) It is known that if we assume frame of discernment of some field is Ω = 1 , 2 , . . . , and propositions , , . . . are the subsets of Ω, the inference rule shall be If then , CF among which , are the logic groupings of the proposition, CF is the certainty factor, which is measured by and called credibility. For any proposition , the certainty factor CF of credibility shall satisfy (2) Evidence description: assume is the defined basic probability assignment function of 2 Ω , then it shall meet the following conditions during calculation: ⊂ Ω, and | | > 1 or | | = 0 among which | | means the factor numbers of proposition .
(3) Inaccurate inference model: (a) suppose is one part proposition of regular condition. Under the condition of evidence , the matching degree of proposition and evidence is (b) the definition of part proposition in regular condition is

Quantifying the Uncertainty in the Dempster-Shafer
Evidence Theory. The uncertain factor considered in evidence theory includes both the uncertainty associated with randomness and the uncertainty associated with granularity. The measure of the granular uncertainty associated with a subset is the specificity measure introduced by [28].
Definition 4. Let be a nonempty subset of . A measure of specificity of , Sp( ), is defined, using Card to denote the cardinality of a set, as It essentially measures the degree to which has exactly one element. The larger the specificity, the lesser the uncertainty.
In [29], the measure to the case of a probability assignment function was extended.
Definition 5. Assume has focal element , = 1 to . Then can be denoted as the expected specificity of the focal elements.
Noting that Sp( ) ∈ [0, 1] and the larger the Sp( ), the less the uncertainty will be. It is clear that the specificity is smallest when is the vacuous belief function, ( ) = 1. In this case, Sp( ) = 0.
Klir [30] and Ayyub [31] introduced a related measure which he called nonspecificity. Definition 6. If is a belief function with focal element , = 1 to , then its nonspecificity is defined as where | | = Card( ).
It is clear that this takes its largest value ln( ), where is the cardinality of for the vacuous belief function. It takes its smallest value when is Bayesian where | | = 1; in this case ln(| |) = 0 and [Sp( )] = 0.
Yager [32] made this definition of nonspecificity cointensive with the preceding definition of specificity by normalization and negation; hence The standard measure of uncertainty associated with a probability distribution is the Shannon entropy.
. . , } and if is a probability distribution on such that is the probability of , then the Shannon entropy of is Any extension of this to belief function must be such that it reduced to the Shannon entropy when the belief structure is Bayesian. Yager [29] suggested an extension of the Shannon entropy to belief structures using the measure of dissonance.
Definition 8. If has focal element , then the extension of the Shannon entropy to belief structures is 4 The Scientific World Journal For a given set of weights, it can be noted that the smaller the focal elements, the larger the entropy. In particular, the smallest a nonempty set can be is one element. And as we knew, the larger the entropy, the larger the uncertainty. For a given set of weights, the smallest entropy for a Bayesian belief structure shall be one element, because one element always means very certainty information of the preference and the entropy is 0, and the largest entropy occurred when all elements have equal probability. From this, we can conclude that, for any uncertainty and certainty information, the fewer the element the smaller the entropy, and the sparser the probability of element the smaller the entropy. Figure 1 shows the relationship between the assignment of element and entropy, in which the more balance the assignment of element, the higher the entropy.

Ranking Procedure by Incomplete Pairwise Comparison
Here we suppose there is a set of alternative Λ = { 1 , 2 , . . . , } with elements. Then we can get 2 − 1 subset with elements as The pairwise comparison means that an expert has chosen some subset of alternatives from Ω and compared them pairwisely. If the paired comparisons have been done for all the subset in Ω, we call it complete pairwise comparison, otherwise, we call it incomplete pairwise comparison. For complete pairwise comparison, we always get the pairwise comparison matrix that looks like Table 1 in which we give two alternatives as an example. It is supposed that experts only compare subset of alternatives without providing preference value or weights of preference. If an expert chooses one comparison preference, then the value 1 is added to the corresponding cell in the pairwise comparison matrix. For example, 12 is the number of experts who choose the comparison preference { 1 } ⪰ There are many references [13,23,33] that have studied the complete pairwise comparison matrix. But in most occasions which have a lot of alternatives, the experts can not give the preference one by one. They only choose limited preferences of all the alternatives. Thus, we get the incomplete pairwise comparison matrix that looks like Table 2 3.1. The Ranking Method with One Group of Preference. We assume the experts choose the following preferences from Utkin's works [25]: five experts: Using the Dempster-Shafer evidence theory we described in previous section, we can get the belief function of the From the belief function and the plausibility function, it can be concluded that the best ranking of the alternatives is  Table 3: The subset of three alternatives and their short notations.

The Ranking Method with Two Independent Groups of
Preference. In fact, in order to obtain more consensus result of the preference, we always choose more than one independent group experts to give their preference of alternatives. Thus we can use the well-established method for combining the independent information with the Dempster-Shafer evidence rule of combination.
Here we suppose there are two groups of experts without loss of generality, and denote the preference obtained from the first and second groups of experts by upper indices (1) and (2), respectively. The combined rule refers to the contents introduced in preliminaries. Now we assume that the first group (with five experts) provides the following preferences: And the second group (with ten experts) provides the following judgements: 13 .
Then we get the BPA's of all preferences:  Table 4. According to the table, we calculate the weight of conflict first: Then the probability of the assignment for the nonzero combined BPA's preference can be calculated as follows: After that, we can get the belief and plausibility functions of alternatives 1 , 2 , 3 or preferences { } ⪰ Λ, = 1, 2, 3 as follows: The belief and plausibility functions of { 2 } ⪰ Λ and { 3 } ⪰ Λ are 0. So we can only get the result of preference from that the "best" alternative is 1 and can not get any information of preference about 2 and 3 . From [25], the author thought the main reason of the situation in previous example was the small number of expert judgements, and the other reason was the used assumption that the sources of evidence were absolutely reliable. So he improved the ranking method with the imprecise Dirichlet model.

The Ranking
Method with Imprecise Dirichlet Model. As we described at the final part of the last subsection, the main difficulty of the proposed ranking method is the possible small number of experts. In order to overcome this difficulty, an imprecise Dirichlet model (IDM) introduced by Walley [34] was applied to extend the belief and plausibility functions such that a lack of sufficient statistical data could be taken into account [35,36]. With the method, we can get the extended belief and plausibility functions as follows: Here the hyperparameter determines how quickly upper and lower probability of events converge as statistical data accumulate, and it should be taken to be 1 or 2; is the number of expert judgements. 6 The Scientific World Journal (1) 43 The second group The second group However, the main advantage of the IDM is that it produces the cautious inference. In particular, if = 0, then Bel ( ) = 0 and pl ( ) = 1. In the case → ∞, it can be stated for any : Bel ( ) = Bel( ), pl ( ) = pl( ). If we denote = /( + ), then there holds * ( ) = ⋅ ( ). One can see from the last expression for * ( ) that is the discount rate characterizing the reliability of a source of evidence and it depends on the number of estimates . Because the total probability assignment ∑ ( ) = 1, we assign the left probability to * ( 77 ); that means that the experts do not know which alternative is better than the others and is indicated by the preference { 1 , 2 , 3 } ⪰ { 1 , 2 , 3 }. By using the discount rate with = 1, we can get 1 = 5/6 ≃ 0.83 for the first group and 2 = 10/11 ≃ 0.91. Hence we can rewrite the preference intersection for Dempster-Shafer combination rule in Table 5.
And the modified probability assignment and conflict weight are Now the combined BPA in Table 5 In the same way, It can be seen from the above results that the "best" ranking is 1 ⪰ 3 ⪰ 2 .

Improved Method and Numerical Analysis
The main advantage of IDM method is that it allows us to deal with comparisons of arbitrary groups of alternatives. It gives the possibility to use the framework of Dempster-Shafer evidence theory and to compute the belief and plausibility functions of alternatives or ranking and provides a way to The Scientific World Journal 7 make cautious decisions when the number of expert estimates is rather small. However, the method depends excessively on the number of experts rather than data itself. If we change the number of experts, it may lead to a different result. For instant, we assume the number of first group of experts is 50, and 20 experts choose the preference { 1 } ⪰ { 2 , 3 } = (1) 16 , and 30 experts choose the preference { 1 , 2 } ⪰ { 3 } = (1) 43 . Then we get the same probability assignment for the first group of experts. The only change is the discount rate from 0.83 to 1 = 50/51 = 0.98. Here we recompute the probability assignment and conflict weight as follows: Now we recompute the combined BPA in Table 5 12 ( (12) 12 ) = In the same way, Thus we get that the "best" ranking is 1 ⪰ 3 ⪰ 2 by pessimistic decision-making and 1 ⪰ 2 ⪰ 3 by optimistic decision-making. Comparing with the result in previous subsection, we will find that we only change the number of experts and then lead to a conflicting result. We think the problem is that the discount rate depending on the number of experts is a subjective parameter. It can not reflect the objectivity of the probability assignment itself. So we obtain different result with the same probability assignment of the incomplete pairwise comparison.
In order to overcome the problem, we introduce an improved method to obtain the discount rate by using entropy which we describe in the Preliminaries section. Because the entropy can measure the uncertainty of the data, it can be applied to extend belief and plausibility function by indicating the sparser of the probability assignment and the fewer of the subset between different groups of pairwise comparison. In particular, if the preference has the probability assignment function with focal element , we define the entropy of the preference as Among the definition, = 2 − 1, and is the number of alternatives. Each group of experts can give their preference of alternatives with pairwise comparison, and then we can calculate every entropy of each group. Because the more uncertain the probability assignment function, the greater the entropy, and the lesser the discount rate of preference assigned by the group of experts; we denote = 1/ ( ) as the discount rate. After that, we get a sequence of the discount rate. The sequence of the discount rate can be normalized by the formula = tan( ) ⋅ 2/ , is the input data before normalization which is equal to the discount rate, and is the output data after normalization. Noting that the dimension of ( ) is the 2 ( ), we define the weight as = √ . Furthermore, we get the weight of each group of probability assignment in the incomplete pairwise comparison matrix. Without loss of generality, if ( ) = 0, we define ln( ( )) ( ) = 0. The flowchart of the improved method using entropy to convert the conflict factor is following in Figure 2.
Then we get the discount rate for two groups: Next we compute the normalized discount rate: Last we obtain the weights of each group: Then the next thing is to recompute the probability assignment and conflict weight as follows: Now we recompute the combined BPA in Table 5: In the same way, we can get other belief and plausibility functions: According to the result of pairwise comparison, we can conclude that the "best" ranking is 1 ⪰ 3 ⪰ 2 by pessimistic decision-making and 1 ⪰ 2 ⪰ 3 by optimistic decision-making. Because the method is not dependent on the amount of the experts, the result will not change with the increasing of experts.
Moreover, we extend the problem to lower or higher conflict with the alternative of experts. Here we assume two different statuses with the alternative of experts, one is lower conflict of alternative and the other is higher conflict of alternative. Then we evaluate the results, respectively.
The Scientific World Journal 9 The second group The second group For lower conflict condition, we assume the first group (with five experts) provides the following preferences: And the second group (with ten experts) provides the following judgements:  Table 6. The conflict factors are 0.287 for IDM method and 0.2101 for improved method, respectively, and the rankings are 1 ⪰ 2 ⪰ 3 both by pessimistic decision-making and optimistic decision-making either in IDM method or in improved method.
If we increase the number of experts in first group to 50, 20 experts choose { 1 } ⪰ { 2 , 3 } and the left experts choose { 1 , 2 } ⪰ { 3 }. The conflict factor shifts to 0.3091 for IDM method and keeps the same as 0.2101 for improved method, and the ranking is the same as the prior one. The result shows that the IDM method and improved method can maintain the consistency in lower conflict of alternative.
For higher conflict condition, we assume the first group (with five experts) provide the following preferences: two experts ( 11 = 2): And the second group (with ten experts) provide the following option: 21 . Then we get the BPA of all preferences:  Table 7.
The conflict factors are 0.7134 for IDM method and 0.4424 for improved method, respectively, and the rankings are 1 ⪰ 2 ⪰ 3 both by pessimistic decision-making and optimistic decision-making either in IDM method or in improved method.
In the same way, we increase the number of experts in first group to 50, and 20 experts choose { 1 } ⪰ { 2 , 3 } and 30 experts choose { 1 , 2 } ⪰ { 3 }. The conflict factor shifts to 0.6042 for IDM method and the same as 0.4424 for improved method. The ranking is 1 ⪰ 2 ⪰ 3 by pessimistic decision-making and 1 ⪰ 3 ⪰ 2 by optimistic decisionmaking in IDM method and the same result in improved method as prior one. The result indicates that the IDM method can not keep the availability in higher conflict situation and the improved method maintains the effectiveness and reliability both in lower and higher conflict situations. The results of comparison are in Table 8.
Analyzing the result of improved method previous section, there is no doubt that the "best" alternative is { 1 }, and the problem is the ranking between { 2 } and { 3 }. Moreover, we find that 60% of experts of the first group choose the preferences { 1 , 2 } ⪰ { 3 } and 70% of experts of the second group choose the preferences { 1 , 3 } ⪰ { 2 } and { 3 } ⪰ { 1 , 2 } which include the information about { 2 } and { 3 }. So the ranking between { 2 } and { 3 } depends on the weight

experts
Low conflict factor of the ranking chosen by the first group and the second group of experts. Noticing that the weight of imprecise Dirichlet method relies heavily on the number of experts, the weight of the second group with ten experts is greater than the first group with five experts. Thus the result is propitious to the preferences { 1 , 3 } ⪰ { 2 } and { 3 } ⪰ { 1 , 2 }.

Conclusion
Different from the imprecise Dirichlet method, we propose an improved method to deal with the incomplete pairwise comparison by any groups of alternatives and experts. The proposed method assigns the probability to the belief and (or) plausibility of alternatives or ranking using the weighted DEMPSTER-SHAFER evidence method. Moreover, in order to objectively consider the fairness of decisionmaking between different groups of experts, the proposed method introduces the entropy, which indicates the value of information, to calculate the weight. The weight in the proposed method is decided by the initial data or probability assignment itself rather than the number of experts. It takes into account the factor of the number of the elements and the sparse degree of the probability assignment and pays more attention on the establishment mechanism of the data. At last, a numerical analysis illustrates the method and shows the difference from the imprecise Dirichlet method.