Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called
agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It is shown that for this class of tables all special cases of symmetric kappa coincide and that the value of symmetric kappa is not affected by any
partitioning of the categories.
1. Introduction
In behavioral and biomedical science researchers are often interested in measuring the intensity of a behavior or a disease. Examples are psychologists that assess how anxious a speech-anxious subject appears while giving a talk, pathologists that rate the severity of lesions from scans, or competing diagnostic devices that classify the extent of a disease in patients into categories. These phenomena are typically classified using a categorical rating system, for example, with categories (A) slight, (B) moderate, and (C) extreme. Because ratings usually entail a certain degree of subjective judgment, researchers frequently want to assess the reliability of the categorical rating system that is used. One way to do this is to assign two observers to rate independently the same set of subjects. The reliability of the rating system can then be assessed by analyzing the agreement between the observers. High agreement between the ratings can be seen as a good indication of consensus in the diagnosis and interchangeability of the ratings of the observers.
Various statistical methodologies have been developed for analyzing agreement of a categorical rating system [1, 2]. For instance, loglinear models can be used for studying the patterns of agreement and sources of disagreement [3, 4]. However, in practice researchers often want to express the agreement between the raters in a single number. In this context, standard tools for summarizing agreement between observers are coefficients Cohen’s kappa in the case of nominal categories [5–7] and weighted kappa in the case of ordinal categories [8–11]. With ordinal categories one may expect more disagreement or confusion on adjacent categories than on categories that are further apart. Weighted kappa allows the user to specify weights to describe the closeness between categories [12]. Both Cohen’s kappa and weighted kappa are corrected for agreement due to chance. The coefficients were originally proposed in the context of agreement studies, but nowadays they are used for summarizing all kinds of cross-classifications of two variables with the same categories [11, 12].
The number of categories used in various rating systems usually varies from the minimum number of two to five in many practical applications. It is sometimes desirable to combine some of the categories [7]. For example, when two categories are easily confused, combining the categories usually improves the reliability of the rating system [13]. By collapsing categories the number of categories of the rating system is reduced. If there is a lot of disagreement between two categories, we expect the kappa value to increase if we combine the categories. This is usually the case. However, Schouten [13] showed that there is a class of agreement tables for which the value of Cohen’s kappa remains constant when categories are merged. This is not what one expects from an agreement coefficient like Cohen’s kappa. The question, then, arises: do other (weighted) kappa coefficients exhibit the same property for these tables? If the answer is negative, it would make sense to replace Cohen’s kappa by a weighted kappa with more favorable properties with regard to these agreement tables.
In this paper we present several properties of kappa coefficients with symmetric weighting schemes with respect to this particular class of agreement tables. The paper is organized as follows. In the next section we introduce notation, define weighted kappa, and discuss some of its special cases, including Cohen’s kappa. The results are presented in Section 3. Section 4 contains a conclusion.
2. Kappa Coefficients
In this section we introduce notation and define the kappa coefficients. For notational convenience weighted kappa is here defined in terms of dissimilarity scaling [8]. If the weights are dissimilarities, pairs of categories that are further apart are assigned higher weights.
Suppose two fixed observers independently rate the same set of n subjects using the same set of c≥2 categories that are defined in advance. For a population of subjects, let πij denote the proportion classified in category i by the first observer and in category j by the second observer, where 1≤i, j≤c. The quantities
(1)πi+=∑j=1cπij,π+i=∑j=1cπji
are the marginal probabilities. They reflect how often the observers used the categories. The cell probabilities of the square table {πij} are not directly observed. Let {nij} denote the contingency table of observed frequencies. Assuming a multinominal sampling model with the total number of subjects n fixed, the maximum likelihood estimate of πij is given by π^ij=nij/n [14, 15]. Since the rows and columns of {nij} have the same labels, the contingency table is usually called an agreement table. Table 1 presents two hypothetical agreement tables with three categories A, B, and C.
Two hypothetical 3×3 agreement tables.
First observer
Second observer
A
B
C
Total
A
B
C
Total
A
22
2
0
24
16
4
0
20
B
4
10
0
14
0
2
1
3
C
4
2
6
12
4
0
2
6
Total
30
14
6
50
20
6
3
29
Let wij≥0 for 1≤i, j≤c be nonnegative real numbers with wii=0. The weighted kappa coefficient can be defined as [8, 12]
(2)κw=1-∑i=1c∑j=1cwijπij∑i=1c∑j=1cwijπi+π+j.
The numerator of the fraction in (2) is the weighted observed disagreement, while the denominator of the fraction is the weighted chance-expected disagreement. The value of (2) is 1 when there is perfect agreement between the two observers, zero when the weighted observed disagreement is equal to the weighted chance-expected disagreement, and negative when the weighted observed disagreement is larger than the weighted chance-expected disagreement.
Under a multinominal sampling model with n fixed, the maximum likelihood estimate of (2) is
(3)κ^w=1-n∑i=1c∑j=1cwijnij∑i=1c∑j=1cwijni+n+j.
Estimate (3) is obtained by substituting π^ij=nij/n for the cell probabilities πij in (2). A large sample standard error of (3) can be found in [16].
In this paper we are interested in the following special case of (2). We may require that weighted kappa has a symmetric weighting scheme; that is, wij=wji for 1≤i, j≤c. Since wii=0 for 1≤i≤c, this symmetric kappa is given by
(4)κs=1-∑i=1c-1∑j=i+1cwij(πij+πji)∑i=1c-1∑j=i+1cwij(πi+π+j+πj+π+i).
Special cases of coefficient (4) that are used in practice are Cohen’s kappa [5, 7, 12] for nominal categories and linear kappa [10, 17] and quadratic kappa [9, 11, 18] for ordinal categories. Cohen’s kappa and quadratic kappa each have been used in thousands of applications [6, 11, 19]. The two coefficients are briefly discussed below.
The identity weights are defined as
(5)wij=1i≠j={0fori=j,1fori≠j.
An example of weighting scheme (5) is presented in the left panel of Table 2. If we use weighting scheme (5) in (2), we obtain Cohen’s unweighted kappa [5]
(6)κ=1-1-∑i=1cπii1-∑i=1cπi+π+i.
Perhaps a more familiar definition of Cohen’s kappa is
(7)κ=∑i=1cπii-∑i=1cπi+π+i1-∑i=1cπi+π+i.
Formulas (6) and (7) are equivalent; definition (6) will be used in Section 3 below. Coefficient (6) has value 1 when the observers agree completely, value zero when agreement is equal to that expected under independence, and negative value when agreement is less than expected by chance.
Two weighting schemes for four categories A, B, C, and D.
Identity
Quadratic
A
B
C
D
A
B
C
D
A
0
1
1
1
0
1
4
9
B
1
0
1
1
1
0
1
4
C
1
1
0
1
4
1
0
1
D
1
1
1
0
9
4
1
0
The quadratic weights are defined as wij=(i-j)2 for 1≤i, j≤c. An example of the weights is presented in the right panel of Table 2. If we use the quadratic weights in (2), we obtain the quadratic kappa [9, 18]
(8)κq=1-∑i=1c∑j=1c(i-j)2πij∑i=1c∑j=1c(i-j)2πi+π+j.
Coefficient (8) is the most popular version of weighted kappa in the case that the categories of the rating system are ordinal [2, 11, 19]. The quadratic kappa can be interpreted as an intraclass correlation, which is a proportion of variance [9, 18]. However, the quadratic kappa is not always sensitive to differences in exact agreement [11], and high values of the quadratic kappa can be found even when the level of exact agreement is low [19].
3. A Class of Agreement Tables
It is sometimes desirable to combine some of the categories [7]. For example, when two categories are frequently confused, combining the categories may improve the reliability of the rating system. Suppose we combine two categories i and j, and let d≥0 be a nonnegative real number. In this paper we focus on the class of agreement tables that satisfy the condition
(9)πij+πjiπi+π+j+πj+π+i=dfori≠j,1≤i,j≤c.
Condition (9) holds, for example, if there is perfect agreement between the raters. In this case d=0 and we have ∑i=1cπii=1 and πij=0 for i≠j and 1≤i, j≤c. It turns out that there are many nonperfect agreement tables that also satisfy (9). Examples are the agreement tables in Table 1. For the two tables, the value of d is .397 and .644, respectively. The examples in Table 1 show that agreement tables that satisfy (9) are not necessarily symmetric. Furthermore, since the examples appear to be ordinary agreement tables that can be encountered in practice, it appears that the class of agreement tables satisfying (9) is not trivial.
For Cohen’s kappa in (6) Schouten [13] showed that if (9) holds, then the kappa value cannot be increased or decreased by combing categories. In this section we present various additional results for other special cases of symmetric kappa in (4). Theorem 1 shows that all special cases of symmetric kappa coincide if (9) holds.
Theorem 1.
If (9) holds, then κs=1-d.
Proof.
If (9) holds, we have the particular case
(10)1-π12+π21π1+π+2+π2+π+1=1-d.
Furthermore, for two arbitrary categories i and j with i≠j we have
(11)πij+πjiπi+π+j+πj+π+i=aij(π12+π21)aij(π1+π+2+π2+π+1)=π12+π21π1+π+2+π2+π+1
for certain nonnegative real numbers aij≥0. Hence, using these aij and identity (10) we can write κs as
(12)κs=1-∑i=1c-1∑j=i+1cwijaij(π12+π21)∑i=1c-1∑j=i+1cwijaij(π1+π+2+π2+π+1)=1-(π12+π21)(∑i=1c-1∑j=i+1cwijaij)(π1+π+2+π2+π+1)(∑i=1c-1∑j=i+1cwijaij)=1-π12+π21π1+π+2+π2+π+1=1-d.
A converse version of Theorem 1 also holds. Lemma 2 is used in the proof of Theorem 3.
Lemma 2.
Let a,b≥0 and c,d>0 be real numbers. One has
(13)ac=bd⟺ac=a+bc+d.
Proof.
Since c and d are positive numbers, we have a/c=b/d or ad=bc. Adding ac to both sides we obtain a(c+d)=c(a+b) or a/c=(a+b)/(c+d).
Theorem 3.
If all special cases of symmetric kappa are equal, then (9) holds.
Proof.
Let r,r′∈{1,2,…,c} with r≠r′ be arbitrary categories. Let κs* denote the value of the special case of symmetric kappa with wrr′=wr′r=2 and all other off-diagonal weights equal to 1. Since all special cases of symmetric kappa are equal, we have in particular κ=κs*=1-d for some real number d≥0. Using (6), the identity κ=κs* is equivalent to
(14)1-∑i=1cπii1-∑i=1cπi+π+i=1-∑i=1cπii+πrr′+πr′r1-∑i=1cπi+π+i+πr+π+r′+πr′+π+r.
Since 1-∑i=1cπi+π+i>0, it follows from application of Lemma 2 to identity (14) and the use of identity (6) that
(15)πrr′+πr′rπr+π+r′+πr′+π+r=1-∑i=1cπii1-∑i=1cπi+π+i=1-κ=d.
Note that in the proof of Theorem 3 certain special cases of coefficient (4) are used. Condition (9) will not necessarily hold if two arbitrary special cases of symmetric kappa are equal. We have the following consequences of Theorems 1 and 3.
Corollary 4.
It holds that κs=1⇔πij=0 for i≠j and 1≤i, j≤c.
Corollary 5.
It holds that
(16)κs=0⟺πij+πjiπi+π+j+πj+π+i=1hhhhhhhfori≠j,1≤i,j≤c.
Theorem 6 shows that if (9) holds, then the value of coefficient (4) remains constant when we combine two categories.
Theorem 6.
Let κs denote the value of symmetric kappa of an agreement table with c≥3 categories and κs* the value of the table that is obtained by combining categories r′ and r′′. If condition (9) holds, then one has κs=κs*.
Proof.
Since (9) holds, it follows from Theorem 1 that κs=1-d for some d≥0. Let r denote the category that is obtained by merging r′ and r′′. Let i with 1≤i≤c and i≠r′, r′′ be an arbitrary category. We have the four relations(17a)πir=πir′+πir′′,(17b)πri=πr′i+πr′′i,(17c)πr+=πr′++πr′′+,(17d)π+r=π+r′+π+r′′.Furthermore, since (9) holds, we have the identities(18a)πir′+πr′iπi+π+r′+πr′+π+i=d,(18b)πir′′+πr′′iπi+π+r′′+πr′′+π+i=d.Applying Lemma 2 to the identities in (18a) and (18b) we obtain
(19)πir′+πr′i+πir′′+πr′′iπi+π+r′+πr′+π+i+πi+π+r′′+πr′′+π+i=d.
Moreover, using (17a), (17b), (17c), (17d), and (19), we have
(20)πir+πriπi+π+r+πr+π+i=πir′+πir′′+πr′i+πr′′iπi+(π+r′+π+r′′)+(πr′++πr′′+)π+i=πir′+πr′i+πir′′+πr′′iπi+π+r′+πr′+π+i+πi+π+r′′+πr′′+π+i=d.
It follows from identity (20) that condition (9) also holds for the collapsed (c-1)×(c-1) table. Application of Theorem 1 then yields that κs*=1-d, from which we may conclude that κs=κs*.
Theorem 6 shows that if the value of Cohen’s kappa in (6) remains constant when categories are combined, then the value of symmetric kappa in (4) also remains constant when categories are combined. By repeatedly applying Theorem 6 we obtain the following consequence.
Corollary 7.
Let κs denote the value of symmetric kappa of an agreement table with c≥3 categories and κs* the value of the collapsed table corresponding to any partitioning of the categories. If (9) holds, then one has κs=κs*.
4. Conclusion
Kappa coefficients are standard tools for summarizing agreement between two observers on a categorical rating scale. The coefficients are nowadays used for summarizing the information in all types of cross-classifications of two variables with the same categories. In the case of nominal categories Cohen’s kappa is a standard tool. In this paper we considered a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It was shown that for this class of agreement tables all special cases of symmetric kappa, that is, all kappa coefficients with a symmetric weighting scheme, coincide (Theorem 1). Furthermore, for this class of agreement tables the value of symmetric kappa remains constant when categories are merged (Theorem 6 and Corollary 7).
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The author thanks an anonymous reviewer for several helpful comments and valuable suggestions on a previous version of the paper. The comments have improved the presentation of the paper. This research is part of Veni project 451-11-026 funded by the Netherlands Organisation for Scientific Research.
JakobssonU.WestergrenA.Statistical methods for assessing agreement for ordinal dataMaclureM.WillettW. C.Misinterpretation and misuse of the Kappa statisticAgrestiA.Modelling patterns of agreement and disagreementAgrestiA.CohenJ.A coefficient of agreement for nominal scalesHsuL. M.FieldR.Interrater agreement measures: comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's αWarrensM. J.Cohen's kappa can always be increased and decreased by combining categoriesCohenJ.Weighted kappa: nominal scale agreement provision for scaled disagreement or partial creditFleissJ. L.CohenJ.The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliabilityVanbelleS.AlbertA.A note on the linearly weighted kappa coefficient for ordinal scalesWarrensM. J.Some paradoxical results for the quadratically weighted kappaWarrensM. J.Conditional inequalities between Cohen's kappa and weighted kappasSchoutenH. J. A.Nominal scale agreement among observersAgrestiA.BishopY. M. M.FienbergS. E.HollandP. W.FleissJ. L.CohenJ.EverittB. S.Large sample standard errors of kappa and weighted kappaCicchettiD.AllisonT.A new procedure for assessing reliability of scoring EEG sleep recordingsSchusterC.A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scalesGrahamP.JacksonR.The analysis of ordinal agreement data: beyond weighted kappa