On Agreement Tables with Constant Kappa Values

Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It is shown that for this class of tables all special cases of symmetric kappa coincide and that the value of symmetric kappa is not affected by any partitioning of the categories.


Introduction
In behavioral and biomedical science researchers are o en interested in measuring the intensity of a behavior or a disease. Examples are psychologists that assess how anxious a speech-anxious subject appears while giving a talk, pathologists that rate the severity of lesions from scans, or competing diagnostic devices that classify the extent of a disease in patients into categories. ese phenomena are typically classied using a categorical rating system, for example, with categories (A) slight, (B) moderate, and (C) extreme. Because ratings usually entail a certain degree of subjective judgment, researchers frequently want to assess the reliability of the categorical rating system that is used. One way to do this is to assign two observers to rate independently the same set of subjects. e reliability of the rating system can then be assessed by analyzing the agreement between the observers. High agreement between the ratings can be seen as a good indication of consensus in the diagnosis and interchangeability of the ratings of the observers.
Various statistical methodologies have been developed for analyzing agreement of a categorical rating system [ , ]. For instance, loglinear models can be used for studying the patterns of agreement and sources of disagreement [ , ]. However, in practice researchers o en want to express the agreement between the raters in a single number. In this context, standard tools for summarizing agreement between observers are coe cients Cohen's kappa in the case of nominal categories [ -] and weighted kappa in the case of ordinal categories [ -]. With ordinal categories one may expect more disagreement or confusion on adjacent categories than on categories that are further apart. Weighted kappa allows the user to specify weights to describe the closeness between categories [ ]. Both Cohen's kappa and weighted kappa are corrected for agreement due to chance. e coe cients were originally proposed in the context of agreement studies, but nowadays they are used for summarizing all kinds of crossclassi cations of two variables with the same categories [ , ]. e number of categories used in various rating systems usually varies from the minimum number of two to ve in many practical applications. It is sometimes desirable to combine some of the categories [ ]. For example, when two categories are easily confused, combining the categories usually improves the reliability of the rating system [ ]. By collapsing categories the number of categories of the rating system is reduced. If there is a lot of disagreement between two categories, we expect the kappa value to increase if we combine the categories. is is usually the case. However, Schouten [ ] showed that there is a class of agreement tables for which the value of Cohen's kappa remains constant when categories are merged. is is not what one expects from an agreement coe cient like Cohen's kappa. e question, then, arises: do other (weighted) kappa coe cients exhibit T : Two hypothetical 3 × 3 agreement tables.
First observer Second observer A B C Total A B C Total A B C Total the same property for these tables? If the answer is negative, it would make sense to replace Cohen's kappa by a weighted kappa with more favorable properties with regard to these agreement tables.
In this paper we present several properties of kappa coecients with symmetric weighting schemes with respect to this particular class of agreement tables. e paper is organized as follows. In the next section we introduce notation, de ne weighted kappa, and discuss some of its special cases, including Cohen's kappa. e results are presented in Section . Section contains a conclusion.

Kappa Coefficients
In this section we introduce notation and de ne the kappa coe cients. For notational convenience weighted kappa is here de ned in terms of dissimilarity scaling [ ]. If the weights are dissimilarities, pairs of categories that are further apart are assigned higher weights.
Suppose two xed observers independently rate the same set of subjects using the same set of ≥ 2 categories that are de ned in advance. For a population of subjects, let denote the proportion classi ed in category by the rst observer and in category by the second observer, where 1 ≤ , ≤ . e quantities ( ) e numerator of the fraction in ( ) is the weighted observed disagreement, while the denominator of the fraction is the T : Two weighting schemes for four categories A, B, C, and D.

Identity Quadratic
weighted chance-expected disagreement. e value of ( ) is when there is perfect agreement between the two observers, zero when the weighted observed disagreement is equal to the weighted chance-expected disagreement, and negative when the weighted observed disagreement is larger than the weighted chance-expected disagreement.
Under a multinominal sampling model with xed, the maximum likelihood estimate of ( ) iŝ Estimate ( ) is obtained by substitutinĝ = / for the cell probabilities in ( ). A large sample standard error of ( ) can be found in [ ].
In this paper we are interested in the following special case of ( ). We may require that weighted kappa has a symmetric weighting scheme; that is, = for 1 ≤ , ≤ . Since = 0 for 1 ≤ ≤ , this symmetric kappa is given by Perhaps a more familiar de nition of Cohen's kappa is Formulas ( ) and ( ) are equivalent; de nition ( ) will be used in Section below. Coe cient ( ) has value when the observers agree completely, value zero when agreement is equal to that expected under independence, and negative value when agreement is less than expected by chance. e quadratic weights are de ned as = ( − ) 2 for 1 ≤ , ≤ . An example of the weights is presented in the right panel of Table . If we use the quadratic weights in ( ), we obtain the quadratic kappa [ , ] Coe cient ( ) is the most popular version of weighted kappa in the case that the categories of the rating system are ordinal [ , , ]. e quadratic kappa can be interpreted as an intraclass correlation, which is a proportion of variance [ , ]. However, the quadratic kappa is not always sensitive to differences in exact agreement [ ], and high values of the quadratic kappa can be found even when the level of exact agreement is low [ ].

A Class of Agreement Tables
It is sometimes desirable to combine some of the categories [ ]. For example, when two categories are frequently confused, combining the categories may improve the reliability of the rating system. Suppose we combine two categories and , and let ≥ 0 be a nonnegative real number. In this paper we focus on the class of agreement tables that satisfy the condition + + + + + + = for ̸ = , 1 ≤ , ≤ . ( ) Condition ( ) holds, for example, if there is perfect agreement between the raters. In this case = 0 and we have ∑ =1 = 1 and = 0 for ̸ = and 1 ≤ , ≤ . It turns out that there are many nonperfect agreement tables that also satisfy ( ). Examples are the agreement tables in Table . For the two tables, the value of is . and . , respectively. e examples in Table show

( )
A converse version of eorem also holds. Lemma is used in the proof of eorem .
eorem . If all special cases of symmetric kappa are equal, then ( ) holds.
Proof. Let , ὔ ∈ {1, 2, . . . , } with ̸ = ὔ be arbitrary categories. Let * denote the value of the special case of symmetric kappa with ὔ = ὔ = 2 and all other o -diagonal weights equal to . Since all special cases of symmetric kappa are equal, we have in particular = * = 1 − for some real number ≥ 0. Using ( ), the identity = * is equivalent to Since 1 − ∑ =1 + + > 0, it follows from application of Lemma to identity ( ) and the use of identity ( ) that Note that in the proof of eorem certain special cases of coe cient ( ) are used. Condition ( ) will not necessarily hold if two arbitrary special cases of symmetric kappa are equal. We have the following consequences of eorems and .

Corollary . It holds that
( ) eorem shows that if ( ) holds, then the value of coefcient ( ) remains constant when we combine two categories.

eorem .
Let denote the value of symmetric kappa of an agreement table with ≥ 3 categories and * the value of the table that is obtained by combining categories ὔ and ὔὔ . If condition ( ) holds, then one has = * .
eorem shows that if the value of Cohen's kappa in ( ) remains constant when categories are combined, then the value of symmetric kappa in ( ) also remains constant when categories are combined. By repeatedly applying eorem we obtain the following consequence.
Corollary . Let denote the value of symmetric kappa of an agreement table with ≥ 3 categories and * the value of the collapsed table corresponding to any partitioning of the categories. If ( ) holds, then one has = * .

Conclusion
Kappa coe cients are standard tools for summarizing agreement between two observers on a categorical rating scale. e coe cients are nowadays used for summarizing the information in all types of cross-classi cations of two variables with the same categories. In the case of nominal categories Cohen's kappa is a standard tool. In this paper we considered a class of agreement tables for which the value of Cohen's kappa remains constant when two categories are combined. It was shown that for this class of agreement tables all special cases of symmetric kappa, that is, all kappa coe cients with a symmetric weighting scheme, coincide ( eorem ). Furthermore, for this class of agreement tables the value of symmetric kappa remains constant when categories are merged ( eorem and Corollary ).

Conflict of Interests
e author declares that there is no con ict of interests regarding the publication of this paper.