Extensions to the Kruskal Wallis Test and a Generalised Median Test with Extensions

The data for the tests considered here may be presented in two way contingency ta bles with all marginal totals xed We show that Pearson s test statistic X P P for Pearson may be partitioned into useful and informative components The rst detects location di erences be tween the treatments and the subsequent components detect dispersion and higher order moment di erences For Kruskal Wallis type data when there are no ties the location component is the Kruskal Wallis test The subsequent components are the extensions Our approach enables us to generalise to when there are ties and to when there is a xed number of categories and a large number of observations We also propose a generalisation of the well known median test In this situation the location detecting rst component of X P reduces to the usual median test statistic when there are only two categories Subsequent components detect higher moment departures from the null hypothesis of equal treatment e ects


Introduction
The idea of decomposing a test into orthogonal contrasts, as in the analysis of variance, has long been appreciated by statisticians as a way of making hypothesis tests more informative.In the authors' smooth goodness of t work (see Rayner and Best, 1989), a similar approach i s pursued.Omnibus test statistics are partitioned into smooth components.We de ne the components of a test statistic to be asymptotically pairwise independent, with each asymptotically having the chi-squared distribution, and such t h a t their sum gives the original test statistic.The components provide powerful directional tests and permit a convenient and informative scrutiny o f t h e data.This approach i s applied to Spearman's test in Best and Rayner (1996) and Rayner and Best (1996a) Rayner and Best (1996b) gave a n o verview of this approach applied to several commonly used nonparametric tests, including the Friedman and Durbin tests.
Data for a generalisation of the median test that we subsequently propose, and for the Kruskal-Wallis test both with and without ties, may be presented in the form of two-way tables with xed marginal totals.We d e r i v e the covariance matrix of entries in such tables and then partition a multiple of X 2  P into components that detect location and higher moment di erences between rows.
For Kruskal-Wallis-type data when there are no ties, the location component i s the Kruskal-Wallis test.Our approach enables us to generalise to when there are ties, and to when there is a xed number of categories and a large number of observations.We also propose a generalisation of the well-known median test.The location detecting rst component o f X 2  P reduces to the usual median test statistic when there are only two categories.Using more categories allows components other than this location component to be calculated.These additional components, that detect dispersion and higher moment e ects, are not available when using the usual median test.
The structure of this paper is as follows.In the next section the model for twoway contingency tables with xed marginal totals is given, and Pearson's X 2 P is derived as a test statistic for the null hypothesis of like rows.In section three a multiple of X 2 P is partitioned into components.The material in section 2 and 3 will be familiar to many readers, but is necessary background for the new work.In section four it is shown that when there are no ties the rst component is the usual Kruskal-Wallis statistic.The non-location detecting components are our extensions.Section ve generalises the treatment to when there are ties.Section six introduces a generalisation of the usual median X 2 test, which i s t h us identi ed as a location detecting test the extensions permit dispersion and other e ects to be detected.

A Model and Pearson's X 2 Test
Suppose we h a ve a t wo-way table of counts N ij , with i = 1 : : : r and j = 1 : : : c .The row and column totals, respectively n i: , i = 1 : : : r and n :j , j = 1 : : : c are known constants.Under the null hypothesis of simple random sampling, the likelihood was given by R o y and Mitra (1956) as To nd moments of the N ij , expectations may be taken with respect to the distribution of the second row conditional on knowledge of the column sums of the rst two r o ws, then conditional on the column sums of the rst three rows, and so on.It su ces to know the moments of the extended hypergeometric distribution.Details are given in the Appendix.We n d E N ij ] = n i: n :j =n :: i = 1 : : : r and j = 1 : : : c : Write N i = ( N i1 : : : N ic ) T , i = 1 : : : r and N T = ( N T 1 : : : N T r ), so that N is the vector of all the cell counts.The joint c o variance matrix of N i and N j is, for i 6 = j , cov(N i N j ) = ; n i: n j: n 2 :: diag n :r n :: (n :: ; 1) ; n :r n :s n :: ; 1 : Write f j = n :j =n :: j = 1 : : : c , and R = diag n :r n :: (n :: ; 1) ; n :r n :s n :: ; 1 : The covariance matrix of N is cov(N ) = fdiag(f j ) ; (f i f j )g R, where is the direct or Kronecker product.See Lancaster (1969) for details about direct or Kronecker sums and products.Now de ne the standardised cell counts ij ] i = 1 : : : r and j = 1 : : : c Z = ( Z 11 : : : Z 1c : : : Z r1 : : : Z r c ) T I a the a by a identity matrix and 1 a the a by 1 v ector with every element 1 .Then cov(Z ) = fI r ; ( q f i f j ])g R: The matrix fI r ;( p f i f j ])g has r ;1 latent roots 1 and one latent root zero.The latent roots of R are di cult to nd in general, but their asymptotic limits follow from Lancaster (1969, Chapter V.3).Lancaster showed that the quadratic form with vector the standardised cell counts and matrix essentially R, is the familiar Pearson goodness of t statistic, with asymptotic distribution 2 c;1 .Hence the latent roots of R are asymptotically one c ; 1 times and zero once.So under the null hypothesis of simple random sampling, Z has zero mean and covariance matrix cov(Z ), which asymptotically has (r ;1)(c;1) latent roots one, and the remaining r + c ; 1 latent roots zero.
In the well known and often used `classical' model, r and c are xed and the total count n :: ! 1 .The test statistic X 2 P is given by X We n o w con rm that our model leads to this test statistic.Suppose H is orthogonal and diagonalises cov(Z ).Asymptotically we then have H T cov(Z)H = I (r;1)(c;1) O (r+c;1) where means direct or Kronecker sum.De ne Y = H T Z. Now Z T Z = Y T Y , in which Y , b y the multivariate Central Limit Theorem, is asymptotically N r c (0 I (r;1)(c;1) 0 (r+c;1) ]) under the null hypothesis of simple random sampling.
It follows that under the null hypothesis, X 2 P = Z T Z = Y T Y asymptotically has the 2  (r;1)(c;1) distribution.

Partitioning Pearson's Statistic
We now show that X 2 P may be partitioned into components, the sth of which detects sth moment departures from the null hypothesis of similarly distributed rows (treatments).

The elements Y
There is some choice in de ning the Y i , a s H is not yet fully speci ed.In doing so, our aim is to nd Y i that can be easily and usefully interpreted.To achieve one such partition, rst suppose that fg s (j)g is the set of polynomials orthonormal on fn :j =n :: g.See the Appendix for the de nitions of the rst two polynomials and the derivation of subsequent polynomials.This approach results, when there are no ties, in the rst component being the Kruskal-Wallis test.Write g s for the c by 1 v ector with elements g s (j).and V c = 0 (all the V s are r by 1) so that n ; P into components V T s V s , s = 1 : : : c ; 1.The V s are asymptotically mutually independent and asymptotically N r (0 I (r;1) 0), so that the V T s V s are asymptotically mutually independent 2 r;1 .Explicitly we h a ve, for s = 1 : : : c ; 1, Because V s involves, through g s , a polynomial of order s, the elements of V s are polynomials of order s in the elements of N.Under the null hypothesis E Z] = 0 , but when this is not true E V s ] involves moments up to order s of Z.So for s = 1 : : : r ; Conover (1980) found the Kruskal-Wallis statistic adjusted for ties to be 0.3209, which is to be compared with the 2 2 (5%) point o f 5 .9 9 1 .We nd the location detecting component V T 1 V 1 to have P -v alue 0.85, con rming, as Conover reported, that \none of the instructors can be said to grade higher or lower than the others on the basis of the evidence presented".However the dispersion detecting component V T 2 V 2 has P-value 0.01, indicating a signi cant v ariability di erence.
From the data it appears that the rst instructor is less variable than the other two.In fact, 9:643 = (;2:113) 2 + ( 2 :274) 2 + ( ;0:031) 2 , with the elements of v 2 = (;2:113 2:274 ;0:031) T being values of approximately standard normally distributed contributions from instructors 1, 2 and 3 respectively.The rst instructor is less variable than the third who is less variable than the second.This can be formalised by a LSD analysis.The residual X 2 We n o w consider models that lead to the Kruskal-Wallis test when there are no ties.The latent r o o t s o f cov(Z ) will be found explicitly rather than asymptotically as in section 2. We s h o w that X 2 P is not an appropriate test statistic, but nevertheless, its components are.The rst component is the Kruskal-Wallis test statistic, and the subsequent c o m p o n e n ts provide informative extensions.
Suppose we have distinct observations x ij , being the j th of n i observations on the ith of t treatments.All n = n 1 + ::: + n t observations are combined, ordered, ranked, and the sums R i of the ranks obtained by the ith treatment calculated.The Kruskal-Wallis statistic is H = f12= n(n + 1 ) ] g i R 2 i =n i ; 3(n + 1 ) : See for example, Conover (1980, section 5.2).The data may be presented as an t by n contingency table of counts fN ij g, w i t h N ij = 1 if rank j is allotted to treatment i, a n d N ij = 0 if rank j is allotted to some other treatment.The row and column totals are all xed: the row totals are the treatment sample sizes, so that n i: = n i for i = 1 : : : t , while the column totals are all one: n :j = 1 for i = 1 With r replaced by t and c replaced by n, this is the same as in section three.
As in section 2, we are interested in the distribution theory as n ! 1 .However there Z was an r c by 1 vector of xed length here Z is a tn by 1 vector.Fortunately, it is not the asymptotic distribution of Z that is required.First recall that X 2 P has a xed value, (t ; 1)n, for all tables, and so is not available as a test statistic.Second, as in section three, the multivariate Central Limit Theorem shows that each V s is asymptotically N t (0 I (t;1) 0).Moreover consideration of all pairs V s , V t shows that they are asymptotically jointly multivariate normal, and since their covariance matrix is zero, they are asymptotically pairwise independent.The V T s V s still partition ; n;1 n X 2 P .It is the pairwise independence and convenient 2 t;1 distribution of each V s that makes data analysis so informative and convenient.What is lost by t h e u n a vailability o f X 2 P , is demonstrated in the Employees Example below: there is no residual available to assess if there are higher moment di erences between the treatments.
We n o w s h o w t h a t p r o vided there are no ties, V T 1 V 1 is the Kruskal-Wallis statistic, so that the subsequent V T s V s provide extensions to the Kruskal-Wallis test.
First note that the fg s (j)g is the set of polynomials orthonormal on the discrete uniform distribution, so that g 1 (j) = aj + b, j = 1 : : : n , i n w h i c h a = p 12=(n 2 ; 1) and b = ; p 3(n + 1 ) =(n ; 1) = ;f(n + 1 ) =2ga: The rank sum for treatment i, R i , i s n X j=1 j N ij , i = 1 : : : t .Now since n :j = 1 for j = 1 : : : n , X j g 1 (j) after some manipulation.This is the Kruskal-Wallis statistic, well known to be sensitive to location departures from the null hypothesis.Since V s assesses sth moment departures between treatments, we h a ve partitioned the statistic ( n;1 n )X 2 P into asymptotically pairwise independent components, V T s V s , s = 1 : : : n ;1, each with the 2 t;1 distribution, and such t h a t the sth detects sth moment departures from the hypothesis of similarly distributed rows (treatments).Since the rst of these is the Kruskal-Wallis statistic, the subsequent components provide extensions to the Kruskal-Wallis test.
Employees Example.Conover (1980, p. 238 The value of the Kruskal-Wallis statistic is 9.72, with 2 3 P-value 0.021, but Monte Carlo permutation test P-value 0.010.The latter is more likely to be accurate as the sample size is small.Further components are not signi cant.An LSD analysis can be used to show that programmes 1 and 2 and programmes 3 and 4 are equally e ective, with 3 and 4 being superior.

The Kruskal-Wallis Test with Ties
If there are ties, the data may be presented as an t by n contingency table of counts fN ij g, with the row totals are xed at the treatment sample sizes, so again n i: = n i , i = 1 : : : t , while the column totals are no longer all one.The covariance matrix of Z is cov(Z ) = fI t ; ( q f i f j ])g R and R = diag n :u n :: (n :: ; 1) ; n :u n :v n :: ; 1 : As in section 2, the latent roots of R are zero once and asymptotically one n ;1 times.It follows that cov(Z ) h a s ( t ; 1)(n ; 1) latent roots asymptotically one, and the remaining t + n ; 1 latent roots zero.With suitable modi cations the partitioning of section three holds.For s = 1 : : : n ; 1, Note that fg s (j)g is the set of polynomials orthonormal on fn :j =n :: g, not on the discrete uniform as in the previous section when there were no ties.This is the partition derived in section 3 for X 2 P .So the rst component of X 2 P in the Instructors example is the Kruskal-Wallis statistic corrected for ties.The subsequent components are extensions to the Kruskal-Wallis test adjusted for ties.Note that for this example the model assumed in section 3, with xed numbers of rows and columns, is more plausible than the model of this section, since n = 5 is hardly large.

Generalised Median Tests
Conover (1980, section 4.3) described the median test, in which random samples are taken from each of c populations.Each random sample is classi ed as above and below the grand median (the median of the combined random samples), forming an r by 2 contingency table with xed marginal totals.The usual chi-squared test, based on X 2 P , is then applied to this contingency table.If instead of the grand median, a `grand quantile' is used, the resulting test is described as a quantile test: see Conover (1980, p. 174).These tests can be generalised by c hoosing c instead of two categories for the combined random samples, and so forming an r by c contingency table of counts N ij of the number of observations for the ith sample in the j th category.This table has all row and column totals xed and can be tested for row consistency using the results of the sections 2 and 3.The rst three say, components of X 2 P are of particular interest, indicating location, dispersion and skewness di erences between treatments.

It is routine to show t h a t t h e location
P reduces to the median test statistic when observations are classi ed into just two categories.This is shown in the Appendix.The result identi es the median test as a location detecting test.To detect up to sth moment di erences between the populations requires categorisation into s + 1 categories and the use of the V 2 : : : V s components.If there are as many categories as observations and each category has one observation, the test based on the location component is the Kruskal-Wallis test, which is known to be more powerful than the median test.Using more than two categories will result in less loss of information due to categorisation compared to the median test, and will permit assessment of higher moment di erences between the treatments.Corn Example.Conover (1980, p. 172) gave the example of four di erent methods of growing corn.He classi ed the data as greater than 89 and up to 88 and applied the median test.In this form this does not conform to the xed margins model.If the objective were to divide the data into groups of the lowe s t 1 8 a n d highest 16 observations, it would conform to the xed margins model.We n o w classify the data into four approximately equal groups.
Using the median test, Conover reported a P-value \slightly less than 0.001": the method median yields are clearly di erent.We calculate X 2 P = 4 9 :712 on 9 degrees of freedom.In addition V T 1 V 1 = 2 5 :723, V T 2 V 2 = 1 9 :972 and V T 3 V 3 = 2 :574, all on 3 degrees of freedom.The location and dispersion components and X 2 P are all signi cant, with P-values all zero to three decimal places.The residual or skewness component h a s   (n :: ; 1) ; n :r n :s n :: ; 1 : Now since the fN ij g are such t h a t t h e r o w and column totals are known constants, cov(N i N 1 +: : : +N r ) = 0 for i = 1 : : : r .So if we w r i t e f j = n :j =n :: , j = 1 : : : c , and is the grand total of the observations.The models for tables with just one set of marginal totals xed, or only the grand total xed, are quite di erent from our model in which all row and column totals are xed.See Lancaster (1969, chapter XI section 2, pp.212-217).This likelihood can be expressed as a product of extended or multivariate hypergeometric probability functions: De ne G by G = G : : : G c ]= p c in which G s is the r c by r matrix .The elements of Y may be considered in blocks of r, the sth block corresponding to the polynomial of order s.These blocks are asymptotically mutually independent.Write Y T = ( V T 1 : : : V T c ), in which V 1 = ( Y 1 : : : Y r ) T : : : V c;1 = ( Y (c;2)r+1 : : : Y (c;1)r ) T EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS 17
Instructors Example.SeeConover (1980, p. 233).Three instructors assign grades in ve categories according to the following table.
, exercise 2) gave an exercise in which 20 new employees are randomly assigned to four di erent job training programmes.At the end of their training the employees are ranked, with a low ranking re ecting a l o w job ability.
23 P-value 0.45.The ner classi cation, compare to that employed by the median test, has uncovered a variability di erence between the methods: methods 3 and 4 are signi cantly less variable than 1 and 2. 2: (N 11 + N 21 )=(n 1: + n 2: ) n