Generalised Score and Wald Tests

The generalised score and Wald tests are described and related to their nongeneralised versions. Two interesting applications are discussed. In the first a new test for the Behrens-Fisher problem is derived. The second is testing homogeneity of variances from multiple univariate normal populations.


Introduction
This paper is intended to be a tutorial for those wishing to inform themselves about the generalised score and Wald Tests. It extends the content of 1 and has similar objectives; that is, it focuses on the use of these tests rather than their properties. It is intended to be very accessible. Readers need only some prior knowledge of partitioned matrices, score and Wald tests, see, for example, 1 and 2, Chapter 3 .
The score test is particularly valuable when maximum likelihood ML estimation under the full model is not preferred, but ML estimation under the null model is. The converse holds for the Wald test. Thus when ML estimation under one of the null and full models is not preferred, the likelihood ratio test is problematic, but one of the score and Wald tests is not. Here by not preferred we mean that, for example, estimates may be calculated by some iterative scheme with dubious convergence. Other possibilities are that estimates may have a particularly convoluted expression or the finite sample properties such as large bias may be inappropriate for the problem of interest.
When ML estimation under both the null and full models is not preferred, we need another way forward. This is provided by the generalised score and Wald Tests. These tests are especially valuable when the model may be misspecified, but that will not be the focus here.
In Section 2 the generalised score and Wald Tests are described. In Section 3 this material is applied to deriving a new test for the Behrens-Fisher problem, while Section 4 looks at testing equality of variances from multiple independent normal samples.

M-Estimators and Generalised Score Tests
The class of M-estimators includes both ML and method of moments estimators. An Mestimator γ satisfies in which X 1 , . . . , X n are independent but not necessarily identically distributed, Ψ is a known p × 1 function not depending on j or n, γ is a p-dimensional parameter, and in general 0 m denotes an m × 1 vector of zeros. The estimating function Ψ must be sufficiently 'smooth.' In particular, its derivatives up to second order, and their expectations, must exist. Hence the matrices A and B defined subsequently are assumed to exist. Also, the expectation of the second-order derivatives must be bounded in probability. More technical details on Mestimators may be found in 3, Chapter 5 . In our setting we assume that γ θ T , β T T and that we wish to test H 0 : θ 0 k against the alternative K: θ / 0 k with θ being the k × 1 vector of primary interest, with β a q × 1 vector of nuisance parameters, and with p k q. The generalised score test is based on the partial M-estimator that satisfies where Ψ is partitioned similarly to γ, so that Ψ T Ψ T θ , Ψ T β , and where γ 0 0 T k , β T Ψ θ X j , γ , in which E 0 denotes expectation under the null hypothesis. Here A γ and B γ are p × p and A θθ and B θθ are k × k. We note that A γ is not necessarily symmetric while B γ is. This means that the form of the generalised tests given by, for example, 4 , needs to be slightly modified. The generalised score test statistic is given by in which all arguments are γ. In the exposition in 4 parameters are estimated by ML but the data do not come from the parametric model: this is ML under misspecification. In 5 , Kent's definitions are given but in place of ML estimators any M-estimators are permitted. It is also noted in 4 that A and B can in practice be replaced by any consistent estimates. An alternative form of S G that is more convenient for calculation is given in 2 , where it is applied to the construction of generalized smooth tests of goodness of fit. This form gives The equivalence of the two forms requires routine but tedious matrix algebra and is omitted here. The asymptotic distribution of both S G and W G under H 0 is χ 2 k . If Ψ X, γ is the derivative of logarithm of the likelihood, which is the usual score function, then A B is the usual symmetric information matrix, and the usual score test statistic. Both are given in this form in 1 . For more information see 5, 6 . In 5, page 328 replacing the inverse of the asymptotic covariance matrix Σ GS γ 0 in S G by a generalised inverse of a consistent estimate of Σ GS γ 0 is recommended. Although it may sound trivial, when calculating any of the ordinary or generalised score or Wald test statistics, we are finding X − E X T Σ −1 X − E X where X is at least asymptotically multivariate normal and Σ is at least asymptotically the full rank covariance matrix of X. Very occasionally it may be more convenient to find the exact covariance matrix rather than one that is asymptotically equivalent. If so the exact covariance matrix can be used in the above expressions; similarly when appropriate a generalised inverse of the exact or an asymptotically equivalent covariance matrix can be used.

The Behrens-Fisher Problem
In the Behrens-Fisher problem, Y 1 , . . . , Y m is a random sample from an N μ Y , σ 2 Y population, and Z 1 , . . . , Z n is an independent random sample from an N μ Z , σ 2 Z population. It is desired to test H: μ Y μ Z against K: μ Y / μ Z , with the standard deviations σ Y and σ Z being nuisance parameters. In 2, Example 3.3.2 the likelihood ratio, score, and Wald tests are derived. The score test requires the solution of an inconvenient cubic equation; so this is one situation in which the Wald statistic looks distinctly more appealing than both the likelihood ratio and score test statistics.
When the estimating function n j 1 Ψ X j , γ is the usual score function, the generalised score test is the usual score test. To conform to our notation put Y 1 , . . . , Y m , Z 1 , . . . , Z n X T , μ Y − μ Z 2θ, μ Y μ Z 2β 1 , σ 2 Y β 2 and σ 2 Z β 2 . We test H : θ 0 against K : θ / 0, with nuisance parameters β 1 , β 2 and β 3 . The logarithm of the likelihood is 3.1 and therefore the score function has the following components:

3.2
These are the partial derivatives of the logarithm of the likelihood. Under the null hypothesis the estimating equations are S β 1 γ 0 S β 2 γ 0 S β 3 γ 0 0. This leads to the inconvenient cubic equation mentioned previously. If we proceed with this model, the cubic must be solved to find β 10 , and hence β 20 i Y i − β 10 2 /m and β 30 j Z j − β 10 2 /n. We also find 3.4 and the generalised score test statistic is S G Y − Z 2 / β 20 /m β 30 /n . This is just the ordinary score test statistic.

Advances in Decision Sciences
While solving the cubic is not a great difficulty, if we modify S β 1 γ so that it becomes a possibly less efficient but certainly more convenient estimator of the common mean under the null hypothesis may be found. This estimator is the solution to S β 1 γ 0 0, namely, β * 10 mY nZ / m n . If we also modify S θ γ so that while leaving the other two equations unchanged, the generalised score test is based on The estimators of β 2 and β 3 are slightly different from those found previously, being β *  and the generalised score test statistic is

3.10
It may be shown that the Wald test statistic is a one-one function of this S G , so that these two tests are equivalent. However, if using the asymptotic χ 2 1 critical values, the generalised score test has actual test sizes much closer to the nominal sizes than the Wald test. When using simulated critical values that are virtually exact, the generalised score test power is within 1% of the entrenched test due to Welch 7 . So on this criterion the Welch and generalised score tests are virtually indistinguishable.
The Welch test is very similar to the Wald test. Using Satterthwaite's approximation to the null distribution of the Welch test gives excellent agreement between the nominal and actual test sizes. However Satterthwaite's approximation does not work nearly as well for S G . Hence, in terms of agreement between nominal and actual test sizes using approximations and asymptotic critical values, the Welch test is to be preferred. Support for these assertions and more numerical details are available in 8 .

Testing Equality of Variances
Suppose that we have m independent random samples, with the jth, j 1, . . . , m, being of size n j and from a normal N μ j , σ 2 j population. The total sample size is n n 1 · · · n m . We seek to test equality of variances: H: σ 2 1 · · · σ 2 m σ 2 say against the alternative K: not H. Popular tests include the likelihood ratio test, frequently referred to as Bartlett's test, and Levene's test. The former is known to be nonrobust, while the latter is more robust in that its actual levels are closer to the nominal levels. Levene's test is less powerful than Bartlett's when the data are consistent with normality.
We now construct a Wald test of H against K. We could use the generalised Wald test construction with Ψ X, γ being the derivative of logarithm of the likelihood, but we leave that as an exercise for the interested reader. We could also calculate one of the forms of the asymptotic covariance matrix, but this is a case where it is simpler to calculate the exact covariance matrix. Moreover the exact covariance matrix involves an inconvenient inverse; so we instead use the Moore-Penrose inverse. This is defined in the appendix, along with some relevant useful results. This approach leads to a simpler test statistic.
Throughout this example, since we are calculating the Wald test statistic, all estimation is ML. As a consequence estimators are denoted by hats ∧ instead of tildes ∼ . We also use unbiased versions of the sample variances with divisors n − 1 instead of n . These are asymptotically equivalent to the usual ML estimators, and the corresponding test statistic is asymptotically equivalent to the usual Wald test statistic.