Comparison of Some Tests of Fit for the Inverse Gaussian Distribution

Copyright q 2012 D. J. Best et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper gives an empirical investigation of some tests of goodness of fit for the inverse Gaussian distribution.


Introduction
The inverse Gaussian IG distribution is an important statistical model for the analysis of positive data. See, for example, Seshadri 1 . In its standard form the distribution, denoted IG λ, μ , depends on the shape parameter λ > 0 and the mean μ > 0. Its probability density function is , for x > 0, and zero otherwise.
Let X 1 , X 2 , . . . , X n be a sequence of independent observations. We wish to test H 0 : X is distributed as IG λ, μ for λ > 0 and μ > 0 against H A and not H 0 .

Advances in Decision Sciences
We consider the following test of fit statistics: i the smooth test statistic of Ducharme  We strongly suggest that P values for all tests be found using the parametric bootstrap. A link for smooth test statistics for many distributions and bootstrap P values for their tests of fit is http://www.biomath.ugent.be/∼othas/smooth2.

Ducharme's R 3
The usual approach to constructing a smooth test as outlined, for example, in Rayner  A positive feature of smooth tests is that their components can often shed light on how the data differ from the hypothesised distribution. This is somewhat less evident with Ducharme's test given that he transforms the data. Another positive feature of smooth tests is that their components often give highly focused tests with good power. Ducharme's test has components that are likely to fulfil this role.

Henze and Klar's V 0
This is defined using the exponentially scaled complementary error function erfce x e x 2 erfc x where erfc x 2 ∞ x e −t 2 dt/ √ π. Note that in erfc x we divide by √ π and not π as in Henze and Klar 3, page 428 , as we believe a typographical error was made in their paper. Now let Z j X j / μ and Z jk Z j Z k . Then Tests based on the empirical Laplace transform, as is V 0 , have produced powerful tests for other distributions, and so it is useful to compare V 0 with other recently suggested tests.

The log TK n Statistic of Vexler et al.
Order the observations so that X 1 ≤ X 2 ≤ · · · ≤ X n , and let Y i 1/ X n−i 1 . Then where δ can be taken to be 0.5. Observe that, for small m, such as m 1, the statistic can take an infinite value when there are tied data. Vexler et al. 4 do not appear to note this. For the Poisson alternative in Table 1 and δ 0.5 the log TK n statistic is often infinite. Choi and Lim 6 show that an entropy-type statistic-like log TK n has good power for the Laplace distribution and so it is of interest to see how the entropy method works with a skewed distribution.

The Anderson-Darling Statistic
Again order the data from smallest to largest to obtain X 1 ≤ X 2 ≤ · · · ≤ X n and take Z i F X i ; λ, μ where F is the distribution function for the IG distribution. Then the Anderson-Darling statistic is The Anderson-Darling has stood the test of time as a useful general option for tests of fit for many distributions. Have newer tests improved on its power performance?

Conventional Smooth Test Third and Fourth Components
Henze and Klar 3 consider the test based on the conventional second-order component U 2 2 and show it has poor power for some alternatives. Ducharme 2 notes that these conventional smooth tests discussed, for example, in Rayner et al. 5 , can be inconsistent. However we decided to include tests based on U 2 3 and U 2 4 in our comparisons. Table 1: Powers in percentages of goodness of fit tests for the inverse Gaussian distribution for n 20 and a α 0.10, b α 0.05.
The parameters λ and μ can be estimated by maximum likelihood ML estimation using the previous formula for λ and μ, thereby giving U 3 and U 4 . We also looked at U 3 and U 4 where the parameters are estimated using method of moments MOM estimators λ n μ 3 / n j 1 X j − μ 2 and μ μ X. As indicated previously, smooth tests can indicate in terms of moments how data and the hypothesised distribution differ. This feature, and good power in previous studies for testing for other distributions, prompted us to include conventional smooth tests in our comparisons.

Sizes and Powers
In this section wherever possible we have used IMSL routines to generate random deviates. Calculations were done using double precision arithmetic and FORTRAN code. For the inverse Gaussian random deviates were found as in Michaels et al. 7 .
We examine a similar range of alternatives to that given by Henze and Klar 3 so that comparisons can be made with the other statistics in their Table 1 a . We note that i for the lognormal alternative LN ϕ the probability density function should be ii the W 1 alternative is a standard exponential alternative. Note that in Henze and Klar 3, Table 1 W 1 , G 1 , and χ 2 2 are equivalent. It appears that the tests based on R 3 and V 0 generally do well while that based on log TK n is only competitive for the symmetric uniform alternative. The smooth tests based on U 2 3 and U 2 4 , like that based on U 2 2 in Henze and Klar 3 , were not competitive. The tests based on U 2 3 and U 2 4 were generally even less competitive. This is unfortunate as these components help describe the data and this facility is not available with the other tests. All powers were calculated using parametric bootstrap.
The alternatives in Table 1 are defined in Henze and Klar 3 . However note that IG λ in Henze and Klar 3 is IG λ, 1 here. There is, however, one exception, and that is the Poisson-type alternative POI 3 which has probability function e −θ θ x /x! in which θ 3 here and if a random x value is zero we take this to be x 0.5. This alternative was suggested by the comment in Henze and Klar 3 that for this shelf life data they examine A 2 and the other EDF statistics have a much smaller P value than V 0 and the other empirical Laplace transform statistics. A feature of the shelf life data was that there were tied observationsnot something one would expect for an inverse Gaussian distribution. The POI 3 alternative gives parametric bootstrap simulated samples with many ties and, as can be seen in Table 1, the power of the test based on A 2 is much greater than those for R 3 or V 0 . The test based on log TK n classifies infinite values as rejections of the null hypothesis. We have no explanation as to why the Anderson-Darling test is quite powerful for tied data when the null hypothesis specifies an inverse Gaussian distribution.
In Table 1 a our powers for the tests based on V 0 and A 2 are very similar to those obtained by Henze and Klar 3 . Table 1 b gives powers for the same alternatives as  Table 1 a but with α 0.05 as this choice of α is commonly used in practice. The relative performance of the tests in Tables 1 a and 1 b is similar. The traditional smooth tests, based on the U statistics, are sometimes particularly poor in Table 1 b .

The Approach to χ 2
An advantage of the smooth test statistics and their components is that under the null hypothesis they have asymptotic χ 2 distributions. Thus for larger sample sizes approximate P values can be found using the χ 2 distribution. However for V 2 2 and R 3 Table 2 indicates that, to give actual test sizes close to the nominal 5%, for λ 2 and μ 1 a sample size of 200 might be needed, while an even greater sample size might be needed for V 2 3 . This ties in with the suggestion, made in Section 1, to use the parametric bootstrap.
We did not expect the conventional smooth test statistics to be asymptotically χ 2 distributed; see, for example, Rayner et al. 5, Section 9.3 . As an illustration of this Table 2  Table 2: Empirical null 95% points for V 2 2 , V 2 3 , R 3 , U 2 3 , and T 2 /v T for λ 2 and μ 1. shows that 95% points of U 2 3 do not converge to 3.84. If m 3 i X i − X 3 /n, we observe that because of the MOM estimators used in U 2 3 we can write the numerator of U 3 as √ n m 3 − 3 μ 5 / λ 2 T say. Then T 2 /v T , where v T is the variance of T , should be asymptotically χ 2 . Here v T 6μ 9 λ 2 12λμ 36μ 2 / nλ 5 . This formula can be found by the delta method. No powers for T 2 /v T are shown in Table 1 as they are similar to those for U 2 3 .

Examples
(i) Failure Times Proschan  0.71 0.06 . We see the tests based on V 2 2 and U 2 4 are significant at the 5% level; the latter suggests the lack of fit is due to kurtosis differences between the model and the data. See Figure 1. Observe that in Figures 1 and 2 the height of the histogram bars is class frequency/number of observations/class width and that this height is labelled "density" so as to be on the same scale as the probability "density" curve. Figure 1 uses MOM estimators for this curve.
Aside from that, we note that in Henze and Klar 3, Table 3 the value 3.7 should be 3.0. This does not affect the conclusions of Henze and Klar for this data set.  Advances in Decision Sciences 9 Figure 2 indicates the inverse Gaussian fit is marginal. We find λ 8.08, μ μ 2.16, and λ 6.72 with V 0 0.33, giving a P value of 0.09. Further test statistics with P values in parentheses are R 3  0.09 0.72 . As the test based on log TK n is most critical of the IG hypothesis, the data may be more symmetric than the IG. The inverse Gaussian curve in Figure 2 uses ML estimators.

(ii) Precipitation at Jug Bridge
In passing we note that the exponential distribution with parameter 0.463 does not provide a good fit to the data. When testing for an exponential distribution A 2 3.226 with an approximate P value of 0.03. Visual inspection of Figure 2 may have indicated the exponential model may have been appropriate.

Conclusion
The tests based on V 0 and R 3 do well in the power comparisons while that based on U 2 4 indicates possible kurtosis differences from the IG distribution in the failure time example. For the precipitation data the test based on log TK n is most critical of the fit of the IG model. In fact apart from the tests based on U 2 3 and U 2 4 , all of the tests studied here had something to recommend them: reasonable power or interpretability. No test was uniformly superior to the others.