Statistical Test for Bivariate Uniformity

The purpose of the multidimension uniformity test is to check whether the underlying probability distribution of a multidimensional population differs from the multidimensional uniform distribution. The multidimensional uniformity test has applications in various fields such as biology, astronomy, and computer science. Such a test, however, has received less attention in the literature compared with the univariate case. A new test statistic for checking multidimensional uniformity is proposed in this paper. Some important properties of the proposed test statistic are discussed. As a special case, the bivariate statistic test is discussed in detail in this paper. TheMonte Carlo simulation is used to compare the power of the newly proposed test with the distance-to-boundary test, which is a recently published statistical test for multidimensional uniformity. It has been shown that the test proposed in this paper is more powerful than the distance-to-boundary test in some cases.


Introduction
Testing uniformity in the univariate case has been studied by many researchers, whereas the multidimensional uniformity test seems to have received less attention in the literature. Testing whether a pattern of points in the multidimensional space is distributed uniformly has applications in many fields such as biology, astronomy, and computer science. A commonly used goodness-of-fit test for uniformity is the chi-square test [1]. Theoretically, the chi-square test can be applied for any multivariate distribution test. However, the problem for the chi-square test is the arbitrariness of cell limits determination. Another problem for the chi-square test is that the power of the chi-square test is usually low. Some other well-known methods for univariate goodness-offit tests are the Kolmogorov-Smirnov test [2,3], Anderson-Darling test [4], and the Cramer-von Mises test [5]. Justel et al. [6] proposed a multivariate goodness-of-fit test based on the idea of the Kolmogorov-Smirnov test. By using the Rosenblatt's transformation, they reduced the multivariate case to univariate case. The test statistic they used has distribution free property and can be applied to any dimensional case. The problem for that method is that the computation of test statistic is complicated especially for over two dimensions. Liang et al. [7] proposed several statistical tests for testing uniformity in multivariate case. Those tests used the numbertheoretic and quasi-Monte Carlo method for measuring the discrepancy of the points in multidimensional unit. Berrendero et al. [8] proposed a test based on the idea of distance to the boundary. It was shown by Monte Carlo simulation that the distance-to-boundary test is more powerful than the tests proposed by Liang et al. [7]. Chen and Ye [9] developed an alternative test for uniformity in univariate case. In that paper, the authors proposed a test statistic based on the order statistics in support set [0, 1]. The test statistic proposed in that paper is The Monte Carlo simulation results showed that the proposed test in that paper is more powerful compared with the commonly used Kolmogorov-Smirnov test when the alternative distribution is a V-shape distribution or when the sample size is small. By applying the probability integral transformation, the uniformity test can be used to check whether the underlying distribution follows any specified distribution. The idea is adopted in this paper to develop a test for the multidimensional case. 2

Advances in Statistics
The main purpose of this paper is to propose a new test statistics for testing multidimensional uniformity. It is expected that the newly proposed test may improve the power of the multidimensional uniformity tests. Since the distanceto-boundary test is a recently published test in multivariate case, the power of test proposed in this paper will be compared with the power of the distance-to-boundary test. While the statistical test can be used for any multidimensional case, the discussion will be mainly based on the bivariate case. Some techniques used in nonparametric statistics are adopted to modify the test statistic for the purpose of raising the power of the test for the bivariate case.

New Test Statistic
Suppose (2) form a random sample from a -dimensional population distribution with support set [0, 1] ( ) . Here [0, 1] ( ) is thedimensional unit cube which is the set defined as Suppose also that (1) , (2) , . . . , ( ) are the ordered values of 1 , 2 , . . . , ( = 1, 2, . . . , ). The purpose is to test the following: The test statistic proposed in this paper is defined as Here it is assumed that Therefore, if the value of (X 1 , X 2 , . . . , X ) is too far away from zero, it could be an indication that the underlying distribution is not uniform distribution on [0, 1] ( ) . This motivates the following test procedure. Under 0 , let ,1− be a number such that Then 0 should be rejected at significance level if (X 1 , X 2 , . . . , X ) > ,1− . It can be shown that (X 1 , X 2 , . . . , X ) is always between 0 and 1. In fact, It can be found from above that (X 1 , X 2 , . . . , X ) can also be rewritten as As mentioned above, this paper will mainly discuss the bivariate case. Suppose In order to raise the power of the test, the test statistic defined in (4) is adjusted by adopting the Kendall's statistic. See, for example, Conover [10]. For any pair of points [ 1 2 ] and [ 1 2 ] in the two dimensional space, define the following items.
Let be the total number of concordant pairs and let be the total number of discordant pairs. Define Here it is assumed that (0)1 = (0)2 = 0 and ( +1)1 = ( +1)2 = 1. It will be shown below that the inclusion of the term (| − | + 1)/( + + 1) can raise the power of the test significantly when the two variables of the alternative bivariate distribution are correlated.
It should be mentioned that the lower and upper bounds of the above inequality cannot be improved. It fact, one may construct bivariate data sets easily such that the values of 2 (X 1 , X 2 , . . . , X ) will reach 0 and 1, respectively.

Critical Values and Power Comparison
Monte Carlo simulation is used to find the critical values of the test statistic described in (10). To accomplish this, = 10,000,000 pseudo random samples of size are generated from the two-dimensional uniform distribution on [0, 1] × [0, 1] for = 5, 6, . . . , 50. The critical values of the test statistic are tabulated in Table 1. The first column is for the sample sizes and the first row is for the significance levels. The values inside the tables are the critical values corresponding to the sample size and the significance level.
The power of the test statistic proposed in this paper is compared with the recently published distance-to-boundary test by Berrendero et al. [8]. This is because the distance-toboundary test has been shown to possess good performance in many cases. For convenience, the test statistic proposed in this paper is denoted as the 2 test for the rest of the paper. Several alternative distributions are selected for the power comparison purpose. The selected alternative distributions can be classified into two types. The first type of alternative 4 Advances in Statistics

Alternative Bivariate Distributions Based on Independent
Beta Distribution. The probability density function of the univariate Beta distribution is The Beta distribution family is quite flexible to get different shapes by selecting parameters and . The bivariate alternative distributions used in the power comparison are formed by two independent univariate Beta distributions.

Alternative Distribution 1.
The bivariate Beta distribution is formed by two independent Beta (5, 2) marginal distributions. Figure 1 shows the power comparison between the 2 test and distance-to-boundary test. It can be seen that the 2 test is more powerful than the distance-to-boundary test in this case.

Alternative Distribution 2.
The bivariate Beta distribution is formed by two independent Beta (5, 1) marginal distributions. Figure 2 shows the power comparison between the 2 test and distance-to-boundary test. It can be seen that the 2 test is more powerful when sample size is less than 25. When the sample size increases, the power of two tests is pretty close. The distance-to-boundary test is slightly better.

Alternative Distribution 3.
The bivariate Beta distribution is formed by two independent Beta (0.5, 0.5) marginal distributions. Figure 3 shows the power comparison between the 2 test and distance-to-boundary test. It can be seen that the distance-to-boundary test performs better in this case. This is probably because the Beta (0.5, 0.5) is a symmetric distribution. The symmetric situation is discussed in Berrendero et al. [8]. It has been shown that the power of distance-toboundary is higher in this case. After changing the symmetric condition, the result seems different.

Metatype Uniform Distribution.
The Metatype uniform distribution was mentioned in the papers of Liang et al. [7] and Berrendero et al. [8]. They introduced this distribution for the power comparison purpose. The basic idea for creating metatype multivariate distribution is as follows. Let the random vector = ( 1 , 2 ) have a distribution function ( ). We define 1 ( 1 ) as the marginal distribution function of 1 and 2 ( 2 ) as the marginal distribution function of 2 . Then define random vector as = ( 1 ( 1 ), 2 ( 2 )) . As we know, 1 ( 1 ) and 2 ( 2 ) are uniformly distributed in the support set [0, 1] and the joint distribution is different from the uniform distribution since 1 ( 1 ) and 2 ( 2 ) are not independent. This kind of multivariate distribution is easily generated by any software and is useful to check the multivariate uniform distribution. Specifically, we have considered two of the metatype uniform distributions in power study. . For the consistence of comparison, the same parameters are chosen as in Berrendero et al. [8]. The power comparison result under such a metatype uniform distribution is shown in Figure 4. The 2 test is more powerful than distance-toboundary test in this case. When the sample size increases, the power of the 2 test increases and the power of distanceto-boundary test does not change too much.

Alternative Distribution 5.
MTU is obtained from bivariate Student's-distribution with = [ 0 0 ] and Σ = [ 1 0.5 0.5 1 ] and 5 degrees of freedom. The power comparison result under such a metatype uniform distribution is shown in Figure 5. The 2 test is more powerful than distance-toboundary test in this case. When the sample size increases, the power of the 2 test increases and the power of distanceto-boundary test does not change too much.

Conclusion and Discussion
In this paper, the new multidimensional uniformity test is proposed. The basic idea is from univariate uniform distribution test in the paper of Chen and Ye [9]. The method is extended to the multidimensional case and the bivariate case is discussed in detail. The new test can be used to test whether an underlying multivariate probability distribution differs from a uniform distribution. The critical value of bivariate uniformity test is calculated and the power study performed by comparing with the recently published multivariate uniformity test.
The distance-to-boundary is a recently published multivariate uniformity test by Berrendero et al. [8]. The result of