On Concordance Measures for Discrete Data and Dependence Properties of Poisson Model

1 Department of Economics, McGill University, Leacock Building, 855 Sherbrooke Street West, C.P. 6128, succursale Centre-ville Montreal, QC, Canada H3A 2T7 2 Département de Mathématiques et d’Informatique, Universitédu Québec à Trois-Rivières, Pavillon Ringuet, local 3060, C.P. 500, Trois-Rivières, QC, Canada G9A 5H7 3 ARC Epidemiology Unit, The University of Manchester, Oxford Road, Manchester M13 9PT, UK


Introduction
The best known dependence property is "lack of dependence," or what is known as stochastic independence.In many applications, independence between two random variables is assumed; this can be a strong assumption in the undertaken analysis.Taking into account the dependence structure between the variables leads to appropriate modeling approaches and correct conclusions.To study stochastic dependence, concordance concept and positive dependence are well used tools.This is because many dependence properties can be described by means of the joint distribution of the variables and these measures and properties are often margins free.In this paper we study two concordance measures, Kendall's tau Kruskal 1 and Spearman's rho Lehmann 2 .These measures have several properties known as Rényi's axioms; for more details see Rényi 3 .Among these axioms, we focus on the range of the association measure.
Many researches have been concerned with the study of tau and rho in the case of continuous variables.Schweizer and Wolff 4 , in one seminal paper, show that the study of concordance measures for continuous random variables can be characterized as the study of copulas 5 .However, for noncontinuous variables, this interrelationship generally does not hold.There are few papers concerning the discrete version of Kendall's tau and Spearman's rho.Conti 6 gives definitions of two approaches of indifference and links them to concordance and discordance properties of the data.Tajar et al. 7 propose a copula-type representation for random couples with binary margins.They show that appropriate measures of association for binary random variables do not depend on the marginal distribution of the variables under study.Mesfioui and Tajar 8 and Denuit and Lambert 9 have shown independently that the range of tau and rho in the discrete case is not the unit interval as in the continuous case.Nešlehovà 10 considers an alternative transformation of an arbitrary random variable to a uniform distribution variable in order to study the rank measures for noncontinuous random variables.
In this paper, we focus on the range of the concordance measures.Aside from identifying the best bounds of tau and rho in the case of discrete random variables, we present some dependence properties of the bivariate Poisson model and discuss their relationship with the concordance measures tau and rho.The paper is organized as follows.The next section provides a method of constructing the ranges of tau and rho for discrete data.Section 3 develops explicit expressions for the best bounds of tau and rho in the discrete Fréchet space with the same marginal.Section 4 provides a new estimator of the copulas based on the so-called empirical copulas.Section 5 discusses some dependence properties of the bivariate Poisson model.

Defintions and Properties
Following Hoeffding 11 , Kruskal 1 , and Lehmann 2 , Schweizer and Wolff 4 express Kendall's tau and Spearman's rho for continuous random vector X, Y in terms of the joint distribution H x, y of X, Y and the margins F x for X and G y for Y .A general representation for each of τ and ρ has been first proposed by Kowalczyk and Niewiadomska-Bugaj 12 ; namely where H x − , y − P X < x, Y < y , H x, y − P X ≤ x, Y < y , H x − , y P X < x, Y ≤ y , and Π x, y F x G y .Several results in this paper are based on the monotonicity property of Kendall's τ and Spearman's ρ.This property has first been proposed for continuous variables by Yanagimoto and Okamoto 13 see also 14 .Tchen 15 obtained similar monotonicity property for τ and ρ when the supports of the joint distributions consist in a finite number of atoms.Mesfioui and Tajar 8 extend various dependence relationships between Kendall's τ and Sperman's ρ in Capéraà and Genest 16 and Nelsen 5 , to the discrete case.One key result of their paper is the generalization to any kind of random variables for continuous and/or discrete variables.
For the remainder of the paper, we recall the property of concordance orderings, defined as follows.
Let X 1 , Y 1 and X 2 , Y 2 be random vectors with identical marginals and respective cdf's H 1 and H 2 .The random couple X 2 , Y 2 is said to be more concordant than In the following proposition, we propose a flexible method to establish the monotonicity property given in Mesfioui and Tajar 8 for purely discrets random vectors.The proof is direct and easy to understand and extends the result to the general random vectors.
Proposition 2.1.Let X 1 , Y 1 and X 2 , Y 2 be two random couples with respective distribution function H 1 and H 2 in Γ F, G , the Fréchet space of all distribution functions with fixed marginals F and G.Then, Proof.Using Fubini's theorem, we note that where H i denotes the survival functions associated to H i , i 1, 2. Now without loss of generality if we assume that H 1 ≤ H 2 , which is equivalent to

Journal of Probability and Statistics
Similarly, we obtain

2.7
Combining the later inequalities with 2.1 , we then obtain 2.3 .It is easy seen that 2.4 is immediate from 2.2 .
For any bivariate distribution function H with univariate marginals F and G, one has 2.8 The extreme distributions H min x, y max 0, F x G y − 1 and H max x, y min F x , G y are often refereed as Fréchet bounds see 17 .These bounds play a central role to construct optimal ranges of τ and ρ as stated in the following corollary.

Corollary 2.2. Let X, Y be a random couple with distribution function
where τ min , ρ min and τ max , ρ max denote the values of Kendall's τ and Spearman's ρ corresponding to the Fréchet lower and upper bounds in Γ F, G , respectively.
As stated earlier, the main objective in this paper is to examine the bounds of τ and ρ in the Fréchet space Γ F, G when F and G are discrete.To do that, let X, Y be a discrete random couple with cdf H ∈ Γ F, G .Since Kendall's τ and Spearman's ρ are scale invariants, they remain unchanged under strictly increasing transformations of the marginal distributions.We can then suppose, without any loss of generality, that X and Y are valued in Z, the set of all integers.Therefore, we can see from 2.1 and 2.2 that τ and ρ can be written as where 12

2.13
In order to obtain the best bounds τ min , ρ min and τ max , ρ max , the minimum and maximum values corresponding to lower and upper bound of τ and ρ, respectively, we replace H in 2.10 and 2.11 by the Fréchet bounds H min i, j max F i G j − 1, 0 and H max i, j min F i , G j , respectively.For discrete data, the ranges of τ and ρ are different from the usual unit interval −1, 1 .This is a violation of the monotone dependence properties of concordance measures, as stated in Nelsen 5 .To correct this problem, we propose the following corrections:

2.14
The main importance of these corrections is that they allow to interpret the levels of the new measures, τ c and ρ c , as percentages.Illustrations of these transformations are proposed in Section 5 with the bivariate Poisson distribution.

Explicit Bounds of Discrete τ and ρ in Γ F, F
The aim of this section is to study the effect of the marginal distributions on the range of τ and ρ for discrete data.Note that it is difficult to obtain explicit expressions of the extreme values of τ and ρ in Γ F, G for noncontinuous distribution F and G.This problem is very complicated and requires several assumptions on F and G.In order to analyze the behavior of these bounds, we consider the particular space Γ F, F , where F is a discrete distribution function.To this end, consider the integer function defined by This function plays an important role to explicit lower bounds of τ and ρ in the space Γ F, F .The next proposition presents explicit optimal bounds of Spearman's ρ.
Proposition 3.1.The best bounds for ρ in the space Γ F, F are given by where Proof.Let H i, j min F i , F j .From 2.12 , we observe that and writing F i − F i − 1 p i , we get from 2.11 that which may be simplified as

3.7
The result then follows from the fact that E F X F X − 1 1.Now, choose H i, j sup F i F j − 1, 0 and put H i, j F i F j − 1.From 2.11 , we see that

3.8
It follows that which may be rewritten as where The result is therefore obtained from 3.11 and 3.10 .
Using 2.10 with H i, j min F i , F j , we notice that the upper bound of Kendall's τ in the space Γ F, F can be expressed as 3.12 Note that the sharp upper bound given in Denuit and Lambert 9 coincides with 3.12 in Γ F, F .However, the behavior of Kendall's tau lower bound in terms of the distribution F is not evident.The following proposition gives an explicit form of this bound in Γ F, F .

Proposition 3.2. The best lower bounds of
where

3.14
Proof.From 2.12 and 2.13 , we observe that

3.15
Consider now H i, j sup F i F j − 1, 0 and write H i, j F i F j − 1.From 2.10 , we get

3.16
Using the fact that 3.17 we have

3.18
which is equivalent to
In this section, we examine the symmetry of the ranges of τ and ρ associated to discretef data.In continuous case, it is well known that the ranges of these parameters are symmetric, that is, τ max −τ min and ρ max −ρ min .This conclusion is of course invalid for noncontinuous data.In order to clarify this question, we consider again the space Γ F, F with discrete distribution F. We present below a situation which ensures that ρ max −ρ min and τ max −τ min .As consequence of Propositions 3.1 and 3.2 and 3.12 , one can establish the following results.

Empirical Copulas Viewed as a Discrete Distribution
It is well recognized that copula provides a flexible approach to model the joint behavior of random variables.In fact, this method allows to represent a bivariate distribution as function of its univariate marginals through a linking function called a copula.Specifically, if H is a distribution function of a bivariate random vector X, Y with continuous marginals, then Sklar 18 ensures that there exists a unique copula C : 0, 1 2 → 0, 1 such that for all x, y ∈ R 2 , Hence, C is a bivariate distribution function with uniform marginals on 0, 1 that captures all the information about the dependence among the components of X, Y .For a comprehensive introduction to a copula, the reader is referred to monographs by Nelsen 5 .Suppose that the random sample X 1 , Y 1 , . . ., X n , Y n is given from some pair X, Y of continuous variable with copula C u, v .To estimate the copula C, Deheuvels 19 proposes the so-called empirical copula defined by where F n and G n are the empirical distribution functions of X and Y based on the sample X 1 , . . ., X n and Y 1 , . . ., Y n given by Let R i be the rank of X i among the sample X 1 , . . ., X n and T i stands the rank of From this representation, one can consider C n u, v as a discrete bivariate distribution with uniform marginals taking values in the set {1/n, 2/n, . . ., 1}.Observe that Now, one can observe that the C n is not copula.Indeed, C n u, 1 nu /n / u, where nu denotes the integer part of nu.
Our goal in this section is to transform the empirical copula in order to obtain a new estimator C * n which is a copula.To this end, let Z n , W n be a discrete random vector with distribution function C n which is defined in 4.2 .The idea is to transform the uniform discrete random variables Z n and W n into a continuous variables Z * n and W * n by defining where U n and V n are independents and uniformly distributed in 0, 1/n .We also suppose that the random vectors Z n and U n resp, W n and V n are independents.The next result shows that the distribution function of the continuous version Z * n , W * n is a copula.
Proposition 4.1.The distribution function C * n of the random vector Z * n , W * n is a copula which may be expressed in terms of the empirical copula as follows: where x is the integer part of x.
Proof.For any u ∈ i/n, i 1 /n , i 0, . . ., n − 1, one sees from the definition of Z * n that and by using the fact that it follows that P Z * n ≤ u u, which ensures that Z * n is uniformly distributed in 0, 1 .Similar arguments imply that W * n is also uniformly distributed in 0, 1 , so that C * n is a copula.Now, we show the expression of C * n given in 4.7 .Let u, v be in the set i/n, i 1 /n × j/n, j 1 /n , i, j 0, . . ., n − 1.In view of relations 4.6 , one has

4.11
which can be rewritten as and hence the result is obtained, since i nu and j nv .
Finally, one concludes that it will be convenient to estimate the theoretical copula C by using the proposal estimator C * n instead of the empirical copula.The reason is that C * n is a copula which uses all the points i/n, j/n , i/n, j 1 /n , i 1/n, j/n , and i 1 /n, 1 j /n in order to estimate C in i/n, i 1 /n × j/n, j 1 /n .

Understanding Dependence Structure of the Bivariate Poisson Distribution
Our purpose in this section is to study dependence properties of the bivariate Poisson distribution H of a random couple X, Y and the relationship between τ and ρ and the parameters of H. Several bivariate Poisson distributions have been proposed in the statistical literature, for example, S. Kocherlakota and K. Kocherlakota 20 .In applied statistics, however, the focus is on the trivariate reduction method described by Johnson et al.21 who construct the Bivariate Poisson distribution using three independent random variables X 1 , X 2 , and Z all distributed as Poisson with parameters λ 1 , λ 2 , and α, respectively: The cumulative distribution of X, Y is given by where To study further the relationships between α and each of τ and ρ for the bivariate Poisson model, we propose an alternative parametrization which consists in fixing the marginal parameters α λ 1 m 1 and α λ 2 m 2 .In this context, the cdf 5.2 becomes As a consequence of the above representation, we can see {H α } as a family of bivariate Poisson models with fixed marginals which are univariate Poisson models with parameters m 1 and m 2 , respectively.This means that the set where F m i denotes the cdf of a Poisson model with mean m i , i 1, 2. The advantage of the parametrization 5.4 rather than 5.2 is that the coefficient α may be interpreted as a dependence parameter in the family {H α }.Now, let τ α and ρ α be Kendall's τ and Spearman's ρ associated with the distribution H α .The result below provides the monotonicity of τ α and ρ α as functions of α.Proposition 5.1.Let H α 1 and H α 2 be two cdf of the set {H α }.Then, and consequently, Proof.From 5.4 , and using the fact that 5.7 becomes, upon simplifications, Therefore 5.9 together with Proposition 2.1 provides 5.5 and 5.6 .
Many statistical researches have focused on studying concepts of positive dependence for bivariate distributions, example right tail increasing, and positive quadrant dependence which are widely used in actuarial literature 22 .There are natural relationships between dependence properties and measures of concordance.An interesting property of positive dependence is the concept of positive quadrant dependence PQD defined as follows: let X, Y be a random couple valued in R × R with joint cdf H, and marginals F and G.These random variables are said to be positively quadrant dependent if, and only if, for all x, y ∈ R 2 H x, y ≥ F x G y .

5.10
The following corollary is a direct consequence of the previous result.
Remark 5.3.When m 1 m 2 m, the upper bound of the family {H α } is given by the cdf H m , and using 5.4 , we then obtain that H m i, j F m i ∧ j min F m i , F m j , for all i, j, which is the upper Fréchet bound.
In order to appreciate the corrections of τ and ρ given by 2.14 , we consider the family of Poisson model {H α } with marginal parameters m 1 m 2 2. Using 3.2 and 3.12 with F m instead of F, we obtain that ρ max 0.951 and τ max 0.792.Table 1 provides τ α and ρ α with their corrections τ α,c and ρ α,c for chosen values of α.
We notice that X and Y are Poisson model with means λ 1 α and λ 2 α, respectively.Note that the covariance and the correlation between X and Y