Accuracy of Approximation for Discrete Distributions

The paper is a contribution to the problem of estimating the deviation of two discrete probability distributions in terms of the supremum distance between their generating functions over the interval [0, 1]. Deviation can be measured by the difference of the kth terms or by total variation distance. Our new bounds have better order of magnitude than those proved previously, and they are even sharp in certain cases.


Introduction
Dealing with random combinatorial structures needs to estimate the deviation of two discrete probability distributions in terms of the maximal distance between their generating functions over [0,1].This is often the case when given a couple of not necessarily independent random events, one needs to estimate the number  of those that occur.Among many popular methods of Poisson approximation, sieve methods with generalized Bonferroni bounds, such as the graph sieve [1], are at hand.They provide estimates not only for the probability that none of the events occur, but also for the difference between the generating function of  and that of the corresponding Poisson distribution over the interval [0, 1] (see [2] for more details).This raises the following problem.
The difficulty lies in the constraint that the difference of the generating functions is only available over the real interval [0, 1], and not over the whole complex unit disc, which would make it possible to apply standard methods of characteristic functions.
Several positive and negative results were achieved in the last three decades, beginning with [3]; see Section 2.
Lower and upper estimates got closer and closer, but the final answer is still ahead.The aim of the present note is to provide new bounds that have better order of magnitude than those proved previously.They are even sharp in certain cases.
The paper is organized as follows.In Section 2 we introduce the necessary notions and notations and cite some earlier results.Section 3 is devoted to the case where |  −  | is estimated in terms of Δ, while in Section 4 the total variation distance is treated.
For  ∈ (0, 1] and  = 0, 1, . . .define In [2] it was shown that, for every  = 0, 1, . .., holds with a suitable positive constant   , if Δ is sufficiently small.In fact, the upper estimate is valid for every Δ ∈ (0, 1], but it becomes trivial for Δ ≥ exp(−32 4 ) by elementary calculus.Though in both the upper and lower bounds the multiplier of Δ is slowly varying as Δ → 0, they are not of the same order of magnitude.
It is easy to see that ∑ ∞ =0 |  | cannot be estimated in a nontrivial way, because for arbitrary Δ ∈ (0, 1] there exists a function  ∈ F, such that Δ() = Δ and ∑ ∞ =0 |  | = 2, maximal.Indeed, let  > 1/Δ and Then 0 ≤ () ≤ Δ for  ∈ [0, 1], because by the choice of , provided Δ < 1.For Δ = 1 the estimation obviously holds.Hence Δ() = (0) = Δ.However, if  =  p −  q , and one of the distributions, say p, is fixed (as in the case of Poisson approximation), then the class of feasible functions is smaller; thus the bounds for |  | may decrease, and even the total variation distance can be estimated nontrivially.The following results can be found in [2].
Let p be a fixed discrete probability distribution such that   > 0 for every  = 0, 1, . .., and lim sup Let  be a positive integer and  a sufficiently small positive constant.Then for every sufficiently small positive Δ there exists a discrete probability distribution q, such that Δ( p −  q ) = Δ, and If the tail of p is lighter than exponential, the lower estimate decreases.
Instead of (8), suppose that lim sup is positive and finite, where ℎ is a positive, continuous, and increasing function, regularly varying at infinity, and lim →∞ ℎ()/ = ∞.Let  be a positive integer and  a sufficiently small positive constant.Then for every sufficiently small positive Δ there exists a discrete probability distribution q, such that Δ( p −  q ) = Δ, and Particularly, when p is the Poisson distribution with parameter , it follows that for every sufficiently small positive Δ there exists a discrete probability distribution q, such that Δ( p −  q ) = Δ, and (The constant   does not depend on  and q.The parameter  only appears in the bounds implicit in the phrase "sufficiently small.")Let us turn to the case of total variation.In [2] for every fixed p an increasing function  : [0, 1] → R was constructed in such a way that lim →0 () = 0 and ‖p − q‖ ≤ (Δ) for arbitrary q.However, apart from the case where the tail of p was extremely light, the function  proved to be slowly varying at 0, which is just a little bit better than nothing.For example, in the case of Poissonian p, the following inequality was obtained: as q varies in such a way that Δ → 0. Since ‖p − q‖ ≥ |  −   |, every lower estimate obtained for fixed  will do for the total variation.However, if   does not decrease faster than exponential, that is, condition (8) is fulfilled, there is a lower estimate of the form with an exponent  ∈ (0, 1) depending on p.
When the tail of p is lighter than exponential, namely, condition (10) holds, then for every sufficiently small positive Δ the following lower estimate is valid.
with a constant  depending on p. Particularly, in the case where p is Poisson, was proved.

Estimation for the Difference of 𝑘th Terms
The following important result can be traced back to Markoff, 1892 [4], who dealt with the extremal properties of Chebyshev polynomials over the interval [−1, 1]; see Chapter 2 of [5].The proof can be found in [6] or in [7].
Theorem 1.Let  be a polynomial of degree less than or equal to , and 0 ≤  ≤ .Then Using this result an upper bound can be proved without any restriction on , which is of the same order as the lower bound in the left-hand side of ( 5) Proof.Suppose first that Δ ≤ Here  −2(+1) <  − = Δ; therefore the right-hand side is less than Δ.By Theorem 1 we then have as claimed.
If  −2 < Δ ≤  −/2 , then the upper bound, being greater than 1, is trivial.Indeed, in that interval, Δ 2 is decreasing, hence the upper bound attains its minimum at Δ =  −/2 .Then its value is Here the right-hand side is equal to  for  = 0 and to (1/8) 5/2 > 1 for  = 1.Stepping further from  to  + 1, the right-hand side gets multiplied by where we used the fact that (( + 1)/)  is increasing; hence it is not less than 2.
Finally, it is easy to see that  2+1 /(2)! is decreasing for  ≥ 1, from which the second inequality follows.
If p satisfies (8), that is,   cannot decrease faster than exponential, then (9) implies that the estimate of Theorem 2 is sharp in the order of magnitude.Theorem 3. Let   =   −   , where p = (  ) ≥0 and q = (  ) ≥0 are discrete probability distributions.Suppose that where  is a positive, continuous, and strictly decreasing function tending to zero.Then for every  <  −1 (Δ).
Remark 5.If 1/ℎ() log(1/  ) is bounded away from zero and infinity as well, then both the upper estimate of Theorem 3 and the lower estimate of (11) are applicable and they are of the same order of magnitude.Use the fact that for regularly varying functions ℎ of order  > 0 we have (ℎ Remark 6. Reference [6] proved similar bounds with different constants, but of the same order of magnitude.However, they imposed conditions similar to (10) on the sequence   =   −   , rather than on p, which is less useful for applications in probability.Besides, for the estimate of Theorem 2, which is true without any restriction on the coefficients, they needed exponential decay of the sequence (  ).
Remark 7. If ℎ is linear, that is, ∑ ≥   ≤  −V , and it holds with a sufficiently large V (V ≥ 2 will do), the upper bound of Theorem 3 is better for  > 0 than that of Theorem 2. (For  = 0 the bound | 0 −  0 | ≤ Δ is obviously the best possible.) Particularly, let p be the Poisson distribution with mean .Then, for  > , we have (see Theorem A.15 in [8]); hence ℎ() =  log  + (), and as  → ∞.In addition, ℎ −1 () > / log  eventually.Let us plug this back into Theorem 3 to get the following estimate.
Corollary 8.If p is Poisson, then, uniformly in q, one has for every  < / log , if Δ is sufficiently small.
Note that the order of this upper bound is the same as that of the lower bound (12).

Estimation for the Total Variation Distance
Let again   =   −   , where p = (  ) ≥0 and q = (  ) ≥0 are discrete probability distributions.As we have seen, if nothing is known about p and q, it is impossible to give a nontrivial upper bound for the total variation distance However, when p is fixed, the situation is completely different.
The right-hand side tends to 0 as Δ → 0 only if the tail of p is not too heavy; namely, () = ( − ).
The method of proof will be applied a couple of times in the sequel with different parameters.Therefore we formulate its essence in a separate lemma as a master inequality.
Lemma 10.Let , , and  be positive real numbers;  ≤ .Suppose Proof.Let  = ⌈⌉, then  − 1 <  ≤ .Clearly, By supposition we have Hence Proof of Theorem 9. Starting from Theorem 3, let us apply Lemma 10 with  = 3Δ,  = (2 + 1) 1/2  −1 (Δ), and  =  −1 (Δ).We get that Particularly, let p be the Poisson distribution with mean .As we have seen in (35), Let us plug this into Theorem 9. Writing 4 in place of  in the exponent we can get rid of the term 1 + (1), and even the multiplier 8 gets eventually absorbed.Thus we obtain the following estimate.
Corollary 11.Let p be Poisson; then, uniformly in q, one has if Δ is sufficiently small.This is already similar to the lower bound (16), and is much better than (13).
If the tail of p is subexponential, that is, ∑ ≥   = exp(−()), then the estimate of Theorem 3 is useless: it tends to infinity as Δ → 0. However, with suitably chosen parameters in the master inequality, a reasonable upper bound can be obtained: not really sharp, but at least not trivial.
Theorem 12. Suppose () =  − () Since (4 2 ) 1/4 = 2.331 ⋅ ⋅ ⋅ < , the first term in the righthand side can be estimated by a positive power of Δ, while the second term decreases slower than any positive power of Δ, as Δ → 0; thus it will eventually dominate.(49) Finally, let us deal with the case of exponentially decaying tails.Let () =  −V , with V ≥ 0. Then the total variation ‖p−q‖ can be estimated by a positive power of Δ.This follows from Theorem 9, but only for V > .The following theorem is valid for all positive V. Theorem 14. Suppose () ≤  −V , with V > 0. Then ‖p − q‖ ≤ 8Δ V/(V+7) . (50) As not sharp, the concrete form of the exponent holds no interest; what really matters is that it is positive and tends to 1 as V → ∞.Note that if   tends to 0 at an exponential rate, (14) provides a lower bound which is also a positive power of Δ (from Theorem 2.4 of [2] it follows that  = (V+1/4)/(V+2) can be used for exponent).