IJMMSInternational Journal of Mathematics and Mathematical Sciences1687-04250161-1712Hindawi Publishing Corporation38294810.1155/2008/382948382948Research ArticleOrder Statistics and Benford's LawMillerSteven J.1NigriniMark J.2DshalalowJewgeni1Department of Mathematics and StatisticsWilliams CollegeWilliamstownMA 01267USAwilliams.edu2Accounting and Information SystemsSchool of BusinessThe College of New JerseyEwing, NJ 08628USAtcnj.edu20082810200820080206200806092008131020082008Copyright © 2008This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Fix a base B>1 and let ζ have the standard exponential distribution; the distribution of digits of ζ base B is known to be very close to Benford's law. If there exists a C such that the distribution of digits of C times the elements of some set is the same as that of ζ, we say that set exhibits shifted exponential behavior base B. Let X1,,XN be i.i.d.r.v. If the Xi's are Unif, then as N the distribution of the digits of the differences between adjacent order statistics converges to shifted exponential behavior. If instead Xi's come from a compactly supported distribution with uniformly bounded first and second derivatives and a second-order Taylor series expansion at each point, then the distribution of digits of any Nδ consecutive differences and all N1 normalized differences of the order statistics exhibit shifted exponential behavior. We derive conditions on the probability density which determine whether or not the distribution of the digits of all the unnormalized differences converges to Benford's law, shifted exponential behavior, or oscillates between the two, and show that the Pareto distribution leads to oscillating behavior.

1. Introduction

Benford's law gives the expected frequencies of the digits in many tabulated data. It was first observed by Newcomb in the 1880s, who noticed that pages of numbers starting with a 1 in logarithm tables were significantly more worn than those starting with a 9. In 1938 Benford  observed the same digit bias in a variety of phenomena. From his observations, he postulated that in many datasets, more numbers began with a 1 than with a 9; his investigations (with 20,229 observations) supported his belief. See [2, 3] for a description and history, and  for an extensive bibliography.

For any base B>1, we may uniquely write a positive x as x=MB(x)Bk, where k and MB(x) (called the mantissa) is in [1,B). A sequence of positive numbers {an} is Benford baseB if the probability of observing a mantissa of an base B of at most s is logBs. More precisely, for s[1,B], we havelimN#{nN:1MB(αn)s}N=logBs. Benford behavior for continuous functions is defined analogously. (If the functions are not positive, we study the distribution of the digits of the absolute value of the function.) Thus, working base 10 we find the probability of observing a first the probability of observing a first digit of d is log10(d+1)log10(d), implying that about 30% of the time the first digit is a 1.

We can prove many mathematical systems follow Benford's law, ranging from recurrence relations  to n!  to iterates of power, exponential and rational maps, as well as Newton's method ; to chains of random variables and hierarchical Bayesian models ; to values of L-functions near the critical line; to characteristic polynomials of random matrix ensembles and iterates of the 3x+1-Map [11, 12]; as well as to products of random variables . We also see Benford's law in a variety of natural systems, such as atomic physics , biology , and geology . Applications of Benford's law range from rounding errors in computer calculations (see [17, page 255]) to detecting tax (see [18, 19]) and voter fraud (see ).

This work is motivated by two observations (see Remark 1.9 for more details). First, since Benford's seminal paper, many investigations have shown that amalgamating data from different sources leads to Benford behavior; second, many standard probability distributions are close to Benford behavior. We investigate the distribution of digits of differences of adjacent ordered random variables. For any δ<1, if we study at most Nδ consecutive differences of a dataset of size N, the resulting distribution of leading digits depends very weakly on the underlying distribution of the data, and closely approximates Benford's law. We then investigate whether or not studying all the differences leads to Benford behavior; this question is inspired by the first observation above, and has led to new tests for data integrity (see ). These tests are quick and easy-to-apply, and have successfully detected problems with some datasets, thus providing a practical application of our main results.

Proving our results requires analyzing the distribution of digits of independent random variables drawn from the standard exponential, and quantifying how close the distribution of digits of a random variable with the standard exponential distribution is to Benford's law. Leemis et al.  have observed that the standard exponential is quite close to Benford's law; this was proved by Engel and Leuenberger , who showed that the maximum difference in the cumulative distribution function from Benford's law (base 10) is at least .029 and at most .03. We provide an alternate proof of this result in the appendix using a different technique, as well as showing that there is no base B such that the standard exponential distribution is Benford base B (Corollary A.2).

Both proofs apply Fourier analysis to periodic functions. In [23, equation (5)], the main step is interchanging an integration and a limit. Our proof is based on applying Poisson summation to the derivative of the cumulative distribution function of the logarithms modulo 1, FB. Benford's law is equivalent to FB(b)=b, which by calculus is the same as FB(b)=1 and FB(0)=0. Thus, studying the deviation of FB(b) from 1 is a natural way to investigate the deviations from Benford behavior. We hope the details of these calculations may be of use to others in investigating related problems (Poisson summation has been fruitfully used by Kontorovich and Miller  and Jang et al.  in proving many systems are Benford's; see also ).

1.1. Definitions

A sequence {an}n=1[0,1] is equidistributed iflimN#{n:nN,an[a,b]}N=ba[a,b][0,1].Similarly, a continuous random variable on [0,), whose probability density function is p, is equidistributed modulo 1 iflimT0Tχa,b(x)p(x)dx0Tp(x)dx=ba,for any [a,b][0,1], where χa,b(x)=1 for xmod1[a,b] and 0 otherwise.

A positive sequence (or values of a function) is Benford base B if and only if its base B logarithms are equidistributed modulo 1; this equivalence is at the heart of many investigations of Benford's law (see [6, 25] for a proof).

We use the following notations for the various error terms.

Let (x) denote an error of at most x in absolute value; thus f(b)=g(b)+(x) means |f(b)g(b)|x.

Big-Oh notation: for g(x) a nonnegative function, we say f(x)=O(g(x)) if there exist an x0 and a C>0 such that, for all xx0, |f(x)|Cg(x).

The following theorem is the starting point for investigating the distribution of digits of order statistics.

Theorem 1.1.

Let ζ have the standard (unit) exponential distributionProb(ζ[α,β])=αβetdt,[α,β][0,).For b[0,1], let FB(b) be the cumulative distribution function of logBζmod1; thus FB(b):=Prob(logBζmod1[0,b]). Then, for all M2,FB(b)=1+2m=1Re(e2πimbΓ(1+2πimlogB))=1+2m=1M1Re(e2πimbΓ(1+2πimlogB))+(42πc1(B)e(π2c2(B))M/logB),where c1(B),c2(B) are constants such that for all mM2, one has e2π2m/logBe2π2m/logBe2π2m/logBc12(B),mlogBe2c2(B)m/logB,1e(π2c2(B))M/logB12.For B[e,10], take c1(B)=2 and c2(B)=1/5, which giveProb(logζmod1[a,b])=ba+2rπsin(π(b+a)+θ)sin(π(ba))+(6.32107),with r0.000324986, θ1.32427186, andProb(log10ζmod1[a,b])=ba+2r1πsin(π(b+a)θ1)sin(π(ba))r2πsin(2π(b+a)+θ2)sin(2π(ba))+(8.5105)withr10.0569573,θ10.8055888,r20.0011080,θ20.1384410.

The above theorem was proved in ; we provide an alternate proof in Appendix A. As remarked earlier, our technique consists of applying Poisson summation to the derivative of the cumulative distribution function of the logarithms modulo 1; it is then very natural and easy to compare deviations from the resulting distribution and the uniform distribution (if a dataset satisfies Benford's law, then the distribution of its logarithms is uniform). Our series expansions are obtained by applying properties of the Gamma function.

Definition 1.2 (Definition exponential behavior, shifted exponential behavior).

Let ζ have the standard exponential distribution, and fix a base B. If the distribution of the digits of a set is the same as the distribution of the digits of ζ, then one says that the set exhibits exponential behavior (base B). If there is a constant C>0 such that the distribution of digits of all elements multiplied by C is exponential behavior, then one says that the system exhibits shifted exponential behavior (with shift of logBCmod1).

We briefly describe the reasons behind this notation. One important property of Benford's law is that it is invariant under rescaling; many authors have used this property to characterize Benford behavior. Thus, if a dataset is Benford base B, and we fix a positive number C, so is the dataset obtained by multiplying each element by C. This is clear if, instead of looking at the distribution of the digits, we study the distribution of the base B logarithms modulo 1. Benford's law is equivalent to the logarithms modulo 1 being uniformly distributed (see, e.g., [6, 25]); the effect of multiplying all entries by a fixed constant simply translates the uniform distribution modulo 1, which is again the uniform distribution.

The situation is different for exponential behavior. Multiplying all elements by a fixed constant C (where CBk for some k) does not preserve exponential behavior; however, the effect is easy-to-describe. Again looking at the logarithms, exponential behavior is equivalent to the base B logarithms modulo 1 having a specific distribution which is almost equal to the uniform distribution (at least if the base B is not too large). Multiplying by a fixed constant CBk shifts the logarithm distribution by logBCmod1.

1.2. Results for Differences of Orders Statistics

We consider a simple case first, and show how the more general case follows. Let X1,,XN be independent identically distributed from the uniform distribution on [0,L]. We consider L fixed and study the limit as N. Let X1:N,,XN:N be the Xi's in increasing order. The Xi:N are called the order statistics, and satisfy 0X1:NX2:NXN:NL. We investigate the distribution of the leading digits of the differences between adjacent Xi:N's, Xi+1:NXi:N. For convenience, we periodically continue the data and set Xi+N:N=Xi:N+L. As we have N differences in an interval of size L, the average value of Xi+1:NXi:N is of size L/N, and it is sometimes easier to study the normalized differencesZi;N=Xi+1:NXi:NL/N.As the Xi's are drawn from a uniform distribution, it is a standard result that as N, the Zi;N's are independent random variables, each having the standard exponential distribution. Thus, as N, the probability that Zi;N[a,b] tends to abetdt (see [26, 27] for proofs).

For uniformly distributed random variables, if we know the distribution of logBZi;Nmod1, then we can immediately determine the distribution of the digits of the Xi+1:NXi:N base B becauselogBZi;N=logB(Xi+1:NXi:NL/N)=logB(Xi+1:NXi:N)logB(LN).

As Zi;N are independent with the standard exponential distribution as N; if Xi are independent uniformly distributed, the behavior of the digits of the differences Xi+1:NXi:N is an immediate consequence of Theorem 1.1. Theorem 1.3 (Shifted exponential behavior of differences of independent uniformly distributed random variables).

Let X1,,XN be independently distributed from the uniform distribution on [0,L], and let X1:N,,XN:N be Xi's in an increasing order. As N, the distribution of the digits (base B) of the differences Xi+1:NXi:N converges to shifted exponential behavior, with a shift of logB(L/N)mod1.

A similar result holds for other distributions. Theorem 1.4 (Shifted exponential behavior of subsets of differences of independent random variables).

Let X1,,XN be independent, identically distributed random variables whose density f(x) has a second-order Taylor series at each point with first and second derivatives uniformly bounded, and let the Xi:N's be the Xi's in increasing order. Fix a δ(0,1). Then, as N the distribution of the digits (base B) of Nδ consecutive differences Xi+1:NXi:N converges to shifted exponential behavior, provided that Xi:N's are from a region where f(x) is nonzero.

The key ingredient in this generalization is that the techniques, which show that the differences between uniformly distributed random variables become independent exponentially distributed random variables, can be modified to handle more general distributions.

We restricted ourselves to a subset of all consecutive spacings because the normalization factor changes throughout the domain. The shift in the shifted exponential behavior depends on which set of Nδ differences we study, coming from the variations in the normalizing factors. Within a bin of Nδ differences, the normalization factor is basically constant, and we may approximate our density with a uniform distribution. It is possible for these variations to cancel and yield Benford behavior for the digits of all the unnormalized differences. Such a result is consistent with the belief that amalgamation of data from many different distributions becomes Benford; however, this is not always the case (see Remark 1.6). From Theorems 1.1 and 1.4, we obtain the following theorem. Theorem 1.5 (Benford behavior for all the differences of independent random variables).

Let X1,,XN be independent identically distributed random variables whose density f(x) is compactly supported and has a second-order Taylor series at each point with first and second derivatives uniformly bounded. Let the Xi:N's be the Xi's in an increasing order F(x) be the cumulative distribution function for f(x), and fix a δ(0,1). Let I(ϵ,δ,N)=[ϵN1δ,N1δϵN1δ]. For each fixed ϵ(0,1/2), assume that

f(F1(kNδ1) is not too small for kI(ϵ,δ,N)limNmaxkI(ϵ,δ,N)min(N(ϵ+δ/2),Nδ1)f(F1(kNδ1))=0,

logBf(F1(kNδ1)mod1 is equidistributed: for all [α,β][0,1]limN#{kI(ϵ,δ,N):logBf(F1(kNδ1))mod1[α,β]}Nδ=βα.

Then, if ϵ>max(0,1/3δ/2) and ϵ<δ/2, the distribution of the digits of the N1 differences Xi+1:NXi:N converges to Benford's law (base B) as N.

Remark 1.6.

The conditions of Theorem 1.5 are usually not satisfied. We are unaware of any situation where (1.13) holds; we have included Theorem 1.5 to give a sufficient condition of what is required to have Benford's law satisfied exactly, and not just approximately. In Lemma 3.3, we show with: Example 3.3 shows that the conditions fail for the Pareto distribution, and the limiting behavior oscillates between Benford and a sum of shifted exponential behavior. (If several datasets each exhibit shifted exponential behavior but with distinct shifts, then the amalgamated dataset is closer to Benford's law than any of the original datasets. This is apparent by studying the logarithms modulo 1. The differences between these densities and Benford's law will look like Figure 1(b) (except, of course, that different shifts will result in shifting the plot modulo 1). The key observation is that the unequal shifts mean that we do not have reinforcements from the peaks of modulo 1 densities being aligned, and thus the amalgamation will decrease the maximum deviations.) The arguments generalize to many densities whose cumulative distribution functions have tractable closed-form expressions (e.g., exponential, Weibull, or f(x)=eexex).

All 499 999 differences of adjacent order statistics from 500 000 independent random variables from the Pareto distribution with minimum value and variance 1. (a) Observed digits of scaled differences of adjacent random variables versus Benford's law; (b) scaled observed minus Benford's law (cumulative distribution of base 10 logarithms).

The situation is very different if instead we study normalized differences:Z˜i:N=Xi+1:NXi:N1/Nf(Xi:N),note if f(x)=1/L is the uniform distribution on [0,L], (1.14) reduces to (1.10). Theorem 1.7 (Shifted exponential behavior for all the normalized differences of independent random variables).

Assume the probability distribution f satisfies the conditions of Theorem 1.5 and (1.12) and Z˜i;N is as in (1.14). Then, as N, the distribution of the digits of Z˜i:N converges to shifted exponential behavior.

Remark 1.8.

Appropriately scaled, the distribution of the digits of the differences is universal, and is the exponential behavior of Theorem 1.1. Thus, Theorem 1.7 implies that the natural quantity to study is the normalized differences of the order statistics, not the differences (see also Remark 3.5). With additional work, we could study densities with unbounded support and show that, through truncation, we can get arbitrarily close to shifted exponential behavior.

Remark 1.9.

The main motivation for this work is the need for improved ways of assessing the authenticity and integrity of scientific and corporate data. Benford's law has been successfully applied to detecting income tax, corporate, and voter fraud (see ); in , we use these results to derive new statistical tests to examine data authenticity and integrity. Early applications of these tests to financial data showed that it could detect errors in data downloads, rounded data, and inaccurate ordering of data. These attributes are not easily observable from an analysis of descriptive statistics, and detecting these errors can help managers avoid costly decisions based on erroneous data.

The paper is organized as follows. We prove Theorem 1.1 in Appendix A by using Poisson summation to analyze FB(b). Theorem 1.3 follows from the results of the order statistics of independent uniform variables. The proof of Theorem 1.4 is similar, and given in Section 2. In Section 3, we prove Theorems 1.5 and 1.7.

2. Proofs of Theorems <xref ref-type="statement" rid="thm1.3">1.3</xref> and <xref ref-type="statement" rid="thm1.4">1.4</xref>

Theorem 1.3 is a consequence of the fact that the normalized differences between the order statistics drawn from the uniform distribution converge to being independent standard exponentials. The proof of Theorem 1.4 proceeds similarly. Specifically, over a short enough region, any distribution with a second-order Taylor series at each point with first and second derivatives uniformly bounded is well approximated by a uniform distribution.

To prove Theorem 1.4, it suffices to show that if X1,,XN are drawn from a sufficiently nice distribution, then for any fixed δ(0,1) the limiting behavior of the order statistics of Nδ adjacent Xi's becomes Poissonian (i.e., the Nδ1 normalized differences converge to being independently distributed from the standard exponential). We prove this below for compactly supported distributions f(x) that have a second-order Taylor series at each point with the first and second derivatives uniformly bounded, and when the Nδ adjacent Xi's are from a region where f(x) is bounded away from zero.

For each N, consider intervals [aN,bN] such that aNbNf(x)dx=Nδ/N; thus, the proportion of the total mass in such intervals is Nδ1. We fix such an interval for our arguments. For each i{1,,N}, letwi={1,ifXi[aN,bN]0,otherwise. Note wi is 1 with probability Nδ1 and 0 with probability 1Nδ1; wi is a binary indicator random variable, telling us whether or not Xi[aN,bN]. Thus,𝔼[i=1Nwi]=Nδ,Var(i=1Nwi)=Nδ(1Nδ1).Let MN be the number of Xi in [aN,bN], and let βN be any nondecreasing sequence tending to infinity (in the course of the proof, we will find that we may take any sequence with βN=O(Nδ/2)). By (2.2) and the central limit theorem (which we may use as wi's satisfy the Lyapunov condition), with probability tending to 1, we haveMN=Nδ+O(βNNδ/2).

We assume that in the interval [aN,bN] there exist constants c and C such that whenever x[aN,bN], 0<c<f(x)<C<; we assume that these constants hold for all regions investigated and for all N. (If our distribution has unbounded support, for any ϵ>0, we can truncate it on both sides so that the omitted probability is at most ϵ. Our result is then trivially modified to be within ϵ of shifted exponential behavior.) Thus,c(bNaN)aNbNf(x)dx=Nδ1C(bNaN),implying that bNaN is of size Nδ1. If we assume that f(x) has at least a second-order Taylor expansion, thenf(x)=f(aN)+f(aN)(xaN)+O((xaN)2)=f(aN)+f(aN)(xaN)+O(N2δ2). As we assume that the first and second derivatives are uniformly bounded, as well as f being bounded away from zero in the intervals under consideration, all Big-Oh constants below are independent of N. Thus,bNaN=Nδ1f(aN)+O(N2δ2).

We now investigate the order statistics of the MN of the Xi's that lie in [aN,bN]. We know aNbNf(x)dx=Nδ1; by setting gN(x)=f(x)N1δ, then gN(x) is the conditional density function for Xi, given that Xi[aN,bN]. Thus, gN(x) integrates to 1, and for x[aN,bN], we havegN(x)=f(aN)N1δ+f(aN)(xaN)N1δ+O(Nδ1).

We have an interval of size Nδ1/f(aN)+O(N2δ2), and MN=Nδ+O(βNNδ/2) of the Xi lying in the interval (remember that βN are any nondecreasing sequence tending to infinity). Thus, with probability tending to 1, the average spacing between adjacent ordered Xi isNδ1/f(aN)+O(N2δ2)MN=(f(aN)N)1+N1O(βNNδ/2+Nδ1),in particular, we see that we must choose βN=O(Nδ/2). As δ(0,1), if we fix a k such that Xk[aN,bN], then we expect the next Xi to the right of Xk to be about t/Nf(aN) units away, where t is of size 1. For a given Xk, we can compute the conditional probability that the next Xi is between t/Nf(aN) and (t+Δt)/Nf(aN) units to the right. It is simply the difference of the probability that all the other MN1 of the Xi's in [aN,bN] are not in the interval [Xk,Xk+t/Nf(aN)] and the probability that all other Xi in [aN,bN] are not in the interval [Xk,Xk+(t+Δt)/Nf(aN)]; note that we are using the wrapped interval [aN,bN].

Some care is required in these calculations. We have a conditional probability as we assume that Xk[aN,bN] and that exactly MN of the Xi are in [aN,bN]. Thus, these probabilities depend on two random variables, namely, Xk and MN. This is not a problem in practice, however (e.g., MN is tightly concentrated about its mean value).

Recalling our expansion for gN(x) (and that bNaN=Nδ1/f(aN)+O(N2δ2) and t is of size 1), after simple algebra, we find that with probability tending to 1, for a given Xk and MN, the first probability is(1XkXk+t/Nf(aN)gN(x)dx)MN1.The above integral equals tNδ+O(N1) (use the Taylor series expansion in (2.7) and note that the interval [aN,bN] is of size O(Nδ1)). Using (2.3), it is easy to see that this is a.s. equal to(1t+O(Nδ1+βNNδ/2)MN)MN1.We, therefore, find that as N, the probability that MN1 of the Xi's (ik) are in [aN,bN][Xk,Xk+t/Nf(aN)], conditioned on Xk and MN, converges to et. (Some care is required, as the exceptional set in our a.s. statement can depend on t. This can be surmounted by taking expectations with respect to our conditional probabilities and applying the dominated convergence theorem.)

The calculation of the second probability, the conditional probability that the MN1 other Xi' that are [aN,bN] not in the interval [Xk,Xk+(t+Δt)/Nf(aN)], given Xk and MN, follows analogously by replacing t with t+Δt in the previous argument. We thus find that this probability is e(t+Δt). Astt+Δteudu=ete(t+Δt),we find that the density of the difference between adjacent order statistics tends to the standard (unit) exponential density; thus, the proof of Theorem 1.4 now follows from Theorem 1.3.

3. Proofs of Theorems <xref ref-type="statement" rid="thm1.5">1.5</xref> and <xref ref-type="statement" rid="thm1.7">1.7</xref>

We generalize the notation from Section 2. Let f(x) be any distribution with a second-order Taylor series at each point with first and second derivatives uniformly bounded, and let X1:N,,XN:N be the order statistics. We fix a δ(0,1), and for k{1,,N1δ}, we consider bins [ak;N,bk;N] such thatak;Nbk;Nf(x)dx=NδN=Nδ1,there are N1δ such bins. By the central limit theorem (see (2.3)), if Mk;N is the number of order statistics in [ak;N,bk;N], then, provided that ϵ>max(0,1/3δ/2) with probability tending to 1, we haveMk;N=Nδ+O(Nϵ+δ/2),of course we also require ϵ<δ/2, as, otherwise, the error term is larger than the main term.

Remark 3.1.

Before we considered just one fixed interval; as we are studying N1δ intervals simultaneously, we need ϵ in the exponent so that with high probability, all intervals have to first order Nδ order statistics. For the arguments below, it would have sufficed to have an error of size O(Nδϵ). We thank the referee for pointing out that ϵ>1/3δ/2, and provide his argument in Appendix B.

Similar to (2.8), the average spacing between adjacent order statistics in [ak;N,bk;N] is(f(ak;N)N)1+N1O(N(ϵ+δ/2)+Nδ1).Note that (3.3) is the generalization of (1.11); if f is the uniform distribution on [0,L], then f(ak;N)=1/L. By Theorem 1.4, as N, the distribution of digits of the differences in each bin converges to shifted exponential behavior; however, the variation in the average spacing between bins leads to bin-dependent shifts in the shifted exponential behavior.

Similar to (1.11), we can study the distribution of digits of the differences of the normalized order statistics. If Xi:N and Xi+1:N are in [ak;N,bk;N], thenZi;N=Xi+1:NXi:N(f(ak;N)N)1+N1O(N(ϵ+δ/2)+Nδ1),logBZi;N=logB(Xi+1:NXi:N)+logBNlogB(f(ak;N)1+O(N(ϵ+δ/2)+Nδ1)).Note we are using the same normalization factor for all differences between adjacent order statistics in a bin. Later, we show that we may replace f(ak;N) with f(Xi:N). As we study all Xi+1:NXi:N in the bin [ak;N,bk;N], it is useful to rewrite the above aslogB(Xi+1:NXi:N)=logBZi;NlogBN+logB(f(ak;N)1+O(N(ϵ+δ/2)+Nδ1)).We have N1δ bins, so k{1,,N1δ}. As we only care about the limiting behavior, we may safely ignore the first and last bins. We may, therefore, assume that each ak;N is finite, and ak+1;N=bk;N. (Of course, we know that both quantities are finite as we assumed that our distribution has compact support. We remove the last bins to simplify generalizations to noncompactly supported distributions.)

Let F(x) be the cumulative distribution function for f(x). Then,F(ak;N)=(k1)NδN=(k1)Nδ1.For notational convenience, we relabel the bins so that k{0,,N1δ1}; thus F(ak;N)=kNδ1.

We now prove our theorems which determine when these bin-dependent shifts cancel (yielding Benford behavior), or reinforce (yielding sums of shifted exponential behavior). Proof of Theorem <xref ref-type="statement" rid="thm1.5">1.5</xref>.

There are approximately Nδ differences in each bin [ak;N,bk;N]. By Theorem 1.4, the distribution of the digits of the differences in each bin converges to shifted exponential behavior. As we assume that the first and second derivatives of f are uniformly bounded, the Big-Oh constants in Section 2 are independent of the bins. The shift in the shifted exponential behavior in each bin is controlled by the last two terms on the right-hand side of (3.5). The logBN shifts the shifted exponential behavior in each bin equally. The bin-dependent shift is controlled by the final termlogB(f(ak;N)1+O(N(ϵ+δ/2)+Nδ1))=logBf(ak;N)+logB(1+min(N(ϵ+δ/2),Nδ1)f(ak;N)).

Thus, each of the N1δ bins exhibits shifted exponential behavior, with a bin-dependent shift composed of the two terms in (3.7). By (1.12), f(ak;N) are not small compared to min(N(ϵ+δ/2),Nδ1), and hence the second term logB(1+(min(N(ϵ+δ/2),Nδ1)/f(ak;N))) is negligible. In particular, this factor depends only very weakly on the bin, and tends to zero as N.

Thus, the bin-dependent shift in the shifted exponential behavior is approximately logBf(ak;N)=logBf(F1(kNδ1)). If these shifts are equidistributed modulo 1, then the deviations from Benford behavior cancel, and the shifted exponential behavior of each bin becomes Benford behavior for all the differences.

Remark 3.2.

Consider the case when the density is a uniform distribution on some interval. Then, all f(F1(kNδ1)) are equal, and each bin has the same shift in its shifted exponential behavior. These shifts, therefore, reinforce each other, and the distribution of all the differences is also shifted exponential behavior, with the same shift. This is observed in numerical experiments (see Theorem 1.3 for an alternate proof).

We analyze the assumptions of Theorem 1.5. The condition from (1.12) is easy-to-check, and is often satisfied. For example, if the probability density is a finite union of monotonic pieces and is zero only finitely often, then (1.12) holds. This is because for kI(ϵ,δ,N), F1(kNδ1)[F1(ϵ),F1(1ϵ)], and this is, therefore, independent of N (if f vanishes finitely often, we need to remove small subintervals from I(ϵ,δ,N), but the analysis proceeds similarly). The only difficulty is basically a probability distribution with intervals of zero probability. Thus, (1.12) is a mild assumption.

If we choose any distribution other than a uniform distribution, then f(x) is not constant; however, (1.13) does not need to hold (i.e., logBf(ak;N)mod1 does not need to be equidistributed as N). For example, consider a Pareto distribution with minimum value 1 and exponent a>0. The density isf(x)={axa1ifx1,0otherwise. The Pareto distribution is known to be useful in modelling natural phenomena, and for appropriate choices of exponents, it yields approximately Benford behavior (see ).

Example 3.3.

If f is a Pareto distribution with minimum value 1 and exponent a>0, then f does not satisfy the second condition of Theorem 1.5, (1.13).

To see this, note that the cumulative distribution function of f is F(x)=1xa. As we only care about the limiting behavior, we need only to study kI(ϵ,δ,N)=[ϵN1δ,N1δϵN1δ]. Therefore, F(ak;N)=kNδ1 implies thatak;N=(1kNδ1)1/a,f(ak;N)=a(1kNδ1)(a+1)/a.

The condition from (1.12) is satisfied, namely,limNmaxkI(ϵ,δ,N)min(N(ϵ+δ/2),Nδ1)f(ak;N)=limNmaxkI(ϵ,δ,N)min(N(ϵ+δ/2),Nδ1)a(kNδ1)(a+1)/a=0,as k is of size N1δ.

Let j=N1δkI(ϵ,δ,N). Then, the bin-dependent shifts arelogBf(ak;N)=a+1alogB(1kNδ1)+logBa=a+1alogB(jN1δ)+logBa=logB(j(a+1)/a)+logB(aN(1δ)(a+1)/a).Thus, for a Pareto distribution with exponent a, the distribution of all the differences becomes Benford if and only if j(a+1)/a is Benford. This follows from the fact that a sequence is Benford if and only if its logarithms are equidistributed. For fixed m, jm is not Benford (e.g., ), and thus the condition from (1.13) fails.

Remark 3.4.

We chose to study a Pareto distribution because the distribution of digits of a random variable drawn from a Pareto distribution converges to Benford behavior (base 10) as a1; however, the digits of the differences do not tend to Benford (or shifted exponential) behavior. A similar analysis holds for many distributions with good closed-form expressions for the cumulative distribution function. In particular, if f is the density of an exponential or Weibull distribution (or f(x)=eexex), then f does not satisfy the second condition of Theorem 1.5, (1.13).

Modifying the proof of Theorem 1.5 yields our result on the distribution of digits of the normalized differences.

Proof of Theorem <xref ref-type="statement" rid="thm1.7">1.7</xref>.

If f is the uniform distribution, there is nothing to prove. For general f, rescaling the differences eliminates the bin-dependent shifts. LetZ˜i:N=Xi+1:NXi:N1/Nf(Xi:N).In Theorem 1.5, we use the same scale factor for all differences in a bin (see (3.4)). As we assume the first and second derivatives of f are uniformly bounded, (2.5) and (2.6) imply that for Xi:N[ak;N,bk;N],f(Xi:N)=f(ak;N)+O(bk;Nak;N)=f(ak;N)+O(Nδ1f(ak;N)+N2δ2),and the Big-Oh constants are independent of k. As we assume that f satisfies (1.12), the error term is negligible.

Thus, our assumptions on f imply that f is basically constant on each bin, and we may replace the local rescaling factor f(Xi:N) with the bin rescaling factor f(ak;N). Thus, each bin of normalized differences has the same shift in its shifted exponential behavior. Therefore all the shifts reinforce, and the digits of all the normalized differences exhibit shifted exponential behavior as N.

As an example of Theorem 1.7, in Figure 1 we consider 500,000 independent random variables drawn from the Pareto distribution with exponenta=4+193333+19+33333(we chose a to make the variance equal 1). We study the distribution of the digits of the differences in base 10. The amplitude is about .018, which is the amplitude of the shifted exponential behavior of Theorem 1.1 (see the equation in [23, Theorem 2] or (1.5) of Theorem 1.1).

Remark 3.5.

The universal behavior of Theorem 1.7 suggests that if we are interested in the behavior of the digits of all the differences, the natural quantity to study is the normalized differences. For any distribution with uniformly bounded first and second derivatives and a second-order Taylor series expansion at each point, we obtain shifted exponential behavior.

AppendicesA. Proof of Theorem <xref ref-type="statement" rid="thm1.1">1.1</xref>

To prove Theorem 1.1, it suffices to study the distribution of logBζmod1 when ζ has the standard exponential distribution (see (1.4)). We have the following useful chain of equalities. Let [a,b][0,1]. Then,Prob(logBζmod1[a,b])=k=Prob(logBζ[a+k,b+k])=k=Prob(ζ[Ba+k,Bb+k])=k=(eBa+keBb+k).It suffices to investigate (A.1) in the special case when a=0, as the probability of any interval [α,β] can always be found by subtracting the probability of [0,α] from [0,β]. We are, therefore, led to study, for b[0,1], the cumulative distribution function of logBζmod1,FB(b):=Prob(logBζmod1[0,b])=k=(eBkeBb+k).This series expansion converges rapidly, and Benford behavior for ζ is equivalent to the rapidly converging series in (A.2) equalling b for all b.

As Benford behavior is equivalent to FB(b) equals b for all b[0,1], it is natural to compare FB(b) to 1. If the derivatives were identically 1, then FB(b) would equal b plus some constant. However, (A.2) is zero when b=0, which implies that this constant would be zero. It is hard to analyze the infinite sum for FB(b) directly. By studying the derivative FB(b), we find a function with an easier Fourier transform than the Fourier transform of eBueBb+u, which we then analyze by applying Poisson summation.

We use the fact that the derivative of the infinite sum FB(b) is the sum of the derivatives of the individual summands. This is justified by the rapid decay of the summands (see, e.g., [28, Corollary 7.3]). We findFB(b)=k=eBb+kBb+klogB=k=eβBkβBklogB,where for b[0,1], we set β=Bb.

Let H(t)=eβBtβBtlogB; note β1. As H(t) is of rapid decay in t, we may apply Poisson summation (e.g., ). Thus,k=H(k)=k=H^(k),where H^ is the Fourier transform of H: H^(u)=H(t)e2πitudt. Therefore,FB(b)=k=H(k)=k=H^(k)=k=eβBtβBtlogBe2πitkdt.Let us change variables by taking w=Bt. Thus, dw=BtlogBdt or dw/w=logBdt. As e2πitk=(Bt/logB)2πik=w2πik/logB, we haveFB(b)=k=0eβwβww2πik/logBdww=k=β2πik/logB0euu2πik/logBdu=k=β2πik/logBΓ(12πiklogB),where we have used the definition of the Γ-functionΓ(s)=0euus1du,Re(s)>0.As Γ(1)=1, we haveFB(b)=1+m=1[β2πim/logBΓ(12πimlogB)+β2πim/logBΓ(1+2πimlogB)].

Remark A.1.

The above series expansion is rapidly convergent, and shows the deviations of logBζmod1 from being equidistributed as an infinite sum of special values of a standard function. As β=Bb, we have β2πim/logB=cos(2πmb)+isin(2πmb), which gives a Fourier series expansion for F(b) with coefficients arising from special values of the Γ-function.

We can improve (A.8) by using additional properties of the Γ-function. If y, then from (A.7), we have Γ(1iy)=Γ(1+iy)¯ (where the bar denotes complex conjugation). Thus, the mth summand in (A.8) is the sum of a number and its complex conjugate, which is simply twice the real part. We have formulas for the absolute value of the Γ-function for large argument. We use (see [30, page 946, equation (8.332)]) that|Γ(1+ix)|2=πxsinh(πx)=2πxeπxeπx.Writing the summands in (A.8) as 2Re(e2πimbΓ(1+2πim/logB)), (A.8) becomesFB(b)=1+2m=1M1Re(e2πimbΓ(1+2πimlogB))+2m=MRe(e2πimbΓ(1+2πimlogB)).The rest of the claims of Theorem 1.1 follow from simple estimation, algebra, and trigonometry.

With constants as in the theorem, if we take M=1 and B=e (resp., B=10) the error is at most .00499 (resp., .378), while if M=2 and B=e (resp., B=10), the error is at most 3.16107 (resp., .006). Thus, just one term is enough to get approximately five digits of accuracy base e, and two terms give three digits of accuracy base 10. For many bases, we have reduced the problem to evaluate Re(e2πibΓ(1+2πi/logB)). This example illustrates the power of Poisson summation, taking a slowly convergent series expansion and replacing it with a rapidly converging one. Corollary A.2.

Let ζ have the standard exponential distribution. There is no base B>1 such that ζ is Benford base B.

Proof.

Consider the infinite series expansion in (1.5). As e2πimb is a sum of a cosine and a sine term, (1.5) gives a rapidly convergent Fourier series expansion. If ζ were Benford base B, then FB(b) must be identically 1; however, Γ(1+2πim/logB) is never zero for m a positive integer because its modulus is nonzero (see (A.9)). As there is a unique rapidly convergent Fourier series equal to 1 (namely, g(b)=1; see  for a proof), our FB(b) cannot identically equal 1.

B. Analyzing <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M516"><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>δ</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> Intervals Simultaneously

We show why in addition to ϵ>0 we also needed ϵ>1/3δ/2 when we analyzed N1δ intervals simultaneously in (3.2); we thank one of the referees for providing this detailed argument.

Let Y1,,YN be i.i.d.r.v. with 𝔼[Yi]=0, Var(Yi)=σ2, 𝔼[|Yi|3]<, and set SN=(Y1++YN)/Nσ2. Let Φ(x) denote the cumulative distribution function of the standard normal. Using a (nonuniform) sharpening of the Berry-Esséen estimate (e.g., ), we find that for some constant c>0,|Prob(SNx)Φ(x)|c𝔼[|Y1|3]σ3N(1+|x|)3,x,N1.Taking Yi=wiNδ1, where wi is defined by (2.1), yieldsSN=MNNδNδ(1Nδ1),σ2Nδ1(1Nδ1),𝔼[|Yi|3]2Nδ1.Thus, (B.1) becomes|Prob(MNNδNδ(1Nδ1)x)Φ(x)|3cNδ/2(1+|x|)3,for all NN0 (for some N0 sufficiently large, depending on δ).

For each N, k, and ϵ consider the eventAN,k,ϵ={Mk;NNδNδ(1Nδ1)[Nϵ,Nϵ]}.Then, as N, we haveProb(k=1N1δAN,k,ϵ)1,provided thatk=1N1δProb(AN,k,ϵc)0,as N. Using (B.3) givesProb(AN,k,ϵc)6cNδ/2(1+Nϵ)3+2(1Φ(Nϵ))6cNδ/23ϵ+2πNϵexp(N2ϵ2)(e.g., ). Thus, the sum in (B.6) is at most6cN13δ/23ϵ+2πN1δϵexp(N2ϵ2),and this is O(1) provided that ϵ>0 and ϵ>1/3δ/2.

Acknowledgments

The authors would like to thank Ted Hill, Christoph Leuenberger, Daniel Stone, and the referees for numerous helpful comments. S. J. Miller was partially supported by NSF (Grant no. DMS-0600848).

BenfordF.The law of anomalous numbersProceedings of the American Philosophical Society1938784551572ZBL0018.26502HillT.The first-digit phenomenonAmerican Scientists199686358363RaimiR. A.The first digit problemThe American Mathematical Monthly1976837521538MR041085010.2307/2319349ZBL0349.60014HurlimannW.Benford's law from 1881 to 2006preprint, http://arxiv.org/abs/math/0607168BrownJ. L.Jr.DuncanR. L.Modulo one uniform distribution of the sequence of logarithms of certain recursive sequencesThe Fibonacci Quarterly197085482486MR0360444ZBL0214.06802DiaconisP.The distribution of leading digits and uniform distribution mod1The Annals of Probability1977517281MR0422186ZBL0364.10025HillT. P.A statistical derivation of the significant-digit lawStatistical Science1995104354363MR1421567ZBL0955.60509BergerA.BunimovichL. A.HillT. P.One-dimensional dynamical systems and Benford's lawTransactions of the American Mathematical Society20053571197219MR209809210.1090/S0002-9947-04-03455-5ZBL1123.37006BergerA.HillT. P.Newton's method obeys Benford's lawAmerican Mathematical Monthly20071147588601ZBL1136.65048MR2341322JangD.KangJ. U.KruckmanA.KudoJ.MillerS. J.Chains of distributions, hierarchical Bayesian models and Benford's lawpreprint, http://arxiv.org/abs/0805.4226KontorovichA. V.MillerS. J.Benford's law, values of L-functions and the 3x+1 problemActa Arithmetica20051203269297ZBL1139.11033MR2188844LagariasJ. C.SoundararajanK.Benford's law for the 3x+1 functionJournal of the London Mathematical Society200674228930310.1112/S0024610706023131MR2269630ZBL1117.11018MillerS. J.NigriniM. J.The modulo 1 central limit theorem and Benford's law for productsInternational Journal of Algebra200821–4119130MR2417189PainJ.-C.Benford's law and complex atomic spectraPhysical Review E2008771301210210.1103/PhysRevE.77.012102CostasE.López-RodasV.ToroF. J.Flores-MoyaA.The number of cells in colonies of the cyanobacterium Microcystis aeruginosa satisfies Benford's lawAquatic Botany200889334134310.1016/j.aquabot.2008.03.011NigriniM.MillerS. J.Benford's Law applied to hydrology data—results and relevance to other geophysical dataMathematical Geology200739546949010.1007/s11004-007-9109-5KnuthD. E.The Art of Computer Programming, Volume 2: Seminumerical Algorithms19973rdReading, Mass, USAAddison-WesleyNigriniM.EttredgeM.Digital analysis and the reduction of auditor litigation riskProceedings of the Deloitte & Touche / University of Kansas Symposium on Auditing Problems1996Lawrence, Kan, USAUniversity of Kansas6981NigriniM.The use of Benford's law as an aid in analytical proceduresAuditing: A Journal of Practice & Theory19971625267MebaneW. R.Jr.Election forensics: the second-digit Benford's law test and recent American presidential electionsPresented at the Election Fraud ConferenceSeptember 2006Salt Lake City, Utah, USANigriniM.MillerS. J.Data diagnostics using second order tests of Benford's lawpreprintLeemisL. M.SchmeiserB. W.EvansD. L.Survival distributions satisfying Benford's lawThe American Statistician2000544236241MR180362010.2307/2685773EngelH.-A.LeuenbergerC.Benford's law for exponential random variablesStatistics & Probability Letters2003634361365MR1996184ZBL1116.60315PinkhamR. S.On the distribution of first significant digitsAnnals of Mathematical Statistics19613212231230MR013130310.1214/aoms/1177704862ZBL0102.14205MillerS. J.Takloo-BighashR.An Invitation to Modern Number Theory2006Princeton, NJ, USAPrinceton University Pressxx+503MR2208019DavidH. A.NagarajaH. N.Order Statistics20033rdHoboken, NJ, USAJohn Wiley & Sonsxvi+458Wiley Series in Probability and StatisticsMR1994955ZBL1053.62060ReissR.-D.Approximate Distributions of Order Statistics. With Applications to Nonparametric Statistics1989New York, NY, USASpringerxii+355Springer Series in StatisticsMR988164ZBL0682.62009LangS.Undergraduate Analysis19972ndNew York, NY, USASpringerxvi+642Undergraduate Texts in MathematicsMR1476913SteinE. M.ShakarchiR.Fourier Analysis: An Introduction20031Princeton, NJ, USAPrinceton University Pressxvi+311Princeton Lectures in AnalysisMR1970295ZBL1026.42001GradshteynI.RyzhikI.Tables of Integrals, Series, and Products19655thNew York, NY, USAAcademic PressPetrovV. V.Limit Theorems of Probability Theory: Sequences of Independent Random Variables19954New York, NY, USAThe Clarendon Press, Oxford University Pressxii+292Oxford Studies in ProbabilityMR1353441ZBL0826.60001FellerW.An Introduction to Probability Theory and Its Applications. Vol. I19622ndNew York, NY, USAJohn Wiley & Sons