IJMMSInternational Journal of Mathematics and Mathematical Sciences1687-04250161-1712Hindawi Publishing Corporation97028410.1155/2009/970284970284Research ArticleGeneralizing Benford's Law Using Power Laws: Application to Integer SequencesHürlimannWernerBerenhautKennethFeldstrasse 145CH-8004 ZürichSwitzerland20092707200920092503200916072009190720092009Copyright © 2009This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Many distributions for first digits of integer sequences are not Benford. A simple method to derive parametric analytical extensions of Benford's law for first digits of numerical data is proposed. Two generalized Benford distributions are considered, namely, the two-sided power Benford (TSPB) distribution, which has been introduced in Hürlimann (2003), and the new Pareto Benford (PB) distribution. Based on the minimum chi-square estimators, the fitting capabilities of these generalized Benford distributions are illustrated and compared at some interesting and important integer sequences. In particular, it is significant that much of the analyzed integer sequences follow with a high P-value the generalized Benford distributions. While the sequences of prime numbers less than 1000, respectively, 10 000 are not at all Benford or TSPB distributed, they are approximately PB distributed with high P-values of 93.3% and 99.9% and reveal after a further deeper analysis of longer sequences a new interesting property. On the other side, Benford's law of a mixing of data sets is rejected at the 5% significance level while the PB law is accepted with a 93.6% P-value, which improves the P-value of 25.2%, which has been obtained previously for the TSPB law.

1. Introduction

Since Newcomb  and Benford  it is known that many numerical data sets follow Benford’s law or are closely approximated by it. To be specific, if the random variable X, which describes the first significant digit in a numerical table, is Benford distributed, then

P(X=d)=log(1+d-1),  d{1,,9}.

Mathematical explanations of this law have been proposed by Pinkham , Cohen , Hill , Allart , Janvresse and de la Rue , and Kossovsky . The latter author has raised some conjectures, which have been proved in some special cases by Jang et al. . Other explanations of the prevalence of Benford’s law exist. For example, Miller and Nigrini  obtain it through the study of products of random variables and Kafri  through the maximum entropy principle. In the recent years an upsurge of applications of Benford’s law has appeared, as can be seen from the compiled bibliography by Hürlimann  and the recent online bibliography by Berg and Hill . Among them one might mention Judge and Schechter , Judge et al. , and Nigrini and Miller . As in the present paper, the latter authors also consider power laws.

Hill  also suggested to switch the attention to probability distributions that follow or closely approximate Benford’s law. Papers along this path include Leemis et al.  and Engel and Leuenberger . Some survival distributions, which satisfy exactly Benford’s law, are known. However, there are not many simple analytical distributions, which include as special case Benford’s law. Combining facts from Leemis et al.  and Dorp and Kotz  such a simple one-parameter family of distributions has been considered in Hürlimann . In a sequel to this, a further generalization of Benford’s law is considered.

It is important to note that many distributions for first digits of integer sequences are not Benford but are power laws or something close. Thus there is a need for statistical tests for analyzing such hypotheses. In this respect the interest of enlarged Benford laws is twofold. First, parametric extensions may provide a better fit of the data than Benford’s law itself. Second, they yield a simple statistical procedure to validate Benford’s law. If Benford’s model is sufficiently “close” to the one-parameter extended model, then it will be retained. These points will be illustrated through our application to integer sequences.

2. Generalizing Benford’s Distribution

If T denotes a random lifetime with survival distribution S(t)=P(Tt), then the value Y   of the first significant digit in the lifetime T has the probability distribution

P(Y=y)=i=-{S(y·10i)-S((y+1)·10i)},y{1,,9}.   Alternatively, if D denotes the integer-valued random variable satisfying

10DT<10D+1, then the first significant digit can be written in terms of T  , and D as

Y=[T·10-D]=[10logT-D], where [x] denotes the greatest integer less than or equal to x. In particular, if the random variable Z=logT-D is uniformly distributed as U(0,1), then the first significant digit Y is exactly Benford distributed. Starting from the uniform random variable W=U(0,2) or the triangular random variable W=Triangular(0,1,2) with probability density function fW(w)=w if w(0,1) and fW(w)=2-w if w[1,2), one shows that the random lifetime T=10W generates the first digit Benford distribution (Leemis et al. [21, Examples  1 and  2]).

A simple parametric distribution, which includes as special cases both the above uniform and triangular distributions, is the twosided power random variable W=TSP(α,c) considered in Dorp and Kotz  with probability density function

fW(w)={c2(wα)c-1,0<wα,c2(2-w2-α)c-1,αw<2. If c=1 then W=U(0,2), and if c=2,  α=1 then W=Triangular(0,1,2). This observation shows that the random lifetime T=10TSP(1,c) will generate first digit distributions closely related to Benford’s distribution, at least if c is close to 1 or 2.

Theorem 2.1.

Let W=TSP(1,c) be the twosided power random variable with probability density function fW(w)={c2wc-1,0<w1,c2(2-w)c-1,1w<2, and let the integer-valued random variable D satisfy DW<D+1. Then the first digit random variable Y=[10W-D] has the one-parameter twosided power Benford (TSPB) probability density functionfY(y)=12{[log(1+y)]c-[logy]c-[1-log(1+y)]c+[1-logy]c},y{1,,9}.

Proof.

This has been shown in Hürlimann .

3. From the Geometric Brownian Motion to the Pareto Benford Law

Another interesting distribution, which also takes the form of a twosided power law, is the double Pareto random variable W=DP(s,α,β) considered in Reed  with probability density function

fW(w)={αβα+β(ws)β-1,ws,αβα+β(ws)-α-1,ws.

Recall the stochastic mechanism and the natural motivation, which generates this distribution. It is often assumed that the time evolution of a stochastic phenomena Xt involves a variable but size independent proportional growth rate and can thus be modeled by a geometric Brownian motion (GBM) described by the stochastic differential equation

dX=μ·X·dt+σ·X·dW, where dW is the increment of a Wiener process. Since the proportional increment of a GBM in time dt has a systematic component μ·dt and a random white noise component σ·dW, GBM can be viewed as a stochastic version of a simple exponential growth model. The GBM has long been used to model the evolution of stock prices (Black-Scholes option pricing model), firm sizes, city sizes, and individual incomes. It is well known that empirical studies on such phenomena often exhibit power-law behavior. However, the state of a GBM after a fixed time T follows a lognormal distribution, which does not exhibit power-law behavior.

Why does one observe power-law behavior for phenomena apparently evolving like a GBM? A simple mechanism, which generates the power-law behavior in the tails, consists to assume that the time T of observation itself is a random variable, whose distribution is an exponential distribution. The distribution of XT with fixed initial state s is described by the double Pareto distributio n  DP(s,α,β) with density function (3.1), where α,  β>0, and α,  -β are the positive roots of the characteristic equation

(μ-12σ2)z+12σ2z2=λ, where λ is the parameter of the exponentially distributed random variable T. Setting s=1 one obtains the following generalized Benford distribution.

Theorem 3.1.

Let W=DP(1,α,β) be the double Pareto random variable with probability density function fW(w)={αβα+β(w)β-1,w1,αβα+β(w)-α-1,w1.

Let the integer-valued random variable D satisfy DW<D+1. Then the first digit random variable Y=[10W-D] has the two-parameter Pareto Benford (PB) probability density function

fY(y)=αα+β{[log(1+y)]β-[log(y)]β}+βα+β·k=1{[k+log(y)]-α-[k+log(1+y)]-α},y{1,,9}.

Proof.

The probability density function of T=10W is given by fT(t)=1t·ln10·fW(lntln10)={αβα+β1t·ln10(lntln10)β-1,1<t10,αβα+β1t·ln10(lntln10)-α-1,  t>10. It follows that the first significant digit of T, namely, Y=[T·10-D], has probability density fY(y)=k=010ky10k(y+1)fT(t)dt. Making the change of variable u=lnt/ln10, one obtains (3.5) as follows: fY(y)=αβα+β{logylog(y+1)uβ-1du+k=1k+logyk+log(y+1)u-α-1du}=αβα+β{1βuβ|log(1+y)log(y)+k=1-1αu-α|k+log(1+y)k+log(y)}=αα+β{[log(1+y)]β-[log(y)]β}+βα+β·k=1{[k+log(y)]-α-[k+log(1+y)]-α}.

One notes that setting β=1 and letting α goes to infinity, the Pareto Benford distribution converges to Benford’s law. Other important paper, which links Benford’s law to GBMs’ law on the one side, is Kontorovich and Miller  and to Black-Scholes’ law on the other side is Schürger . Another law, which includes as a special case the Benford law, is the Planck distribution of photons at a given frequency, as shown recently by Kafri [28, 29].

4. Fitting the First Digit Distributions of Integer Sequences

Minimum chi-square estimation of the generalized Benford distributions is straightforward by calculation with modern computer algebra systems. The fitting capabilities of the new distributions are illustrated at some interesting and important integer sequences. The first digit occurrences of the analyzed integer sequences are listed in Table 1. The minimum chi-square estimators of the generalized distributions as well as an assumed summation index m for the infinite series (3.5) are displayed in Table 2. Statistical results are summarized in Table 3. For comparison we list the chi-square values and their corresponding P-values. The obtained results are discussed.

First digit distributions of some integer sequences.

Name of sequence Sample size Percentage of first digit occurrences
123456789

Benford law30.117.612.59.77.96.75.85.14.6
Square10021.014.012.012.09.09.08.07.08.0
Cube50028.214.811.49.88.87.86.66.85.8
Cube100022.615.912.410.69.48.37.47.16.3
Cube1000022.515.812.610.69.38.37.57.06.4
Square root9919.217.215.213.111.19.17.15.13.0
Prime < 1002516.012.012.012.012.08.016.08.04.0
Prime < 100016814.911.311.311.910.110.710.710.18.9
Prime < 10000122913.011.911.311.310.711.010.210.310.3
Princeton number2528.08.012.012.08.012.08.04.08.0
Mixing sequence61828.314.611.59.97.67.88.16.65.7
Pentagonal number10035.012.010.08.010.06.08.05.06.0
Keith number7132.414.114.17.04.27.012.72.85.6
Bell number10031.015.010.012.010.08.05.06.03.0
Catalan number10033.018.011.011.08.08.04.03.04.0
Lucky number4542.217.88.94.42.26.78.92.26.7
Ulam number4445.513.66.86.84.56.84.56.84.5
Numeri ideoni6530.818.513.810.86.23.17.76.23.1
Fibonacci number10030.018.013.09.08.06.05.07.04.0
Partition number9428.717.014.99.67.46.47.45.33.2

Minimum chi-square estimators.

Name of sequence Sample size TSPB PB
ParameterParameters
calphabetam

Square1000.7983715.559571.74552100
Cube5002.465195.558491.69860100
Cube10002.2679820.565061.47082100
Cube100002.2705420.535771.475760100
Square root991.40176894917231.34334100
Prime < 100252.6858123.139522.14449100
Prime < 10001682.9521622.997542.28436100
Prime < 1000012293.0354229.767292.30760100
Princeton number252.761706.945952.36119100
Mixing sequence6182.539584.786411.83119100
Pentagonal number1002.948472.067973.31268100
Keith number712.733382.161072.637201000
Bell number 1001.0819110.148201.24828100
Catalan number 1001.135220.670951.153775000
Lucky number453.157217.569620.94576100
Ulam number443.553759.994450.81215100
Numeri ideoni 651.124101297612.160.98591100
Fibonacci number 1002.05365257000.421.00560100
Partition number 941.232680.656511.714091000

Fitting integer sequences to the Benford and generalized Benford distributions

Name of sequenceSample sizeBenfordTwosided Power BenfordPareto Benford
chi-squareP-valuechi-squareP-value chi-squareP-value

Square1009.09633.437.83734.720.36299.91
Cube5009.69628.705.80856.230.28699.96
Cube100046.4590.0043.7250.000.4899.81
Cube10000443.7450.00472.0110.003.13879.13
Square root998.61237.617.00242.862.77883.61
Prime < 100257.74145.917.29939.841.84993.30
Prime < 100016845.0160.0036.6510.000.33399.93
Prime < 100001229387.1940.00307.3220.003.29777.07
Princeton number253.45290.292.76289.721.30297.16
Mixing sequence61815.5504.939.01425.171.81993.55
Pentagonal number1005.27772.762.12795.241.96892.26
Keith number719.21532.457.68836.097.40228.53
Bell number1003.06993.003.01488.372.60785.63
Catalan number1002.40496.612.30494.111.93492.57
Lucky number457.69346.405.16563.985.56447.37
Ulam number446.35060.812.52092.562.52686.56
Numeri ideoni652.59495.722.52292.542.58485.89
Fibonacci number1001.02999.811.02199.451.02798.46
Partition number941.39499.431.13299.241.51395.86

The definition, origin, and comments on the mathematical interest of a great part of these integer sequences have been discussed in Hürlimann . Further details on all sequences can be retrieved from the considerable related literature. The mixing sequence represents the aggregate of the integer sequences considered in Hürlimann .

All of the 19 considered integer sequences are quite well fitted by the new PB distribution. For 14 sequences the minimum chi-square is the smallest among the three comparative values and in the other 5 cases its value does not differ much from the chi-square of the TSPB distribution ( bold cells in Table 3 and Table 5).

A strong numerical evidence for the Benford property for the Fibonacci, Bell, Catalan, and partition numbers is observed (corresponding italic cells in Tables 2 and 3). In particular, the values of the parameters α,  β of the BP distribution for the Fibonacci sequence are close to 1 and , which means that the BP distribution is almost Benford as remarked after Theorem 3.1. It is well known that the Fibonacci sequence is Benford distributed (e.g., Brown and Duncan , Wlodarski , Sentance , Webb , Raimi (1976),  Brady  and Kunoff ).The same result for Bell numbers has been derived formally in Hürlimann [24, Theorem  4.1]. More generally, a proof that a generic solution of a generic difference equation is Benford is found in Miller and Takloo-Bighash  (see also Jolissaint [38, 39]). Results for squares and cubes are also obtained. Recall that the exact probability distribution of the first digit of mth integer powers with at most n digits is known and asymptotically related to Benford’s law (e.g., Hürlimann ). The fit of the PB distribution is very good when restricted to finite sequences but breaks down for longer sequences. A further remarkable result is that Benford’s law of the mixing sequence is rejected at the 5% significance level while the PB law is accepted with a 93.6% P-value, which improves the P-value of 25.2% obtained for the TSPB law in Hürlimann .

The sequence of primes merits a deeper analysis. The Benford property for it has long been studied. Diaconis (1977)  shows that primes are not Benford distributed. However, it is known that the sequence of primes is Benford distributed with respect to other densities rather than with the usual natural density . According to Serre [45, Page76], , Bombieri has noted that the analytical density of primes with first digit 1 is log102, and this result can be easily generalized to Benford behavior for any first digit. Table 3 shows that the primes less than 1,000 respectively 10 000 are not at all Benford or TSPB distributed, but they are approximately PB distributed with high P-values of 93.3% and 99.9%. Does this statistical result reveal a new property of the prime number sequence? To answer this question it is necessary to take into account longer sequences and look at other cutoffs than 10k for an integer k . Our calculations show that among those prime sequences below 10k for fixed k there is exactly one sequence with minimum chi-square value with an optimal cutoff at a prime with first digit 9. Tables 4 and 5 summarize our results for the primes up to 108. Besides the PB best fit with minimum chi-square we also list the PB “linear best” fit obtained from the PB best fit by taking a linear decreasing number of primes between those with the same number of primes with first digit 1 and 9 as in the PB best fit. Though the P-value goes to zero very rapidly the ratio of the minimum chi-square value to the sample size is more stable. For the PB linear best fit this goodness-of-fit statistic, which is also considered in Leemis et al. , even decreases and indicates therefore that the first digits of the prime number sequence might be distributed this way. For this it remains to test using more powerful computing whether the mentioned property still holds for even longer sequences of primes. One observes that the best fit parameters as the sample size increases to infinity are quite stable and increase only slightly.

First digit distributions of prime number sequences with optimal cutoff.

Sample size First digit occurrences
123456789
25433332421
168251919201718181715
1216160146139139131135125127114
948611931129109710691055101310271003900
77736958591428960874786158458843583267468
657934800207702575290741147295172257715647103863675
5701502686048664277651085641594633932628206622882618610554868

Best and linear best Pareto Benford fit for prime number sequences.

Sample size PB Parameters PB best fit PB linear best fit
alphabetachi-square/sample sizeP-valuechi-square/sample sizeP-value
2523.139522.144497.396%93.308.407%91.01
16822.997542.284360.198%99.930.781%97.10
121630.155042.258000.175%90.760.152%93.34
948632.595442.284420.172%1.200.084%23.86
7773633.265502.312620.175%0.000.075%0.00
65793433.826222.329080.185%0.000.070%0.00
570150234.281322.341480.188%0.000.065%0.00

Finally, it might be worthwhile to mention another recent intriguing result by Kafri , which shows that digits distribution of prime numbers obeys the Planck distribution, which is another generalized Benford law as already mentioned at the end of Section 3.

NewcombS.Note on the frequency of use of the different digits in natural numbersAmerican Journal of Mathematics188141–4394010.2307/2369148MR1505286BenfordF.The law of anomalous numbersProceedings of the American Philosophical Society193878551572ZBL0018.26502PinkhamR. S.On the distribution of first significant digitsAnnals of Mathematical Statistics1961321223123010.1214/aoms/1177704862MR0131303ZBL0102.14205CohenD. I. A.An explanation of the first digit phenomenonJournal of Combinatorial Theory. Series A197620336737010.1016/0097-3165(76)90032-7MR0406912ZBL0336.10052HillT. P.Base-invariance implies Benford's lawProceedings of the American Mathematical Society1995123388789510.2307/2160815MR1233974ZBL0813.60002HillT. P.The significant-digit phenomenonThe American Mathematical Monthly1995102432232710.2307/2974952MR1328015ZBL0833.60003HillT. P.A statistical derivation of the significant-digit lawStatistical Science1995104354363MR1421567ZBL0955.60509HillT. P.Benford's lawEncyclopedia of Mathematics Supplement19971102HillT. P.The first digit phenomenonThe American Scientist1998864358363EID2-s2.0-0001718673AllaartP. C.An invariant-sum characterization of Benford's lawJournal of Applied Probability199734128829110.2307/3215195MR1429075ZBL0874.60016JanvresseÉ.de la RueT.From uniform distributions to Benford's lawJournal of Applied Probability20044141203121010.1239/jap/1101840566MR2122815ZBL1065.60095KossovskyA. E.Towards a better understanding of the leading digits phenomenapreprint, 2008, http://arxiv.org/abs/math/0612627JangD.KangJ. U.KruckmanA.KudoJ.Miller S. J.Chains of distributions, hierarchical Bayesian models and Benford's lawJournal of Algebra, Number Theory: Advances and Applications2009113760MillerS. J.NigriniM. J.The modulo 1 central limit theorem and Benford's law for productsInternational Journal of Algebra200821–4119130MR2417189ZBL1148.60008KafriO.Entropy principle in direct derivation of Benford's lawpreprint, 2009, http://arxiv.org/ftp/arxiv/papers/0901/0901.3047.pdfHürlimannW.Benford's law from 1881 to 2006: a bibliography2006, http://arxiv.org/abs/math/0607168BergA.HillT.Benford Online Bibliography2009, http://www.benfordonline.netJudgeG.SchechterL.Detecting problems in survey data using Benford's lawJournal of Human Resources200944112410.3368/jhr.44.1.1EID2-s2.0-67649366072JudgeG.SchechterL.GrendarM.An information theoretic family of data based Benford-like distributionsPhysica A200797201207NigriniM. J.MillerS. J.Benford's law applied to hydrology data—results and relevance to other geophysical dataMathematical Geology200739546949010.1007/s11004-007-9109-5EID2-s2.0-34548737045ZBL1155.86322LeemisL. M.SchmeiserB. W.EvansD. L.Survival distributions satisfying Benford's lawThe American Statistician200054423624110.2307/2685773MR1803620EngelH.-A.LeuenbergerC.Benford's law for exponential random variablesStatistics & Probability Letters2003634361365MR1996184ZBL1116.60315van DorpJ. R.KotzS.The standard two-sided power distribution and its properties: with applications in financial engineeringThe American Statistician2002562909910.1198/000313002317572745MR1945867HürlimannW.A generalized Benford law and its applicationAdvances and Applications in Statistics200333217228MR2034405ZBL1045.62010ReedW. J.The Pareto, Zipf and other power lawsEconomics Letters2001741151910.1016/S0165-1765(01)00524-9EID2-s2.0-0035619627ZBL1007.91046KontorovichA. V.MillerS. J.Benford's law, values of L-functions and the 3x+1 problemActa Arithmetica2005120326929710.4064/aa120-3-4MR2188844ZBL1139.11033SchürgerK.Extensions of Black-Scholes processes and Benford's lawStochastic Processes and Their Applications200811871219124310.1016/j.spa.2007.07.017MR2428715ZBL1152.60027KafriO.The second law as a cause of the evolutionpreprint, 2007, http://arxiv.org/ftp/arxiv/papers/0711/0711.4507.pdfKafriO.Sociological inequality and the second lawpreprint, 2008, http://arxiv.org/ftp/arxiv/papers/0805/0805.3206.pdfBrownJ. L.Jr.DuncanR. L.Modulo one uniform distribution of the sequence of logarithms of certain recursive sequencesThe Fibonacci Quarterly197085482486MR0360444ZBL0214.06802WlodarskiJ.Fibonacci and Lucas numbers tend to obey Benford's lawThe Fibonacci Quarterly197198788SentanceW. A.A further analysis of Benford's lawThe Fibonacci Quarterly197311490494ZBL0282.10032WebbW.Distribution of the first digits of Fibonacci numbersThe Fibonacci Quarterly1975134334336MR0382198ZBL0319.10012RaimiR. A.The first digit problemAmerican Mathematical Monthly197683521538BradyW. G.More on Benford's lawThe Fibonacci Quarterly1978165152ZBL0374.10033KunoffS.N! has the first digit propertyThe Fibonacci Quarterly1987254365367MR911988ZBL0627.10007MillerS. J.Takloo-BighashR.An Invitation to Modern Number Theory2006Princeton, NJ, USAPrinceton University Pressxx+503MR2208019JolissaintP.Loi de Benford, relations de récurrence et suites équidistribuéesElemente der Mathematik20056011018MR2188341ZBL1084.11005JolissaintP.Loi de Benford, relations de récurrence et suites équidistribuées. IIElemente der Mathematik20096412136MR2471593HürlimannW.Integer powers and Benford's lawInternational Journal of Pure and Applied Mathematics20041113946MR2033394ZBL1068.11047DiaconisP.The distribution of leading digits and uniform distribution mod 1Annals of Probability197757281WhitneyR. E.Initial digits for the sequence of primesThe American Mathematical Monthly197279215015210.2307/2316536MR0304337ZBL0227.10047SchatteP.On H-summability and the uniform distribution of sequencesMathematische Nachrichten198311323724310.1002/mana.19831130122MR725491CohenD. I. A.KatzT. M.Prime numbers and the first digit phenomenonJournal of Number Theory198418326126810.1016/0022-314X(84)90061-1MR746863ZBL0549.10040SerreJ.-P.A Course in Arithmetic1996New York, NY, USASpringer