Many distributions for first digits of integer sequences are not Benford. A simple method to derive parametric analytical extensions of Benford's law for first digits of numerical data is proposed. Two generalized Benford distributions are considered, namely, the two-sided power Benford (TSPB) distribution, which has been introduced in Hürlimann (2003), and the new Pareto Benford (PB) distribution. Based on the minimum chi-square estimators, the fitting capabilities of these generalized Benford distributions are illustrated and compared at some interesting and important integer sequences. In particular, it is significant that much of the analyzed integer sequences follow with a high

Since Newcomb [

Mathematical explanations of this law have been proposed by Pinkham [

Hill [

It is important to note that many distributions for first digits of integer sequences are not Benford but are power laws or something close. Thus there is a need for statistical tests for analyzing such hypotheses. In this respect the interest of enlarged Benford laws is twofold. First, parametric extensions may provide a better fit of the data than Benford’s law itself. Second, they yield a simple statistical procedure to validate Benford’s law. If Benford’s model is sufficiently “close” to the one-parameter extended model, then it will be retained. These points will be illustrated through our application to integer sequences.

If

A simple parametric distribution, which includes as special cases both the above uniform and triangular distributions, is the twosided power random variable

Let

This has been shown in Hürlimann [

Another interesting distribution, which also takes the form of a twosided power law, is the double Pareto random variable

Recall the stochastic mechanism and the natural motivation, which generates this distribution. It is often assumed that the time evolution of a stochastic phenomena

Why does one observe power-law behavior for phenomena apparently evolving like a GBM? A simple mechanism, which generates the power-law behavior in the tails, consists to assume that the time

Let

Let the integer-valued random variable

The probability density function of

One notes that setting

Minimum chi-square estimation of the generalized Benford distributions is straightforward by calculation with modern computer algebra systems. The fitting capabilities of the new distributions are illustrated at some interesting and important integer sequences. The first digit occurrences of the analyzed integer sequences are listed in Table

First digit distributions of some integer sequences.

Name of sequence | Sample size | Percentage of first digit occurrences | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||

Benford law | 30.1 | 17.6 | 12.5 | 9.7 | 7.9 | 6.7 | 5.8 | 5.1 | 4.6 | |

Square | 100 | 21.0 | 14.0 | 12.0 | 12.0 | 9.0 | 9.0 | 8.0 | 7.0 | 8.0 |

Cube | 500 | 28.2 | 14.8 | 11.4 | 9.8 | 8.8 | 7.8 | 6.6 | 6.8 | 5.8 |

Cube | 1000 | 22.6 | 15.9 | 12.4 | 10.6 | 9.4 | 8.3 | 7.4 | 7.1 | 6.3 |

Cube | 10000 | 22.5 | 15.8 | 12.6 | 10.6 | 9.3 | 8.3 | 7.5 | 7.0 | 6.4 |

Square root | 99 | 19.2 | 17.2 | 15.2 | 13.1 | 11.1 | 9.1 | 7.1 | 5.1 | 3.0 |

Prime < 100 | 25 | 16.0 | 12.0 | 12.0 | 12.0 | 12.0 | 8.0 | 16.0 | 8.0 | 4.0 |

Prime < 1000 | 168 | 14.9 | 11.3 | 11.3 | 11.9 | 10.1 | 10.7 | 10.7 | 10.1 | 8.9 |

Prime < 10000 | 1229 | 13.0 | 11.9 | 11.3 | 11.3 | 10.7 | 11.0 | 10.2 | 10.3 | 10.3 |

Princeton number | 25 | 28.0 | 8.0 | 12.0 | 12.0 | 8.0 | 12.0 | 8.0 | 4.0 | 8.0 |

Mixing sequence | 618 | 28.3 | 14.6 | 11.5 | 9.9 | 7.6 | 7.8 | 8.1 | 6.6 | 5.7 |

Pentagonal number | 100 | 35.0 | 12.0 | 10.0 | 8.0 | 10.0 | 6.0 | 8.0 | 5.0 | 6.0 |

Keith number | 71 | 32.4 | 14.1 | 14.1 | 7.0 | 4.2 | 7.0 | 12.7 | 2.8 | 5.6 |

Bell number | 100 | 31.0 | 15.0 | 10.0 | 12.0 | 10.0 | 8.0 | 5.0 | 6.0 | 3.0 |

Catalan number | 100 | 33.0 | 18.0 | 11.0 | 11.0 | 8.0 | 8.0 | 4.0 | 3.0 | 4.0 |

Lucky number | 45 | 42.2 | 17.8 | 8.9 | 4.4 | 2.2 | 6.7 | 8.9 | 2.2 | 6.7 |

Ulam number | 44 | 45.5 | 13.6 | 6.8 | 6.8 | 4.5 | 6.8 | 4.5 | 6.8 | 4.5 |

Numeri ideoni | 65 | 30.8 | 18.5 | 13.8 | 10.8 | 6.2 | 3.1 | 7.7 | 6.2 | 3.1 |

Fibonacci number | 100 | 30.0 | 18.0 | 13.0 | 9.0 | 8.0 | 6.0 | 5.0 | 7.0 | 4.0 |

Partition number | 94 | 28.7 | 17.0 | 14.9 | 9.6 | 7.4 | 6.4 | 7.4 | 5.3 | 3.2 |

Minimum chi-square estimators.

Name of sequence | Sample size | TSPB | PB | ||
---|---|---|---|---|---|

Parameter | Parameters | ||||

c | alpha | beta | m | ||

Square | 100 | 0.79837 | 15.55957 | 1.74552 | 100 |

Cube | 500 | 2.46519 | 5.55849 | 1.69860 | 100 |

Cube | 1000 | 2.26798 | 20.56506 | 1.47082 | 100 |

Cube | 10000 | 2.27054 | 20.53577 | 1.475760 | 100 |

Square root | 99 | 1.40176 | 89491723 | 1.34334 | 100 |

Prime < 100 | 25 | 2.68581 | 23.13952 | 2.14449 | 100 |

Prime < 1000 | 168 | 2.95216 | 22.99754 | 2.28436 | 100 |

Prime < 10000 | 1229 | 3.03542 | 29.76729 | 2.30760 | 100 |

Princeton number | 25 | 2.76170 | 6.94595 | 2.36119 | 100 |

Mixing sequence | 618 | 2.53958 | 4.78641 | 1.83119 | 100 |

Pentagonal number | 100 | 2.94847 | 2.06797 | 3.31268 | 100 |

Keith number | 71 | 2.73338 | 2.16107 | 2.63720 | 1000 |

100 | 10.14820 | 100 | |||

100 | 0.67095 | 5000 | |||

Lucky number | 45 | 3.15721 | 7.56962 | 0.94576 | 100 |

Ulam number | 44 | 3.55375 | 9.99445 | 0.81215 | 100 |

65 | 1297612.16 | 100 | |||

100 | 257000.42 | 100 | |||

94 | 0.65651 | 1.71409 | 1000 |

Fitting integer sequences to the Benford and generalized Benford distributions

Name of sequence | Sample size | Benford | Twosided Power Benford | Pareto Benford | |||
---|---|---|---|---|---|---|---|

chi-square | chi-square | chi-square | |||||

Square | 100 | 9.096 | 33.43 | 7.837 | 34.72 | ||

Cube | 500 | 9.696 | 28.70 | 5.808 | 56.23 | ||

Cube | 1000 | 46.459 | 0.00 | 43.725 | 0.00 | ||

Cube | 10000 | 443.745 | 0.00 | 472.011 | 0.00 | ||

Square root | 99 | 8.612 | 37.61 | 7.002 | 42.86 | ||

Prime < 100 | 25 | 7.741 | 45.91 | 7.299 | 39.84 | ||

Prime < 1000 | 168 | 45.016 | 0.00 | 36.651 | 0.00 | ||

Prime < 10000 | 1229 | 387.194 | 0.00 | 307.322 | 0.00 | ||

Princeton number | 25 | 3.452 | 90.29 | 2.762 | 89.72 | ||

Mixing sequence | 618 | 15.550 | 4.93 | 9.014 | 25.17 | ||

Pentagonal number | 100 | 5.277 | 72.76 | 2.127 | 92.26 | ||

Keith number | 71 | 9.215 | 32.45 | 7.688 | 28.53 | ||

Bell number | 100 | 3.069 | 3.014 | 88.37 | 85.63 | ||

Catalan number | 100 | 2.404 | 2.304 | 94.11 | 92.57 | ||

Lucky number | 45 | 7.693 | 46.40 | 5.564 | 47.37 | ||

Ulam number | 44 | 6.350 | 60.81 | 2.526 | 86.56 | ||

Numeri ideoni | 65 | 2.594 | 92.54 | 2.584 | 85.89 | ||

Fibonacci number | 100 | 1.029 | 99.45 | 1.027 | 98.46 | ||

Partition number | 94 | 1.394 | 99.24 | 1.513 | 95.86 |

The definition, origin, and comments on the mathematical interest of a great part of these integer sequences have been discussed in Hürlimann [

All of the 19 considered integer sequences are quite well fitted by the new PB distribution. For 14 sequences the minimum chi-square is the smallest among the three comparative values and in the other 5 cases its value does not differ much from the chi-square of the TSPB distribution ( bold cells in Table

A strong numerical evidence for the Benford property for the Fibonacci, Bell, Catalan, and partition numbers is observed (corresponding italic cells in Tables

The sequence of primes merits a deeper analysis. The Benford property for it has long been studied. Diaconis (1977) [

First digit distributions of prime number sequences with optimal cutoff.

Sample size | First digit occurrences | ||||||||
---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |

25 | 4 | 3 | 3 | 3 | 3 | 2 | 4 | 2 | 1 |

168 | 25 | 19 | 19 | 20 | 17 | 18 | 18 | 17 | 15 |

1216 | 160 | 146 | 139 | 139 | 131 | 135 | 125 | 127 | 114 |

9486 | 1193 | 1129 | 1097 | 1069 | 1055 | 1013 | 1027 | 1003 | 900 |

77736 | 9585 | 9142 | 8960 | 8747 | 8615 | 8458 | 8435 | 8326 | 7468 |

657934 | 80020 | 77025 | 75290 | 74114 | 72951 | 72257 | 71564 | 71038 | 63675 |

5701502 | 686048 | 664277 | 651085 | 641594 | 633932 | 628206 | 622882 | 618610 | 554868 |

Best and linear best Pareto Benford fit for prime number sequences.

Sample size | PB Parameters | PB best fit | PB linear best fit | |||
---|---|---|---|---|---|---|

alpha | beta | chi-square/sample size | chi-square/sample size | |||

25 | 23.13952 | 2.14449 | 7.396% | 93.30 | 91.01 | |

168 | 22.99754 | 2.28436 | 0.198% | 99.93 | 97.10 | |

1216 | 30.15504 | 2.25800 | 0.175% | 90.76 | 93.34 | |

9486 | 32.59544 | 2.28442 | 0.172% | 1.20 | 23.86 | |

77736 | 33.26550 | 2.31262 | 0.175% | 0.00 | 0.00 | |

657934 | 33.82622 | 2.32908 | 0.185% | 0.00 | 0.00 | |

5701502 | 34.28132 | 2.34148 | 0.188% | 0.00 | 0.00 |

Finally, it might be worthwhile to mention another recent intriguing result by Kafri [