IJMMS International Journal of Mathematics and Mathematical Sciences 1687-0425 0161-1712 Hindawi 10.1155/2017/3571419 3571419 Research Article A Geometric Derivation of the Irwin-Hall Distribution Marengo James E. 1 http://orcid.org/0000-0003-0251-8770 Farnsworth David L. 1 Stefanic Lucas 1 Kalla Shyam L. School of Mathematical Sciences Rochester Institute of Technology Rochester NY 14623 USA rit.edu 2017 1892017 2017 23 05 2017 04 08 2017 17 08 2017 1892017 2017 Copyright © 2017 James E. Marengo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Irwin-Hall distribution is the distribution of the sum of a finite number of independent identically distributed uniform random variables on the unit interval. Many applications arise since round-off errors have a transformed Irwin-Hall distribution and the distribution supplies spline approximations to normal distributions. We review some of the distribution’s history. The present derivation is very transparent, since it is geometric and explicitly uses the inclusion-exclusion principle. In certain special cases, the derivation can be extended to linear combinations of independent uniform random variables on other intervals of finite length. The derivation adds to the literature about methodologies for finding distributions of sums of random variables, especially distributions that have domains with boundaries so that the inclusion-exclusion principle might be employed.

1. Introduction

The simple continuous uniform or rectangular distribution Uniform(0, 1) with probability density function (PDF) f(x)=1 for 0<x<1 and f(x)=0 otherwise is very important. Two applications arise in numerical simulation and Bayesian analysis of proportions. If F is the cumulative distribution function (CDF) of the continuous random variable X, then the random variable Y=F(X) has a Uniform(0,1) distribution. The random variable X can be simulated by first simulating Y and then letting X=F1(Y). This is called the inversion method ([1, page 295], [2, pages 194–196]). The transformation is called the probability integral transformation (, [4, pages 203-204]). The uniform distribution is a Bayesian noninformative prior distribution for the distribution of a random variable defined on the unit interval, such as a beta distribution for a proportion ([2, page 33], [5, pages 82–90]). For other applications and generalizations of the uniform distribution, see .

The present goal is to derive the CDF and the PDF of the sum T=i=1nXi, where Xi are independent identically distributed Uniform(0, 1) random variables for i=1,2,,n. The CDF and PDF are(1)Ft=i=0n-1init-inn!sit,(2)ft=i=0n-1init-in1n-1!sit,respectively, where sa(t) is the unit step function(3)sat=0t<a1at.The derivation in Section 2 is geometric and explicitly uses the inclusion-exclusion principle.

Derivations of the distribution, which more recently acquired its name Irwin-Hall, go back to Lagrange and Laplace in the latter 18th century and the early 19th century. Lagrange used generating functions based on ax to obtain the distribution of T ([9, pages 603–612], [10, page 283]). Those generating functions are a predecessor of characteristic functions [10, page 286]. Laplace often revisited the problem of finding the distribution of T and employed many methods ([9, pages 714-715], [10, pages 286–301]). The distribution is described in [1, pages 296–300], where it is called the Irwin-Hall distribution.

Some derivations employ characteristic functions in a variety of ways, since the characteristic function of a sum of independent random variables is the product of each summand’s characteristic function and the inverse transform is not intractable ([11, pages 188-189], , [15, pages 362-363], [16, 17]). Others utilize the convolution integral for sums and mathematical induction ([4, page 225], [11, pages 190-191 and 244–246], ). The distribution of the sum of uniform random variables that may have differing domains is found in . Sums of dependent uniform random variables are examined in [22, 23].

Direct integration techniques can be used to obtain the distribution of a linear combination of Uniform(0, 1) random variables ([15, pages 358–360], [24, 25]). Similar techniques are used in  for uniform distributions whose domains are intervals with zero as their left endpoints. The distribution of the mean is obtained when all the constants are 1/n. In this case, the distribution is called the Bates distribution ([1, page 297], ), which can also be found by a simple transformation of the Irwin-Hall distribution ([15, page 359], [25, page 241]). Using moment generating functions, instead of characteristic functions, Gray and Odell  found the distribution of any linear combination of uniform random variables with different domains allowed. In Section 3, the present method or style of proof is extended to those cases giving the same distributions.

Because T is a sum, the Irwin-Hall distribution approximates a normal distribution with a spline, since the Irwin-Hall distribution in (2) is composed of polynomials. The support of T is the interval [0,n]; the mean, mode, and median of T are n/2; and its variance is n/12. By symmetry, all odd central moments are zero, including skewness. The kurtosis is 3-6/(5n) [1, page 300]. This is the measure of kurtosis that is 3 for a normal distribution, so Irwin-Hall distributions are platykurtic, and the kurtosis is close to 3 for large n. According to the Central Limit Theorem,(4)Z=T-n/2n/12DDINormal0,1asn([4, pages 280–283], [11, pages 213–218 and 245], [29, pages 220–222]). Figure 1 contains a normal distribution with mean n/2=3/2 and variance n/12=3/12=1/4 and its approximating Irwin-Hall distribution with n=3. The approximation is very good even for this small value of n . The uniform error bound for the normal(0, 1) CDF Φ(z) is(5)Fz-Φz320n(, [32, page 51]). Approximations with spline fitting can be useful with or without complete information about the distributional shape [33, 34].

Irwin-Hall distribution with n=3 and the matching normal distribution with mean 3/2 and variance 1/4.

Since round-off errors for random variables that are rounded to the nearest integer are distributed Uniform(−1/2, 1/2), the sum of round-off errors is a linearly transformed Irwin-Hall distribution . For large n, the sum of round-off errors is easily described with a normal distribution [29, page 222]. For small n, the Irwin-Hall distribution is also appropriate and not too complicated.

Lee et al.  use the Irwin-Hall distribution to examine the efficacy of goodness-of-fit tests. Heinrich et al.  adapt the Irwin-Hall distribution in consideration of the accumulated accuracy of round-off errors. Inequalities for linear combinations of independent random variables whose domains have an upper bound are given in .

2. Derivation of the Irwin-Hall Distribution Theorem 1.

Let Xi for i=1,2,,n be independent random variables, each having the continuous uniform distribution on the unit interval, and let T=i=1nXi. Then, the CDF and PDF of T are given by (1) and (2), respectively.

Proof.

For m{0,1,2,,n-1} and t[m,m+1), let(6)Ant=x1,x2,,xn:xi0fori1,2,,n,i=1nxit,Bjt=x1,x2,,xnAnt:xj>1,Cn=x1,x2,,xn:0xi1,which is the n-dimensional unit cube. The set complement of Cn with respect to Rn is denoted by Cn.

The hypervolume of the n-dimensional solid An(t) has value(7)VolAnt=tnn!, since the solid is a standard orthogonal simplex from the corner of an n-cube. Similarly, if k{1,2,,m}, then the hypervolume of j=1kBj(t) is(8)Volj=1kBjt=t-knn!.For k{m+1,m+2,,n},(9)j=1kBjt=φ,Volj=1kBjt=0,since the sum of nonnegative coordinates exceeds the number of coordinates which are greater than 1.

By the inclusion-exclusion principle,(10)Ft=PTt=VolAntCn=VolAnt-VolAntCn=VolAnt-Volj=1nBjt=tnn!-k=1m1k11j1<j2<<jknVolBj1tBj2tBjkt=tnn!-k=1m1k1nkVolB1tB2tBkt=tnn!-k=1m-1k1nkt-knn!=k=0m-1knkt-knn!.

In (1), F(n) is the Stirling number of the second kind with both parameters equal to n and has numerical value 1 [39, pages 38-39]. If tn, then CnAn(t), so F(t)=1 in this case. Since F is a polynomial, k=0n(1)knk(tk)n/n!=1 for all real-valued t. Introducing the unit step function gives (1), and differentiation with respect to t gives (2).

3. Discussion and a Generalization

Figures 2 and 3 reveal the structure of the CDF(11)Ft=12t2s0t-t-12s1t+12t-22s2tfor n=2. Figure 2 demonstrates how the hyperplane (line), which is the line of a constant sum of the values of the random variables and is perpendicular to the n-cube’s (square’s) main diagonal, accrues volume (area) below it. Figure 3 illustrates the regions that are included and excluded for various positions of the hyperplane (line) and how vertices are meet in sets. For n=2, the binomial coefficients, which provide the counts of the vertices, are 1 for (0,0), 2 for (1,0) and (0,1), and 1 for (1,1), as seen in Figures 2 and 3. In (11), the first term is the area of the large triangle in Figures 3(a), 3(b), and 3(c); the second term is the sum of the areas of the two hatched triangles in Figure 3(b), where exactly one of {x1,x2} is greater than 1, and in Figure 3(c); and the third term is the area of the crosshatched triangle in Figure 3(c), where both x1 and x2 are greater than 1.

The CDF F(t) increases as t increases.

Computing the CDF for n=2 for increasing values of t.

0 t < 1

1 t < 2

t 2

Figure 4 shows the same geometric interpretation for n=3. In its CDF(12)Ft=16t3s0t-12t-13s1t+12t-23s2t-16t-33s3t,the first term is the volume using (7) of the large orthogonal simplex in Figures 4(a), 4(b), and 4(c) with edges of length t. The second term is the sum of the volumes using (8) of the three orthogonal simplexes, where exactly one of {x1,x2,x3} is greater than 1. In Figure 4(b), the vertices P1,P2,P3, and P4 of the simplex with x1>1 are labeled. Their coordinates are P1:(t,0,0), P2:(1,0,t-1),P3:(1,t-1,0), and P4:(1,0,0). The lengths of the edges P1P4,P2P4, and P3P4 are t-1. The third term of (12) is the sum of the three volumes using (8), where exactly two of {x1,x2,x3} are greater than 1. In Figure 4(c), the vertices are labeled P3,P5,P6, and P7 in the region where both x1 and x2 are greater than 1. Their coordinates are P3:(1,t-1,0), P5:(t-1,1,0),P6:(1,1,t-2), and P7:(1,1,0). The lengths of the edges P3P7,P5P7, and P6P7 are t-2. The fourth term is the region that is shared by all the other regions, analogous to the crosshatched region in Figure 3(c).

Computing the CDF for n=3 for increasing values of t.

0 t < 1

1 t < 2

2 t < 3

In the same way, for any n, the terms are the n-volumes of orthogonal n-simplexes, whose multiplicity is counted by binomial coefficients determined by the number of vertices of the n-cube in sets as the “moving” n-1-dimensional hyperplane “passes” them as t increases. The hyperplane is perpendicular to the diagonal line x1=x2=x3==xn. The volumes of the simplexes are computed using (7) and (8).

The Website  has a free simulator for T, where selecting n yields the PDF (2). Other calculators are at [41, 42].

The method of proof in Section 2 can be extended to linear combinations of uniform random variables on different intervals. Suppose that X1,X2,,Xn are independent, that Xk is uniformly distributed on the interval [ak,bk], and that c1,c2,,cn are real constants. Also, (13)Pk=1nckXkt=Pk=1ndkYkt,where(14)Yk=Xk-akbk-ak,dk=ckbk-ak,t=k=1nakck.Then, Y1,Y2,,Yn are independent uniform random variables on [0,1], and P(k=1nckXkt) can be interpreted as the hypervolume of the solid that consists of all points that lie inside the unit hypercube [0,1]n and on one side of the hyperplane k=1ndkYk=t. Now, proceed by inclusion-exclusion as in Section 2. In general, the formula for P(k=1nckXkt) is complicated because of the lack of symmetry that is caused by the presence of d1,d2,,dn. This increases the number cases and removes the congruence of the solids of each size whose hypervolumes need to be added or subtracted at each stage of the inclusion-exclusion process. Nevertheless, the correct distribution is obtained in this manner. A special case in which these problems disappear is d1=d2==dn=d, so that(15)Pk=1nckXkt=Ftdford>01-Ftdford<0,where F is given in (1).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Johnson N. L. Kotz S. Balakrishnan N. Continuous Univariate Distributions 1995 2 2nd New York, NY, USA John Wiley & Sons MR1326603 Koch K. R. Introduction to Bayesian Statistics 2010 2nd Berlin, Germany Springer Quesenberry C. P. Kotz S. Johnson N. L. Read C. B. Probability integral transformations Encyclopedia of Statistical Sciences 1986 7 New York, NY, USA Wiley 225 231 Rohatgi V. K. An Introduction to Probability Theory and Mathematical Statistics 1976 New York, NY, USA John Wiley & Sons MR0407916 Zbl0354.62001 Berger J. O. Statistical Decision Theory and Bayesian Analysis 1985 2nd New York, NY, USA Springer Springer Series in Statistics 10.1007/978-1-4757-4286-2 MR804611 Silva P. C. Cerdeira J. O. Martins M. J. Monteiro-Henriques T. Data depth for the uniform distribution Environmental and Ecological Statistics 2014 21 1 27 39 10.1007/s10651-013-0242-7 MR3170532 2-s2.0-84894542286 Jayakumar K. Sankaran K. K. On a generalization of uniform distribution and its properties Statistica 2016 76 1 83 91 Dettmann C. P. Roychowdhury M. K. Quantization for uniform distributions on equilateral triangles Real Analysis Exchange 2017 42 1 149 166 Pearson E. S. The History of Statistics in the 17th and 18th Centuries, Against the Changing Background of Intellectual, Scientific and Religious Thought, Lectures by Karl Pearson given at University College London during the Academic Sessions 1978 New York, NY, USA Macmillan Sheynin O. B. Finite random sums (a historical essay) Archive for History of Exact Sciences 1972/1973 9 4-5 275 305 10.1007/BF00348365 MR0448464 2-s2.0-34250429491 Cramer H. Mathematical Methods of Statistics 1946 Princeton, NJ, USA Princeton Mitra S. K. Banerjee S. N. On the probability distribution of round-off errors propagated in tabular differences The Australian Computer Society. The Australian Computer Journal 1971 3 2 60 68 MR0292331 Irwin J. O. On the frequency distribution of the means of samples from a population having any law of frequency with finite moments, with special reference to Pearson's Type II Biometrika 1927 19 3/4 225 239 10.2307/2331960 Lowan A. N. Laderman J. On the distribution of errors in Nth tabular differences The Annals of Mathematical Statistics 1939 10 4 360 364 10.1214/aoms/1177732147 MR0000759 Stuart A. Ord J. K. Kendall’s Advanced Theory of Statistics 1987 1 5th New York, NY, USA Oxford Kruglov V. M. On one identity for distribution of sums of independent random variables Theory of Probability and its Applications 2014 58 2 329 331 10.1137/S0040585X97986576 MR3300562 2-s2.0-84902815174 Potuschak H. Müller W. G. More on the distribution of the sum of uniform random variables Statistical Papers 2009 50 1 177 183 10.1007/s00362-007-0050-y MR2476177 Olds E. G. A note on the convolution of uniform distributions Annals of Mathematical Statistics 1952 23 2 282 285 10.1214/aoms/1177729446 MR0048750 Mitra S. K. On the probability distribution of the sum of uniformly distributed random variables SIAM Journal on Applied Mathematics 1971 20 2 195 198 10.1137/0120026 MR0307326 Bradley D. M. Gupta R. C. On the distribution of the sum of n non-identically distributed uniform random variables Annals of the Institute of Statistical Mathematics 2002 54 3 689 700 10.1023/A:1022483715767 MR1932412 Sadooghi-Alvandi S. M. Nematollahi A. R. Habibi R. On the distribution of the sum of independent uniform random variables Statistical Papers 2009 50 1 171 175 10.1007/s00362-007-0049-4 MR2476176 2-s2.0-59049087539 Murakami H. A saddlepoint approximation to the distribution of the sum of independent non-identically uniform random variables Statistica Neerlandica. Journal of the Netherlands Society for Statistics and Operations Research 2014 68 4 267 275 10.1111/stan.12032 MR3271981 2-s2.0-84911430584 Lo G. S. Sangare H. Ndiaye C. . A review on asymptotic normality of sums of associated random variables Afrika Statistika 2016 11 1 855 867 10.16929/as/2016.855.79 MR3491567 Barrow D. L. Smith P. W. Classroom Notes: spline notation applied to a volume problem American Mathematical Monthly 1979 86 1 50 51 10.2307/2320304 MR1538918 Hall P. The distribution of means for samples of size N drawn from a population in which the variate takes values between 0 and 1, all such values being equally probable Biometrika 1927 19 3/4 240 245 10.2307/2331961 JFM53.0518.05 Roach S. A. The frequency distribution of the sample mean where each member of the sample is drawn from a different rectangular distribution Biometrika 1963 50 508 513 10.1093/biomet/50.3-4.508 MR0163381 Bates G. E. Joint distributions of time intervals for the occurrence of successive accidents in a generalized Polya scheme Annals of Mathematical Statistics 1955 26 4 705 720 10.1214/aoms/1177728429 MR0076265 Gray H. L. Odell P. L. On sums and products of rectangular variates Biometrika 1966 53 3-4 615 617 10.1093/biomet/53.3-4.615 MR0211507 Hogg R. V. McKean J. W. Craig A. T. Introduction to Mathematical Statistics 2005 6th Upper Saddle River, NJ, USA Pearson-Prentice Hall Hoyt J. P. The teacher's corner: a simple approximation to the standard normal probability density function American Statistician 1968 22 2 25 26 10.1080/00031305.1968.10480455 2-s2.0-16344366561 Allasia G. Approximation of the normal distribution function by means of a spline function Statistica 1981 41 2 325 332 MR655026 Patel J. K. Read C. B. Handbook of the Normal Distribution 1982 2nd New York, NY, USA Marcel Dekker Statistics: Textbooks and Monographs MR664762 Muminov M. S. Soatov K. A note on spline estimator of unknown probability density function Open Journal of Statistics 2011 1 3 157 160 10.4236/ojs.2011.13019 MR2922294 Muminov M. S. Soatov K. S. On the approximation of maximum deviation spline estimation of the probability density Gaussian process Open Journal of Statistics 2015 5 4 334 339 10.4236/ojs.2015.54034 Lee C. Kim S. Jeong J. A view on the validity of central limit theorem: an empirical study using random samples from uniform distribution Communications for Statistical Applications and Methods 2014 21 6 539 559 10.5351/CSAM.2014.21.6.539 Heinrich L. Pukelsheim F. Wachtel V. The variance of the discrepancy distribution of rounding procedures, and sums of uniform random variables Metrika. International Journal for Theoretical and Applied Statistics 2017 80 3 363 375 10.1007/s00184-017-0609-0 MR3621801 Zbl1360.62057 2-s2.0-85009881948 Rio E. Exponential inequalities for weighted sums of bounded random variables Electronic Communications in Probability 2015 20 77 1 10 10.1214/ECP.v20-4204 MR3417449 Stein P. Classroom Notes: a note on the volume of a simplex American Mathematical Monthly 1966 73 3 299 301 10.2307/2315353 MR1533698 Liu C. L. Introduction to Combinatorial Mathematics 1968 New York, NY, USA McGraw-Hill MR0234840 2017, http://www.math.uah.edu/stat/special/IrwinHall.html 2017, http://www.distributome.org/V3/calc/IrwinHallCalculator.html 2017, http://randomservices.org/distributions/IrwinHall/Calculator.html