The limiting distribution of the size of binary interval tree is investigated. Our illustration is based on the contraction method, and it is quite different from the case in one-sided binary interval tree. First, we build a distributional recursive equation of the size. Then, we draw the expectation, the variance, and some high order moments. Finally, it is shown that the size (with suitable standardization) approaches the standard normal random variable in the Zolotarev metric space.
1. Introduction
Random trees are usually generated based on combinatorics and occur also in the context of algorithms from computer science. There are many kinds of random trees with different structures, such as recursive trees, search trees, binary trees, and interval trees. The asymptotic probability behavior of random variables in random trees has attracted more scholars’ attention and has become a popular research area. Drmota [1] introduced some labelled and unlabelled random trees in his book. Devroye and Janson [2] studied the protected nodes in several random trees. Feng and Hu [3] researched the phase changes of scale-free trees. The limiting law for the height, size, and subtree of binary search trees was also considered (see [4–6]). There were also some researchers investigating the Zagreb index and nodes of random recursive trees (see [7–9]).
Binary interval tree is a random structure that underlies the process of random division of a line interval and parking problems. It has recently been a popular subject. Sibuya and Itoh [10] showed that the number of internal and external nodes in different levels of binary internal tree is asymptotically normal, from which the asymptotic normality of the size of the tree could not be achieved directly. Prodinger [11] looked into various parameters of the incomplete trie, a one-sided version of a random tree with a digital flavor. Fill et al. [12] followed with a study of the nonexistence of limit distribution for the height of the incomplete trie. Itoh and Mahmoud [13] considered five incomplete one-sided variants of binary interval trees and proved that their sizes all approach some normal random variables. Janson [14] drew the same result for a larger scale of one-sided interval trees by the renewal theory, and one kind of fragmentation trees was discussed by Janson and Neininger [15]. Javanian et al. [16] investigated the paths in m-ary interval trees. Su et al. [17] studied the complete binary interval trees and got the Law of Large Numbers. In addition, Pan et al. [18] considered the construction algorithm about binary interval trees.
The binary interval tree is a tree associated with repeated divisions of a line interval of length x. The process of divisions is as follows. If x<1, there is no division in effect; the associated interval tree consists only of one terminal node. Supposing that x≥1, we begin with the interval (0,x). Divide the interval (0,x) into two subintervals by choosing Ux, a point uniformly distributed over the interval (0,x). Then, we get two intervals, (0,Ux) and (Ux,x). Each of the two subintervals is further divided at a uniform point of its length, and two smaller subintervals are got as before. If the length of the subinterval is less than 1, we stop the division. Repeat this process until the length of every interval (or subinterval) is less than 1.
We take x=4, for instance. Figures 1(a) and 1(b) show how the above random division process of interval generates a binary interval tree.
(a) The division process. (b) The binary interval tree.
If some different conditions are added and those intervals satisfying the conditions are not allowed to be divided (see [13, 14]), then we can get different incomplete interval trees. In particular, if we only divide one subinterval of every interval, then the interval tree we get is the so-called one-sided interval tree (see [13]).
It is obvious that interval tree could embody many properties of random division, so it can elicit lots of valuable subjects related to probability. For example, for x>0, the height of the interval tree is the greatest level of all subintervals after the divisions, denoted by Hx; the total number of nodes of an interval tree is the total number of intervals that were got from the random division process, and so on. Let Sx be the size of the interval trees, that is, the total number of nodes of the binary interval trees. Our intention is to investigate the random variable Sx, the size of binary interval trees.
In this paper, the central limit theorem of the size of binary interval trees is investigated. In view of the difficulty to calculate the moment generating function of Sx, the method we used is completely different from that in the case of one-sided interval trees. In Section 2, we build a distributional recursive equation of Sx and give the expectation, the variance, and some high order moments of Sx. In Section 3, via the contraction method, the limit law of Sx is shown to approach the unique solution of a fixed-point distributional equation in the Zolotarev metric space. Finally, we demonstrate that Sx, with suitable standardization, converges to a normal limiting random variable, as x→∞.
2. The Moments of Sx
Compared with the one-sided interval trees, the properties of binary interval trees are much more complex. There are a lot of difficulties when it comes to obtaining the moment generating function of Sx. Therefore, the method used in the case of one-sided interval trees (see [13]) is no longer applicable. Here, we build a distributional recursive equation of Sx. We can calculate the expectation and the variance of Sx. Furthermore, we find that the order of the fourth central moment of Sx is O(x2) as x goes to infinity.
From the definition of binary interval tree, it is easy to see that S1=3 and Sx=1, for x<1. For our purpose to investigate the case of x≥1, let Ux denote the point chosen uniformly from interval (0,x); hence, Ux~U(0,x). For any fixed real number 0<u<x, if Ux=u, we denote Su(1) to be the size of the left subtree associated with the interval (0,u). Correspondingly, Sx-u(2) denotes the size of the right subtree associated with the interval (u,x). According to the rule of division, we can see that Su(1) and Sx-u(2) are mutually independent. Thus, we have (1)SxU=u=d1+Su1+Sx-u2,∀0<u<x. This formula implies that if Ux=u is given, Sx has the same distribution as 1+Su(1)+Sx-u(2). Obviously, we can rewrite the above formula as(2)Sx=d1+SUx1+Sx-Ux2.Define (3)m1x≔ESx;m2x≔ESx2. It is easy to see that (4)m11=ES1=3;m21=9;m1x=m2x=1,0<x<1.
From the distributional recursive equation (2) and the above boundary conditions, Su et al. [17] calculated the expectation ESx and the variance VarSx, for any x≥0.
Lemma 1.
Let Sx be the size of a binary interval tree. Then(5)ESx=m1x=4x-1,x≥1.
Lemma 2.
Let Sx be the size of a binary interval tree. Then(6)VarSx=32xlnx-16x2+8x+8,1≤x≤2,32ln2-20x,x≥2.
In order to prove that the asymptotic distribution of Sx is normal, we also need the order of E(Sx-ESx)4 as x→∞. The following proposition shows the fourth central moment of Sx.
Proposition 3.
Let Sx be the size of a binary interval tree. Then (7)ESx-ESx4=Ox2,x⟶∞.
Proof.
See the appendix.
3. The CLT for Sx
In this section, we will prove the asymptotic normality of Sx as x→∞. The main method is the contraction method and some metrics are needed especially the Zolotarev metrics (see [19]).
First we introduce the Zolotarev metrics. Denote the distribution of the random variable X by L(X). Let D be the set of the distributions of all real random variables, and define (8)D∗=F:F∈D,∫RxdFx=0,∫Rx2dFx=1,∫Rx3dFx<∞.
It can be verified that random variable Z with L(Z)=N(0,σ2) satisfies the following formula. For any u∈[0,1], (9)Z=dZu+Z¯1-u, and more generally, we have the following lemma.
Lemma 4.
If Z and Z¯ are standard normal random variables, U is uniformly distributed over interval [0,1], and (U,Z,Z¯) are mutually independent and then one has(10)Z=dZU+Z¯1-U.
Proof.
In fact, for any u∈[0,1], we have (11)EexpituZ+1-uZ¯=EexpituZ+1-uZ¯=EeituZEeit1-uZ¯=exp-ut22exp-1-ut22=exp-t22. Therefore, (12)EexpitUZ+1-UZ¯=∫01EexpituZ+1-uZ¯du=∫01exp-t22du=exp-t22. But, we can find that, in the set D∗, there is only one distribution, the standard normal N(0,1), satisfying (10).
Suppose that m is a nonnegative integer. Denote F(m) by the set of all real functions that are m times continuous and differentiable, defined on the real line. Let (13)Fαm≔f:f∈Fm,fmx-fmy≤x-yα, where 0<α≤1 is a fixed real number. Let s=m+α and(14)ζsX,Y≔ζsLX,LY=supEfX-EfY:f∈Fαm,and then ζs is the Zolotarev metrics with order s on the set D. According to the properties of the Zolotarev metric, we know(15)ζsX,Y<∞⟺EXs+EYs<∞,EXk=EYk,k=1,…,m. Therefore, we can choose ζ3 as the metric we need on the subset D∗ (see [20, 21]); that is, m=2, α=1. This is due to the fact that, for any L(X)∈D∗ and L(Y)∉D∗, we have ζ3(X,Y)=∞, but if L(X),L(Y)∈D∗, then ζ3(X,Y)<∞.
The metric ζs(X,Y) has several properties as follows (see [20]):
For any constant c>0,(16)ζscX,cY=csζsX,Y;
if random variables Y and (X1,X2) are mutually independent, then(17)ζsX1+Y,X2+Y≤ζsX1,X2;
for random variables X and {Xn,n=1,2,3,…},(18)ζsXn,X⟶0⟹Xn→dX.
Now, we begin to prove the main result in this paper.
Theorem 5.
Let Sx be the size of a binary interval tree. Then, as x→∞, (19)Sx-ESxVarSx⟶dN0,1.
Proof.
Denote(20)Sx∗≔Sx-4x-132ln2-20x,x>0,hx≔32xlnx-16x2+8x+832ln2-20x,x>0. Then from Lemmas 1 and 2, we know that(21)Sx∗=Sx-ESxVarSx,x≥2,Sx-ESxVarSx·hx,1≤x<2,2-4x32ln2-20x,0<x<1.So, we have L(Sx∗)∈D∗ for x≥2 and L(Sx∗)∉D∗ for 0<x<2.
According to the correlative inequality in [21], for any L(X)∈D∗,L(Y)∈D∗, (22)ζ3X,Y≤Γ2Γ4∫Rt3dPX<t-PY<t, where Γ is the gamma function. Assume that the distribution of random variable Z is N(0,1). It follows from Proposition 3 that(23)supx≥4ESx∗4<∞. Therefore, there exists a constant C>0 such that(24)supx≥4ζ3Sx∗,Z≤Csupx≥4ESx∗3+EZ3<∞.Denote(25)ax≔ζ3LSx∗,Φ=ζ3Sx∗,Z,where Φ is standard normal distribution and Z is standard normal random variable; then we can see that(26)0≤b≔limsupx→∞ax<∞.Now, we just need to prove that b=0; then the theorem follows.
Suppose that x≥4; by (A.1) and (21), we have (27)Sx∗Ux=t=Sx-4x-132ln2-20xUx=t=dSt1-ESt132ln2-20x+Sx-t2-ESx-t232ln2-20x,2≤t≤x-2,St1-ESt132ln2-20x+Sx-t2-ESx-t232ln2-20x,1≤t<2,St1-ESt132ln2-20x+Sx-t2-ESx-t232ln2-20x,x-2<t≤x-1,Sx-t2-ESx-t2-4t-232ln2-20x,0<t<1St1-ESt1-4x-t-232ln2-20x,x-1<t<x=dSt∗tx+S~x-t∗x-tx,2≤t≤x-2;St∗httx+S~x-t∗x-tx,1≤t<2,St∗tx+S~x-t∗hx-tx-tx,x-2<t≤x-1,S~x-t∗x-tx-4t-232ln2-20x,0<t<1;St∗tx-4x-t-232ln2-20x,x-1<t<x, where Ux=t is the first point chosen from interval (0,x) and {S~x∗,x>0} is an independent copy of {Sx∗,x>0}.
If we denote U≔Ux/x, then U~U(0,1) and we can rewrite the above formula as(28)Sx∗U=u=Sx-4x-132ln2-20xU=u(29)=dSux∗u+S~1-ux∗1-u,2x≤u≤1-2x;Sux∗huxu+S~1-ux∗1-u,1x≤u<2x,Sux∗u+S~1-ux∗h1-ux1-u,1-2x<u≤1-1x,S~1-ux∗1-u-4ux-232ln2-20x,0<u<1x;Sux∗u-41-ux-232ln2-20x,1-1x<u<1.
According to the definition of D∗ and ζ3, it could be found that Sx∗∣Ux=t∈D∗ for 2<t<x-2 and Sx∗∈D∗. If we define Sx′≔Sx∗∣Ux<2orUx>x-2, then we can also see that Sx′∈D∗. Furthermore, E((Sx′)4)≤C1 for some positive constant C1 by conditioning on Ux and using the similar calculation in the appendix. Hence,(30)ζ3Sx′,Z≤β, for some positive constant β.
As we had pointed out before, the standard normal distribution is the only distribution satisfying (10) in the set D∗. From (25), (14), and Lemma 4, for x>4, we have (31)ax=ζ3Sx∗,Z≤ζ3Sx′,Z·4x+∫2/x1-2/xζ3Sx∗∣U=u,Zdu≤4βx+∫2/x1-2/xζ3Sxu∗u+S¯x1-u∗1-u,Zu+Z¯1-uduBy 15 and 29≤4βx+∫2/x1-2/xζ3Sxu∗u+S¯x1-u∗1-u,Zu+S¯x1-u∗1-udu+∫2/x1-2/xζ3Zu+S¯x1-u∗1-u,Zu+Z¯1-udu≤4βx+∫2/x1-2/xζ3Sxu∗u,Zudu+∫2/x1-2/xζ3S¯x1-u∗1-u,Z¯1-udu=4βx+2∫2/x1-2/xζ3Sxu∗u,Zudu=4βx+2∫2/x1-2/xu3/2ζ3Sxu∗,Zdu=4βx+2∫2/x1-2/xu3/2axudu. Given ε>0, let δ>0 be small enough such that βδ5/2<ε/8. For any fixed δ>0, when x is sufficiently large, then (32)4βx<ε10,2x<δ,supδ≤u≤1axu<b+ε. Thus, (33)2∫2/xδu3/2axudu≤2β∫2/xδu3/2du≤2β∫0δu3/2du=4βδ5/25<ε10;2∫δ1-2/xu3/2axudu≤2b+ε∫δ1-2/xu3/2du≤2b+ε∫01u3/2du<4b+ε5, where β is the constant as before and x is sufficiently large. It implies that (34)ax≤4βx+2∫2/xδu3/2axudu+2∫δ1-2/xu3/2axudu<ε10+ε10+4b+ε5<4b5+ε, when x is sufficiently large. Therefore, (35)b≔limsupx→∞ax≤4b5+ε. From this equation and the arbitrariness of ε>0, we can conclude b=0 and (36)limx→∞ζ3Sx∗,Z=limx→∞ax=0 immediately. By (18), the theorem holds.
AppendixProof of Proposition 3
From the process of generating the binary interval trees, it is obvious that, for given Ux=t, t∈(0,x),(A.1)Sx-ESxUx=t=dSt1-ESt1+Sx-t2-ESx-t2,1≤t≤x-1;Sx-t2-ESx-t2-4t-2,0<t<1;St1-ESt1-4x-t-2,x-1<t<x, where Ux=t is the first point chosen from interval (0,x). For x≥1, if we denote (A.2)Tx≔Sx1-ESx1,Tx∗≔Sx2-ESx2, then we have (A.3)TxUx=t=dTt+Tx-t∗,1≤t≤x-1;Tx-t∗-4t-2,0<t<1;Tt-4x-t-2,x-1<t<x. We need to calculate ETx3 first before we get ETx4. For x>3, we have (A.4)ETx3=EETx3∣Ux=1x∫01ETx-t-4t-23dt+1x∫x-1xETt-4x-t-23dt+1x∫1x-1ETt+Tx-t∗3dt=2x∫x-1xETt-4x-t-23dt+1x∫1x-1ETt+Tx-t∗3dt=-2x∫x-1x4x-t-23dt+6x∫x-1x4x-t-22ETtdt-6x∫x-1x4x-t-2ETt2dt+2x∫x-1xETt3dt+1x∫1x-1ETt3+3ETt2Tx-t∗+3ETtTx-t∗2+ETx-t∗3dt. In view of the independence between Tt and Tx-t∗ and that E[Tt]=E[Tt∗]=0 holds for any 1≤t≤x-1, we have (A.5)ETx3=-2x∫x-1x4x-t-23dt-6x∫x-1x4x-t-2ETt2dt+2x∫x-1xETt3dt+2x∫1x-1ETt3dt=-2x∫x-1x4x-t-23dt-6x∫x-1x4x-t-2ETt2dt+2x∫1xETt3dt≔M1+M2+2x∫1xETt3dt. It is easy to see that (A.6)M1=-12x∫-22u3du=0, and when x>3, for the part M2, we have (A.7)M2=-6x∫x-1x4x-t-2VarStdt=-6x∫x-1x4x-t-232ln2-20tdt=-632ln2-20x∫014t-2x-tdt=232ln2-20x. Therefore, (A.8)ETx3=2x∫1xETt3dt+232ln2-20x,x>3. That is, (A.9)xETx3=2∫1xETt3dt+232ln2-20,x>3. Via differentiation with respect to x, we get the differential equation: (A.10)ETx3′-1xETx3=0,x>3.The solution to this differential equation is(A.11)ETx3=k0x,x>3,where k0 is a constant real number.
Similarly, for E[Tx]4, when x>4, we have (A.12)ETx4=2x∫x-1xETt-4x-t-24dt+1x∫1x-1ETt+Tx-t∗4dt. Because Tt is independent of Tx-t∗, and ETt=0 holds for any 1≤t≤x-1, we get (A.13)ETx4=2x∫x-1x4x-t-24dt+12x∫x-1x4x-t-22ETt2dt-8x∫x-1x4x-t-2ETt3dt+2x∫x-1xETt4dt+1x∫1x-1ETt4+ETx-t∗4+6ETtTx-t∗2dt=2x∫x-1x4x-t-24dt+12x∫x-1x4x-t-22ETt2dt-8x∫x-1x4x-t-2ETt3dt+2x∫1xETt4dt+6x∫1x-1ETtTx-t∗2dt≔I1+I2+I3+I4+I5.In particular, for the part I1, we have (A.14)I1=325x.When x>3, for the part I2, we have (A.15)I2=12x∫x-1x4x-t-22ETt2dt=12x∫x-1x4x-t-22VarStdt=12x∫x-1x4x-t-2232ln2-20tdt=1232ln2-20x∫014t-22x-tdt=48332ln2-20-2432ln2-203x≔12k1-6k1x, where k1≔4(32ln2-20)/3 is a constant.
When x>4, for the part I3, we have (A.16)I3=-8x∫x-1x4x-t-2ETt3dt=-8x∫x-1x4x-t-2k0tdt=-8k0x∫014t-2x-tdt=8k03x, where k0 is the same as that in (A.11).
When x>4, for the part I5, we have (A.17)I5=6x∫1x-1ETtTx-t∗2dt=6x∫12ETt2ETx-t∗2dt+6x∫x-2x-1ETt2ETx-t∗2dt+6x∫2x-2ETt2ETx-t∗2dt=12x∫12ETt2ETx-t∗2dt+6x∫2x-2ETt2ETx-t∗2dt. Noting that E[Tt2]=Var[St] and (6), we can see that (A.18)∫1xETt2ETx-t∗2dt=2∫1232tlnt-16t2+8t+832ln2-20x-tdt+∫2x-232ln2-20t32ln2-20x-tdt=1632ln2-202x3-4332ln2-202x2+1332ln2-20256ln2-168x+16932ln2-20≔a3x3+a2x2+a1x+a0, where (A.19)a0=16932ln2-20;a1=1332ln2-20256ln2-168;a2=-4332ln2-202;a3=1632ln2-202.Therefore, (A.20)ETx4=2x∫1xETt4dt+I1+I2+I3+I5,x>4. That is, (A.21)xETx4=2∫1xETt4dt+I1+I2+I3+I5x,x>4. Via differentiation with respect to x, we get the differential equation: (A.22)ETx4′-1xETx4=12k1x+63a3x+2a2+a11x,x>4. The solution to this differential equation is (A.23)ETx4=18a3x2+12a2xlnx+cx-6a1-12k1,x>4, where c is a constant and the constants k1,a1,a2,a3 are real numbers as defined before. From this equation, Proposition 3 follows.
Consent
Informed consent was obtained from all individual participants included in the study.
Conflict of Interests
The authors declare that they have no conflict of interests.
Acknowledgments
The authors are most grateful to the referee and the editor for their very thorough reading of the paper and valuable suggestions, which greatly improve the original results and presentation of this paper. Jie Liu’s work was supported by the National Natural Science Foundation of China (nos. 11101394, 71471168, and 71520107002), China Postdoctoral Science Foundation Funded Project (nos. 201104312 and 20100480688), Fund for the Doctoral Program of Higher Education Foundation (no. 20113402120005), and the Fundamental Research Funds for the Central Universities of China (no. WK2040160008). Yang Yang’s work was supported by the National Natural Science Foundation of China (no. 71471090), the Humanities and Social Sciences Foundation of the Ministry of Education of China (no. 14YJCZH182), China Postdoctoral Science Foundation (nos. 2014T70449 and 2012M520964), Natural Science Foundation of Jiangsu Province of China (no. BK20131339), the Major Research Plan of Natural Science Foundation of the Jiangsu Higher Education Institutions of China (no. 15KJA110001), Qing Lan Project, PAPD, Program of Excellent Science and Technology Innovation Team of the Jiangsu Higher Education Institutions of China, Project of Construction for Superior Subjects of Statistics of Jiangsu Higher Education Institutions, and Project of the Key Lab of Financial Engineering of Jiangsu Province.
DrmotaM.2009New York, NY, USASpringer10.1007/978-3-211-75357-6MR2484382DevroyeL.JansonS.Protected nodes and fringe subtrees in some random trees201419, article 610.1214/ecp.v19-30482-s2.0-84893535893FengQ.HuZ.Phase changes in the topological indices of scale-free trees201350251653210.1239/jap/1371648958ZBL1267.050652-s2.0-84879758545BroutinN.FlajoletP.The distribution of height and diameter in random non-plane binary trees201241221525210.1002/rsa.20393ZBL1250.050952-s2.0-84864040793DennertF.GrübelR.On the subtree size profile of binary search trees201019456157810.1017/s0963548309990630MR26474932-s2.0-77955773100MahmoudH. M.One-sided variations on binary search trees200355488590010.1007/BF02523399MR2028623ZBL1099.686012-s2.0-0242536504FengQ.HuZ.On the Zagreb index of random recursive trees20114841189119610.1239/jap/1324046027MR2896676ZBL1234.050532-s2.0-84855262225GrübelR.MichailowI.Random recursive trees: a boundary theory approach201520, article 3710.1214/ejp.v20-3832MahmoudH. M.WardM. D.Asymptotic properties of protected nodes in random recursive trees201552129029710.1239/jap/1429282623SibuyaM.ItohY.Random sequential bisection and its associated binary tree1987391698410.1007/BF02491450MR886507ZBL0622.600202-s2.0-51249170773ProdingerH.How to select a loser19931201–314915910.1016/0012-365x(93)90572-b2-s2.0-0000789371FillJ. A.MahmoudH. M.SzpankowskiW.On the distribution for the duration of a randomized leader election algorithm1996641260128310.1214/aoap/1035463332MR1422986ZBL0870.60018ItohY.MahmoudH. M.One-sided variations on interval trees200340365467010.1239/jap/1059060894MR1993259ZBL1043.050362-s2.0-0242676835JansonS.One-sided interval trees20043149164JansonS.NeiningerR.The size of random fragmentation trees20081423-439944210.1007/s00440-007-0110-1ZBL1158.600442-s2.0-49749141104JavanianM.MahmoudH. M.Vahidi-AslM.Paths in m-ary interval trees20042871–3455310.1016/j.disc.2004.06.0052-s2.0-4644250899SuC.LiuJ.HuZ. S.The Laws of large numbers for the size of complete interval trees2007362181188MR2362731PanW.LiS.-K.CaiX.ZengL.The construction algorithm of binary interval tree nodes based on lattice partition20112311151122ZolotarevV. M.1997Utrecht, The NetherlandsVSPZolotarevV. M.Approximation of distributions of sums of independent random variables with values in infinite-dimensional spaces197621472173710.1137/1121086RachevS. T.RüschendorfL.Probability metrics and recursive algorithms199527377079910.2307/1428133MR1341885