ON THE BIRTHDAY PROBLEM: SOME GENERALIZATIONS AND APPLICATIONS

We study the birthday problem and some possible extensions. We discuss the uni-modality of the corresponding exact probability distribution and express the moments and generating functions by means of conﬂuent hypergeometric functions U( − ; − ; − ) which are computable using the software Mathematica. The distribution is generalized in two possible directions, one of them consists in considering a random graph with a single attracting center. Possible applications are also indicated.


Introduction.
In the present paper, we study the birthday problem which may be formulated in various manners: consider, for example, a smoker who draws matches from a matchbox that contains initially n (unused) matches.Whenever he needs a match, he draws one at random and returns the used match into the box.He continues drawing matches until a used one is encountered for the first time.The probability p (n) k that the first used match is encountered in the kth draw is given by (see, e.g., [5, equation (7.1), page 47]) (1.1) For n = 365, (1.1) represents the probability that among k persons can be found two persons with the same birthday (in this case the matches correspond to the 365 days of the year).
The distribution (1.1) also plays an important role in the study of characteristics of random mappings (random graphs): we consider a random mapping T of a finite set X = {1, 2,...,n} into itself, assuming that T assigns independently to each x ∈ X its unique image y ∈ X with probability 1/n for all y ∈ X. Denote by G the directed graph with vertex set X, containing an arrow from x to y if and only if T (x) = y.The set of successors of the element x ∈ X in T is defined as S T (x) = x, T (x), T 2 (x),...,T n−1 (x) , (1.2) where T 0 (x) = x and T k (x) = T (T k−1 (x)) for k ≥ 1. Denoting by s T (x) the number of elements in S T (x), the probability that a vertex x has k successors is given by (1.3) (see [8, equation (3.2), page 1047]).
In the next section, we study some basic properties of distribution (1.1) and indicate possible applications.Moments and generating functions are determined in Section 3, using the well-known U -function.In Section 4, we study a very general type of distribution defined by means of directed weighted trees, which generalizes (1.1).Among others, these results yield a recurrence relation for the moments of distribution (1.1).In Section 5, we generalize (1.1), considering the distribution of the number of successors in a random graph with a single attracting center.

Some basic properties
where k 0 is the smallest integer greater than or equal to Multiplying this inequality by Beyond the index k 0 above it is interesting to determine the index k 1 (k 1 ≥ k 0 ) which maximizes the difference This index is an equivalent to a turning point of a distribution in the continuous case.Similar to the above consideration, it can be shown that ∆ k < ∆ k+1 if is positive and that ∆ k > ∆ k+1 if (2.4) is negative.We get the following proposition.
Proposition 2.2.The index k 1 is the smallest integer k such that (2.4) is nonpositive.
For large numbers n, the indices k 0 and k 1 satisfy k 0 ≈ √ n and k 1 < √ 3n (see (2.4)), that is, there is an increasing degree of antisymmetry.If n converges to ∞, p k 0 (and thus all p k ) converges to 0 since for large n, (2.5) The distribution (1.1) might have practical applications.Among others, it could be useful to estimate the size of a population.Suppose, for example, that biologists want to estimate the number of fishes of a certain species in a lake.They could proceed as follows: catching a fish repeatedly, mark it every time and put it back into the water.The number k of fishes (of the considered species) that had to be caught until the first marked fish was encountered represents the test statistics.Let n denote the (unknown) size of the fish population, the observed value k occurs with probability p k (see (1.1)).By means of the maximum likelihood method, the number of fishes can be estimated by the value of n that maximizes p k (k fixed).
Example 2.3.We assume that k = 101; for example, when the 101th fish is caught, a previously marked fish is encountered for the first time.This event occurs with probability if n denotes the size of the fish population.Generally, p (n) k is a unimodal function in n for every fixed k and (2.8) 101 is maximized for n = 5016 which is the estimator for the size of the fish population.
As the above calculations illustrate, the population size is approximately given by n = k 2 /2 for a sufficiently large population.In this case we have (see [8, page 1047]) (2.9) The right-hand side of this relation is maximized for n = k 2 /2 when k is fixed.
The above method represents an alternative to the estimating procedure for a fish population described in [9,Chapter 14].
Further applications of the probability distribution (1.1) are conceivable in quantitative linguistics, in the framework of testing the hypothesis that there is a tendency in language to repeat already used text elements (see [29]).

Moments and generating functions.
In order to express the moments we make use of the well-known confluent hypergeometric function U(r ; r + n; x) defined by (see, e.g., [19, page 116] and [28, page 41]) where denotes the rising factorial.Computation of this function is easily done by using the software Mathematica.For the mth rising factorial moment (m = 0, 1,...), we get For m = 0, we obtain a result which is not readily available in the literature on special functions.For m = 1, we get the mean Alternatively, where Γ (α, x) is the incomplete gamma function [28, page 97].An alternative exact and asymptotic result for a general case, for the expected number of repetitions necessary for one of the alternatives to occur a certain number of times, was given by Klamkin and Newman [17].Thus, we get another result not easily available.
The dependence of the mean E(X) on n is illustrated in Figure 3.1.It may be noted that for n = 365, E(X) = 24, 6166.
The variance of X is given by The moments about the origin can be calculated using the relation (see [16, page 6]) where S(n, k) denote the Stirling numbers of the fourth kind.We obtain the r th moments of X as a linear combination of the polynomial U(−; −; n): (3.9) Since U(−; −; n) can be expressed as a linear combination of incomplete gamma functions, so is µ r in a complicated manner.Dwass [4] considered higher asymptotic moments of X. Holst [10] commented that moments of order statistics from the gamma distribution can be used to get higher moments.
The central moments µ r are obtained from the relation Finally, the probability generating function P (s) and the moment generating function M X (t) can be expressed as respectively, and the characteristic function is obtained from (3.12), substituting it for t, where i = √ −1.

A general type of distribution.
Let T be a directed tree with the root 0 such that all arcs are leaving from the root, that is, 0 is the unique source and δ − (v) = 1 for every vertex v ≠ 0 (δ − (v) denotes the indegree of v, that is, the number of arcs entering into v).We associate a positive weight with all arcs of T , such that the sum of the weights of all arcs leaving a vertex is 1.Furthermore, we associate a weight w(v) with every vertex v of T such that w(0) = 1 and for every vertex v ≠ 0, we define w(v) as the product of the weights of the arcs on the unique path from 0 to v.An example of a directed weighted tree is shown in Figure 4.1, where the vertex weights are only indicated when the vertex is a leaf (represented by shaded circles).
Proposition 4.1.For every directed weighted tree of the above type, the weights of the leaves sum up to 1.
Then result can be easily proved by induction.Obviously, every tree of the above type defines a discrete probability distribution when the weights of the leaves are interpreted as the probabilities.We consider now the tree in Figure 4.2 with the vertices 0, 1,...,n, 2 ,..., n+1 , where i , i = 2,...,n+ 1, represent the leaves.
The weights of the leaves (probabilities) are which obviously sum up to 1.For c i = (n − i)/n, i = 1,...n − 1, we obtain the probability distribution (1.1) as a special case.In this case a vertical (horizontal) arc represents the selection of an unused (used) match; the vertex i can be interpreted as the stage of the system with i used matches in the box, while i represents the encounter of a used match for the first time in the ith draw.
A tree representing a probability distribution.
The moments about the origin of distribution (4.1) can be expressed as Using (4.2), we finally state some interrelations between the moments about the origin of distribution (1.1).Thus, we obtain Alternatively the µ r can be written as (see (1.1)) Combining (4.5) and (4.6) yields the following recurrence relations for T r and µ r , respectively: For small values of r , (4.7) gives where T 0 can be computed as (see Section 3) (4.9)

Generalization of the distribution by means of an attracting center.
We now generalize the random mapping considered in the introduction, assuming that the vertex 1 is an attracting center, that is, T assigns independently to each x ∈ X the image 1 with probability q and the image y with probability p = (1 − q)/(n − 1) for y = 2,...,n (see, e.g., [14, page 191]; related models of random graphs are studied in [2,15]).The probability that the attracting center 1 has k successors is given by we get ( We now determine the probability P (s(x) = k), where x ∈ {2,...,n}, that is, x is not the attracting center.Two cases have to be distinguished depending on whether the attracting center is contained in the successor set or not.In the first case we get, assuming that T i (x) = 1 for a fixed i ∈ {1,...,k− 1}, (5.4) The probability that x has k successors different from 1 is given by (5.5) Finally, we get, for x ≠ 1, ( Rewriting the last term, we get for k = 1,...,n. (5.7) Obviously the two distributions in (5.3) and (5.7) result in p (n) k+1 (see (1.1)) for q = p = 1/n.
The dependence of the distributions (5.3) and (5.7) on the choice of q is illustrated in Figures 5.1 and 5.2 for n = 100.It is observed that (5.7) has the same unimodal form for all q (0 < q < 1), while (5.3) is unimodal for q < 0, 1 and 1/2-modal for q ≥ 0, 1.
The results of this section might be interesting in random-number generation.It may be assumed that in the generation of random numbers a sequence x, T (x), T 2 (x),... is constructed (see, e.g., [6] and [18, Section 3.1]), where x ∈ X = {1, 2,...,n} and T is a mapping selected at random from all the n n mappings from X into itself.
If all the n n mappings are chosen with equal probability, p (n) k+1 gives the probability that the process begins to cycle after k iterations, that is, that the numbers x,T (x),...,T k−1 (x) are pairwise different while T k (x) = T i (x) for some i ∈ {0, 1,...,k− 1}.Formulas (5.3) and (5.7) give the probability of cycling after k iterations, when mappings T that assign frequently the image 1 are preferred.

Call for Papers
Thinking about nonlinearity in engineering areas, up to the 70s, was focused on intentionally built nonlinear parts in order to improve the operational characteristics of a device or system.Keying, saturation, hysteretic phenomena, and dead zones were added to existing devices increasing their behavior diversity and precision.In this context, an intrinsic nonlinearity was treated just as a linear approximation, around equilibrium points.
Inspired on the rediscovering of the richness of nonlinear and chaotic phenomena, engineers started using analytical tools from "Qualitative Theory of Differential Equations," allowing more precise analysis and synthesis, in order to produce new vital products and services.Bifurcation theory, dynamical systems and chaos started to be part of the mandatory set of tools for design engineers.
This proposed special edition of the Mathematical Problems in Engineering aims to provide a picture of the importance of the bifurcation theory, relating it with nonlinear and chaotic dynamics for natural and engineered systems.Ideas of how this dynamics can be captured through precisely tailored real and numerical experiments and understanding by the combination of specific tools that associate dynamical system theory and geometric tools in a very clever, sophisticated, and at the same time simple and unique analytical environment are the subject of this issue, allowing new methods to design high-precision devices and equipment.
Authors should follow the Mathematical Problems in Engineering manuscript format described at http://www .hindawi.com/journals/mpe/.Prospective authors should submit an electronic copy of their complete manuscript through the journal Manuscript Tracking System at http:// mts.hindawi.com/according to the following timetable:

Proposition 2 . 1 .
of the distribution.The following proposition proves that the distribution in (1.1) is unimodal.(To simplify the notation, we write p k instead of p (n) k in the following.)The probabilities p k satisfy

Figure 3 . 1 .
Figure 3.1.The dependence of the mean E(X) on n.

First
Round of ReviewsMarch 1, 2009 Figure 4.1.Example of a directed weighted tree.