The Concepts of Pseudo Compound Poisson and Partition Representations in Discrete Probability

The mathematical/statistical concepts of pseudo compound Poisson and partition representations in discrete probability are reviewed and clarified. A combinatorial interpretation of the convolution of geometric distributions in terms of a variant of Newton’s identities is obtained. The practical use of the twofold convolution leads to an improved goodness-of-fit for a data set from automobile insurance that was up to now not fitted satisfactorily.


Introduction
Consider the class of discrete arithmetic random variables  with probability generating function (pgf) () = ∑ ≥0 () ⋅   and nonvanishing zero probability (0) > 0. Suppose that the pgf can be written as () = (0) ⋅ exp{()} for some generating function () = ∑ ≥1 () ⋅   .The pseudo compound Poisson representation concerns the duality between () and (), which is best expressed in terms of the identity   () =   () ⋅ () or equivalently the recurrence relations As a consequence, it has been shown that  is infinitely divisible if and only if one has () ≥ 0, ∀ ≥ 1 (e.g., [1]; [2, page 83]).Around the same time Feller [3, page 290] shows that  is infinitely divisible if and only if it is a compound Poisson random variable with parameter  = −ln(0) and severity probabilities () = ()/.If  is not infinitely divisible, then (1) still holds for some (),  ≥ 1, with at least one negative value, and this property motivates the naming pseudo compound Poisson representation.Section 1 summarizes its main properties.The power series identity () = (0) ⋅ exp{()} leads also to the more complex and less known expression where   = { = ( 1 , . . .,   ) | ∑  =1   = } is the set of free partitions of weight , which is called partition representation.Although (2) has been applied in Hürlimann [4,Theorem 3.2], to derive an existence criterion for the construction of confidence bounds for discrete sampling distributions, expression (2) is not stated correctly and a proof of it is missing.Section 3 fills in these gaps and provides a combinatorial interpretation of the partition representation.
As a new illustration, Section 4 considers the convolution of  geometric random variables, whose pseudo compound Poisson representation is specified by the th power sum polynomial () = () = ∑  =1    ,  ≥ 1.The partition representation (2) identifies with the (0) multiple of the th complete symmetric function ℎ() in the variables   .The recursion (1) is equivalent to a variant of Newton's identities, which motivates us to call this convolution Newton distribution.Applied to estimation theory, we show that this distribution satisfies Gauss's principle (the maximum likelihood estimator of the mean is the sample mean) and construct a parameter vector orthogonal to the mean.In the two-parameter case, we derive the maximum likelihood equations and illustrate their use in Section 5 at a specific data set from automobile insurance.Through regrouping of classes, we show that the Newton distribution beats in 2 Journal of Probability and Statistics goodness-of-fit both the negative binomial and the Poisson inverse Gaussian and at the same time improves the unsatisfactory fit obtained in a previous case study.

Characterization through Pseudo Compound Poisson Representation
Recall the pseudo compound Poisson representation in discrete probability theory from Hürlimann [5,6].Let  be a discrete arithmetic random variable defined on the natural numbers with probabilities () = Pr( = ),  ≥ 0, such that (0) > 0.Besides the probability generating function (pgf) () = ∑ ≥0 () ⋅   one considers the cumulant pgf defined by Its name is motivated by the series expansion of the cumulant generating function (cgf)

Proposition 1 (pseudo compound Poisson representation).
Let  be a discrete arithmetic random variable with  () such that (0) > 0 and set  = −ln(0).Then the probabilities () satisfy Panjer's recursion and the following pseudo compound Poisson representation  ∼ (, ) holds: Proof.The recursion ( 5) is an immediate consequence of (3).Indeed, the derivative of the equation () = (0) ⋅ exp{()} satisfies the relation   () =   () ⋅ (), which is equivalent to identities (5).For further details consult Hürlimann [ [6] for a more general characterization).Otherwise, one says that  has a pseudo compound Poisson distribution.In the terminology of Sundt [8], it belongs to the class  ∞ [0, ] with () = () (see also [9]).The theoretical and practical usefulness of pseudo compound Poisson distributions have been demonstrated by the author in numerous publications.Sometimes, the sequence {()} defined by ( 5) is called the De Pril transform of {()} after De Pril [10] (e.g., [11]).The interest of recurrence relation (5) extends beyond discrete probability to the general context of integer sequences.Let Id be the identity map such that Id() =  for all nonnegative integers.Then, the convolution equation Id⋅ =  *  for integer sequences {()} and {()} occurs in many areas of mathematics.An important problem is the relationship between the asymptotic behaviors of the two sequences {()} and {()} (e.g., [12] and its references).
The converse of Proposition 1, although used in applications (e.g., [6,[13][14][15]), has been less studied.In case of negative ()'s a necessary condition on the cumulant pgf such that (5) defines a true probability distribution has been first identified by Lévy.

Partition Representation and Combinatorial Interpretation
Besides the pseudo compound Poisson representation the power series identity () = (0) ⋅ exp{()} has another important immediate consequence.
Proof.This result follows from the calculation which implies identities (7).
Remarks 3.Although variants existed before (e.g., [22,Section 2]; [23, Equation (3.4)]), the general partition representation (7) for pseudo compound Poisson arithmetic distributions has been first stated in Corollary 3.1 of Hürlimann [4].Unfortunately, the given formula contains a misprint: the multiplicity factorial   ! has been wrongly replaced by !.Note that this mistake remains without influence on the validity of Theorem 3.2 derived from this representation.However, the claimed but missing proof by induction is best replaced by the present proof.The simple power series manipulation has already been used by Macdonald [24] in his proof on page 25 of a similar but more special result in the theory of symmetric functions (see later Section 4.3).Section I.1 of the mentioned textbook is recommended reading for definitions and properties around partitions (in particular weight, length, and multiplicity of a partition).
The partition representation has a nice combinatorial interpretation.For convenience, set ℎ() = (),  ≥ 1.Then, the expression derived from (7) identifies with the cycle index polynomial (or cycle indicator) of the symmetric group of order  in the variables ℎ() (e.g., [24, Example 9.(a), page 29]).Indeed, the coefficient in ( 9) of an arbitrary monomial ℎ(1)  1 ⋅ ℎ(2)  2 ⋅ . . .⋅ ℎ()   is equal to the fraction of all permutations that have  1 fixed points,  2 cycles of length 2, . .., and   cycles of length .We note that this combinatorial interpretation is crucial to the novel example studied in the next two sections.

The Convolution of Geometric Random Variables: A Newton Type Distribution
To illustrate further the discrete probability concepts of Sections 2 and 3, consider the power sum like specification of the cumulant pgf in (3); namely, where  = ( 1 ,  2 , . . .,   ), 0 <   < 1,   ̸ =   , ,  = 1, . . ., , are  unknown parameters.A calculation shows that the quantity  = ∑ ≥1 () = − ∑  =1 ln(1 −   ) > 0 is finite.The condition (C1) of Proposition 2 is fulfilled and therefore (10) belongs to the random variable  of some compound Poisson distribution, to be determined.Clearly, one has from which one obtains which is the pgf of the convolution of  geometric random variables; hence Consider the partition representation of Proposition 3. One observes that () = () = ∑  =1    is nothing else other than the th power sum polynomial in the variables   ,  = 1, . . ., .Identity ( 9) is herewith equal to and shows with Macdonald [24, Equation (2.14  ), page 25] that the th probability of , namely, () = (0) ⋅ ℎ(), is the (0) multiple of the th complete symmetric function ℎ() in the variables   ,  = 1, . . ., .In this combinatorial context the Panjer recursion ( 5) is equivalent to the identities which is nothing else other than a variant of Newton's identities (see [24,Equations (2.11) and (2.11  ), page 23]).Therefore, the distribution ( 14) of the convolution of geometric random variables derived from the recursion (15) could also be called Newton distribution.The distribution corresponding to the random variable ( 13) is abbreviated by  ∼   ( 1 , . . .,   ).Now, we apply our concepts to study some estimation properties of the Newton distribution.In a first step, we construct a parameter vector  = ( 2 ,  3 , . . .,   ) orthogonal to the mean In general, given a random variable  we suppose that the mean  is functionally independent of a certain parameter vector  = (), that is, /  = 0,  = 2, . . ., , and denote the log-likelihood of  by ℓ(; , ). ]): where (),  ≥ 1, is determined by the pseudo compound Poisson representation of .Applied to the Newton distribution, one sees after some calculation that (17) with  =  1 ,  ,1 =   / 1 ,  = 2, . . ., , is fulfilled provided the partial differential equation can be solved.It is easy to see that this will be fulfilled provided one has A solution to these partial differential equations is It follows that the mean  is orthogonal to the parameter vector  = () = ( 2 / 1 , . . .,   / 1 ).
Next, we derive the maximum likelihood equations for the two-parameter Newton distribution  2 (, ) obtained from the convolution  = Geom(1 − ) ⊕ Geom(1 − ) of two geometric distributions with parameters 0 < ,  < 1,  ̸ = .The probabilities in the above orthogonal parameterization (, ) = (, /) are given by For each  ≥ 0 let   be the number of observations of the random variable  equal to , and let  = ∑  =0   be the total number of observations, where   = 0 for all  > .From (21) one obtains the scaled log-likelihood function The maximum likelihood equations  −1 ℓ  (; , ) =  −1 ℓ  (; , ) = 0 are given by This system of two nonlinear equations in the two variables (, ) can be solved with substitution setting  = /(1−).Insert  = /(1 + ) in the first equation to see that The only feasible solution of this quadratic equation takes the minus sign and is given by Using that  =  and inserting (25) into the second equation of (23) one sees that the latter depends besides the observations only upon  and is determined by The preceding simple derivation of maximum likelihood properties is possible in virtue of the pseudo compound Poisson representation (mainly (17)).Let us also illustrate the mathematical benefit of the partition representation.For this we rely on Theorem 3.2 in Hürlimann [4], which guarantees the existence of well-defined confidence bounds for the mean of a count distribution provided the following regularity assumption is fulfilled.

A Numerical Example from Automobile Insurance
The choice of an appropriate claim number model in automobile insurance is a prominent actuarial research topic whose last 50-year-old story begins with Bichsel [25], who used the negative binomial (NB) distribution to construct a bonus-malus system.Among the other first good choices, the Poisson inverse Gaussian (PIG) has often been advocated (see Example 3.4 in [4] and discussions).Due to the penalty in the SBC (Schwartz Bayesian criterion) score the best overall fit is in any case obtained for a two-parameter distribution (see [4, end of Section 4]).In this context, we ask whether the Newton distribution might compete with the NB and PIG.For a specific example and in a very precise sense (regrouping of classes), it beats both of them and at the same time improves the unsatisfactory fitting of data set 4 in Table 4.5 of Hürlimann [4].Since the method of moments is inappropriate, as shown by Gossiaux and Lemaire [26], we use the maximum likelihood estimation (MLE) method, which is best in view of its asymptotic properties.The goodness-of-fit is established on the basis of both the SBC score and the  value of the chisquare statistic with appropriate regrouping of the classes.The improved results are found in Tables 1 and 2.Although the minimum SBC score is attained for the PIG the chi-square and  value (with the last two regrouped classes) are best for the Newton distribution.The decrease in goodness-of-fit of the PIG is due to the loose in fit by regrouping the last two classes and the deficiency in fitting the class with two observed insurance claims.
Let us conclude the present work.Our aim was to give a systematic brief overview of the concepts of pseudo compound Poisson and partition representations in discrete probability and present a new application.Through the detailed study of a single new example, we have exemplified some typical features and further potential of our methodology.
The mean  is orthogonal to the parameter vector , denoted by  ⊥ , if one has [( 2 ℓ/  )] = [(ℓ/)(ℓ/  )] = 0,  = 2, . . ., .Further, let  be the class of all discrete arithmetic distributions for which the maximum likelihood estimator of the mean is the sample mean, that is, such that μ =  (Gauss's principle).The subclass ⊥ of , called mean orthogonal class, consists of all those distributions, which besides Gauss's principle satisfy the mean orthogonal property  ⊥ .It is known that the class  ⊥ is closed under convolution (see [15, Theorem 2.2]) and can be characterized as follows.Suppose there exist a parameter  and a one-toone coordinate transformation mapping (, ) to (, ), and set  = −ln Pr( = 0).Then  ∈  ⊥ with  = (, ) ⊥  is equivalent to the following condition (see [15, Lemma

Table 1 :
Data set and MLE fit with the last two regrouped classes.

Table 2 :
Goodness-of-fit (chi-square and  value with the last two regrouped classes).