Sum of Bernoulli Mixtures : Beyond Conditional Independence

We consider the distribution of the sum of Bernoulli mixtures under a general dependence structure. The level of dependence is measured in terms of a limiting conditional correlation between two of the Bernoulli random variables. The conditioning event is that the mixing random variable is larger than a threshold and the limit is with respect to the threshold tending to one. The large-sample distribution of the empirical frequency and its use in approximating the risk measures, value at risk and conditional tail expectation, are presented for a new class of models which we call double mixtures. Several illustrative examples with a Beta mixing distribution, are given. As well, some data from the area of credit risk are fit with the models, and comparisons are made between the new models and also the classical Beta-binomial model.


Introduction
Since the U.S. subprime mortgage crisis which led the world economy into the global recession in the late 2000s, there has been serious criticism about the use of inappropriate dependent default models for portfolio credit risk (see Donnelly and Embrechts [1]).Specifically, the Gaussian copula model (Basel II Accord [2] and Li [3]) had become the industry standard model and has been widely used for both the pricing of portfolio credit derivatives such as collateralized debt obligations (CDOs) or mortgage backed securities and the credit risk management.Its inability to incorporate extreme tail dependence, however, resulted in a serious lack of default clustering, especially under stressful economic situations.
In the literature, several alternative approaches were proposed to cope with the issue of an insufficient level of dependency among defaults.As a simple extension of the Gaussian copula model, Andersen and Sidenius [4] and Burtschell et al. [5] considered stochastic asset correlation models.A more popular way of incorporating further dependence beyond the Gaussian copula model is the use of heavy tail copulas such as Archimedean copulas.Amongst many others, we mention Burtschell et al. [6] and Schönbucher and Schubert [7] for applications of copula models under the structural and the intensity-based credit risk model framework, respectively.
It is important to note that factor copula models under the typical conditional independence assumption have a close link with Bernoulli mixture models.Specifically, by de Finettis theorem, there exists a common mixing random variable on which the default indicator random variables, in an exchangeable credit portfolio, are dependent.The mixing random variable is essentially the random probability of default which can be represented as a function of the common systematic factor in a factor copula model.Under the typical conditional independence assumption, however, the level of dependence between the default indicator random variables is controlled solely by the distributional property of the systematic factor, which can impose limitations for some applications where extreme levels of dependence are required.See Bluhm et al. [8], Cousin and Laurent [9], Frey and McNeil [10,11], KMV [12], McNeil et al. [13], McNeil and Wendin [14], Moraux [15], and RiskMetrics Group [16] for more references on credit risk applications of the Bernoulli mixture models under the conditional independence modeling framework.
Formally, a Bernoulli mixture model is defined as follows.Let (  )  =1 be a sequence of  identically distributed indicator random variables.We assume that each   ,  = 1, . . ., , follows the Bernoulli distribution with the common default probability .The probability  is randomly drawn from 2 Journal of Probability and Statistics a distribution with cumulative distribution function (cdf)   (),  ∈ (0, 1).At this point, the dependency structure for (  )  =1 is still arbitrary.In credit applications, the statistical properties of the sum of the  Bernoulli random variables are the main interest, and the following assumption plays an important role in alleviating computational difficulties: {  } are conditionally independent given .In this case, given , the sum   := ∑  =1   follows the binomial distribution with mean  and variance (1 − ).Then, the unconditional probability mass function (pmf) is given as follows: where the superscript "" on  indicates the assumed conditional independence.The most commonly used mixing distribution in modeling the credit risk of a homogeneous portfolio such as a mortgage or credit card portfolio is the Beta distribution.This model is referred to as the Betabinomial (-I) model.The present paper is concerned with the level of dependence incorporated in Bernoulli mixture models, especially under stress situations.Together with the asset correlation, the default correlation is used as the standard measure of dependence in portfolio credit risk models.In particular, we consider the behavior of the correlation between two arbitrary terms, as the random default probability  is conditioned to be larger than , tending to 1. Specifically, we define the conditional default correlation between two terms,   and   , given  > , 0 <  < 1, as follows (Bae and Iscoe [17] used an analogous definition to study the asset correlations under stress for various single-factor credit risk models-some dynamic-while Kalkbrener and Packham [18] also studied asset correlations under stress, for static, normal variance mixture models.In both studies, in place of the conditioning on { > }, conditioning is done on an auxiliary risk factor which, for example, is related to a systematic factor): where (3) Then, the limit of the conditional correlation as  tends to 1 (referred to as the limiting correlation) is defined as provided the limit exists.
It is intuitive to expect that the correlations in (2) increase monotonically as the threshold  approaches 1.Unfortunately, this is not the case for the Beta-binomial model.In fact, the conditional correlation always converges to zero regardless of the mixing distribution, as long as conditional independence is postulated (Corollary 2).This type of model failure can introduce a significant bias in the measurement and management of risk in stress situations.
The objectives of the present paper are threefold: (i) to derive the relationship between the limiting correlation and the probability structure; (ii) to construct general Bernoulli mixture models with nontrivial limiting correlations; (iii) to demonstrate implications of the constructed models, for tail risk measures such as value at risk and conditional tail expectation.
The theoretical results for these objectives are presented in Section 2. We first examine the limiting behavior of correlations for Bernoulli mixture models in Section 2.1.Then, Section 2.2 introduces a general model frameworkthe class of double mixture models-which allows further positive dependence between entities, beyond conditional independence (Theorem 4).Section 2.3 is devoted to the third objective, and it considers the large-sample distribution of the sum of Bernoulli mixtures under the general model framework and its application to the approximation of risk measures is discussed.As a specific case, Section 3 considers a Beta Bernoulli mixture, and Section 4 provides the results for an example with specific parameter values.Section 5 demonstrates the fitting results of the double mixture models to a dataset from the area of credit risk.Finally, Section 6 concludes this paper with a summary.Technical details of some proofs are given in the appendices.
The following hypothesis on the properties of   in a deleted neighbourhood of  = 1 will be part of the assumptions in Theorem 1 and its Corollary 2 below.Hypothesis H.   is absolutely continuous in a neighbourhood of  = 1, with probability density function (pdf)   thereon, such that (i)   () > 0; (ii)   is continuous in a deleted neighbourhood of  = 1; (iii) (1 − )  ()/(1 −   ()) converges to a finite limit as  tends to 1.
The significance of (i) is clear: it is required for the conditioning on { > } to be well defined.Parts (ii) and (iii) are technical assumptions which play a role in the proof of Theorem 1. Part (iii) of Hypothesis  is satisfied in many cases of practical interest, where   has a simple asymptotic behaviour near  = 1.For example, (iii) is satisfied if for some positive (finite) constant  and some (finite) constants  > −1 and  (the relation " ∼ " denotes asymptotic equivalence, in the sense that the ratio / tends to 1, as  → 1) Here is the verification that (6) implies that (iii) holds, in the case that  ̸ = 0. (The case  = 0 is similar but easier, so the details will be omitted.)Applying (6) in the first step and L'Hospital's rule in the second step, as  tends to 1, we have so (iii) is satisfied with the value of the limit being  + 1.It will be clear that the proof of Theorem 1 is easily generalized to allow the limit in (iii) to be infinite.However, it is not apparent, given the previous example, whether such a situation can actually occur.For this reason, we have taken the limit to be finite in the formulation of (iii).
Based on the conditional joint probability  2 (), we obtain the following result for the limiting correlation.Theorem 1.Let   and   be two Bernoulli mixture random variables with correlation,  , (), as in (2).Suppose that Hypothesis  holds.One further assumes that lim  → 1  2 () = 1,  2 () is differentiable for  in a deleted neighbourhood of 1, and lim  → 1   2 () exists.Then, the limiting correlation in (4) exists and satisfies Proof.See the appendices.
In the context of a sequence (  ) of Bernoulli mixture random variables, the notation, " 2 , " suppresses the dependence on  and ; more precisely, it should be written as  2;, .In order to have a result which does not actually depend on  or , an extra hypothesis on the Bernoulli mixtures must be imposed.We will return to this point in Section 2.2.
An interesting example is the case that both terms take on identical values (comonotonicity) (here, comonotonicity refers to the case that all the components (identically distributed) of a Bernoulli random vector coincide in value with probability one, conditional on ); that is,  2 () = .Under Hypothesis , Theorem 1 implies that the limiting correlation for this case is 1.The following corollary, another important example, states that the limiting correlation between two arbitrary terms from an identically distributed Bernoulli mixture sequence converges to zero, under the assumption of conditional independence.
Corollary 2. Suppose that two Bernoulli mixture random variables   and   are conditionally independent given the common default probability .Then, under Hypothesis , the limiting correlation in (4) exists and satisfies Proof.Under the conditional independence assumption,  2 () =  2 .The desired result follows immediately from Theorem 1: Remark 3. Results similar to those of Theorem 1 and Corollary 2 can also be given for the other extreme, conditional on  <  with  tending to 0. One simply replaces, in Hypothesis  and Theorem 1, the phrase "neighbourhood of  = 1" with "neighbourhood of  = 0" and (iii) with "  ()/  () converges to a finite limit as  tends to 0;" then, in Theorem 1 and Corollary 2, one replaces "lim  → 1 " with "lim  → 0 ." (Note that lim ) However, in our applications, it is the case " → 1" which is important, so we will not dwell further on the case " → 0. "

Double Mixture Models.
We now turn to the construction of a more general probability structure beyond conditional independence-a structure which reveals the fundamental role of the quantities,   .We require a stronger assumption on the sequence (  )  =1 than their being just conditionally identically distributed, which is a statement about the 1dimensional marginals.We require that (  )  =1 be conditionally exchangeable, meaning that, under each Pr[⋅ |  = ], all permutations of ( 1 ,  2 , . . .,   ) have a common joint distribution which may depend on .(Note that a sequence of  conditionally iid random variables is conditionally exchangeable, as is a comonotonic sequence.)In particular, conditional exchangeability is sufficient to guarantee that  2 (),   (), and hence the limiting correlation in Theorem 1, are independent of  and .
We assume, for the remainder of the paper, that our sequence of  Bernoulli mixture random variables is conditionally exchangeable.The quantities   (), 1 ≤  ≤ , then do not depend on the choice of subset of  variables.The probability that any  of the Bernoulli random variables take on the value 1 is given in the following theorem which shows that the pmf of   is essentially determined by the quantities   (),  = 1, . . ., , which in turn can be specified quite generally as input to the model.
Remark 5.For a general characterization of multivariate Bernoulli random variables, see Sharakhmetov and Ibragimov [19].With the exchangeability assumption, ( 12) can be deduced from Theorem 3 of the aforementioned paper.
Remark 7. The converse part of the theorem is connected with the easy part of the classical Hausdorff moment problem and its application to the proof of de Finetti's theorem for sequences of exchangeable random variables (see, e.g., Chapter VII in Feller [20]).
In concrete examples, we will employ some meaningful, worded description or an acronym, rather than through the formal notation of Definition 8.In addition, for the remainder of the paper, all considered models will be Bernoulli (or binomial) double mixture models; therefore, we may omit those words from the names of models, simply retaining the description of the building blocks, ](⋅; ) and   .
The following simple example illustrates the converse of the theorem.
Example 9 (ICM).We consider a weighted average of independence and comonotonicity.Specifically, for a specified weight  ∈ [0, 1], the random -vector  ≡ (  )  =1 is a twopoint mixture of a conditionally comonotonic -vector of Bernoulli mixtures and a conditionally independent -vector of Bernoulli mixtures (with common mixing distribution,   ), with respective weights  and 1 − .For such a model, the (conditional) joint probability,   (),  = 2, . . ., , is expressed as the weighted average of the joint probability of the comonotonic case and that of independence case: In practice, the weight parameter  ∈ [0, 1] must be estimated from data or be chosen exogenously.This model (with   unspecified) will be denoted by the acronym ICM standing for independent comonotonic mixture (with everything being implicitly conditional).It corresponds to the following choice for ](⋅; ), which is a double mixture of point masses: Note that ∫ 1 0  0 ](; ) = 1 follows from the usual understanding that 0 0 = 1.
Then, the limiting correlation is (cf.( 4)) and the unconditional pmf of the sum,   () := Pr[  = ], is with    () given in (1).The mean and variance of the sum   can be easily obtained as follows: Note that both the mean and variance are the weighted averages of those for the independent and comonotone cases.(In general, the th moment is the weighted average of the th moment of the independent and comonotone cases.)

Large-Sample Distribution of Empirical
Frequency.The proportion of observed defaults to the total number of entities in a credit portfolio is of interest in practice.For example, the historical default rate for a homogenous credit portfolio is used to estimate the probability of default for a generic counterparty in the portfolio.For a fixed , let   =   / denote the empirical frequency, which can be considered as the percentage gross loss of a portfolio of  loans in equal dollar amounts.
The probability distribution of   is For the general binomial double mixture model, based on a family of probability measures, ](⋅; ), and mixing distribution,   , we have the following result: Here is the proof of ( 22).The case  = 1 is trivial.For 0 ≤  < 1, first note that by ( 14) where, for a nonnegative number , [] represents the greatest integer less than or equal to .The integrand is the cdf, at , of    / where    ∼ Bin(, ); so, by the LLN, as  → ∞,    / → , a.s.Therefore, for 0 <  < 1, where 1  denotes the indicator function of a set, .(The second term, with the factor "1/2, " comes from an application of the CLT to Pr[   −  ≤ 0], when  = .)For  = 0, Integrating these two limiting results with respect to ](; )  () yields the first two cases of (22).The interchange of limit and integrals is justified by the bounded convergence theorem.
Example 10 (ICM reprise).For the -weighted average of conditionally comonotonic and conditionally independent Bernoulli mixtures described in Example 9, we can easily obtain the following limiting distribution (this is the weighted average of the obvious limit for the conditionally comonotonic component and the convergence result in Vasicek [21] for the conditionally independent component of the mixture): Then, the mean and variance of the limiting distribution are Var In general, the th moment of the limiting distribution is The limiting result (27) can be applied to the area of financial credit risk management.The tail risk measures of a large-sample portfolio credit loss distribution are of particular interest for financial risk managers.The two wellknown risk measures, value at risk (VaR) and conditional tail expectation (CTE), can be approximated by using the limiting distribution, (27).Formally, for a loss random variable  with the cumulative distribution function (), VaR and CTE at confidence level  ∈ [0, 1] are defined as follows: Note that this definition reduces to if  is a continuous random variable (see Hardy [22] or Acerbi and Tasche [23]).Then, by the scaling property of these risk measures and the limiting result (27), for a large , the VaR of the sum of  Bernoulli mixtures at level  can be approximated as follows: where Similarly, the CTE at level  can be approximated as (33)

Beta Mixing Distribution
We now specialize the results of the preceding section to the case where the mixing distribution,   , is the Beta distribution with parameters  > 0 and  > 0. Specifically, the density   () is given by where Γ(⋅) and (⋅, ⋅) are the Gamma and Beta functions, respectively.For a Beta Bernoulli mixture, several computationally convenient expressions are available.We first introduce a notation which is convenient for the cumulative distribution function: where 1), the pmf of the sum under the conditional independence assumption, can be evaluated as The th moments for  and (1 − ) in this case are The following lists several results for the ICM model with the Beta mixing distribution (-ICM model).
Example 11 (-ICM).The pmf of the binomial double mixture is given from (20) as ( The limiting distribution of the empirical frequency, (27), is Since the inverse cumulative Beta distribution function does not admit a closed-form expression, a numerical method is required to approximate the VaR of a large-sample credit portfolio.Given the VaR at level , the CTE can be written in a concise form.Specifically, where  * := { − (/( + ))}/(1 − ), and where As can be seen from ( 27), the limiting distribution of the empirical frequency under the -ICM model has point masses at both end points, 0 and 1 (when  ̸ = 0).This may restrict the use of a certain parameter estimation method in the case that all observations are strictly inside the distributional range, which is often the case.In the following example, we consider a Beta distribution for the measures ](⋅; ), 0 <  < 1, as well as a Beta mixing distribution for   .We call the model a double Beta model with acronym -.The limiting empirical frequency for the - model does not have point masses at the boundary points, 0 and 1.
Example 12 (-).In this example, the choice for ](⋅; ) is the Beta distribution with the two shape parameters  and (1 − ): where parameter  satisfies 0 <  < ∞.Then, using the recursive property of the Beta function (( + 1, ) = [/( + )](, )), with the usual convention that a product over an empty set of indices equals 1.In particular, and thus the limiting correlation for this model is, by Theorem 1, (Note that this is identical to the result for an ICM model, with  = 1/(1 + ); cf.(19) in Example 9.) The pmf of the binomial double mixture is given, by Theorem 4, as By (22), the limiting distribution of the empirical frequency is absolutely continuous, and its pdf is Since (47) does not admit a closed-form expression, a numerical method is required to calculate the approximate VaR and CTE of a large-sample credit portfolio.

𝛽-ICM Model.
In Figure 1, we plot the cumulative distribution functions of the -ICM binomial double mixture,  20 , for several different values of  (the limiting correlation).The plot shows that both the left and right tails get thicker as the level of limiting correlation, , increases to 1.
In Figures 2 and 3, we display the approximate risk measures, VaR and CTE, for a scale of confidence levels  ∈ {0.8, 0.85, 0.90, 0.95, 0.99} and by the level of the limiting correlation, .The results show that both VaR  and CTE  are increasing in the level of limiting correlation, for  ≥ 0.90.This is not the case for VaR  when the confidence level is less than 1 − [] = /( + ), because is decreasing in  when  < /( + ).
More specifically, Table 1 compares the approximated VaR and CTE at the level  = 0.95.

𝛽-𝛽 Model.
Recall that the limiting correlation of the - model is 1/(1 + ).For the purpose of comparison with the -ICM model, we reparametrize the - mixture model as follows:  = (1 − )/, such that the limiting correlation becomes .correlation, .The approximate risk measures are smaller than those of the -ICM model for each confidence level and limiting correlation; compare, for example, Figures 2 and 3.This indicates that the right tail of the - model is thinner than that of the -ICM model for the same value of .

Models Fit to Real Data
In addition to the previous examples with prescribed parameter values, we illustrate the two binomial double mixture models with real data: Bloomberg mortgage delinquency rate index.U.S. residential mortgage loans are segmented into three buckets, Prime, Alt-A, and Subprime, based upon  For comparison, we also estimate the two shape parameters of the Beta distribution-the limiting distribution of the classical Beta-binomial (-I) model-by matching the first and second sample moments with the theoretical ones, using (28) and (37).Table 2 gives the parameter estimates.The table shows that the MMEs for the two shape parameters of the Beta mixing distribution of the general model ( ̸ = 0) are larger than those of the reduced model ( = 0).As a result, the right tail of the fitted Beta distribution in the general model is slightly thinner than that of the fitted Beta distribution in the reduced model.This is explained as follows: in the general model, the tail thickness in the data is explained by the estimates of both the shape parameters and .On the other hand, the reduced model is less flexible, and the two shape parameter estimates attempt to capture the observed tail.
In order to identify the implications of the two fitted models (with and without ), for the risk measures, we plot approximate VaRs and CTEs of   with  = 10 6 at various confidence levels.(The chosen number of entities is purely nominal since we are using the large-sample results.According to the National Delinquency Survey issued by Mortgage Bankers Association, the average number of conventional subprime mortgage loans during 2011 is over four million.) Figure 8 shows that the risk measures calculated under the conditional independence (-I) model are larger than those of the more general model, the -ICM model (17), at confidence levels up to 0.996 and 0.945 for VaR and CTE, respectively.Note that the estimated  is small to the extent that the effect is not well recognized in VaR until the confidence level takes an extremely high value.The conditional tail expectation, however, of the more general model becomes larger than that under the conditional independence assumption at confidence levels which are typically used in the calculation of economic capital for a credit portfolio.The nonzero  allows further dependence between entities, and thus this result suggests that the use of the conditional independence model may result in the underestimation of extreme tail risk.

𝛽-𝛽 Model.
Here, we illustrate the - model of Example 12.Note that the large-sample distributions (of the empirical frequency) are continuous, for both the - and -I models, and none of the observations are either 0 or 1.Thus, the maximum likelihood estimation (MLE) method can be applied by using the limiting distribution of the empirical frequency, (47).However, the number of observations, 57 data points, may not be sufficient to yield statistically reliable MLE fitting results.For the sake of statistical precision and model comparison, we reduce the number of parameters in both the - model and the -I model, which corresponds to the case of conditional independence.Specifically, we use the following theoretical relationship between the first two moments of the Beta mixing distribution and the two shape parameters (, ): where  and  2 denote the mean and variance of the Beta mixing distribution, which we estimate with the sample mean and variance,  = 0.2171 and  2 = 0.0103, respectively, of the delinquency rate, .Thus, we use the following form for  in the Beta density:  = ( 0.0103 0.2171 3 )  ( + 0.2171) , to obtain a single-parameter family of Beta distributions. Figure 8: The approximated VaR  and CTE  of   ,  = 10 6 , the number of delinquencies in a month, at confidence levels, 0.9 ≤  ≤ 0.999, based on -ICM and -I models.Table 3 gives the maximum likelihood estimates for the parameter  of the Beta mixing distribution, the limiting correlation parameter  = 1/(1+), and the values of the loglikelihood functions under the - model and the -I model, respectively.
Figure 9 displays the differences in estimates of each of the two risk measures, VaR and CTE, under the - model and the -I model.For illustration, the approximate risk measures are calculated assuming that there are one million entities in the mortgage portfolio.
The result shows that both the VaRs and the CTEs under the - model are larger than those under the -I model, especially at high confidence levels, and the differences increase rapidly as the confidence level tends to 1.
This result demonstrates the role of the limiting correlation parameter  in explaining the tail risk resulting from an interdependency among names within a credit portfolio.

Conclusion
In this paper, we study the conditional correlation and the distribution of the sum of Bernoulli mixtures under a general dependence structure.We show, in particular, under the typical conditional independence assumption, that the conditional correlation between two Bernoulli mixtures converges to 0, given that the mixing random variable is larger than a threshold tending to 1.We propose a method to construct a general dependence structure in the form of a double mixture model, in which the conditional iid assumption is replaced by the more general assumption of conditional exchangeability.As a simple illustration, we consider a weighted average of two cases: conditional independence and comonotonicity, for which the limiting correlation is included as a model parameter.
The large-sample distribution of the empirical frequency and its use in approximating the risk measures VaR and CTE are presented.Several tractable results for two Bernoulli double mixture models with Beta mixing distribution, the -ICM model and the - model, are given as illustrative numerical examples and also applied to real data.
Note that there is a strong demand for a credit risk model with an appropriate level of dependence in a stressed economic environment.The most popular model, the Betabinomial mixture, however, cannot properly explain the empirically observed default correlation under stress.On the other hand, the model framework presented in this paper is simple but flexible enough to accommodate the required level of limiting correlation and thus can be effectively applied to portfolio credit risk models.
Future directions of research include the application of double mixture models to pricing of credit derivatives such as CDOs, on a completely homogeneous pool (e.g., Bae et al. [24]).

Figure 1 :
Figure 1: The cumulative distribution functions of  20 , for a scale of limiting correlations, , under the -ICM model.

Figure 4 Figure 2 :Figure 3 :
Figure 2: The approximated VaRs of  1000 , by confidence level  and the limiting correlation , under the -ICM model.

9 Figure 4 :Figure 5 :
Figure 4: The cumulative distribution functions of  20 , for a scale of limiting correlations, , under the - model.

Figure 6 :
Figure 6: The approximated CTEs of  1000 , by confidence level  and the limiting correlation , under the - model.

Figure 9 :
Figure 9:  The differences between approximated risk measures, under the - model and the -I model, of   ,  = 10 6 , the number of delinquencies in a month for the Alt-A residential mortgage segment with (nominally) one million entities, at confidence levels, 0.9 ≤  ≤ 0.995.

Table 1 :
The approximated VaR and CTE of  1000 at confidence level  = 0.95, under the -ICM model.(thelimitingcorrelation).The plot shows that  20 has a heavier left tail under the - model than under the -ICM model, for each limiting correlation level.Note that the - model does not admit the two extreme cases:  = 0 and  = 1.Figures5 and 6show the approximate risk measures, VaR and CTE, for a scale of confidence levels  ∈ {0.8, 0.85, 0.90, 0.95, 0.99} and by the level of the limiting

Table 2 :
The method of moments estimates of the shape parameters,  and , of Beta mixing distribution and the limiting correlation .

Table 3 :
The maximum likelihood estimates of the model parameters under the - model and the -I model.