Estimation of Log-Linear-Binomial Distribution with Applications

Log-linear-binomial distribution was introduced for describing the behavior of the sum of dependent Bernoulli random variables. The distribution is a generalization of binomial distribution that allows construction of a broad class of distributions. In this paper, we consider the problem of estimating the two parameters of log-linearbinomial distribution by moment and maximum likelihood methods. The distribution is used to fit genetic data and to obtain the sampling distribution of the sign test under dependence among trials.


Introduction
During the last three decades, a growing amount of literature has been observed in generalizing the classical discrete distributions.The main idea was to apply the extended versions of modeling different kinds of dependent count or frequency structure in various fields; see the work of Johnson et al. in 1 , and of Bowman and George in 2 , and of George and Bowman in 3 , Yu and Zelterman 4, 5 .Failure to take account of correlation in the data will cause less precision for binomial-based estimates; see, for example, Kolev and Paiva 6 .
As a generalization for the binomial distribution, Lovison 7 has derived the distribution of the sum of dependent Bernoulli random variables as an alternative of Altham's multiplicative-binomial distribution 8 from Cox's log-linear representation 9 for the joint distribution of n binary-dependent responses and it will be called log-linear-binomial distribution.This distribution is characterized by two parameters and provides wider range of distributions than are provided by the binomial distribution where the log-linear binomial distribution includes underdispersion, overdispersion models and the binomial distribution as a special case.

Lovison's Log-Linear-Binomial Distribution
Consider the random vector Z Z 1 , . . ., Z n , Z i being a binary response which measures whether some event of interest is present, "1", or absent, "0" for a sample of n units and Y n n i 1 Z i denotes the sample frequency of successes.To accommodate the possible dependence among Z i , and under the assumption that the units are exchangeable Lovison 7 has obtained the distribution of Y n as where 0 < ψ < 1 and ω > 0 are the parameters; for more details about this distribution; see the work of Lovison in 7 .This distribution provides a wider range of distributions than is provided by the binomial distribution, for example, Figures 1 and 2 show the distribution of Y n for n 10, ψ 0.2, 0.5, and different values of ω.For the values of ω > 1, the distribution is sharper in the middle than the binomial.
As can be seen from the figures for some values of ω < 1, the distribution can be U, bimodal and unimodal shapes.The expected value and the variance of Y n are given by

2.2
Note that for k / h and Cov stands for covariance.The variance of the binomial is and the covariance of where  Therefore, the variance of the log-linear binomial is equal to the variance of binomial when Cov Z k , Z h 0, greater than the variance of binomial when Cov Z k , Z h > 0, and less than the variance of binomial when Cov Z k , Z h < 0.
The expected value and the variance of Y n are nonlinear on both ψ and ω.For example, the nonlinearity in the variance of Y n is depicted in Figures 3 and 4.

Estimation of the Parameters
Let a random sample of R sets of n trials each be available, the number of given y successes being f y y 0, 1, 2, . . ., n , and R n y 0 f y .

Method of Moments
We can use the first two sample moments to find moment estimates for ψ and ω as follows.
The first sample moment is

3.1
Equating these sample moments to the corresponding population moments we obtain the estimates of ψ and ω by solving the two equations

3.2
The solution of these two equations needs numerical methods.The numerical solution of these two equations may be found by using nonlinear equation solver nleqslv in statistical R-software.The author has a program written in R for finding the moment estimates upon request.

Method of Maximum Likelihood
The method of maximum likelihood provides estimators that have a reasonable intuitive basis and many desirable statistical properties.The likelihood of the sample can be written as Take the logarithm of this function

Estimation of ψ and ω
After simplification the first partial derivative for ψ and ω can be written as
The numerical solution for these two equations may be obtained using nonlinear equation solver nleqslv in R-software.The author has a program written in R for finding the maximum likelihood estimates upon request.

The Asymptotic Variances
Suppose that θ ψ ω and under certain regularity conditions the information matrix is The asymptotic variance-covariance matrix can be obtained as

3.11
To find the information matrix for the log-linear binomial distribution, we note that where

3.13
Taking the expectation, the information matrix is obtained as The variancecovariance matrix will be V θ ≈ I θ −1 .

3.15
The estimated variance-covariance matrix is .

3.16
The author has a program written in R for finding the estimated variance-covariance matrix upon request.

Special Case: Ungrouped Data
If the values of Bernoulli random variables Z i are known, the parameters ψ and ω can be estimated as follows.By noticing that in a vector of binary responses z there are n n − 1 /2 pairs of responses, and if the order is irrelevant three types of pairs are distinguishable:

4.1
The estimated cross-product ratio CPR is To obtain the maximum likelihood estimate of ψ, we need to solve The estimated variance of ψ can be obtained when R 1.

Sampling Distribution of the Sign Test for Comparing Paired Sample
The sign test is a nonparametric test which makes very few assumptions about the nature of the distributions under test.It is for use with two repeated or correlated measures, and measurement is assumed to be at least ordinal.The usual null hypothesis for this test is that there is no difference between the two treatments groups, g k , k 1, 2 .Formally, let τ P X > Y , and then test the null hypothesis H 0 : τ 0.5 for no differences against H 1 : π / 0.5 for differences.The sign test can be written as where

5.2
Under the assumptions of two outcomes, fixed probability of success, and independent trials it is assumed that the sampling distribution of s d is binomial distribution.The rejection region is and y d min #1, # 0 .For a two-tailed test we reject H 0 if 2P value ≤ α, else we do not reject H 0 , where α is prespecified value.

Sign Test Under Dependence of the Trials
Suppose the assumption of mutual independence in the data is violated and the trails are dependent; see, for example, the work of Tallis in 10 and Luce ño 11 .In this case, we suggest the log-linear binomial distribution as a sampling distribution of s d n i 1 Z i rather than the binomial distribution.Let τ P X > Y P Z 1 ψ κ n−1 ψ, ω /κ n ψ, ω represent the probability of success.Then the null hypothesis H 0 : τ 0.5 is equivalent to H 0 : ψ 0.5.Therefore, under the null hypothesis, the rejection region can be obtained as where y d min #1, # 0 and ω 1/ √ CPR.For a two-tailed test, we reject H 0 if 2P value.lb≤ α, else do not reject H 0 and α is a prespecified value H 0 : there is no difference against H 1 : there is difference.
Example 5.1.A physiologist wants to know if monkeys prefer stimulation of brain area A to stimulation of brain area B. In the experiment, 14 rhesus monkeys from the same family are taught to press two bars.When a light comes on, presses on Bar 1 result in stimulation of area A and presses on Bar 2 result in stimulation of area B. After learning to press the bars, the monkeys are tested for 15 minutes, during which time the frequencies for the two bars are recorded.The data are shown in Table 1.

Power of the Test
Following the method given by Groebner et al. in 14 , we may use the normal approximation to study the power of the sign test at n 14, ω 1.1 and using one-tail test for simplicity as follows.The power of the test is power 1 − β, β P accept H 0 | H 0 is false .

5.11
The power of the test under log-linear binomial distribution is power lb ≈ 1 − P Z < 9.404 − 14τ 1 1.4616 , 5.12 τ 1 > 0.5.Power of the sign test is given in Table 2.In this case, the log-linear binomial distribution shows improvement in the power of the sign test over the sign test under binomial; for example, when τ 1 0.70, the power increases from 0.44 to 0.61 about 1.40 times.

Fitting Genetic Data
The data are taken from Salmaniyaa hospital records in Bahrain for a genetic study on the gender ratio.Table 3 shows the number of male children in 3475 families with 7 children.
The first two sample moments are m 1 3.12057, M 2 1.21567.The expected frequencies based on these estimates are shown in Table 3.The value of χ 2 1.288 with 6 degrees of freedom gives p 0.97 > 0.05.Thus, the log-linear binomial distribution provides a good fit to the observed data.

Conclusion
The parameters of the log-linear binomial distribution were estimated by the moment and maximum likelihood methods.Both methods needed solving nonlinear equations to obtain the estimators of the parameters.We used nonlinear equation solver nleqslv package in statistical R-software to find the estimates of the parameters.The variance-covariance matrix for the maximum likelihood estimates was obtained.Moreover, the sampling distribution of the sign test was studied when trials are dependent.A set of genetic data from Salmaniyaa hospital in Bahrain has been fitted using log-linear binomial distribution.The fit is found preferable over fitting the binomial distribution.

Figure 1 :Figure 2 :
Figure 1: The distribution of Y n for different values of ω, ψ 0.2, and n 10.

Figure 3 :
Figure 3: Variance of Y n for various values of ω at each value of ψ and n 25.

Figure 4 :
Figure 4: Variance of Y n for various values of ω at each value of ψ and n 25.
and n − y y pairs of z k 0, z h 1 , or z k 1, z h 0 where y n k 1 z k ; see the work of Lovison in 7 .Therefore, the estimate of ω is ω 1 CPR .

Table 1 :
Number of bar presses in brain stimulation experiment.That is, we would conclude that monkeys prefer stimulation in brain area B to stimulation in area A. Note that the rejection of H 0 agrees with Wilcoxon Signed-Ranks test for the same data; see the work of Weaver in 12 and Siegel and Castellan in 13 .
Using the binomial distribution, we have n 14, y d 3, and P value is

Table 2 :
Power of the sign test under binomial and log-linear binomial distribution for one-tail test with n 14, ω 1.1, and α 0.05 based on normal approximation.

Table 3 :
Numbers of male children in 3475 families of size 7This value of τ was used to obtain the expected frequencies shown in Table3.The value of χ 2 193.01 with 7 degrees of freedom gives p 0 < 0.05, the simple binomial model has to be rejected.If we use log-linear binomial distribution, and fit the data by maximum likelihood, we find that