Confidence Intervals for the Mean Based on Exponential Type Inequalities and Empirical Likelihood

For independent observations, recently, it has been proposed to construct the confidence intervals for the mean using exponential type inequalities. Although this method requires much weaker assumptions than those required by the classical methods, the resulting intervals are usually too large. Still in special cases, one can find some advantage of using bounded and unbounded Bernstein inequalities. In this paper, we discuss the applicability of this approach for dependent data. Moreover, we propose to use the empirical likelihood method both in the case of independent and dependent observations for inference regarding the mean. The advantage of empirical likelihood is its Bartlett correctability and a rather simple extension to the dependent case. Finally, we provide some simulation results comparing these methods with respect to their empirical coverage accuracy and average interval length. At the end, we apply the above described methods for the serial analysis of a gene expression (SAGE) data example.


Introduction
Although the classical -test is based on the assumption that observations are normally distributed, it is well known that it is a robust test and works well at least for symmetric distributions (see [1]). However, for skewed or heavy tailed distributions, the confidence intervals for the mean based on the inversion of the -test may give poor coverage. Using exponential type inequalities (such as Bernstein's inequality), Rosenblum and van der Laan [2] presented a simple approach for constructing confidence intervals for the population mean. Rosenblum and van der Laan [2] dealt with the bounded version of Bernstein's inequality which is applicable only for a few distributions such as, the bounded uniform distribution. To use it for general distributions, we need to use the unbounded version of Bernstein's inequality, which was analyzed by Shilane et al. [3] for the negative binomial distribution. The confidence intervals based on exponential type inequalities have a guaranteed coverage probability under much weaker assumptions than required by the standard methods. Although the obtained confidence intervals are usually too large, there are situations when they give better coverage accuracy than the classical methods.
In this paper, our goal is to use the empirical likelihood method introduced by Owen [4,5] for the inference on the mean and compare it with the Rosenblum and van der Laan [2] method. The empirical likelihood method is based on the nonparametric likelihood ratio statistic, which similarly to the parametric case has asymptotically a limiting chisquare distribution. The confidence intervals obtained by the empirical likelihood method have some very appealing characteristics. There are no prespecified parametric assumptions on the distribution of observations and no constraints on the shape of the confidence intervals. Empirical likelihood intervals are Bartlett correctable in most cases. This means that a simple correction for the mean reduces the coverage error from order −1 to order −2 , where denotes the sample size (see, e.g., [6,7]). Owen [7] suggested (see Section 2 for more details) that for small sample sizes the distribution may provide a better approximation for the limiting distribution of the test statistic. We found in our 2 ISRN Biomathematics simulation study that this approach can provide a significant improvement (see Tables 1 and 3). Time series data are often correlated. For this situation, either modified or different statistical procedures of statistical inference are required. There exist many exponential type inequalities for weakly dependent data, characterized bymixing or -mixing sequences (e.g., see [8][9][10]). However, it is problematic to use them for practical data problems. First, such inequalities typically contain the -or -mixing coefficients which have to be estimated from the data. Only recently McDonald et al. [11] introduced the estimation procedure for -mixing coefficients. Second, usually these inequalities are established for bounded distributions. The applicability of exponential type inequalities for the dependent case is therefore limited. The blockwise empirical likelihood method introduced by Kitamura [12] provides a good alternative for the inference regarding the mean. It seems that the Bartlett correction can be applied also in this case.
For our simulation study, we use the negative binomial distribution, which has been analyzed by several authors (e.g., see [3,13]) and has been widely used for applications involving insurance data. There are several ways how the negative binomial distribution can be parametrized. We will use the parametrization described by Hilbe [13]; that is, for any ∈ R + and ∈ R + , the resulting probability mass function for the negative binomial random variable ∼ Negbinom( , ) is We also simulate some autoregressive processes representing data with a marginal negative binomial distribution. For independent data, we compared the empirical coverage accuracy for the confidence intervals based on exponential type inequalities, the -test, and several empirical likelihood and bootstrap methods. For dependent data, we used a simulation study to compare the blockwise empirical likelihood method, with and without the Bartlett correction, and we also applied the empirical likelihood method to the serial analysis of gene expression (SAGE) data analyzed by Shilane et al. [3]. This paper is organized as follows. Section 2 deals with the independent case. It contains three subsections which introduce Bernstein inequalities and the empirical likelihood method while also including a simulation study. In Section 3, weakly dependent data are discussed. Subsequent three subsections introduce the blockwise empirical likelihood method, describe two exponential-type inequalities, provide a simulation study, and analyze the SAGE data example. We finish with the main conclusions presented in Section 4.

Confidence Intervals Using Bernstein's Inequality.
Rosenblum and van der Laan [2] described the method for generating confidence intervals, which bounds the tails of the distribution of the sample mean of independent, bounded random variables. This method is based on exponential type inequalities such as Bernstein's inequality: for all > 0, where and V are constants defined in Theorem 1.

Theorem 1 (Rosenblum and van der Laan [2]
). Let 1 , . . . , be independent and identically distributed (iid) random variables with mean , and let be such that P(| − | ≤ ) = 1. Let * 2 be a number satisfying * 2 ≥ Var( ). Then for the function log (2/ ) ) , In particular, The accuracy of coverage probabilities depends on the estimation of and V, where V = * 2 . Rosenblum and van der Laan [2] suggested that a bound should be set at least as large as max | − | and proposed also to use for * 2 an estimator 2 + SE( 2 ), where SE( 2 ) is the standard error of 2 (for details, see [2]).
To examine exponential type inequalities for distributions like the negative binomial distribution, we will need a version of Bernstein's inequality that does not require the assumption of boundedness. [14]). Let 1 , . . . , be independent random variables satisfying the moments conditions

Lemma 2 (Birgé and Massart
for all > 2, for some positive constants V and . Then, for any positive and = ∑ =1 , where is defined by the equation = V √ 2 + .
Suppose that , 1 ≤ ≤ , are iid negative binomial with parameters and , and let = − . In this case, Shilane et al. [3] obtained the confidence interval for the mean of the form ± , where is the solution to the equation To estimate the parameter , we can use the method of moments; that is,̂= /(( 2 / ) − 1).

Empirical
Likelihood for the Mean. Let 1 , . . . , be iid observations with the distribution and mean = E 1 . To obtain confidence intervals for , we define the profile empirical likelihood function Owen [4,5] showed that a unique value for the right-hand side of (8) exists when is inside the convex hull of the data points 1 , . . . , . This maximization problem can be solved using the Lagrange multiplier method.
Theorem 3 (Owen [7]). If 1 , . . . , are iid random variables with the distribution function 0 , 0 = E( ), and 0 < Var( ) < ∞, then The empirical likelihood (1 − )100% confidence interval for is of the following form: where is the 1 − quantile of the 2 1 distribution. Owen [7] stated that the proof of Theorem 3 and some simulations suggest that the quantiles 2,1− 1 can be replaced by the quantiles of the distribution 1− 1, −1 . The calibration usually gives better results for small sample sizes.
The Bartlett correction replaces the 1 − quantile of the limiting chi-square and is th moment of 1 (see [6]). The Bartlett correction improves the asymptotic error rate for all coverage levels 1− . A Bartlett-corrected empirical likelihood confidence interval has the following form:

Simulation Study.
Our simulation study is based on 10, 000 independent trials. In every trial, we generated random variables from the negative binomial distribution Negbinom( , ) with different sample sizes = {20, 50, 100}, = 5, and = {0.1, 0.5, 1.0}. To simulate data, we used the command rnegbin from the package MASS in the R statistical programming language.
In Table 1, the empirical coverage accuracy for nominal 95% confidence intervals for the mean is evaluated for the following methods: the -test ( ), the empirical likelihood method (EL), the empirical likelihood with Bartlett correction (EL ), the empirical likelihood method with the distribution calibration (EL ), and the empirical likelihood method with the distribution calibration and Bartlett correction (EL ). For comparison, we have also simulated the coverage accuracy of several bootstrap resampling procedures: the percentile ( perc ), normal ( norm ), basic ( basic ), and studentized ( stud ) bootstrap resampling methods. The R package boot and command boot.ci are used to construct the confidence intervals for all the above mentioned bootstrap methods. Although only the unbounded version of Bernstein inequalities is valid for the negative binomial distribution, we also apply the bounded version in order to see the possible effects of the violation of this assumption. In Table 2, we present average confidence interval lengths for the same methods as described in Table 1.
For smaller values, the degree of skewness of the negative binomial distribution is higher. Therefore the coverage accuracy increases along with the increase of the parameter . From the results from Tables 1 and 2, we can see that the coverage accuracy is higher for the confidence intervals based on the bounded Bernstein's inequality than for those using the unbounded version. However, the resulting confidence intervals have larger average length when the bounded version is used.
Comparing the empirical likelihood and the bootstrap methods, we can see that even the uncorrected EL method works better than the bootstrap methods (the only exception is the studentized bootstrap, which also has one of the widest  intervals) and also the average lengths of the EL based confidence intervals are only negligibly larger than those for the bootstrap methods. When the parameter = 0.1 and data become much more asymmetric, the exponential type inequalities give the best coverage. However, as they are exact methods, for greater values, the empirical coverage becomes much larger than the nominal 95% confidence level. When using the Bartlett correction and the calibration with the distribution, the empirical likelihood method provides much better coverage accuracy compared with the -test and bootstrap methods.

Blockwise Empirical Likelihood for the Mean.
In practice, the assumption of independence may be invalid in many situations, and by relaxing this strong assumption, we extend the scope of our research. As first shown by Kitamura [12], the empirical likelihood method can be applied not only for independent but also for stationary weakly dependent data characterized by some mixing coefficients.
Let ( ) ∈Z be a real-valued strictly stationary process defined on a probability space (Ω, F, ). For any twofields A and B ⊂ F, define the following coefficient of dependence: where the supremum is taken over all ∈ A and ∈ B.
Further, we will use the notation and the results from two papers: Zhang et al. [16] and Kitamura [12]. Let ( , ) be a real-valued function such that E{ ( , )} = 0, where = ( ). Let and be the integers depending on , will be used for the estimation. Note that is the block length and is the separation between the block starting points. Furthermore, we will use nonoverlapping blocks with = .
In order to address the dependence among Kitamura, [12] proposed to use the profile empirical likelihood method to the blockwise sample {∑ =1 ( ( −1) + , )} =1 instead of { ( , )} =1 . More specifically, Kitamura [12] defined the blockwise empirical likelihood function for as which generates the blockwise empirical likelihood ratio statistic Under some regularity conditions, Kitamura [12] showed that ( 0 ) also has a 2 1 limiting distribution. The coverage error of confidence intervals can be improved up to the order ( −5/6 ) by using the Bartlett correction; that is, where = (1/2) 4 − (1/3) 2 3 and denotes the th moment of . This rate is slower than the rate obtained for the independent data due to the nonparametric treatment of dependence.

Exponential Type Inequalities.
In this section, we introduce several exponential type inequalities for -andmixing sequences and discuss their applicability in practice.
Theorem 4 (Doukhan [17]). Let ( ) ∈Z be a zero-mean realvalued process such that ∃ 2 ∈ R + , for all , ∈ N : (1/ ) ( + ⋅ ⋅ ⋅ + + ) 2 ≤ 2 and for all ∈ N : | | ≤ . Then for each > 0, = 2 /4, and ≤ /(1 + ), Theorem 5 (Bosq [18]). Let ( ) ∈N be a zero-mean realvalued process such that sup 1≤ ≤ ‖ ‖ ∞ ≤ . Then, for each integer ∈ [1, /2] and each > 0, Both inequalities involve mixing coefficients, which in practice need to be estimated. Recently McDonald et al. [11] proposed a method for the -mixing coefficient that uses the histogram estimator for stationary time series data. To our knowledge, this is the first result in this area that allows us to estimate such general mixing coefficients. The estimator has the following form: wherêis the histogram estimator of the joint density of observations and̂2 is the 2-dimensional histogram estimator of the joint density of the two sets of observations Table 4: Empirical average interval lengths for the 95% confidence intervals for the mean using the blockwise EL method (EL) and the blockwise EL method with the Bartlett correction (EL ) for the data generated from an AR(1) process with = 0.5 and with a marginal negative binomial distribution Negbinom( , ), based on 10 000 samples. In a recent paper, Merlevède et al. [10] obtained the exponential type inequality depending on the following condition on a strong mixing coefficient; that is, where is some constant. Here, we present the inequality given in Corollary 12 by Merlevède et al. [10].
Note that the inequality (23) only depends on the value of the constant . Using the -mixing estimator (21) from McDonald et al. [11], we propose to estimate by the linear regression estimator of the slope from the equation where 1 is some constant. The inequalities defined in Theorems 4 and 5 can be used for practical applications after the estimation of , by exploiting the relation ( ) ≤ ( ).

Simulation Study.
We simulate 1 , . . . , from a stationary AR(1) process { } ∈ defined as where { } ∈ is an innovation process which is weakly stationary with mean 0 and autocovariance E( +ℎ ) = 2 < ∞ if ℎ = 0 and 0 otherwise, and | | ≤ 1 is the coefficient of the process. The data are generated with a marginal negative binomial distribution using the R package gsarima. We used the command garsim and selected the parameter link="identity. " We used different sample sizes = {50, 100, 200, 500, 1000} and selected different values for the parameter = {0.1, 0.5, 1.0}. The minimal value of was set equal to 0.5 for the algorithm used by the command garsim.
In Table 3, we present the empirical coverage accuracies for the 95% confidence intervals constructed using the blockwise empirical likelihood method with and without the Bartlett correction. We report the corresponding average lengths of the 95% confidence intervals in Table 4. Similar to the independent case the, Bartlett correction provides some improvement in all cases. Furthermore, the results strongly depend on the block length parameter .
A general autoregressive process AR(1) has the mixing property if the process defined in (26) has a continuous marginal density (see [15]). Thus, it might not be appropriate to use exponential type inequalities in the case of the negative binomial distribution. We performed more simulations using a continuous distribution such as the normal distribution. In this case, the confidence intervals using Bernstein type inequalities typically are too large in comparison with those produced by the blockwise empirical likelihood method.  Finally, we analyze the serial analysis of gene expression data (SAGE) from Shilane et al. [3]. This dataset is used in molecular biology to estimate the relative abundance of messenger ribonucleic acid (mRNA) molecules based upon the frequency of corresponding 14 base pair tag sequences that are extracted from a cell. Due to the fact that the cost of sequencing can be quite high, the sample size is often quite small for such data situations.
Shilane et al. [3] picked the 20 most frequent tags from the whole data set and constructed the confidence intervals for the mean of the corresponding counts using several methods: bounded and unbounded Bernstein type inequalities, 2 , Gamma, Wald, -test, and bias corrected and accelerated bootstrap methods. Additionally, we report the confidence intervals obtained by the empirical likelihood method in Table 5.
Shilane et al. [3] outlined some strengths and weaknesses of the methods from Table 5. Both the intervals based on the bounded Bernstein inequality and the Wald method include a range of negative numbers as possible values for the mean , which is unreasonable for the SAGE data. By the contrast, All the methods used in Table 5 require the independence of the data. However, if we look at the whole data set, there are significant correlations (see Figure 1). Still in this case we can use the blockwise empirical likelihood method for statistical inference (see Table 6).

Conclusions
We conclude that large confidence intervals are typically obtained when Bernstein type inequalities are used. This method can still provide good results for highly skewed situations when the sample sizes are small. For the independent observations, the empirical likelihood method is a good alternative not only to the classical -test statistic but also to the bootstrap methods. The Bartlett correction and the calibration using the distribution can provide a significant improvement for small sample sizes. For dependent data, in principle, it is possible to use Bernstein type inequalities for inference for the mean. However, for correlated time series data, usually bigger sample sizes are necessary. Second, for the negative binomial distribution, these inequalities are not valid for ARMA type processes. A reasonable alternative is to use the blockwise empirical likelihood method.