FIXED-WIDTH CONFIDENCE INTERVAL FOR A LOGNORMAL MEAN

We consider the problem of constructing a fixed-width confidence interval for a lognormal mean. We give a Birnbaum-Healy type two-stage procedure to construct such a confidence interval. We discuss some asymptotic properties of the procedure. A three-stage procedure and an accelerated sequential procedure are also given for the comparison of efficiency among these three multistage methodologies.


Introduction.
Suppose there is a lognormal distribution with unknown parameters µ and σ 2 .Let X 1 ,X 2 ,... be a random sequence from this distribution.Then Y 1 ,Y 2 ,... is a random sequence from N(µ, σ 2 ), where Y i = ln X i , i = 1, 2,....The mean of the lognormal distribution is a nonlinear function of (µ, σ 2 ) as E(X) = exp(µ + σ 2 /2) (= θ, say) and it is unknown since µ and σ 2 are unknown.Also, note that V (X) = {exp(σ 2 ) − 1} exp(2µ + σ 2 ).Let θ n be an estimator of θ based on Y i = ln X i , i = 1, 2,...,n, defined by where Then, the problem is to construct a confidence interval for θ such that for given d (> 0) and α (0 < α < 1) by choosing the size n.Since Ȳn and (n−1)S 2 Y ,n /σ 2 are independently distributed as N(µ, σ 2 /n) and χ 2  n−1 , respectively, we have (1.4) Let A = (σ 2 /2)(1 + σ 2 /2).Then, the coverage probability is given for sufficiently large n as where Φ(•) denotes the N(0, 1) distribution function.Hence, it is easy to see that the requirement (1.3) is satisfied asymptotically when d is small if n is chosen as where z is the upper point α/2 of the N(0, 1) distribution, that is, Φ(z) = 1−α/2.That is, n given by (1.6) is an asymptotically optimal sample size and the requirement (1.3) is satisfied only if θ 2 A 2 /n 2 → 0 for large n.However, the fixed-sample size n is unknown since µ and σ 2 are unknown.Takada [16] showed that there is no fixed-sample size procedure to construct a fixed-width confidence interval for the mean of lognormal distribution with at least the nominal value uniformly for all µ and σ 2 .So, we resort to some sequential methodology for this problem.Takada gave a solution to this problem by extending Nagao's [13] sequential procedure to a Birnbaum-Healy-type two-stage procedure.However, the procedure given there tends to require very large sample sizes in order to meet the requirement (1.3).In this paper, we give a two-stage procedure which is also a Birnbaum-Healy-type two-stage procedure, but it is more efficient than Takada's procedure.For the reference on other applications of Birnbaum-Healy-type two-stage procedure, see Graybill and Connell [8,9] and Govindarajulu [5,6,7].For the original idea of sequential sampling in two stages, see Stein [15] and Cox [4].
In Section 2, a two-stage procedure is given for constructing a confidence interval which satisfies (1.3) asymptotically.In Section 3, some asymptotic properties of the procedure are discussed when compared with Takada's procedure asymptotically.In Sections 4 and 5, we give a three-stage procedure and an accelerated sequential procedure for this problem.Finally, in Section 6, a Monte-Carlo study is presented to compare the efficiency among these three multistage procedures for moderate samples.
2. Two-stage procedure.We first take the preliminary sample X 1 ,...,X m of size m (≥ 2) and compute where Ȳm and S 2 Y ,m are defined like (1.2), and the constants a (≤ 0) and b (≥ 1) are determined later so as to satisfy the requirement (1.3) asymptotically.Next, we take the second sample X m+1 ,...,X m+N of size N defined by using (2.1) as where [x] denotes the largest integer less than x.Then, a confidence interval for θ is constructed by using θ N defined as (1.1) on the basis of the second sample of size N.
The coverage probability (CP) is given for sufficiently large N by We determine the design constants a and b so as to satisfy (2.4) In order to meet the requirement (1.3) asymptotically, we write that where δ = d/θ.Since it increases as δ increases, we have We note that for sufficiently large m, and where Z and νW are independently distributed as the N(0, 1) and χ 2 ν with ν = m − 1, respectively.Then, we write the right-hand side of (2.6) for sufficiently large m as where g ν (•) denotes the density function of the random variable χ 2 ν /ν.We set h(σ So, we use the constants a and b such that (2.13)However, we can see that there is no solution to (2.12) when m < ∞, because Φ ze a b φ(Z)dZ (2.14) for sufficiently large m, where φ(•) denotes the N(0, 1) density function.So, we determine the constants a and b as follows: by using the normal approximation to νW for sufficiently large ν, the left-hand side of (2.11) is written as Now we consider that a and b are sequences depending on where k and (< −3 √ 2) are constants, and expand (2.15) such that (2.17) In the second term of the above expansion, we let Then, we have and from (2.16) To calculate the values of a and b, we first substitute the constant a given by (2.20) into (2.11).Then, we obtain the value of b by solving (2.11) for given (α, m) with the bisection method.The value of a is given by (2.20) with b determined above.Here, the 128-point Gauss-Legendre numerical quadrature formula was used to evaluate the left-hand side of (2.11).Since the integral is over an infinite range, we instead calculate We choose the constant u such that (2.24) where G ν (•) denotes the χ 2 ν distribution function, we have Note that a → 0 and b → 1 as m → ∞.
Then, the two-stage procedure given in Section 3 has the following asymptotic properties.
Theorem 3.1.Under condition (3.1), the two-stage procedure is asymptotically consistent, that is, for Proof.Since for sufficiently large N we recall that then it yields that The left-hand side is bounded below as follows: By using the dominated convergence theorem, the right-hand side is as follows: Hence, from (3.5) and (3.6), we conclude that The proof is complete.
Theorem 3.2.Under condition (3.1) and d 2 m(d) → 0 as d → 0, the two-stage procedure is asymptotically first-order efficient, that is, where n is defined by (1.6).The proof is complete.
Remark 3.3.It should be noted that the procedure is second-order inefficient as usual in the two-stage sampling, that is, lim d→0 (m + E(N) − n ) = ∞, which leads to substantial oversampling in the second stage especially if m happens to be chosen n .
Takada [16] proposed a different Birnbaum-Healy type two-stage procedure for this problem.The procedure can be summarized as follows: first take the preliminary sample X 1 ,...,X m of size m (≥ 2) and compute Y ,m /b , where Ȳm and S 2 Y ,m are defined by (1.2) and the constants a and b are chosen so as to satisfy (1.3).Then we determine K m (> 0) such that θ m (e Km − 1) = d with θ m = exp( µ m + σ 2 m /2).Next, take the second sample X m+1 ,...,X m+N of size N defined as a solution to (3.11) given as follows: where θ N = exp( ȲN + S 2 Y,N /2) and 0 < γ < α.The coverage probability is given by Hence, the values of a and b are chosen such that However, the values of a and b are not given in Takada [16].
When we compare the efficiency of the procedure given in Section 2 with Takada's procedure, the following theorem holds.Theorem 3.4.Under condition (3.1), the expected second sample size in the twostage procedure is smaller than that in Takada's procedure for 0 < d < d 0 with sufficiently small d 0 (> 0).

Proof.
Let N 0 and N T represent the second sample sizes in the two-stage procedure and in Takada's procedure, respectively.Let H(N) be the left-hand side of (3.11).When N = N T , H(N T ) = 1 − γ for sufficiently small d (> 0).When N = N 0 , for sufficiently small d (> 0) under (3.1),where c = z/2 and η = σ 2 /2.For 0 < η ≤ 1 we have for sufficiently small d (> 0).By noting H(N) is an increasing function of N, the proof is complete.Thus, the two-stage procedure given in Section 2 is more efficient than Takada's procedure asymptotically.We have observed that Takada's procedure performs badly for the practical size of d.We omit the details for brevity.

Three-stage procedure.
In Section 2, we gave a two-stage procedure for estimating the mean of a lognormal distribution, whose coverage probability is the specified one.We tried to ensure this asymptotically by introducing constants (a, b) and determining them suitably.Also, due to mathematical difficulties, the information about the preliminary sample of size m is ignored in the final estimation of the confidence interval.This will be justifiable if m is small relative to the size of the second sample.However, in the case of Stein's two-stage procedure for estimating the normal mean with specified width 2d and specified coverage probability 1 − α, Cox [4] pointed out that the difference between the expected value of the combined sample size and n 0 tends to infinity as d → 0, if m/n 0 (d) → 0 as d → 0, where n 0 (d) is the "optimal" fixedsample size which would have been used had σ 2 been known.Thus, the two-stage procedure is likely to lead to considerable oversampling.To overcome this difficulty, Hall [10] proposed a three-stage procedure for estimating the normal mean which ensures that the difference between the expectation of the final sample size and n 0 is bounded as d → 0. In the following we improvise Hall's method to our problem.
As before, let m denote the size of an available preliminary sample.Let ρ (0 < ρ < 1) be a fixed number in advance.Take the second sample of size M − m where with Y ,M .Now, take the third sample of size N − M where where K is a non-negative integer.Let ȲN and S 2 Y ,N be the mean and variance of the pooled sample Y 1 ,...,Y N .Then, an approximate 1 − α level confidence interval for θ is given by ( θ N − d, θ N + d).
5. Accelerated sequential procedure.For a N(µ, σ 2 ) population with both µ and σ unknown, if one wishes to estimate µ with a given accuracy, there are two ways to go about.In particular, suppose we wish to construct a confidence interval for µ with specified coverage probability and width.Stein's [15] two-stage procedure allows us to do so by only two operations.However, Stein's method can be considerably less efficient than the fully sequential procedure proposed by Anscombe [1], Robbins [14], and Chow and Robbins [3] in the sense that it leads to substantial oversampling.On the contrary, if N is the size of the final sample in the Anscombe-Chow-Robbins (ACR) procedure, then E(N ) − n 0 is bounded as d → 0. However, the ACR procedure will have coverage probability very nearly 1 − α especially for small d.In spite of the greater efficiency of the ACR procedure, it can be more expensive to carry out.The sample values must be taken one at a time and a decision is made after each sampling operation.In many situations, significant saving can be achieved by collecting many sample values together, in which case it could be more economical than to employ Stein's method.Hall [10] provided an accelerated method which combines the advantages of two-stage procedure with those of the ACR procedure.In the following we adapt Hall's [11] accelerated procedure for the problem of estimating the lognormal mean.The accelerated sequential procedure may be viewed as robustizing the two-stage procedure against the possibility of a small initial sample size.
Fix ρ (0 < ρ < 1).Stop the first-stage sampling as soon as where ), and z is a constant such that Φ(z) = 1 − α/2.According to (5.1), call the initial sample size M 1 .Now set for any ε ≥ 0. Define the total sample size as (5.3) Draw N −M 1 additional observations.Construct ( θ N −d, θ N +d) as a confidence interval for θ of width 2d with coverage probability nearly 1 − α.According to Hall [11], a safe value for ρ is 0.5 from the practical point of view.

Monte-Carlo simulation.
In this section, we present findings regarding the moderate and small sample size performances of the two-stage procedure given in Section 2, three-stage procedure given in Section 4 and the accelerated sequential procedure given in Section 5.For all the methodologies, we consider the case when m = 20, α = 0.05, and µ = 0.The scale parameter σ is set equal to 0.3 and 0.5.Note that the coefficient of variation (CV) = {exp(σ 2 ) − 1} 1/2 = 0.307 and 0.533 in the case when σ = 0.3 and 0.5, respectively.We fix n = 20, 60, 100, and then, from (1.6), for each n and σ .For the two-stage procedure, we let (a, b) = (−0.302,2.354) in (2.2) with (2.1) from Table 2.1.For the three-stage procedure, we let (ρ, K) = (0.5, 3) in (4.1) and (4.2) as suggested in Hall [10, page 1230].For the accelerated sequential procedure, we let (ρ, ε) = (0.5, 0) in (5.1) and (5.2).Efficiency of each procedure is evaluated by average numbers of the sample sizes required in the procedure and of coverage probabilities via the Monte-Carlo simulation with 10,000 (= R, say) independent trials.The algorithm of the pseudo random number generator in the simulation is This generator is called URN30 in Karian and Dudewicz [12, page 118] and passed the TESTRAND set of tests.In further testing by Bernhofen, Dudewicz, Levendovszky, and van der Meulen [2], this generator is best among all 12 generators which have passed the tests in TESTRAND.Let n r be the observed value of N and p r = 1 (or 0) according to whether the intervals constructed by each procedure include θ or not for r = 1,...,R.We denote n The quantities n and p, respectively, estimate E(N) and the coverage probability, while s(n) and s( p) stand for their corresponding estimated standard errors.The last column gives the sample size ratio relative to n , that is, ( n + m)/n for the two-stage procedure and n/n for the three-stage procedure and the accelerated sequential procedure.
As observed from Table 6.2, the two-stage procedure does oversampling compared with the other procedures even though it cuts down the number of sampling operations compared to other procedures.The accelerated sequential procedure seems to perform better than the other procedures in terms of smaller amount of oversampling.The accelerated sequential procedure as well as the three-stage procedure involve the choice of proper parameters.Naturally, one would expect that the choice of these parameters will very likely impact the performances of these methodologies for moderate values of n .Actually, there is no available optimality criteria yet in order to (b) σ = 0.5.
choose the best set of parameters.Hence, we try various values of those parameters and examine any visible effects or marks on such procedures.We report performances here only for the set of parameters given in Table 6.2 for brevity.

3 .
Asymptotic property.Throughout this section, we assume that m = m(d), a = a(d), and b = b(d) are sequences such that

. 9 )
By using the facts that θ m → θ and A m → A as d → 0 and the dominated convergence theorem,
Table 2.1.The values of a and b for given m and α (the upper entry is for a and the lower entry is for b).