Increased Statistical Efficiency in a Lognormal Mean Model

Within the context of clinical and other scientific research, a substantial need exists for an accurate determination of the point estimate in a lognormal mean model, given that highly skewed data are often present. As such, logarithmic transformations are often advocated to achieve the assumptions of parametric statistical inference. Despite this, existing approaches that utilize only a sample’s mean and variance may not necessarily yield the most efficient estimator. The current investigation developed and tested an improved efficient point estimator for a lognormal mean by capturing more complete information via the sample’s coefficient of variation. Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 absolute percentage points above the efficient estimator presented by Shen and colleagues (2006). The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.


Introduction
The presence of highly skewed data is commonplace across both basic and applied sciences [1,2].In certain instances, the logarithmic transformation of such data may be undertaken with the primary purpose of establishing a normal distribution and improving variance, which may include removing heteroskedasticity in the process, to achieve (,  2 ).Patterson [3] provided seminal work concerning the statistical challenges involved in estimated the population mean following the transformation of data.More recently, Shen et al. [4] proposed an improved efficient minimum risk/relative mean squared error (RMSE) estimator of the lognormal mean, and numerous researchers have addressed the estimation of parameters within this distribution from both a frequentist and Bayesian context [5][6][7][8][9].
In presenting the lognormal distribution from a more fundamental perspective, if  is a random variable with a lognormal distribution and a mean of () = , then ln() will be normally distributed with mean of  and variance of  2 .Therefore,  may also be expressed as  ∼ ln(,  2 ) with a mean of , observing that As such, when considering a random sample  1 ,  2 , . . .,   that is i.i.d. and given ln(,  2 ) with a mean of , then   = ln(  ) is i.i.d. as (,  2 ) for  = 1, . . ., .The following may also be defined: noting that  and  2 are the maximum likelihood estimators (MLE) for  and  2 , respectively [2].By applying (1) to (2), the usual estimator (UER) for  is Addressed previously, Shen et al. [4] proposed a new estimator for UER  by minimizing its relative mean square error (RMSE) through an application of the delta method [10], which yields the following when applied to (3): Notably, the class of estimators used by the authors was with By minimizing the RMSE of the estimators in the class by an order of 1/ 2 , the optimal value of "" was obtained, wherein the optimal estimator, , in the class was identified as By applying the usual unbiased estimate  2 of  2 , a novel minimum risk/relative mean squared error estimator of  was developed: or, alternatively, Given the prevalence of logarithmic data across research disciplines and the importance in efficiently estimating these data, the purpose of the current study was to derive and assess an approach to obtain statistically efficient estimators of the lognormal mean.More specifically, the objectives involved incorporating more comprehensive information from the sample's coefficient of variance,  = /(√ ⋅ ), following a logarithmic transformation of a resultant sample data in estimating the nontransformed population mean of the original distribution.

Preparatory Improvement
Being contingent upon the Rao-Blackwell theorem, any function of a sample mean or sample variance will be a uniformly minimum variance unbiased estimate (UMVUE) or Uniformly minimum mean squared error estimator (UMMSE) of the population if an estimator is an unbiased or minimum mean squared estimator (MMSE).Hence, a usual estimator (UER) as presented in (3) is the UMVUE of ln() (i.e., the population mean of the original lognormal distribution).Furthermore, it should be noted that ( − 1) ⋅  2 / 2 ∼  2 distribution with ( − 1) degrees of freedom.
Numerous empirical analyses involve conditions wherein small values of the sample estimate of the coefficient of variation are observed.Therein, an alternate estimator appearing in Lovric and Sahai [11], denoted by  ⊗ , of the population mean, , is offered rather than the typical estimator, : Applying principals outlined in Nikulin [12], the relative efficiency (i.e., a key measure of an estimator's optimality) of  ⊗ versus  can be expressed as a percentage as The UMVUE of the efficiency ratio, , as a function of (,  2 ) may be determined as follows, given that (,  2 ) is a complete sufficient statistic for (,  2 ): with In explaining the aforementioned in more detail, particularly the development of term  to include the terms  2 and  , suppose (, s 2 ) is a function of the sample mean  and the sample variance  2 for a random sample from a normal population.Therefore,  and  2 would be known as having independent sampling distributions.Consequently, the expression {(, s 2 )} may be regarded as a "two-phase" exercise; in the first phase,  {⋅} may be viewed as the expectation with respect to the random variable , treating the silenced (i.e., pseudo) relative variable  2 as a constant,  2 {⋅ ⋅ ⋅ }.Subsequently, in the second phase, the random variable  2 also has an expectation of being viewed as Applied specifically to (12), the integration by parts may be detailed accordingly to ultimately yield Notably, the other term vanishes by the well-known properties of definite integral, ∫  − {integrand}, as the integrand is the odd function of .Subsequently, the remainder of the derivation follows by way of integration by parts as with as Again, it is important to note that, independent of , ( − 1) ⋅  2 / 2 approximates a  2 distribution with ( − 1) degrees of freedom.Therefore, again applied to (12), Furthermore By applying (19) within (18), the following may be obtained: Again through the application of integration by parts, or, expressed differently, noting that  = () : As such, the UMVUE relating to the statistical relative efficiency of  ⊗ versus , given that  = 1 + 2 +  (hence, η = 1 + 2 + ) from ( 12), and including ( 16) and (19), would be derived from and, thus, if 0 < η < 1, 0 < 1+2+ < 1, or 0 < {( ⋅ () 2 / 2 ) 2 ⋅(−3)+ ( ⋅ () 2 / 2 ) ⋅ ( − 3) − 2 ⋅ ( − 1)} per (17), as ( ⋅ () 2 / 2 ) > 1 for all () 2 >  2 /, with the coefficient of variation of  < 1 per (16), or if ( ⋅ () 2 / 2 ) > ( + 1)/( − 3).
Given the aforementioned, the alternate estimator defined in (10),  ⊗ , would be a more efficient estimator of the normal population mean, .It is important to also note that this proposed estimator could be expressed as a function of the square of sample coefficient of variation:

The Improved Lognormal Mean Estimator
As previously noted, the purpose of this research investigation was to improve and test the estimator proposed by Shen (2006), presented initially in (8), Offered in (25), the development of the proposed estimator,  ⊗ , draws upon (10) and, through substitution from ((8), (26)), the following is obtained: Describing the approach through which  ⊗ from (10) was developed to the expression in (25) and further to (27), To illustrate, if 1/ 2 is even 1/(31) 2 , the term is negligible.
To derive an efficient estimator of the normal variance using the sample coefficient of variance, the sixth iteration efficient estimator of the normal variance presented in Lovric and Sahai [11] may be applied as where Consequently, the proposed lognormal mean estimator from the current investigation, GS Presented in Table1, the relative efficiencies of both the proposed estimator, GS(2014), and the existing efficient estimator, SHEN(2006), recorded improvements to the usual maximum likelihood estimator, UER.In general, relative efficiencies for both efficient estimators increased as a function of lower sample size, though the proposed estimator, GS (2014) , was also more efficient at lower population standard deviations.Across 74 of the 77 analytic categories (96%), GS (2014) noted higher relative efficiencies than SHEN(2006), with the absolute difference being most pronounced at lower sample sizes plus lower population standard deviations.To illustrate, the greatest absolute percentage difference favoring highly skewed data is often undertaken to achieve assumptions required for parametric statistical inference.Despite this, existing approaches that capture only a sample's mean and variance do not necessarily yield the most efficient estimator.The current investigation developed and tested more efficient point estimators for a lognormal mean model by capturing more complete information within the sample's coefficient of variation.Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 percentage points above the current efficient estimator.The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.