COMPARISON OF TWO COMMON ESTIMATORS OF THE RATIO OF THE MEANS OF INDEPENDENT NORMAL VARIABLES IN AGRICULTURAL RESEARCH

This paper addresses the problem of estimating the ratio of the means of independent normal variables in agricultural research. The first part of the research examines the distributional properties of the ratio of independent normal variables, both theoretically and using simulation. The second part of the research evaluates the relative merits of two common estimators of the ratio of the means of independent normal variables in agricultural research, an arithmetic average and a weighted average, via simulation experiments using normal distributions. The results are then tested using research data from rice breeding multi-environment trials in Jilin Province, China, in 1994. These data are used to demonstrate the diagnostic approach developed for assessing the “safe” use of the arithmetic and the weighted average methods for estimating the ratio of the means of independent normal variables.


Introduction
A ratio R = X/Y of independent normal variables is commonly used to capture the relative merits of two contrasting treatments, practices or methodologies in agricultural research.Examples include the ratio of grain yield of a new crop variety to that of the commercial control variety across a range of environments, harvest index (the ratio between economical and biological yields of plants), and relative efficiency (the ratio between error estimates of two biological models in agricultural research).It is important to know how the mean E(X/Y ) of the ratio of the two independent normal variables should be estimated when several such ratio estimates are available.Throughout, we assume that X and Y are uncorrelated and that μ Y > 0.
The motivation for this research lies in the study of relative performance of rice varieties in grain yield in Jilin Province of China in 1994 (see Jilin Provincial Seed Station [4]), where a series of ratio estimates needed to be pooled or averaged over different environments.For this rice breeding multi-environment trial (MET) conducted over eight locations, the grain yield data were analysed to quantify the percent increase in grain yield of three varieties over the control variety (Table 1.1).In such studies, a subset of rice varieties are added in or dropped out from the regional variety testing program every year, based on their overall performance (mainly yield) relative to the control.This makes the field evaluation of rice varieties progress in a roll-over pattern.The aim is to estimate the mean percent yield increase of each of the test varieties over the control variety across a range of environments.In Table 1.1, the percent grain yield of each test variety relative to the control variety (Jiyin 12) is used to assess the yield improvement of the new variety at these locations.The ratio of grain yield of each test variety to grain yield of the control (expressed as a percentage), over all possible trials in the MET, is to be estimated.Since the mean E(X/Y ) of the ratio of two independent normal variables does not exist (Lukacs and Laha [10]; Lukacs [9]; Springer [16]; Johnson et al. [5]), this causes a practical problem in its estimation due to the non-existence of E(1/Y), because Y can in theory assume values arbitrarily close to zero.Lai et al. [8] studied a punctured normal distribution, where a small neighbourhood (|Y | ≤ ε with ε a small positive number) is removed from consideration, through two left-truncated normal variables.They show that the mean of the inverse of the punctured normal variable exists, whence also exists although E(X/Y ) fails to exist.They also justify the estimation of μ X /μ Y as a surrogate for E(X/Y ), because μ X /μ Y is a satisfactory measure of centre for X/Y .Hence, as the maximum likelihood estimator of μ X /μ Y , X/Y is naturally the best estimator of μ X /μ Y .The aim of this paper is to explore theoretical and numerical aspects of the estimation of this ratio, leading to the provision of useful advice for the practitioner.
Two methods are widely used for averaging different ratio estimates in agricultural research.The first is the arithmetic average approach, which divides the sum of all the ratio estimates by the total number of estimates (Kaeppler et al. [6]; Moreau et al. [11]; Qiao et al. [13]).The second is known as the weighted average approach, which estimates the true ratio via dividing the sum of all the numerators by the sum of all the denominators of the individual ratio estimates (Robinson et al. [15]; Haque et al. [2]; Witcombe et al. [17]).When used on the same set of data to estimate the mean of the ratio of two independent normal variables, these two approaches may give different results or even reach contradictory conclusions in some circumstances.We have not, however, found any report in the literature comparing these two methods.We note that related research was conducted in Qiao et al. [14], where the corresponding estimators of a binomial proportion using several independent samples in agricultural research were investigated.That work provided the impetus for the current study.
We pause now to describe the two estimators.Suppose a sample of observations and for each observation, the ratio X i /Y i is calculated.There are two popular ways in agricultural research to estimate the ratio μ X /μ Y , the arithmetic average approach, with R A = ( X i /Y i )/n, and the weighted average approach, with (1.1) Intuition suggests that R A = ( X i /Y i )/n is a poor estimator of μ X /μ Y .This is because Y i can be small and positive, leading to large and positive X i /Y i , thus biasing the final average upwards.It averages after division.In contrast, R W = X n /Y n should be a better estimator of μ X /μ Y as very small Y n values are less likely to occur, thus lessening the upward bias.It averages before division.Hence, R W appears generally superior to R A .
For the motivation example, a ratio of means of independent normal variables (grain yield in this instance) is to be estimated.The arithmetic and weighted average ratio estimators produced different estimates in Table 1.1 and it is unclear which estimator should be used.This forms the drive for investigations of the theoretical foundation of the difference between the two methods and for evaluation of them in a more general sense in agricultural research.
The paper is presented in five sections.Section 2 explores the distribution of the ratio of two independent normal variables; this is followed by an evaluation of the two estimators of the ratio of normal means, both theoretically and using simulation.Section 4 applies the findings to a data set from an agricultural experiment, while Section 5 contains general recommendations concerning the use of the two estimators in agricultural research.

The probability density function of the ratio of independent normal variables.
Springer (see [16, pages 139-148]) found the probability density function of W = (X/σ X )/(Y/σ Y ) and then R = X/Y through the use of the simple transformation R = (σ X /σ Y )W.This result is rather unwieldy for computational purposes.Kamerud [7] gave the probability density function of R = X/Y explicitly.There is an error in her derivation of the density function of W that we rectify in the following, making it necessary to adjust the density function.
Define U = X/σ X , V = Y/σ Y , and thus U ∼ N(μ X /σ X ,1), V ∼ N(μ Y /σ Y ,1).Set W = U/V and let g be its density function.Replacing μ 1 and μ 2 in Kamerud [7] by μ X /σ X and μ Y /σ Y , respectively, we have where , and Φ is the standard normal cumulative distribution function.The probability density of R is then given by f (r In contrast to the method given in Springer [16], Kamerud's expressions are easy to compute numerically.Hence, Kamerud's probability density function is used to generate graphs of X/Y against its density, shown in Figure 2.1, to assess the distributional properties of the ratio of two independent normal variables.Some typical plots (Figures 2.1(a)-2.1(c))are drawn using this density function, with varying coefficient of variation (CV) for the denominator variable.From the considerations of Section 2 and Qiao et al. [14], it is evident that the CV of the denominator is of critical importance.In Figure 2.1(a), the CV of both X and Y is small (0.1).Hence, the density function is fairly symmetric around μ X /μ Y = 1, having the bell-shape of a normal distribution.The long tail in Figure 2.1(b) and multiple peaks in Figure 2.1(c), where the CV is small for the numerator but large for the denominator, indicate that the moments, especially the mean of the distribution, may not exist.
For small coefficient of variation of Y (CV Y ), the moments of the ratio appear to exist.This is due to the fact that very small Y values were not sampled in the above graphical presentation and hence we were effectively sampling from (X/Y ) | |Y | > ε, a punctured normal for the denominator variable (Lai et al. [8]).The moments of X/Y appear to exist in this situation.Both the arithmetic and the weighted average methods involve ratios of independent normal variables.We will demonstrate later that, as far as estimation of μ X /μ Y is concerned, both the arithmetic and the weighted average methods can be used when CV Y is sufficiently small.The circumstances under which the ratio of two independent normal values can be used to safely estimate the ratio of the means are now investigated using simulation.

Simulation of the distribution of the ratio
The population means of both variables were fixed at 100 and hence μ X /μ Y = 1.The population standard deviations of both variables took the values 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, and 500, leading to both CV X and CV Y taking values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, and 5.For each of these 144 combinations, 500 000 pairs of (X i ,Y i ) were sampled; the mean and standard deviation of the ratios R i = X i /Y i were examined.Before considering the simulation results, we offer some theoretical reflections.The mean of X/Y does not exist, but under sampling, X/Y and X/Y | |Y | > ε are essentially the same random variable for sufficiently small ε.For example, if μ Y > 0 and CV Y < 0.2, it is possible to find an ε such that 0 < ε < μ Y − 5σ Y .Hence, fewer than one in a million sample values of Y will have absolute value less than ε.As argued in the introduction, ), X/Y is the maximum likelihood estimator of μ X /μ Y , and hence the estimator of choice.In summary, as long as CV Y < 0.2, theory tells us that X/Y is a sound estimator for the centre of X/Y .Our simulation results now confirm these findings.
The simulation results, listed in Table 2.1, indicate that the sample mean and standard deviation of the ratio estimates are all strongly influenced by CV Y .This supports our earlier remark that CV Y , not CV X , is a critical parameter.When CV Y < 0.2, the mean of R remains close to μ X /μ Y = 1, while the standard deviation of R increases approximately linearly as CV X increases (Table 2.1).It appears that the variation of R is almost purely determined by the variation in the numerator variable when CV Y is small.Evidently, CV Y = 0.2 is an appropriate cut-off point for the denominator; for larger CV Y values, the mean deviates substantially from μ X /μ Y = 1 and the standard deviation increases accordingly.In contrast, CV X has no influence on the mean of R, and a relatively small influence on the standard deviation.Hence, the deleterious effect of increasing CV Y is much stronger than that when increasing CV X .
The sample mean of the ratios fails to estimate μ X /μ Y when CV Y > 0.4, while the standard deviation is extremely large, with erratic behaviour, when CV Y > 0.3.For the sample means to serve as reasonable estimators of μ X /μ Y for this sample size (500 000), CV Y apparently has to be kept sufficiently small (CV Y < 0.2 appears to suffice).In practical applied research, it is rare for the CV of a normal variable to be larger than 5.0.Thus, as long as CV Y < 0.2, it makes empirical sense use the ratio estimator X/Y .This simulation was repeated first with μ X /μ Y = 10/100 = 0.1, and then with μ X /μ Y = 100/10 = 10.The mean and standard deviation of the ratio behave similarly to the case where μ X /μ Y = 1.This provides circumstantial evidence that the magnitude of μ X /μ Y does not influence the manner in which the sample mean estimates μ X /μ Y .

Implications in applied research.
The non-existence of moments of the ratio of normal variables presents a problem.In practical applications, as long as we avoid sampling in an interval around Y = 0, moments of X/Y will appear to exist.If we let ε be a sufficiently small positive quantity, then X i /Y i can be used to estimate the ratio of μ X /μ Y , provided |Y i | > ε.Hall [1] showed that if a positive random variable Y has a normal distribution singly truncated from below, denoted by N a (μ,σ), where 0 < a < Y, then the inverse moments E(Y −1 ) and E(Y −2 ) can be approximated accurately by expressions involving Dawson's integral.The expressions are independent of the truncation point a, provided that (σ/μ) 2 < a/μ < 1/25.This will ensure the apparent existence of the expectation of the ratio of two independent normal variables E(X/Y ) when (σ/μ) 2 < 1/25, or CV Y = σ/μ < 1/5 = 0.2.The central idea behind this and behind our investigations is similar, namely to make the denominator variable nonzero, a condition easily met in practical research.
The findings also suggest that if we want to use the sample mean of ratios X i /Y i to estimate μ X /μ Y , then the larger the sample we use, the smaller the CV Y we will need to avoid sample points getting close to zero in the denominator.When CV Y is sufficiently small, there is almost no chance for a value of Y very close to zero being sampled, thus ensuring the apparent existence of sample moments.
When CV Y is very small, Y behaves as Y | |Y | ≥ ε for some ε > 0, thus the moments of 1/Y can be accurately approximated (Hall [1]; Nahmias and Wang [12]).This leads to apparent existence of the sample moments of X/Y .From our simulations and the results of Hall [1], CV Y < 0.2 can be used as a condition which determines the usefulness of R A and R W .

Comparison of the two estimators
In this section, we examine the performance of the two estimators of μ X /μ Y , first in the light of the conclusion of Section 2, then theoretically, and finally using simulation.

Estimators and coefficient of variation.
From the previous section, it is evident that X i /Y i is a reasonable estimate of μ X /μ Y provided CV Y < 0.2.This observation will provide the reason why R W improves as the sample size n increases, while for R A , this is not the case; hence, R W will be regarded as a superior estimator.We now examine R W and R A separately and conclude that R W can be used if CV Y n < 0.2, while R A can be adopted if CV Y < 0.2.The error in R A as an estimator of μ X /μ Y does not decrease with sample size, whereas the error in R W as an estimator of μ X /μ Y decreases to zero with sample size, hence R W is to be favoured.

Weighted average ratio estimator.
Recall that R W is called the weighted average estimator, named so because it can be written as is an acceptable estimator of μ X /μ Y .Thus, for practical purposes, we recommend that R W is used to estimate μ X /μ Y , since taking a sample of sufficiently large size n will reduce the coefficient of variation of Y n .
In designing a research experiment or survey, the sample size n required to provide a reasonably good estimate of μ X /μ Y can be determined in the following way.Take a sample of size n from X ∼ N(μ X ,σ X ) and Y ∼ N(μ Y ,σ Y ).In order for R W = X n /Y n to estimate μ X /μ Y , the coefficient of variation for the denominator of R W has to satisfy Here, CV Y , rather than CV Y , being small is the condition that needs to be fulfilled.In practical situations, the population means and standard deviations of interest are rarely known, but can be estimated by the relevant sample means and standard deviations.Hence, the above inequality can be approximated by n > 25(s 2 Y /Y 2 n ).In practical terms, the sample size n is always predetermined.Thus, sample results can be examined to see if they satisfy the requirement s Y / √ n/Y n < 0.2.This will provide a general guideline for evaluating the suitability of the weighted average method in estimating the ratio of the means of two normal variables.

Arithmetic average ratio estimator. Estimator
/n is an equally weighted average of n ratios X i /Y i .We can adopt the same methodology used in evaluating the weighted average method to assess the suitability of R A .The coefficient of variation of Y i , however, is Taking a larger sample size n is of little use.Naturally, the sample value of s Y /Y n can be used as a diagnostic tool for the evaluation of the appropriateness of R A .Thus, we recommend the use of R A only if the coefficient of variation of Y i is sufficiently small, that is, CV Y = s Y /Y n < 0.2.The simulation results, which follow, support our recommendation.μ X /μ Y = 2, leading to useful R W values.In conclusion, R W here is a better estimator of μ X /μ Y than R A .

Comparison as sample size changes, with coefficient of variation fixed.
Here we illustrate the effect of increasing sample size on the two estimators, when CV Y > 0.2.We use X and Y independently drawn from two N(100,100) distributions, whence CV X = CV Y = 1; 200 random samples of 1, 4, 25, 100, and 400 pairs of observations (x i , y i ) were generated.Table 3.1 summarised the distributions of R A and R W for each sample size, where the means and standard deviations are based on 200 samples in each cell of the table and R A = R W when n = 1.Results show that the weighted average settles down to the true ratio of one as the sample size increases.The arithmetic average R A always fails to estimate μ X /μ Y , whereas with increasing sample size, CV Y falls under 0.2 and the weighted average R W becomes a useful estimator of μ X /μ Y .Note that R A , even as the sample size increases, shows no tendency to approach the true ratio of one.In fact, the mean of R A took arbitrary values as sample size increased.On the other hand, the distribution of R W centres on the true ratio as the sample size increases.In particular, for sample sizes of 25 or more (whence CV Y < 0.2), R W performs well.
C. G. Qiao et al. 11 In summary, R W unlike R A , improves as an estimator of μ X /μ Y under moderate increases in sample size.The major difference between R A and R W is mainly because the latter has a better theoretical basis as an estimator for μ X /μ Y .The advantage of R W over R A in reducing the estimation bias, however, depends on the sample size.

Application of the two estimators in rice trials
The grain yield data of the rice breeding MET are used in an attempt to evaluate the relative merits of the two estimators of the ratio of independent normal variables in agricultural research.Detailed results of the analyses using both estimators were listed in Table 1.1.An examination of the correlations between the numerator (X) and denominator (Y ) variables shows that there was no significant correlation between the yield of each of the three test varieties (X) and that of the control variety (Y ).Hence, the following analysis assuming independent normal variables is justified.(Under the assumption of (X,Y ) having a bivariate normal distribution, corr (X,Y ) = 0 implies that X and Y are independent.)4.1.Estimation of the pooled percent yield improvement over control.Here the ratio of averages R W represents the expected performance of the test variety across the whole region, while the average ratio R A could be regarded as an indicator of what might be expected at any particular location.The choice of the two estimators depends predominantly on the aims of the research, rather than purely on their statistical properties.Since the emphasis was on testing for broad adaptation of the crop varieties, or to summarise information on the overall performance of each cultivar, relative to the control, over the whole range of environments (region), R W is thus a naturally better option than R A .As far as specific adaptation is concerned, the R A may have its merit in that it has a better relationship with the expected performance of the variety at a particular location.This, however, is out of the scope of the present study.
The results show that there is a degree of variation in the difference between R A and R W for the three test varieties, ranging from 1.4% to 3.3% (Table 1.1).Estimators R A and R W demonstrate greater difference for the two test varieties 850011 and Yan 501 than for Chang 90-40.From the plant breeding point of view, there is reason (to be discussed in the next subsection) to believe that differences of such magnitude between R A and R W for rice varieties are sufficiently large to change the conclusions of the plant breeding METs.
It is regulated by the Jilin Provincial Crop Variety Evaluation Committee [3] that a new variety of a self-pollinated crop species such as rice has to exceed the control, in grain yield, by at least 5% over three consecutive years before it can be considered for release and commercialisation.The regulation imposed by the committee is most stringent, and it is usually difficult for a test variety to increase grain yield by an extra 1% against the control variety.Thus a 1% difference between the two ways of estimating the pooled ratio of the two rice varieties under comparison can make a real difference in deciding whether a particular variety should be released.Therefore, based on the observed difference between R A and R W for the three varieties, it is evident that the two ways of estimating the ratio of normal variables can influence the decision of plant breeding in terms of recommendation for release and commercialisation.The findings of this paper indicate that the weighted average ratio estimator R W should be used in practical agricultural research.

4.2.
Application of the diagnostic approach in rice trials.The difference between these two estimators ranges from 1.4% to 3.3%, depending on the coefficient of variation for the denominator variable, the grain yield of the control.When the CV of the control is larger than 0.2, as in the case for 850011 and Yan 501, the two estimators differ by a reasonably large amount, 2.6% and 3.3%, respectively.The R A is unreliable in this case, while R W should be used to demonstrate the yield potential of the two varieties relative to the control.In comparison, in the case of Chang 90-40, the CV of the control is only 0.174 (below 0.2) and hence the difference between R W and R A is relatively much smaller.Thus, the difference between R W and R A is dependent on the CV of the grain yields for the control (denominator variable) over the range of environments in which the test variety is being compared with the control.Furthermore, CV Y , the CV of the denominator of estimator R W , is always much smaller than CV Y , the CV of the denominator of estimator R A , for each of the three comparisons between the test varieties and the control (Table 1.1).This clearly demonstrates the advantage of using the weighted average method in these situations.
Based on the R W estimates of all test rice varieties, only 850011 exceeded the control in grain yield by more than 5% in 1994.By standards commonly adopted in the province, a particular variety will qualify for possible release only if it has outperformed (exceeded) the control in grain yield by 5% or more for all three years of the Provincial Regional Test.Thus, if 850011 continued to outperform the control by 5% or more in grain yield for another two years in the Regional Test, it would be recommended for release, as long as its other agronomic traits have reached the relevant levels of standards.The other two test varieties (Chang 90-40 and Yan 501) have both failed to exceed the control in grain yield by the threshold of 5%.Hence, both varieties were regarded as having no potential for future release from this round of regional trials.
Further studies will focus on a comparison of weighted and arithmetic average estimators under assumption of dependence.Another potential estimator of μ X /μ Y , the geometric mean of the X/Y ratios, may prove useful under this circumstance, since it may possess some potentially valuable attributes.A comprehensive investigation of these estimators is thus justified.

Conclusions
The mean of the ratio X/Y of two independent normal variables does not exist.The mean appears to exist, however, and is close to μ X /μ Y , if we avoid sampling points for which |Y | ≤ ε, with ε being a small positive quantity.This favourable situation is approximated in practice when the coefficient of variation of the denominator variable is sufficiently small (less than 0.2).In such circumstances, the ratio of two independent variables can be used to estimate μ X /μ Y .
The coefficient of variation of the denominator should thus be considered when estimating a ratio of independent normal variables; the weighted average method automatically reduces denominator coefficient of variation as sample size increases and so is better than the arithmetic average method.We recommend the use of the weighted average approach for estimating the true ratio from a series of ratio estimates in agricultural research.The arithmetic average approach, however, has to be adopted when only the individual ratios are recorded.
Using the weighted average estimates of all test rice varieties in the motivation example, we concluded that only rice variety 850011 exceeded the control in grain yield by more than 5% in 1994.If 850011 continued to outperform the control by 5% or more in grain yield for another two years in the three-year Provincial Regional Test, it would be recommended for release, as long as its other agronomic traits have reached the relevant levels of standards.
The empirically determined critical coefficient of variation value (0.2) for the denominator of the ratio of independent normal variables can be used to evaluate the suitability of both estimators.A practical diagnostic formula has been proposed to assess the reliability of the weighted average ratio estimator, namely that the coefficient of variation for the denominator mean Y n is smaller than 0.2.The arithmetic average ratio estimator is of less use and should be employed only when the coefficient of variation for the denominator is smaller than 0.2.The development of a satisfactory estimator of the ratio when X and Y are dependent remains an area for future research.
Figure 3.1.A comparison of the distributions of R A and R W using 200 ratio estimates.Note the different scales used in (a) and (b), where the mean and standard deviation over the 200 estimates are 1.430 and 18.465 for R A and 2.015 and 0.152 for R W , respectively.

Table 2 .
1. Simulation of the ratio distribution: mean and standard deviation for 500 000 pairs of observations X i /Y i , where X i ∼ N(μ X ,σ X ) and Y i ∼ N(μ Y ,σ Y ), under varying coefficients of variation (CV), with μ X /μ Y = 100/100 = 1.

Table 3 .
1. Impact of sample size on the mean and standard deviation of R A and R W .