The three-parameter lognormal distribution is the extension of the two-parameter lognormal distribution to meet the need of the biological, sociological, and other fields. Numerous research papers have been published for the parameter estimation problems for the lognormal distributions. The inclusion of the location parameter brings in some technical difficulties for the parameter estimation problems, especially for the interval estimation. This paper proposes a method for constructing exact confidence intervals and exact upper confidence limits for the location parameter of the three-parameter lognormal distribution. The point estimation problem is discussed as well. The performance of the point estimator is compared with the maximum likelihood estimator, which is widely used in practice. Simulation result shows that the proposed method is less biased in estimating the location parameter. The large sample size case is discussed in the paper.
1. Introduction
The two-parameter lognormal distribution and the three-parameter lognormal distribution have been used in many areas such as reliability, economics, ecology, biology, and atmospheric sciences. In the past twenty years, many research papers have been published on the parameter estimation problems for the lognormal distributions. See, for example, Kanefuji and Iwase [1], Sweet [2], and Crow and Shimizu [3]. The three-parameter lognormal distribution is the extension of the two-parameter lognormal distribution to meet the need of the biological and sociological science, and other fields. Some papers can be found in the literature for the parameter estimation problems for this distribution. See, for example, Komori and Hirose [4], Singh et al. [5], Eastham et al. [6], Cohen et al. [7], Chieppa and Amato [8], Griffiths [9], and Cohen and Whitten [10]. Chen [11] analyzed an application data set containing 49 plastic laminate strength measurements using the locally maximum likelihood estimation method. When the locally maximum likelihood estimation method is used, people are not using the criterion of searching the value of the parameter, which is being estimated, such that the likelihood function is maximized. This is particularly true when the location parameter of the three-parameter lognormal distribution is estimated. This is because the likelihood function goes to infinity when the value of the location parameter approaches to the smallest order statistic. The point estimation will be discussed in Section 3. The same data set is analyzed using the method presented in this paper.
It should be noted that the inclusion of the location parameter brings in some technical difficulties for the parameter estimation problems. The probability density function of the three-parameter lognormal distribution is
(1)f(x)=1(x-γ)σ2πexp{-(ln(x-γ)-μ)22σ2}(x>γ),
where the parameters γ≥0,-∞<μ<∞, and σ>0 are all assumed to be unknown in this paper. When γ=0, the distribution becomes the two-parameter lognormal distribution. Constructing confidence intervals for the parameters of the three-parameter lognormal distribution is a difficult problem because of the inclusion of the location parameter γ. As far, only some approximation methods can be found in the literature. This paper proposes a method for constructing exact confidence intervals and exact upper confidence limits for the location parameter γ of the three-parameter lognormal distribution. The point estimation problem is discussed as well. Statistical simulation is conducted to compare the performance of the method proposed in this paper with the maximum likelihood estimator, which is a commonly used method for estimating parameters.
2. Confidence Interval and Statistical Test
Let X1,X2,…,Xn be a random sample from the three-parameter lognormal distribution, and let X(1),X(2),…,X(n) be the corresponding order statistics. To find a 1-α confidence interval for the parameter γ, define
(2)ξ(γ)=ln(X(j)-γ)-ln(X(i)-γ)ln(X(k)-γ)-ln(X(j)-γ).
As a mathematical function, ξ(γ) is a function of γ only. On the other hand, the distribution of ξ(γ) does not depend on any parameter. This is due to the fact that ξ(γ) can be expressed as
(3)ξ(γ)=(ln(X(j)-γ)-μ)/σ-(ln(X(i)-γ)-μ)/σ(ln(X(k)-γ)-μ)/σ-(ln(X(j)-γ)-μ)/σ,
and the fact that
(4)ln(X(1)-γ)-μσ,…,ln(X(n)-γ)-μσ
are the corresponding order statistics of
(5)ln(X1-γ)-μσ,…,ln(Xn-γ)-μσ.
Therefore, for any fixed 0<α<1, there exists a number ξα such that
(6)P(ξ(γ)<ξα)=α.
It can be shown that ξ(γ) is a strictly increasing function of γ. Then a confidence interval of γ can be constructed based on ξ(γ). The lower and upper confidence limits are the solutions of γ for the equations
(7)ξ(γ)=ξα/2,(8)ξ(γ)=ξ1-α/2,
respectively. The values of ξα can be obtained by Monte Carlo simulation. The construction of the confidence interval of γ here is based on three order statistics X(i),X(j), and X(k). Therefore, the performance of the confidence interval of γ depends on the selection of the triplet (i,j,k). For a complete sample X1,…,Xn, it is natural to use the largest and smallest observations. In other words, one would choose i=1 and k=n. Then the selection of the triplet (i,j,k) can be focused on selecting j. Monte Carlo simulation is used to select the “optimal” value of j. Here the selection is based on two criteria. A traditional way to evaluate the performance of confidence intervals is to check the average width of the confidence intervals for a fixed level of confidence. This method is adopted here for selecting j. To discuss the second criterion for selecting j, note that
(9)limγ→0+ξ(γ)=limγ→0+ln(X(j)-γ)-ln(X(i)-γ)ln(X(k)-γ)-ln(X(j)-γ)=lnX(j)-lnX(i)lnX(k)-lnX(j),
and that
(10)limγ→X(i)-ξ(γ)=limγ→X(i)-ln(X(j)-γ)-ln(X(i)-γ)ln(X(k)-γ)-ln(X(j)-γ)=∞.
It is possible that
(11)lnX(j)-lnX(i)lnX(k)-lnX(j)>ξα/2
may occur. If that is the case, then the lower confidence limits of γ cannot be found. Fortunately, Monte Carlo simulation result has shown that if the value of j is appropriately selected, the occurrence of the previously mentioned event is very unlikely for all commonly used confidence levels. It is found from the Monte Carlo simulation results that when the value of j is somewhere between 20% and 40% of the sample size, the average width of the confidence intervals of the location parameter is the shortest, and the probability of the occurrence of (11) is almost zero. Based on this result, it is recommended that the value of j should be 30% of the sample size.
To obtain the values of ξα, Monte Carlo simulation was used. For each combination of the selected values of n and j, 250,000 pseudorandom samples were generated from the three-parameter lognormal distribution. Since the distribution of ξ(γ) does not depend on any parameters, the simplest case with γ=0, μ=0 and σ=1 was used. Then the critical values of ξ(γ) were obtained for selected values of α. The values of ξα are listed in Table 1. The column in the middle of Table 1 (labeled ξ0.5) is for obtaining point estimator of the location parameter γ.
Critical values of ξ(γ).
n
i
j
k
ξ0.025
ξ0.05
ξ0.5
ξ0.95
ξ0.975
10
1
2
10
0.0065
0.0129
0.1662
0.6988
0.8696
1
3
10
0.0535
0.0774
0.3677
1.1273
1.3762
1
4
10
0.1292
0.1709
0.5834
1.6351
1.9893
20
1
4
20
0.0734
0.0970
0.3165
0.7676
0.8935
1
5
20
0.1190
0.1496
0.4096
0.9242
1.0636
1
6
20
0.1666
0.2033
0.5027
1.0862
1.2466
1
7
20
0.2177
0.2590
0.5983
1.2608
1.4448
1
8
20
0.2695
0.3171
0.6998
1.4511
1.6667
30
1
6
30
0.1266
0.1538
0.3745
0.7761
0.8817
1
7
30
0.1604
0.1918
0.4339
0.8662
0.9801
1
8
30
0.1961
0.2299
0.4913
0.9581
1.0794
1
9
30
0.2328
0.2695
0.5514
1.0541
1.1870
1
10
30
0.2672
0.3073
0.6131
1.1526
1.2967
1
11
30
0.3023
0.3462
0.6753
1.2543
1.4072
1
12
30
0.3409
0.3877
0.7393
1.3651
1.5361
40
1
8
40
0.1629
0.1910
0.4069
0.8739
0.1629
1
9
40
0.1899
0.2209
0.4493
0.9391
0.1899
1
10
40
0.2182
0.2513
0.4931
1.0121
0.2182
1
11
40
0.2728
0.3093
0.5355
1.1488
0.2728
1
12
40
0.3001
0.3394
0.5792
1.2220
0.3001
1
13
40
0.3264
0.3683
0.6229
1.3028
0.3264
1
14
40
0.3556
0.3990
0.6673
1.3812
0.3556
1
15
40
0.3824
0.4282
0.7140
1.4645
0.3824
1
16
40
0.1629
0.1910
0.7604
0.8739
0.1629
50
1
10
50
0.1899
0.2186
0.4282
0.7749
0.8613
1
11
50
0.2135
0.2431
0.4614
0.8244
0.9143
1
12
50
0.2356
0.2670
0.4953
0.8731
0.9665
1
13
50
0.2573
0.2901
0.5285
0.9213
1.0203
1
14
50
0.2796
0.3150
0.5619
0.9724
1.0724
1
15
50
0.3009
0.3378
0.5964
1.0170
1.1230
1
16
50
0.3234
0.3624
0.6312
1.0724
1.1840
1
17
50
0.3447
0.3849
0.6651
1.1276
1.2443
1
18
50
0.3681
0.4093
0.6995
1.1755
1.2970
1
19
50
0.3896
0.4326
0.7352
1.2288
1.3516
1
20
50
0.4140
0.4588
0.7739
1.2938
1.4205
60
1
12
60
0.2121
0.2408
0.4428
0.7749
0.8588
1
13
60
0.2298
0.2602
0.4711
0.8141
0.8984
1
14
60
0.2500
0.2803
0.4994
0.8504
0.9369
1
15
60
0.2668
0.2999
0.5266
0.8941
0.9837
1
16
60
0.2854
0.3202
0.5542
0.9310
1.0240
1
17
60
0.3042
0.3388
0.5808
0.9669
1.0628
1
18
60
0.3232
0.3589
0.6090
1.0129
1.1127
1
19
60
0.3429
0.3784
0.6372
1.0479
1.1485
1
20
60
0.3607
0.3985
0.6653
1.0916
1.1964
1
21
60
0.3791
0.4195
0.6950
1.1342
1.2438
1
22
60
0.3970
0.4377
0.7236
1.1755
1.2900
1
23
60
0.4168
0.4589
0.7530
1.2193
1.3407
1
24
60
0.4349
0.4777
0.7825
1.2638
1.3892
The quantity ξ(γ) can also be used to test the hypotheses about the location parameter γ. In practice, people may need to choose either the two-parameter lognormal distribution or the three-parameter lognormal distribution to fit their data. In that case, the test H0:γ=0 versus Ha:γ>0 needs to be conducted. It has been mentioned previously that ξ(γ) is strictly increasing in γ. When the calculated value of ξ(γ) is greater than ξ1-α, it can be concluded, at level of significance α, that the three-parameter lognormal distribution should be used instead of the two-parameter lognormal distribution.
3. Point Estimation
A widely used method for estimating the parameters of the lognormal distributions in the literature is the maximum likelihood estimator. Certain problems in using the maximum likelihood estimation have been mentioned by some authors. With respect to the three-parameter lognormal distribution, note that the likelihood function of a random sample from the three-parameter lognormal distribution is
(12)L(μ,σ,γ∣x)=(1σ2π)n(∏i=1n(xi-γ))-1×exp{-∑i=1n(log(xi-γ)-μ)22σ2}expSSS×I[γ,∞)(min{x1,…,xn}).
Here I[γ,∞)(min{x1,…,xn}) is an indicator function defined as
(13)I[γ,∞)(min{x1,…,xn})={1min{x1,…,xn}≥γ,0min{x1,…,xn}<γ.
It can be seen from the above expression of L(μ,σ,γ|x) that the maximum likelihood estimator of γ is X(1)=min{X1,…,Xn}. Since the density function of the three-parameter lognormal distribution is nonzero only when x(1)≥γ, and since the probability that X(1)>γ is 1, one would expect that the maximum likelihood estimator X(1) is a positively biased estimator of γ. This is verified by the Monte Carlo simulation result discussed in the following. Chen [11] used the locally maximum likelihood estimation method to estimate the parameter. As mentioned in that paper, the locally maximum likelihood estimation method has some problems. The locally maximum likelihood estimate may not exist. In some cases, there are multiple locally maximum values. The biggest problem for the locally maximum likelihood estimation method is that it gives up the principle of maximizing the likelihood function globally. The point estimator of γ can be obtained by squeezing the confidence interval of γ described in the in the previous section. In fact, a point estimator of γ is the solution of γ for the equation
(14)ξ(γ)=ξ0.5.
The value of ξ0.5 can also be found in Table 1. The above equation can be solved easily using a scientific calculator.
To compare the performance of the point estimator obtained by (14) with the maximum likelihood estimator, Monte Carlo simulation was used based on 250,000 pseudorandom samples from the three-parameter lognormal distribution with parameters γ=10, μ=4, and σ=2. Simulation results are listed in Table 2. The column γ^(new) gives the average of the point estimates using the method presented in this paper, and the column γ^(MLE) gives the average of the point estimates using the maximum likelihood estimator. It can be seen that the point estimator using the maximum likelihood estimator method is obviously biased. The columns MSE(new) and MSE(MLE) provide the mean squared error for the method in this paper and the maximum likelihood estimator method, respectively. It can be seen that the method presented in this paper has smaller mean squared error when the sample size is small. When the sample size becomes larger, the maximum likelihood estimator method has smaller mean squared error while the maximum likelihood estimator is still biased.
Comparison with MLE.
n
Method in this paper
MLE method
i
J
k
γ^(new)
MSE(new)
γ^mle
MSE(mle)
5
1
2
5
21.65
370.01
21.87
380.08
10
1
3
10
11.14
32.88
14.60
34.58
15
1
4
15
10.22
14.25
12.86
10.38
20
1
5
20
10.01
7.94
12.10
4.95
25
1
6
25
9.97
4.93
11.69
2.78
30
1
8
30
10.08
2.28
11.40
1.77
40
1
10
40
10.02
1.74
11.08
0.96
50
1
9
50
10.03
1.03
10.88
0.59
4. Examples
The following data set containing 20 observations was used in Cohen and Whitten [10]: 142.290, 144.328, 174.800, 168.554, 184.101, 166.475, 131.375, 145.788, 135.880, 137.338, 164.304, 155.369, 127.211, 132.971, 128.709, 201.415, 133.143, 155.680, 153.070, 157.238. This data set was considered as a sample from the three-parameter lognormal distribution with parameters γ=100,μ=ln50, and σ=0.4. To find a 90% confidence interval for the location parameter γ using the method described in Section 2, note that ξ0.05=0.2033 and ξ0.95=1.0862 when i=1, j=6, and n=20. Here the value of j is selected to be about 30% of the sample size, as recommended in Section 2. The solution of γ for the equation ξ(γ)=0.2033 is 69.06, and the solution of γ for the equation ξ(γ)=1.0862 is 126.16. Then (69.06,126.16) is a 95% confidence interval for the location parameter γ. To find point estimate of γ, note that ξ0.5=0.5027. The solution of γ for the equation ξ(γ)=0.5027 is 120.59, which is the point estimate of γ.
Chen [11] analyzed a plastic laminate strength data set locally maximum likelihood estimation method. Forty-nine strength measurements (in psi) are listed below in ascending order: 21.87, 23.80, 24.83, 25.80, 29.95, 30.26, 31.23, 31.29, 31.86, 32.48, 33.38, 33.73, 33.88, 33.93, 34.03, 34.50, 34.90, 35.57, 35.66, 39.44, 41.76, 41.96, 42.21, 42.66, 43.27, 43.41, 44.06, 45.32, 47.39, 47.98, 48.81, 50.76, 51.54, 54.67, 54.92, 55.33, 57.24, 59.30, 60.41, 60.89, 61.63, 68.93, 71.96, 72.65, 73.51, 76.15, 78.48, 81.37, 99.43. To find point estimate of the location parameter γ using the method presented in this paper, note that ξ0.5=0.5964 when i=1,j=15, and n=50. The solution of γ for the equation ξ(γ)=0.5946 is 12.14, which is the point estimate of γ. To find a 95% upper confidence limit for the location parameter γ, note that ξ0.95≈1.0170. The solution of γ for the equation ξ(γ)=1.0170 is 19.21. Then 19.21 is a 95% upper confidence limit for γ.
5. Conclusions and Discussion
Compared with the two-parameter lognormal distribution, the three-parameter lognormal distribution is more flexible because of the inclusion of the location parameter. However, the inclusion of the location parameter brings in a lot of technical difficulties to statistical inferences. Only some approximation methods can be found in the literature for constructing confidence intervals for the location parameter. The most commonly used method for finding point estimator is the maximum likelihood estimator. As discussed previously, the maximum likelihood estimator of the location parameter is positively biased.
A method for constructing exact confidence intervals and exact confidence limits for the location parameter is proposed in this paper. The method can also be used to conduct statistical test about the location parameter of the three-parameter lognormal distribution. The point estimator is obtained as well by squeezing the confidence interval of the location parameter.
While the discussion of the method introduced in this paper is for complete samples, the method can also be used for censored data. For example, suppose that only the first r order statistics X(1),X(2),…,X(r) are available for the statistical analysis. Then i=1 and k=r. The selection of j is similar to the complete sample case.
The selection of the triplet (i,j,k) can also be discussed for the large sample case. The pivotal quantity ξ(γ) possesses some asymptotic properties when the sample size is sufficiently large. Some of the following discussion uses the results in Bahadur [12] and Embrechts et al. [13]. Let X1,…,Xn be a random sample from the three-parameter lognormal distribution described in (1), and let X(1),…,X(n) be the corresponding order statistics. Let j=[np]+1(0<p<1). It can be shown that
(15)ξ(γ)=ln(X(j)-γ)-ln(X(1)-γ)ln(X(n)-γ)-ln(X(j)-γ)⟶a.s.1.
To show this, let Zi=(ln(Xi-γ)-μ)/σ(i=1,2,…,n). Then Z1,Z2,…,Zn are i.i.d.N(0,1) and their order statistics are Z(i)=(ln(X(i)-γ)-μ)/σ(i=1,2,…,n).
Letting n→+∞, we have
(16)Z(j)=Z([np]+1)→a.s.zp,wherezpisthepquantileofN(0,1):Z(n)2lnn⟶a.s.1,Z(1)2lnn→a.s.-1.
So, when n→+∞, we have
(17)ξ(γ)=ln(X(j)-γ)-ln(X(1)-γ)ln(X(n)-γ)-ln(X(j)-γ)=(ln(X(j))-μ)/σ-(ln(X(1))-μ)/σ(ln(X(n))-μ)/σ-(ln(X(j))-μ)/σ=Z(j)-Z(1)Z(n)-Z(j)⟶a.s.1.
Furthermore, if γ^>0 is the solution of equation
(18)ln(X(j)-γ)-ln(X(1)-γ)ln(X(n)-γ)-ln(X(j)-γ)=ξα,
where ξα is the α quantile of ξ(γ), then
γ^<X(1).
Let n,X(1),X(n), and α be fixed. When X(j) decreasingly tends to X(1),γ^ increasingly tends to X(1).
Let n,X(1),X(j), and X(n) be fixed. When α increasingly tends to 1, γ^ increasingly tends to X(1).
Let j=[np]+1(0<p<1), and n→∞. Then X(1)-γ^→a.s.0 and γ^→a.s.γ.
The proof of (i) is obvious. To prove (iv), if γ^>0 is the solution of (18), then
(19)X(1)-γ^=(X(j)-γ^)(11+(X(n)-X(j))/(X(j)-γ^))ξα.
Note that the support of the three-parameter lognormal distribution is (γ,+∞). So
(20)X(1)→a.s.γ,X(n)⟶a.s.+∞.
Let xp be the p quantile of the three-parameter lognormal distribution (0<p<1). Then
(21)X(j)=X([np]+1)⟶a.s.xp.
According to the discussion in this section when n→+∞, the left of (19) converges to 0 almost surely. That is
(22)X(1)-γ^⟶a.s.0.
Since X(1)→a.s.γ, we have γ^→a.s.γ.
Statements (ii) and (iii) are immediate consequence of (19).
Based on previously mentioned properties, we can draw the following conclusions.
About the selection of triplet (i,j,k): if we choose i=1 and k=n, then, when n→∞, the width of the confidence intervals of parameter γ tends to be zero almost surely. This is the reason of selecting i=1 and k=n.
About the selection of j: according to (17), the “optimal” value of j is j=[0.5n]+1.
To obtain the values of ξα, we can use the standard normal distribution for Monte Carlo simulation. This was actually used when statistical simulation was conducted to obtain quantiles of the pivotal quantity ξ(γ).
Acknowledgment
The authors thank an anonymous referee for his/her detailed comments and suggestions, especially for the suggestions on adding discussion on large sample size case, that greatly improved the original paper.
KanefujiK.IwaseK.Estimation for a scale parameter with known coefficient of variationSweetA. L.On the hazard rate of the lognormal distributionCrowE. L.Shimizu K.KomoriY.HiroseH.Easy estimation by a new parameterization for the three-parameter lognormal distributionSinghV. P.CruiseJ. F.MaM.A comparative evaluation of the estimators of the three-parameter lognormal distribution by Monte Carlo simulationEasthamJ. F.LaRicciaV. N.SchuenemeyerJ. H.Small sample properties of the maximum likelihood estimators for an alternative parameterization of the three-parameter lognormal distributionCohenA. C.WhittenB. J.DingY.Modified moment estimation for the three-parameter lognormal distributionChieppaM.AmatoP.A new estimation procedure for the three-parameter lognormal distributionGriffithsD. A.Interval estimation for the three-parameter lognormal distribution via the likelihood functionCohenA. C.WhittenB. J.Estimation in the three-parameter lognormal distributionChenC.Tests of fit for the three-parameter lognormal distributionBahadurR. R.A note on quantiles in large samplesEmbrechtsP.KluppelbergC.MikoshT.