We investigate the statistical inferences and applications of the half exponential power distribution for the first time. The proposed model defined on the nonnegative reals extends the half normal distribution and is more flexible. The characterizations and properties involving moments and some measures based on moments of this distribution are derived. The inference aspects using methods of moment and maximum likelihood are presented. We also study the performance of the estimators using the Monte Carlo simulation. Finally, we illustrate it with two real applications.
1. Introduction
The well-known exponential power (EP) distribution or the generalized normal distribution has the following density function:
(1)f(x)=p1-1/p2Γ(1/p)e-|x|p/p,-∞<x<∞,
where p>0 is the shape parameter. This family consists of a wide range of symmetric distributions and allows continuous variation from normality to nonnormality. It includes the normal distribution Z~N(0,1) as the special case when p=2 and the Laplace distribution when p=1. Nadarajah [1] provided a comprehensive treatment of its mathematical properties.
Its tails can be more platykurtic (p>2) or more leptokurtic (p<2) than the normal distribution (p=2). The distribution has been widely used in the Bayes analysis and robustness studies (see Box and Tiao [2], Genc [3], Goodman and Kotz [4], and Tiao and Lund [5].)
On the other hand, since the most popular models used to describe the lifetime process are defined on nonnegative measurements, which motivate us to take a positive truncation in the model (1) and develop a half exponential power (HEP) distribution. As far as we know, this model has not been previously studied although, we believe, it plays an important role in data analysis. The resulting nonnegative half exponential power distribution generalizes the half normal (HN) distribution, and it is more flexible. In our work, we aim to investigate the statistical features of the nonnegative model and apply them to fit the lifetime data.
The rest of this paper is organized as follows: in Section 2, we present the new distribution and study its properties. Section 3 discusses the inference, moments, and maximum likelihood estimation for the parameters. In Section 4, we discuss a useful technique, a half normal plot with a simulated envelope, to assess the model adequacy. Simulation studies are performed in Section 5. Section 6 gives two illustrative examples and reports the results. Section 7 concludes our work.
2. The Half Exponential Power Distribution2.1. The Density and Hazard FunctionDefinition 1.
A random variable X has a half exponential power slash distribution if its density function with scale parameter σ>0 takes
(2)f(x)=p1-1/pσΓ(1/p)e-xp/pσp,x≥0,
where σ>0 and p>0. We denote it as X~HEP(σ,p).
Figure 1(a) displays some plots of the density function of the half exponential power distribution with various parameters.
The cumulative distribution function of the half exponential power distribution X~HEP(σ,p) is given as follows. For x≥0,
(3)F(x)=∫0xfX(u)du=∫0xp1-1/pσΓ(1/p)e-up/pσpdu=γ(1/p,xp/pσp)Γ(1/p),
where γ(,) is the lower incomplete gamma function, defined as γ(s,x)=∫0xts-1e-tdt.
The hazard rate function (also known as the failure rate function) of the half exponential power distribution is given by, for x≥0,
(4)h(x)=f(x)1-F(x)=p1-1/pe-xp/pσpσ[Γ(1/p)-γ(1/p,xp/pσp)].
Since Γ(s)-γ(s,x)~xs-1e-x, as x→∞, we obtain h(x)~xp-1/σp. Therefore, the hazard rate function is increasing for p≥1 and decreasing for 0<p<1. Figure 1(b) displays some plots of the hazard rate function of the half exponential power distribution with various parameters.
The density and hazard rate functions of HEP(σ,p) for σ=1.
Density function
Hazard function
2.2. Moments and Measures Based on MomentsProposition 2.
Let X~HEP(σ,p), for k=1,2,3,…; the kth noncentral moments are given by
(5)μk=𝔼Xk=pk/pσkΓ(1/p)Γ(k+1p).
The following results are immediate consequences of (5).
Corollary 3.
Let X~HEP(σ,p). The mean and variance of X are given by
(6)𝔼X=p1/pσΓ(1/p)Γ(2p),Var(X)=p2/pσ2[Γ(1/p)Γ(3/p)-[Γ(2/p)]2][Γ(1/p)]2.
Corollary 4.
Let X~HEP(σ,p). The skewness and kurtosis coefficients of X are given by
(7)β1=2[Γ(2/p)]3-3Γ(1/p)Γ(2/p)Γ(3/p)(Γ(1/p)Γ(3/p)-[Γ(2/p)]2)3/2+[Γ(1/p)]2Γ(4/p)(Γ(1/p)Γ(3/p)-[Γ(2/p)]2)3/2,β2=-3[Γ(2/p)]4+6Γ(1/p)[Γ(2/p)]2Γ(3/p)(Γ(1/p)Γ(3/p)-[Γ(2/p)]2)2-4[Γ(1/p)]2Γ(2/p)Γ(4/p)+[Γ(1/p)]3Γ(5/p)(Γ(1/p)Γ(3/p)-[Γ(2/p)]2)2.
Figure 2 shows the skewness and kurtosis coefficients with various parameters for the HEP model.
The plot for the skewness and kurtosis coefficients with various parameters.
Skewness coefficient
Skewness coefficient in log scale
Kurtosis coefficient
Kurtosis coefficient in log scale
3. Inference3.1. Moment Estimation
Let X1,X2,…,Xn be a random sample from the distribution HEP(σ,p). From (5), we have 𝔼X=(p1/pσ/Γ(1/p))Γ(2/p) and 𝔼X2=(p2/pσ2/Γ(1/p))Γ(3/p). Replacing 𝔼X and 𝔼X2 with the corresponding sample estimators, we obtain the moment equations
(8)X-=1n∑i=1nXi=p1/pσΓ(1/p)Γ(2p),X-2=1n∑i=1nXi2=p2/pσ2Γ(1/p)Γ(3p).
The estimate p^ is the solution to
(9)Γ(1/p)Γ(3/p)[Γ(2/p)]2=X-2X-2,
which can be solved numerically. And the estimate σ^ is given by
(10)σ^=X-Γ(1/p^)p^1/p^Γ(2/p^).
It is clear that, for the special case when p is known, estimator σ^ is unbiased and its mean squared error (MSE) is given by
(11)MSE(σ^)=σ2[Γ(1/p)Γ(3/p)-[Γ(2/p)]2]n[Γ(2/p)]2.
In the following proposition, we present the asymtotic property of the moment estimators.
Proposition 5.
Let X1,X2,…,Xn be a random sample of size n from the distribution HEP(σ,p), and let θ=(σ,p); then, if μ6=𝔼X6<∞ and θ^ is the moment estimator of θ, one has
(12)n(θ^-θ)⟶dN2(0,H-1Σ[H-1]T)
as n→∞, where Σ=({μi+j-μiμj}ij) and H is given by
(13)H=H(θ)=(∂μ1∂σ∂μ1∂p∂μ2∂σ∂μ2∂p)
whose entries are given by
(14)∂μ1∂σ=p1/pΓ(2/p)Γ(1/p),∂μ1∂p=-p-2+1/pσΓ(2/p)[-1+logp-ψ(1/p)+2ψ(2/p)]Γ(1/p),∂μ2∂σ=2p2/pσΓ(3/p)Γ(1/p),∂μ2∂p=-p-2+2/ps2Γ(3/p)[-2+2logp-ψ(1/p)+3ψ(3/p)]Γ(1/p),
where ψ() is the digamma function defined as the logarithmic derivative of the gamma function, ψ(x)=(d/dx)logΓ(x)=Γ′(x)/Γ(x).
Remark 6.
A consistent estimator for the asymptotic covariance matrix H-1Σ[H-1]T can be obtained by replacing parameters with their corresponding moment estimators.
3.2. Maximum Likelihood Estimation
In this section, we consider the maximum likelihood estimation about the parameter θ=(σ,p) of the HEP model defined in (2). The log likelihood for a random sample x1,x2,…,xn is
(15)l(θ)=log∏i=1nf(xi)=n(1-1p)logp-nlogσ-nlogΓ(1p)-1pσp∑i=1nxip.
By taking the partial derivatives of the log-likelihood function with respect to σ and p, respectively, and equalizing the obtained expressions to zero, the following maximum likelihood estimating equations are obtained:
(16)lσ=-nσ+1σp+1∑i=1nxip=0,lp=n(logp+p-1)p2+nψ(1/p)p2+1+plogσσpp2∑i=1nxip-1pσp∑i=1nxiplogxi=0.
In general, there are no explicit solutions for the above maximum likelihood estimating equations. The estimates can be obtained by means of numerical procedures such as the Newton-Raphson method. The program R provides the nonlinear optimization routine optim for solving such problems.
For asymptotic inference of θ=(σ,p), we need the Fisher information matrix I(θ). It is known that its inverse is the asymptotic variance matrix of the maximum likelihood estimators. For the case of a single observation (n=1), we take the second-order derivatives of the log-likelihood function in (15).
Consider,
(17)lσσ=1σ2-p+1σp+2xp,lσp=1σp+1xp(logx-logσ),lpp=-1p4σp[(1p)-3pσp+p2σp+2pxp+2pσplogp-1p4σp+2p2xplogσ+p3xp[logσ]2-1p4σp-2p2xplogx-2p3xplogσlogx-1p4σp+p3xp[logx]2+2pσpψ(1p)-1p4σp+σpψ′(1p)].
Using the facts
(18)𝔼xp=σp,𝔼(xplogx)=σp[plogσ+logp+ψ(1+1/p)]p,𝔼(xp[logx]2)=σp[(plogσ+logp+ψ(1+1/p))2+ψ′(1+1/p)]p2,
we can obtain the elements of the Fisher information matrix:
(19)I11=-𝔼lσσ=pσ2,I12=-𝔼lσp=logp+ψ(1+1/p)σp,I21=-𝔼lpσ=logp+ψ(1+1/p)σp,I22=-𝔼lpp=-p-p2+p[logp+ψ(1+1/p)]2p4+pψ′(1+1/p)+ψ′(1/p)p4.
Proposition 7.
Let X1,X2,…,Xn be a random sample of size n from the distribution HEP(σ,p), let θ=(σ,p), and θ^ is the maximum likelihood estimator of θ, one has
(20)n(θ^-θ)⟶dN2(0,I(θ)-1).
4. Assessment of Model Adequacy
In this section, we introduce a useful tool, a half normal plot with a simulated envelope which will be used to evaluate the HEP model in Section 6. The advantage of this technique is its ease of interpretation without knowing the distribution of the residuals.
Atkinson [6] proposed this diagnostic plot to detect potential outliers and influential observations in linear regression models. A simulated envelope is added to the plot to aid overall assessment, whereby the observed residuals are expected to lie within the boundary of the envelope if the presumed model has been correctly specified.
The method of simulated envelope and its corresponding transformations have been widely applied in many applications (see Flack and Flores [7], Ferrari and Cribari-Neto [8], da Silva Ferreira et al. [9], and so forth.) The simulated envelope technique compares the observed statistics with those of the data generated from the proposed model. Any sizeble departure of the observed residuals from the simulated quantities may be thought as evidence against the adequacy of the proposed model. Here is the procedure to produce the half normal plot with simulated envelopes.
Fit the model to the observed data (sample size = n).
Generate a sample of n observations based on the fitted model.
Fit the model to the above generated sample and compute the ordered absolute values of the standard residuals.
Repeat the above steps k times.
Consider the n sets of the k-ordered statistics; calculate the average, minimum, and maximum values across each set.
Plot these values together with the ordered residuals from the original data against the half normal scores Φ-1((i+n-1/8)/(2n+1/2)).
The minimum and maximum values of the k-ordered statistics constitute a simulated envelope to guide assessment of the model adequacy. Atkinson [6] suggested using k=19 since there is a 5% chance to detect the largest residual being outside the boundary of the simulated envelope. Moreover, other types of residuals such as deviance or score residual may be used in the procedure. For example, da Silva Ferreira et al. [9] used the Mahalanobis distance to assess their models. The horizontal axis can also show other variables such as index.
5. Simulation Study
In this section, we conduct some simulations and study the properties of the estimators numerically.
We perform a simulation to illustrate the behaviors of the moment and MLE estimators for parameters θ=(σ,p), respectively. The simulation is conducted by the software R. We generate 1000 samples of size n=100, n=150, and n=200 from the HEP(σ,p) distribution for fixed parameters σ and p.
The random numbers can be generated as follows. We first generate random numbers Y from an exponential power distribution with μ=0, σ, and p, the procedures can be found in Chiodi [10]; then we take the absolute value of the random numbers, X=|Y|. It follows that X~HEP(σ,p).
The estimators are computed using the results in Section 3. The empirical means and standard deviations of the estimators are presented in Tables 1 and 2, respectively. The simulation studies show that the parameters are well estimated, and the estimates are asymptotically unbiased. The empirical MSEs decrease as sample size increases as expected. Further, MLEs are more efficient than moment estimators.
Empirical means and SD for the moment estimators of σ and p.
σ
p
n=100
n=150
n=200
σ^ (SD)
p^ (SD)
σ^ (SD)
p^ (SD)
σ^ (SD)
p^ (SD)
1
1
1.0116 (0.1274)
1.0643 (0.1949)
1.0099 (0.1077)
1.0450 (0.1675)
1.0084 (0.0935)
1.0380 (0.1426)
1
2
1.0046 (0.1014)
2.0544 (0.3443)
0.9989 (0.0816)
2.0369 (0.3167)
1.0034 (0.0745)
2.0484 (0.2869)
1
3
0.9972 (0.0844)
3.0454 (0.4233)
0.9998 (0.0714)
3.0375 (0.4089)
1.0044 (0.0640)
3.0547 (0.3970)
2
1
2.0365 (0.2499)
1.0660 (0.1959)
2.0390 (0.2099)
1.0559 (0.1635)
2.0233 (0.1872)
1.0443 (0.1505)
2
2
2.0090 (0.1983)
2.0726 (0.3453)
2.0111 (0.1710)
2.0541 (0.3117)
2.0014 (0.1424)
2.0372 (0.2814)
2
3
2.0033 (0.1660)
3.0516 (0.4338)
2.0013 (0.1392)
3.0344 (0.4054)
2.0116 (0.1275)
3.0607 (0.3974)
Empirical means and SD for the MLE estimators of σ and p.
σ
p
n=100
n=150
n=200
σ^ (SD)
p^ (SD)
σ^ (SD)
p^ (SD)
σ^ (SD)
p^ (SD)
1
1
1.0119 (0.1272)
1.0515 (0.2055)
1.0134 (0.1079)
1.0397 (0.1695)
1.0026 (0.0890)
1.0270 (0.1401)
1
2
1.0153 (0.1106)
2.2028 (0.6168)
1.0048 (0.0883)
2.0995 (0.4420)
1.0063 (0.0770)
2.0876 (0.3644)
1
3
1.0193 (0.1102)
3.4735 (1.3164)
1.0099 (0.0816)
3.2477 (0.7742)
1.0068 (0.0736)
3.1542 (0.6405)
2
1
2.0202 (0.2631)
1.0566 (0.2107)
2.0309 (0.2178)
1.0409 (0.1697)
2.0153 (0.1766)
1.0242 (0.1372)
2
2
2.0250 (0.2266)
2.1944 (0.6224)
2.0136 (0.1798)
2.1194 (0.4469)
2.0031 (0.1531)
2.0695 (0.3449)
2
3
2.0332 (0.2235)
3.4523 (1.4561)
2.0241 (0.1682)
3.2700 (0.8226)
2.0218 (0.1432)
3.2229 (0.7221)
6. Real Data Illustration
In this section, we analyze two real datasets to fit with the proposed model. The applications demonstrate that the HEP model fits the data better than the HN model.
6.1. Application 1
The data are the plasma ferritin concentration measurements of 202 athletes collected at the Australian Institute of Sport. This dataset has been studied by several authors (see Azzalini and Dalla Valle [11], Cook and Weisberc [12], and Elal-Olivero et al. [13].)
The descriptive statistics for the dataset are shown in Table 3, where b1 and b2 are the sample skewness and kurtosis coefficients. Notice that the dataset presents nonnegative measurements.
Summary of the plasma ferritin concentration measurements.
Sample size
Mean
Standard deviation
b1
b2
202
76.88
47.50
1.28
4.42
We fit the dataset with the half normal and the half exponential power distribution, respectively, using maximum likelihood method. The MLE estimators are computed using R, and the results are reported in Table 4. The usual Akaike information criterion (AIC) and Bayesian information criterion (BIC) to measure of the goodness of fit are also computed: AIC=2k-2logL and BIC=klogn-2logL, where, k is the number of parameters in the distribution and L is the maximized value of the likelihood function. The results indicate that HEP model has the lower values for the AIC and BIC statistics, and thus it is a better model. Figures 3(a) and 3(b) display the fitted models using the MLE estimates.
Maximum likelihood parameter estimates (with (SD)) of the HN and HEP models for the plasma ferritin concentration data.
Model
σ^
p^
Log lik.
AIC
BIC
HN
76.9436 (3.0588)
—
−1062.037
2126.074
2129.382
HEP
97.1311 (6.1496)
2.5109 (0.3318)
−1054.739
2113.478
2120.095
Models fitted for the plasma ferritin concentration dataset.
Histogram and fitted curves
Empirical and fitted CDF
The diagnostic procedure introduced in Section 4 is implemented for both models. The simulated envelope plots are shown in Figures 4(a) and 4(b). Most of the observed residuals are either near or outside the boundary of the envelope, indicating inadequacy of the fitted HN model. On the other hand, the observed residuals corresponding to the HEP model in Figure 4(b) are well within the simulated envelope, indicating that the HEP model provides a better fit to the data.
Simulated envelopes for on HN and HEP models.
Half normal
Half exponential power
6.2. Application 2
We consider the stress-rupture dataset and the life of fatigue fracture of Kevlar 49/epoxy that are subject to the pressure at the 90% level. The dataset has been previously studied by Andrews and Herzberg [14], Barlow et al. [15], and Olmos et al. [16].
Table 5 summarizes the dataset. This dataset also shows nonnegative asymmetry. Same as before, we fit the dataset with the half normal and the half exponential power distribution, respectively, using maximum likelihood method. The results are reported in Table 6. The AIC and BIC are presented as well, and the results show that HEP model fits better. Figures 5(a) and 5(b) display the fitted models using the MLE estimates.
Summaryofthe life of fatigue fracture.
sample size
Mean
Standard deviation
b1
b2
101
1.025
1.119
3.001
16.709
Maximum likelihood parameter estimates (with (SD)) of the HN and HEP models for the life of fatigue fracture data.
Model
σ^
p^
Log lik.
AIC
BIC
HN
1.5135 (0.1064)
—
−115.1666
232.3332
234.9483
HEP
0.9689 (0.1298)
0.8815 (0.1677)
−103.2537
210.5074
215.7376
Models fitted for the life of fatigue fracture dataset.
Histogram and fitted curves
Empirical and fitted CDF
The diagnostic procedure introduced in Section 4 is implemented for both models. The simulated envelope plots are shown in Figures 6(a) and 6(b). The observed residuals corresponding to the HEP model in Figure 6(b) are well within the simulated envelope, indicating that the HEP model provides a better fit to the data.
Simulated envelopes for on HN and HEP models.
Half normal
Half exponential power
7. Concluding Remarks
In this paper, we have studied the half exponential power distribution HEP(σ,p) in detail. This nonnegative distribution contains the half normal distribution as its special case. Probabilistic and inferential properties are studied. A simulation is conducted and demonstrates the good performance of the moment and maximum likelihood estimators. We apply the model to two real datasets, illustrating that the proposed model is appropriate and flexible in real applications. There are a number of possible extensions of the current work. Mixture modeling using the proposed distributions is the most natural extension. Other extensions of the current work include a generalization of the distribution to multivariate settings.
AppendixProofs of PropositionsProof of Proposition 2.
This result follows directly by using standard large sample theory for moment estimators, as discussed in Sen and Singer [17].
Proof of Proposition 7.
It follows directly by using the large sample theory for maximum likelihood estimators and the Fisher information matrix given above.
NadarajahS.A generalized normal distribution20053276856942-s2.0-2764459650610.1080/02664760500079464BoxG.TiaoG.A further look at robustness via bayes's theorem1962493-4419432GençA. I.A generalization of the univariate slash by a scale-mixtured exponential power distribution20073659379472-s2.0-3454830360810.1080/03610910701539161GoodmanI. R.KotzS.Multivariate θ-generalized normal distributions1973322042192-s2.0-0010856550TiaoG.LundD.The use of olumv estimators in inference robustness studies of the location parameter of a class of symmetric distributions197065370386AtkinsonA.1985Clarendon Press OxfordFlackV. F.FloresR. A.Using simulated envelopes in the evaluation of normal probability plots of regression residuals19893122192252-s2.0-0024657386FerrariS. L. P.Cribari-NetoF.Beta regression for modelling rates and proportions20043177998152-s2.0-444435718410.1080/0266476042000214501da Silva FerreiraC.BolfarineH.LachosV. H.Skew scale mixtures of normal distributions: properties and estimation2011821541712-s2.0-7875170574610.1016/j.stamet.2010.09.001ChiodiM.Procedures for generating pseudo-random numbers from a normal distribution of order p (P>1)19861726AzzaliniA.Dalla ValleA.The multivariate skew-normal distribution19968347157262-s2.0-0001417140CookR.WeisbercS.An introduction to regression graphic?199417, article 640Elal-OliveroD.Olivares-PachecoJ. F.GómezH. W.BolfarineH.A new class of non negative distributions generated by symmetric distributions200938799310082-s2.0-7764931629010.1080/03610920802361381AndrewsD.HerzbergA.198518New York, NY, USASpringerBarlowR.TolandR.FreemanT.ClarottiC. A.LindleyD. V.A bayesian analysis of the stress-rupture life of kevlar/epoxy spherical pressure vessels1988OlmosN. M.VarelaH.GómezH. W.BolfarineH.An extension of the half-normal distribution20111122-s2.0-7995980286910.1007/s00362-011-0391-4SenP.SingerJ. M.1993Chapman and Hall/CRC