This paper derives closed-form solutions for the g-and-h shape parameters associated with the Tukey family of distributions based on the method of percentiles (MOP). A proposed MOP univariate procedure is described and compared with the method of moments (MOM) in the context of distribution fitting and estimating skew and kurtosis functions. The MOP methodology is also extended from univariate to multivariate data generation. A procedure is described for simulating nonnormal distributions with specified Spearman correlations. The MOP procedure has an advantage over the MOM because it does not require numerical integration to compute intermediate correlations. Simulation results demonstrate that the proposed MOP procedure is superior to the MOM in terms of distribution fitting, estimation, relative bias, and relative error.
1. Introduction
The Tukey g-and-h families of univariate and multivariate nonormal distributions are commonly used for distribution fitting, modeling events, random variable generation, and other applied mathematical contexts such as operational risk, extreme oceanic wind speeds, common stock returns, and solar flare data. See [1–17].
The family of univariate g-and-h distributions can be summarized as follows:
(1)Y=q(Z)=qg,h(Z)=g-1(egZ-1)ehZ2/2,(2)Y=q(Z)=qg,0(Z)=limh→0qg,h(Z)=g-1(egZ-1),(3)Y=q(Z)=q0,h(Z)=limg→0qg,h(Z)=ZehZ2/2,
where Z is an i.i.d. standard normal random variable with probability density function (pdf), ϕ(z), and cumulative distribution function (cdf), Φ(z). The transformations in (1)–(3) are strictly monotone increasing functions with real-valued constants g and h that produce distributions defined as (i) asymmetric g-and-h (g≠0, h>0), (ii) log-normal (g≠0, h=0), and (iii) symmetric h (h≥0), respectively. The constant ±g controls the skew of a distribution in terms of both direction and magnitude. Taking the negative of g will change the direction of the skew but not its magnitude; that is, q-g,h(Z)=-qg,h(-Z). The constant h controls the tail-weight of a distribution where the function ehZ2/2 (i) preserves symmetry, (ii) is increasing for Z≥0 and h≥0, and (iii) produces increased tail-weight as the value of h becomes larger. In summary, (1)–(3) are computationally efficient for the purpose of generating nonormal distributions because they only require the specification of one or two shape parameters (g,h) and an algorithm that produces standard normal random deviates.
The values of g and h associated with (1)–(3) can be determined from either the method of moments (MOM), for example, [8, 10, 13], or the method of percentiles (MOP), for example, [9, 18]. However, these two methods have disadvantages. Specifically, in the context of the MOM, the estimates of conventional skew and kurtosis associated with heavy-tailed or -skewed distributions can be substantially biased, have high variance, or can be influenced by outliers; see, for example, [19–24]. In terms of the MOP, the primary disadvantage associated with the g-and-h procedure is the laborious “multiplicative iterative” approach needed to determine the h constant for symmetric and asymmetric nonormal g-and-h distributions; see [9, p. 484–489].
On the other hand, the MOP approach described by Karian and Dudewicz [24–27] in the context of the generalized lambda distribution (GLD) has demonstrated to be an attractive and computationally efficient alternative to the MOM in terms of distribution fitting and computing the GLD shape parameters. Further, it has been demonstrated that the MOP is superior to the MOM over a broad range of combinations of skew and kurtosis for fitting GLDs to theoretical or empirical distributions (see [24]).
A procedure for simulating correlated g-and-h distributions with a specified correlation structure based on (1)–(3) is described by Kowalchuk and Headrick [10] and Headrick [28]. Specifically, the g-and-h procedure makes use of the popular NORTA (NORmal To Anything) approach. That is, the procedure begins with generating multivariate standard normal deviates prior to transformation. However, one limitation arises because the Pearson correlation is not invariant under nonlinear strictly increasing transformations. This is a concern because the transformations in (1)–(3) have this characteristic, for example, [8, 28]. Thus, the initial multivariate normal correlation structure used in the NORTA approach will not be maintained subsequent to any of the transformations in (1)–(3). As such, the NORTA procedure must begin with the computation of an intermediate correlation (IC) matrix, which is different from the specified correlation matrix between the nonormal distributions. The purpose of the IC matrix is to adjust for the non-normalization effect of a transformation such that the g-and-h procedure can generate nonormal distributions with a specified correlation matrix. Further, in the context of the g-and-h procedure, NORTA also requires numerical integration techniques to solve for the IC matrix, see [28, p. 143], which can be computationally expensive.
In contradistinction, the Spearman correlation has comparative advantages over the Pearson correlation in the context of the g-and-h procedure for computing ICs. Specifically, the transformations in (1)–(3) must be strictly monotone increasing functions to produce valid pdfs. Thus, subsequent to transformation, the rank order of Y, R(Y), remains the same as the rank order of Z, R(Z); that is, the Spearman correlation remains unchanged. Moreover, there is a straightforward equation that can be used to directly solve for all pairwise ICs, for example, [28, p. 114], and thus does not require the numerical integration techniques associated with the conventional product MOM g-and-h procedure.
In view of the above, the present aim is to obviate the problems associated with the MOM in the context of the family of g-and-h distributions in (1)–(3) by characterizing these distributions through the MOP as described in Karian and Dudewicz [27, p. 172-173]. Specifically, the purpose of this paper is to obviate the laborious “multiplicative iterative” technique given in [9] and develop the methodology and a procedure for simulating g-and-h distributions with specified medians (γ1), interdecile ranges (γ2), left-right tail-weight ratios (γ3, a skew function), and tail-weight factors (γ4, a kurtosis function). In terms of simulating multivariate g-and-h distributions, the Spearman correlation will be used in lieu of the Pearson correlation. In summary, the advantages of the proposed MOP procedure are that (i) the MOP parameters (γ3, γ4) exist for any distribution, whether the mean exists or not, for example, [25]; (ii) there are less relative bias and less relative standard error when juxtaposed with the MOM procedure; (iii) there are closed-form solutions for the g and h constants; (iv) there is a straightforward equation to obtain pairwise ICs for the purpose of simulating correlated nonormal g-and-h distributions in (1)–(3).
The remainder of the paper is outlined as follows. In Section 2, a summary of the univariate g-and-h distributions based on the MOM is provided. In Section 3, the requisite information associated with the MOP is provided. Further, the systems of equations for determining the closed-form solutions of the g and h constants are subsequently derived for simulating univariate nonormal distributions with specified values of γ1–γ4. In Section 4, a comparison of the MOM and the MOP is provided by fitting the SPSS data from [29]. In Section 5, the methodology for simulating correlated nonormal g-and-h distributions with the specified Spearman correlation matrices is provided. In Section 6, the steps for implementing the proposed MOP procedure are described. A numerical example and results of a simulation are also provided to confirm the derivations and compare the proposed procedure with the MOM procedure. In Section 7, the results of the simulation are discussed.
2. Preliminaries for the Tukey Family of g-and-h Distributions2.1. The Tukey Family of g-and-h Distributions Based on the MOM
The requirement that q(Z) in (1)–(3) be a strictly monotone increasing function implies that an inverse function (q-1) exists and thus Fq(z)(z)=Φ(z), where Fq(z)(z) is the general form of the cdf for both the MOM and the MOP. Differentiating both sides with respect to q(z) yields dFq(z)(z)/dq(z)=fq(z)(z), where fq(z)(z) is the general form of the pdf for both the MOM and the MOP. Hence, dFq(z)(z)/dq(z)=(dFq(z)(z)/dz)/(dq(z)/dz)=ϕ(z)/q′(z). Whence, the pdf integrates to one because ϕ(z) is the standard normal pdf and will be nonnegative on its support q(z) for z∈(-∞,+∞) given that h≥0 and where limz→±∞ϕ(z)/q′(z)=0 for the transformations in (1)–(3).
The g and/or h constants associated with (1)–(3) that determine the shape of a distribution are computed using a moment-matching approach that involves the conventional measures of the mean (α1), variance (α2), skew (α3), and kurtosis (α4). Specifically, the values of g and h in (1) are determined by simultaneously solving (6) and (7) for specified values of skew and kurtosis (e.g., [28, p. 125, Eqs. 5.16-5.17], and [10, (A1)–(A4)]):(4)α1=(exp{g2/(2-2h)}-1)(g(1-h)1/2),(5)α2=[(exp{g2/2-2h}-1)2(h-1)(1-2exp{g2/(2-4h)}+exp{2g2/(1-2h)})(1-2h)1/2d+(exp{g2/2-2h}-1)2(h-1)]×g-2,(6)α3=[2(exp{g2/(2-2h)}-1)3(1-h)3/2(3exp{g2(2-6h)}+exp{9g2(2-6h)}ddddd-3exp{2g2(1-3h)}-1)×(1-3h)-1/2dd-3(1-2exp{g2(2-4h)}+exp{2g2(1-2h)})dd×(exp{g2(2-2h)}-1)×((1-2h)1/2(1-h)1/2)-1dd+2(exp{g2/(2-2h)}-1)3(1-h)3/2]dd×[g3((×g-2(exp{g2/(2-2h)}-1)2(h-1))3/2(1-2exp{g2(2-4h)}+exp{2g2(1-2h)})dddddddd×(1-2h)-1/2+(exp{g2/(2-2h)}-1)2(h-1))ddddddd×g-2(exp{g2/(2-2h)}-1)2(h-1))3/2]-1,(7)α4=[(+exp{2g2(1-2h)})2exp{8g2(1-4h)}d×(1+6exp{6g2(4h-1)}+exp{8g2(4h-1)}dddd-4exp{7g2(8h-2)}-4exp{15g2(8h-2)})s×(1-4h)-1/2d-4(3exp{g2(2-6h)}+exp{9g2(2-6h)}ddddd-3exp{2g2(1-3h)}-1)d×(exp{g2/(2-2h)}-1)(1-3h)1/2(1-h)1/2d-6(exp{g2/(2-2h)}-1)4(h-1)2d-12(1-2exp{g2(2-4h)}+exp{2g2(1-2h)})d×(exp{g2/(2-2h)}-1)2(1-2h)1/2(h-1)d+3(1-2exp{g2(2-4h)}+exp{2g2(1-2h)})2d×(2h-1)-1(+exp{2g2(1-2h)})2]×[(1-2exp{g2(2-4h)}+exp{2g2(1-2h)})ddd×(1-2h)-1/2+(exp{g2(2-2h)}-1)2ddd×(h-1)-1{2g2(1-2h)}]-2.
The mean and the variance for a g-and-h distribution can be determined by evaluating (4) and (5) using the solutions of g and h obtained from (6) and (7). For any g-and-h distribution, the kth moment will exist if 0≤h<1/k (see [8]). As such, (4)–(7) exist if the first four moments are defined. That is, 0≤h<1/4. Note that the equations of α1–α4 for g distributions in (2) and h distributions in (3) can be obtained from [10, Eqs. (A5)–(A8)] and [10, Eqs. (A9)–(A12)], respectively.
3. The Percentile Based g-and-h Family of Distributions3.1. General Considerations
The percentiles (θp) associated with a conventional based g-and-h pdf can be obtained by making use of the standard normal cdf, Φ(z). As such, we will define the following location, scale, and shape parameters as in [27, p. 172-173](8)γ1=θ0.50,γ2=θ0.90-θ0.10,γ3=θ0.50-θ0.10θ0.90-θ0.50,γ4=θ0.75-θ0.25γ2,
where (8) are the (i) median, (ii) interdecile range, (iii) left-right tail-weight ratio (a skew function), and (iv) tail-weight factor (a kurtosis function), respectively. The parameters in (8) are defined to have the restrictions
(9)-∞<γ1<+∞,γ2≥0,γ3≥0,0≤γ4≤1,
and where a symmetric distribution implies that γ3=1.
3.2. The Tukey Family of g-and-h Distributions Based on the MOP
The derivation of a percentile based system of g-and-h pdfs begins by substituting the standard normal distribution percentiles (zp) into the quantile functions in (1)–(3) and making use of (8) gives
(10)γ1=q(z0.50),γ2=q(z0.90)-q(z0.10),γ3=q(z0.50)-q(z0.10)q(z0.90)-q(z0.50),γ4=q(z0.75)-q(z0.25)γ2,
where z0.50=0, z0.90=1.281⋯, z0.75=0.6744⋯ from the standard normal distribution. Note from symmetry that z0.10=-z0.90 and z0.25=-z0.75. The explicit forms of (10) are
(11)γ1=0,(12)γ2=[(1/2)exp{z0.90(hz0.75-2g)}(exp{2gz0.90}-1)]g,(13)γ3=exp{-gz0.90},(14)γ4=12[exp{(z0.75-z0.90)(h(z0.90+z0.75)-2g)2}dd×(exp{2gz0.75}-1)(coth[gz0.90]-1){(z0.75-z0.90)(h(z0.90+z0.75)-2g)2}].
Simultaneously, solving for the coefficients in (13) and (14) gives the convenient closed-form expressions
(15)g=-ln(γ3)z0.90,h=2ln([γ31-z0.75/z0.90(γ32z0.75/z0.90-1)]/[(γ32-1)γ4])z0.902-z0.752.
For a symmetric distribution (g=0), the closed-form solution for h can be expressed as
(16)h=2ln(z0.75/(γ4z0.90))z0.902-z0.752.
Estimates of γ1–γ4 based on the percentiles in (10) for a sample of size n can be determined by finding the j and j+1 integer values and their corresponding expected values of the order statistics E[q(Z)j:n] and E[q(Z)j+1:n], by making use of the following equation (see [30, 31]):
(17)E[q(Z)j:n]=n!(j-1)!(n-j)!×∫-∞+∞q(z)ϕ(z){Φ(z)}j-1{1-Φ(z)}n-jdz
such that
(18)E[q(Z)j:n]≤q(zp)≤E[q(Z)j+1:n]
and subsequently solve the equation
(19)q(zp)=(u)E[q(Z)j:n]+(1-u)E[q(Z)j+1:n]
for 0≤u≤1. Thus, an empirical estimate of q(zp) can then be obtained based on the order statistics of a sample of size n as q(zp)≃q(Zp)=(u)q(Z)j:n+(1-u)q(Z)j+1:n.
4. A Comparison of the MOM and the MOP on Distribution Fitting
Presented in Figure 1 are the MOM and the MOP pdfs superimposed on the histogram of the SPSS customer_dbase data from [29]. This is a data file that concerns the company’s efforts to use the information in its data warehouse to make special offers for customers who are most likely to reply. Specifically, these data are the amount each customer spent on their primary credit card in the last month. The parameters (α3,4, γ3,4) associated with Figure 1 were based on a sample size of n=5,000 participants. The bootstrap estimates, confidence intervals, and relative standard errors of (α^3,4, γ^3,4) are also provided in Figure 1. Note that to fit the g-and-h distributions to the data, a linear transformation has to be imposed on q(z)=Aq(z)+B for both the MOM and the MOP procedures. Specifically, in the context of the MOM, A=s/σ and B=m-Aμ. In the context of the MOP, A=IDR/γ2 and B=med. The values of the sample and the theoretical means (m,μ), standard deviations (s,σ), medians (med, γ1), and interdecile range (IDR, γ2) for the data and g-and-h pdfs are given in Figure 1.
Histograms and estimates based on the MOM and the MOP for the SPSS credit card customer_dbase data from [29].
Visual inspection of the approximation in Figure 1 indicates that the MOP pdf provides a more accurate fit to the actual data over the MOM. Further, γ3,4 have more precision than α3,4 because the relative standard errors (RSEs) of γ^3,4 are much less than those of α^3,4. To compare the accuracy of the data fitting for the MOP and the MOM, the Euclidean distances are calculated as d=∑(O-E)2, where O is the observed proportion in each interval and E is the expected proportion in each interval for both the MOP and the MOM. From Table 1, the MOP has a more accurate data fit because the Euclidean distance (d) of the MOP is less than that of the MOM.
Percentiles, expected proportions, observed proportions, and the Euclidean distances (d) for the MOP and the MOM approximations associated with the SPSS customer_dbase data (n = 5,000) from [29].
Percentile
Expected prop.
Obs. prop.
Obs. prop.
(MOM)
(MOP)
10
0.10
0.1240
0.1000
25
0.15
0.1208
0.1658
50
0.25
0.2392
0.2340
75
0.25
0.2564
0.2354
90
0.15
0.1632
0.1646
d=0.04196
d=0.03053
5. The Spearman Correlations for the System of the MOP
We assume that the variates Yi=q(Zi) and Yj=q(Zj) in (1)–(3) produce valid pdfs and are thus increasing monotonic transformations in Zi and Zj. This implies that the rank orders of Yi (R(Yj)) and Zi (R(Zj)) are identical and thus will have rank correlations of ρR(Yi),R(Zi)=ρR(Yj),R(Zj)=1.
Given these assumptions, suppose it is desired to simulate a T-variate distribution from the quantile functions in (1)–(3) with a specified T×T Spearman correlation matrix and where each distribution has specified γ3 and γ4. Let Zi and Zj have univariate normal pdfs ϕ(zi) and ϕ(zj), the Pearson correlation ρzi,zj, and standard normal bivariate density of
(20)fij:=fzizj(zi,zj,ρzi,zj)fij:=(2π1-ρzi,zj2)-1dddd0×exp{-(2(1-ρzi,zj2))-1(zi2-2ρzi,zjzizj+zj2)}.
The correlation ρR(Yi),R(Yj) can be obtained from the derivation of ρR(Zi),R(Zj) given in [32]. That is, because (1)–(3) are strictly increasing monotonic transformations, ρR(Yi),R(Yj)=ρR(Zi),R(Zj) and thus we have from Headrick [28, Eq. 4.34]
(21)ρR(Yi),R(Yj)=(6π)[(n-2n+1)sin-1(ρzi,zj2)+(1n+1)sin-1(ρzi,zj)],
where ρR(Yi),R(Yj) is the specified Spearman correlation and ρzi,zj is the intermediate correlation (IC). For a specified value of ρR(Yi),R(Yj) and a finite sample size n, the value of the IC, ρzi,zj, can be obtained by numerically solving (21).
6. The Procedure for Simulation and Monte Carlo Study
To implement the method for simulating g-and-h, g, h distributions with specified γ3, γ4 and the Spearman correlations, we suggest the following six steps.
Specify the values of γ3 and γ4 for the T transformations of the forms in (1)–(3), thatis,q1(Z1),…,qT(ZT) and obtain the constants of g and/or h by solving (15)-(16) using the specified values of γ3 and γ4 for each distribution. Specify a T×T matrix of the Spearman correlations between qj(Zj) and qk(Zk), where j<k∈{1,2,…,T}.
Compute the ICs ρzi,zj by substituting the solutions of the constants from Step 1 into (21) and then solve for ρzi,zj. Repeat this step separately for all T(T-1)/2 pairwise combinations of correlations.
Assemble the ICs into a T×T matrix and decompose this matrix using a Cholesky factorization. Note that this step requires the IC matrix to be positive definite.
Use the results of the Cholesky factorization from Step 3 to generate T standard normal variables (Z1,…,ZT) correlated at the intermediate levels as follows:
(22)Z1=a11V1Z2=a12V1+a22V2ddd⋮Zj=a1jV1+a2jV2+⋯+aijVi+⋯+ajjVjddd⋮ZT=a1TV1+a2TV2+⋯+aiTVi+⋯+ajTVj+⋯+aTTVT,
where V1,…,VT are independent standard normal random variables and aij represents the element in the ith row and the jth column of the matrix associated with the Cholesky factorization performed in Step 3.
Substitute Z1,…,ZT from step 4 into T equations of the form in (1)–(3), as noted in Step 1, to generate the g and/or h distributions with the specified values of γ3 and γ4 and the Spearman correlations.
To demonstrate the previous steps and evaluate the proposed procedure, a comparison between the MOP and the MOM procedures is subsequently described. Specifically, the distributions in Figure 2 are used as a basis for a comparison using the specified correlation matrices in Table 2. Tables 3, 4, and 5 give the solved IC matrices for the MOM and the MOP procedures with samples of sizes 25 and 750, respectively. Note that the ICs for the MOM were computed by using the Mathematica source code as in [10, Table 1] and the ICs for the MOP are based on (21). Tables 6, 7, and 8 give the results of the Cholesky decompositions on the IC matrices, which are then used to create Z1,…,Z4 with the specified ICs by making use of the formulae given in (22) of step 4 with T=4. The values of Z1,…,Z4 are subsequently substituted into equations of the form in (1)–(3) to produce q1(Z1),…,qT(ZT) for both procedures.
Specified correlation matrix for the distributions in Figures 2(a)–2(d).
1
2
3
4
1
1
2
0.40
1
3
0.60
0.50
1
4
0.65
0.70
0.60
1
Intermediate correlations for the MOM procedure.
1
2
3
4
1
1
2
0.418072
1
3
0.786146
0.646711
1
4
0.906185
0.796341
0.702012
1
Intermediate correlations for the MOP procedure (n=25).
1
2
3
4
1
1
2
0.431321
1
3
0.638650
0.536062
1
4
0.688961
0.738501
0.638650
1
Intermediate correlations for the MOP procedure (n=750).
1
2
3
4
1
1
2
0.416344
1
3
0.618734
0.518259
1
4
0.668342
0.717483
0.618734
1
The Cholesky decompositions for the MOM procedure.
a11=1
a12=0.418072
a13=0.786146
a14=0.736275
0
a22=0.908414
a23=0.350111
a24=0.537778
0
0
a33=0.509310
a34=-0.127801
0
0
0
a44=0.390334
The Cholesky decompositions for the MOP procedure (n=25).
a11=1
a12=0.431321
a13=0.638650
a14=0.688961
0
a22=0.902198
a23=0.288848
a24=0.489180
0
0
a33=0.713227
a34=0.080404
0
0
0
a44=0.528745
The Cholesky decompositions for the MOP procedure (n=750).
a11=1
a12=0.416343
a13=0.618734
a14=0.668342
0
a22=0.909207
a23=0.286682
a24=0.483084
0
0
a33=0.731424
a34=0.091215
0
0
0
a44=0.558237
Two symmetric h distributions (a)-(b) and two asymmetric g-and-h distributions (c)-(d) with their MOM and MOP parameters of skew (α3) and left-right tail-weight ratio (γ3), kurtosis (α4) and tail-weight factor (γ4), and corresponding shape parameters for (3) and (1).
In terms of the simulation, a Fortran algorithm was written for both methods to generate 25,000 independent sample estimates for the specified parameters of (i) conventional skew (α3), kurtosis (α4), and the Pearson correlation; (ii) left-right tail-weight ratio (γ3), tail-weight factor (γ4), and the Spearman correlation. All estimates were based on sample sizes of n=25 and n=750. The formulae used for computing estimates of α3,4 were based on Fisher’s k-statistics, that is, the formulae currently used by most commercial software packages such as SAS, SPSS, and Minitab, for computing indices of skew and kurtosis (where α3,4=0 for the standard normal distribution). The formulae used for computing estimates of γ3,4 were based on (10). Note that the estimates of percentiles were based on (17). The estimate for the Pearson correlations were based on the usual formula for the Pearson product-moment of correlation statistic and the estimate for the Spearman correlations were computed based on usual formula for the Spearman rank of correlation statistic. The estimates for the Pearson and Spearman correlations were both transformed using Fisher’s z transformation. Bias-corrected accelerated bootstrapped average (median) estimates, confidence intervals (CIs), and standard errors were subsequently obtained for the estimates associated with the parameters using 10,000 resamples via the commercial software package Spotfire S+ [33]. The bootstrap results for the estimates of the medians and CIs associated with the Pearson and Spearman correlations were transformed back to their original matrices (i.e., estimates for the Pearson and Spearman correlations). Further, if a parameter (P) was outside its associated bootstrap CI, then an index of relative bias (RB) was computed for the estimate (E) as RB=(((E-P))/P)×100. Note that the small amount of bias associated with any bootstrap CI containing a parameter was considered negligible and thus not reported. The results of the simulation are reported in Tables 9, 10, 11, and 12 and are discussed in the next section.
Skew (α3) and kurtosis (α4) results for the MOM procedure (n=25).
Dist.
Parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
1
α3=0
0.0034
−0.0035, 0.01065
0.0036
—
α4=0
−0.1785
−0.1896, −0.1673
0.0056
-5.93
2
α3=0
−0.0009
−0.0158, 0.0142
0.0075
—
α4=25
1.297
1.2668, 1.3246
0.0151
-94.81
3
α3=10
1.763
1.7466, 1.7778
0.0082
-82.37
α4=1000
3.471
3.3925, 3.5469
0.0392
-99.65
4
α3=3
1.436
1.4227, 1.4487
0.0066
-52.13
α4=21
2.068
2.0184, 2.1203
0.0270
-90.15
Skew (α3) and kurtosis (α4) results for the MOM procedure (n=750).
Dist.
Parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
1
α3=0
−0.0003
−0.0017, 0.0013
0.0007
—
α4=0
−0.0149
−0.0175, −0.0123
0.0013
—
2
α3=0
0.0043
−0.0043, 0.0141
0.0045
—
α4=25
6.231
6.1678, 6.2958
0.0313
-75.08
3
α3=10
4.325
4.3022, 4.3491
0.0121
-56.75
α4=1000
30.82
30.4145, 31.1905
0.2079
-96.92
4
α3=3
2.524
2.5145, 2.5335
0.0051
-15.87
α4=21
10.58
10.4758, 10.6732
0.0506
-49.62
Left-right tail-weight ratio (γ3) and tail-weight factor (γ4) results for the MOP procedure (n=25).
Dist
Parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
1
γ3=1
0.9978
0.9908, 1.0040
0.0034
—
γ4=0.526307
0.5294
0.5279, 0.5310
0.0008
0.5877
2
γ3=1
1.001
0.9940, 1.0087
0.0039
—
γ4=0.469319
0.4762
0.4745, 0.4778
0.0008
1.4662
3
γ3=0.387801
0.3995
0.3967, 0.4024
0.0014
3.02
γ4=0.440929
0.4895
0.4876, 0.4913
0.0010
11.06
4
γ3=0.432409
0.4452
0.4418, 0.4484
0.0017
2.96
γ4=0.477822
0.4894
0.4876, 0.4913
0.0009
2.42
Left-right tail-weight ratio (γ3) and tail-weight factor (γ4) results for the MOP procedure (n=750).
Dist.
Parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
1
γ3=1
1
0.9992, 1.0014
0.0005
—
γ4=0.526307
0.5264
0.5261, 0.5267
0.0002
—
2
γ3=1
0.9994
0.9980, 1.0009
0.0007
—
γ4=0.469319
0.4696
0.4693, 0.4699
0.0002
—
3
γ3=0.387802
0.3882
0.3876, 0.3887
0.0003
—
γ4=0.440929
0.4746
0.4742, 0.4749
0.0002
7.64
4
γ3=0.432409
0.4326
0.4321, 0.4332
0.0003
—
γ4=0.477822
0.4784
0.4780, 0.4788
0.0002
—
7. Discussion and Conclusion
One of the advantages that the MOP has over the MOM is that the MOP can be much less biased when sampling is from distributions with more severe departures from normality. Moreover, inspection of the simulation results in Tables 9 and 12 clearly indicates this result. That is, the superiority that the MOP estimates (γ3,γ4) have over their corresponding MOM counterparts (α3,α4). For example, with samples size of n=25, the estimates of skew and kurtosis for Distribution 3 (heavy-skewed and heavy-tailed) were, on average, 82.37% and 99.65% below their associated population parameters, whereas the estimates of γ3 and γ4 were 3.02% and 11.06% over their respective parameters. It is also evident from comparing Tables 9 and 12 that γ3 and γ4 are more efficient estimators as their relative standard errors RSE=(standarderror/estimate)×100 are considerably smaller than those of the MOM estimators of skew and kurtosis. For example, with samples size of n=25, in terms of Distribution 3, inspection of Tables 9 and 12 indicates RSE measures of RSE(α^3)=0.47% and RSE(α^4)=1.13% compared with RSE (γ^3)=0.36% and RSE(γ^4)=0.20%. This demonstrates that γ3 and γ4 have more precision because they have less variance around their estimates.
Presented in Tables 13, 14, 15, and 16 are the results associated with the conventional Pearson and the Spearman correlations. Overall inspection of these tables indicates that the Spearman correlation is superior to the Pearson correlation in terms of RB. For example, with samples of size n=25, the RB for the two heavy-tailed distributions (i.e., distributions 3 and 4) was 9.66% for the Pearson correlation compared with only 2.29% for the Spearman correlation. Further, for large sample sizes (n=750), the Spearman correlation bootstrap CIs contained most of the population parameters, whereas the Pearson correlation CIs contained none of the parameters. It is also noted that the variability of the Spearman correlation appears to be more stable than that of the Pearson correlation both within and across the different conditions.
The Pearson correlation (corr.) results for the MOM procedure (n=25).
Corr. parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
0.40
0.4188
0.4158, 0.4212
0.0016
4.69
0.60
0.7092
0.7077, 0.7108
0.0016
18.19
0.65
0.6879
0.6862, 0.6893
0.0015
5.83
0.50
0.5738
0.5718, 0.5760
0.0016
14.76
0.70
0.7347
0.7331, 0.7364
0.0018
4.96
0.60
0.6580
0.6558, 0.6605
0.0021
9.66
The Pearson correlation (corr.) results for the MOM procedure (n=750).
Corr. parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
0.40
0.4017
0.4012, 0.4022
0.0003
0.43
0.60
0.6337
0.6330, 0.6342
0.0006
5.61
0.65
0.6544
0.6541, 0.6548
0.0003
0.68
0.50
0.5193
0.5187, 0.5200
0.0005
3.86
0.70
0.7047
0.7042, 0.7051
0.0005
0.67
0.60
0.6173
0.6164, 0.6181
0.0007
2.88
The Spearman correlation (corr.) results for the MOP procedure (n=25).
Corr. parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
0.40
0.4138
0.4115, 0.4169
0.0017
3.45
0.60
0.6139
0.6115, 0.6162
0.0020
2.31
0.65
0.6660
0.6646, 0.6692
0.0019
2.46
0.50
0.5129
0.5108, 0.5161
0.0017
2.57
0.70
0.7156
0.7131, 0.7169
0.0019
2.23
0.60
0.6137
0.6115, 0.6169
0.0020
2.29
The Spearman correlation (corr.) results for the MOP procedure (n=750).
Corr. parameter
Estimate
95% bootstrap CI
Standard error
Relative bias %
0.40
0.39995
0.3996, 0.4005
0.0003
—
0.60
0.6002
0.5998, 0.6006
0.0003
—
0.65
0.6502
0.6499, 0.6506
0.0003
—
0.50
0.5004
0.4999, 0.5008
0.0003
—
0.70
0.7005
0.7002, 0.7008
0.0003
0.072
0.60
0.6000
0.5997, 0.6004
0.0003
—
In summary, the proposed MOP procedure is an attractive alternative to the traditional MOM procedure. In particular, the MOP procedure has distinct advantages when distributions with large departures from normality are used.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
AlginaJ.KeselmanH. J.PenfieldR. D.Confidence interval coverage for Cohen's effect size statistic200666694596010.1177/0013164406288161MR2297376AlginaJ.KeselmanH. J.PenfieldR. D.Confidence intervals for an effect size when variances are not equal20065213BadrinathS. G.ChatterjeeS.A data-analytic look at skewness and elongation in common-stock return distributions19919223233DupuisD. J.FieldC. A.Large wind speeds: modeling and outlier detection20049105121FieldC.GentonM. G.The multivariate g-and-h distribution distribution200648110411110.1198/004017005000000562MR2236532GoergG. M.2011Cornell UniversityGueganD.HassaniB.A modified Panjer algorithm for operational risk capital calculations200945372HeadrickT. C.KowalchukR. K.ShengY.Parametric probability densities and distribution functions for Tukey g-and-h transformations and their use for fitting data200829–12449462MR2416698HoaglinD. C.MostellerF.TukeyJ. W.1985New York, NY, USAWileyKowalchukR. K.HeadrickT. C.Simulating multivariate g-and-h distributions2010631637410.1348/000711009X423067MR2751286KeselmanH. J.KowalchukR. K.LixL. M.Robust nonorthogonal analyses revisited: an update based on trimmed means199863145163KeselmanH. J.LixL. M.KowalchukR. K.Multiple comparison procedures for trimmed means19983123141MartinezJ.IglewiczB.Some properties of the Tukey g-and-h family of distributions198413335336910.1080/03610928408828687MR746034ZBL1247.62041MorgenthalerS.TukeyJ. W.Fitting quantiles: doubling, HR, HQ, and HHH distributions20009118019510.2307/1390620MR1819869TukeyJ. W.Modern techniques in data analysisNSF Sponsored Regional Research Conference at Southern Massachusetts University1977North Dartmouth, Mass, USAWilcoxR. R.Comparing the slopes of two independent regression lines when there is complete heteroscedasticity199750309317WilcoxR. R.Detecting nonlinear associations, plus comments on testing hypotheses about the correlation coefficient2001267383HoaglinD. C.1983New York, NY, USAWileyHodisF. A.HeadrickT. C.ShengY.Power method distributions through conventional moments and L-moments2012641-4421592193MR2903239HeadrickT. C.A characterization of power method transformations through L-moments201120112210.1155/2011/497463497463MR2783440HeadrickT. C.PantM. D.Characterizing Tukey h and hh-distributions through L-moments and the L-correlation2012201220980153MR2892253HeadrickT. C.PantM. D.Simulating non-normal distributions with specified L-moments and L-correlations201266442244110.1111/j.1467-9574.2012.00523.xMR2983303PantM. D.HeadrickT. C.A method for simulating Burr Type III and Type XII distributions through L-moments and L-
correlations2013201314191604KarianZ. A.DudewiczE. J.Comparison of gld fitting methods: superiority of percentile fits to moments in L2 norm20032171187DudewiczE. J.KarianZ. A.Fitting the generalized lambda distribution (GLD) system by a method of percentiles, II: tables199919173KarianZ. A.DudewiczE. J.Fitting the generalized lambda distribution to data: a method based on percentiles199928793819KarianZ. A.DudewiczE. J.2011Chapman & HallMR2767101HeadrickT. C.2010Chapman & Hall/CRCIBM Corp. IBM SPSS Statistics for Windows, version 20.0. Armonk, NY, USA, IBM Corp., 2011HeadrickT. C.PantM. D.On the order statistics of standard normal-based power method distributions201220121310.5402/2012/945627945627MR2922970ZBL1264.62043JohnsonN.KotzS.BalakrishnanN.1994New York, NY, USAJohn WileyMoranP. A. P.Rank correlation and product-moment correlation194835203206MR0025123ZBL0031.17103TIBCO Software2008Palo Alto, Calif, USA