We define a new four-parameter model called the odd log-logistic generalized inverse Gaussian distribution which extends the generalized inverse Gaussian and inverse Gaussian distributions. We obtain some structural properties of the new distribution. We construct an extended regression model based on this distribution with two systematic structures, which can provide more realistic fits to real data than other special regression models. We adopt the method of maximum likelihood to estimate the model parameters. In addition, various simulations are performed for different parameter settings and sample sizes to check the accuracy of the maximum likelihood estimators. We provide a diagnostics analysis based on case-deletion and quantile residuals. Finally, the potentiality of the new regression model to predict price of urban property is illustrated by means of real data.
Conselho Nacional de Desenvolvimento Científico e TecnológicoCoordenação de Aperfeiçoamento de Pessoal de Nível Superior1. Introduction
The inverse Gaussian (IG) distribution is widely used in several research areas, such as life-time analysis, reliability, meteorology and hydrology, engineering, and medicine. Some extensions of the IG distribution have appeared in the literature. For example, the generalized inverse Gaussian (GIG) distribution with positive support is introduced by Good [1] in a study of population frequencies. Several papers have investigated the structural properties of the GIG distribution. Sichel [2] used this distribution to construct mixtures of Poisson distributions. Statistical properties and distributional behavior of the GIG distribution were discussed by Jørgensen [3] and Atkinson [4]. Dagpunar [5] provided algorithms for simulating this distribution. Nguyen et al. [6] showed that it has positive skewness. More recently, Madan et al. [7] proved that the Black-Scholes formula in finance can be expressed in terms of the GIG distribution function. Koudou [8] presented a survey about its characterizations and Lemonte and Cordeiro [9] obtained some mathematical properties of the exponentiated generalized inverse Gaussian (EGIG) distribution.
In this paper, we study a new four-parameter model named the odd log-logistic generalized inverse Gaussian (OLLGIG) distribution which contains as special cases the GIG and IG distributions, among others. Its major advantage is the flexibility in accommodating several forms of the density function, for instance, bimodal and unimodal shapes. It is also suitable for testing goodness-of-fit of some submodels.
Our main objective is to study a new regression model with two systematic structures based on the OLLGIG distribution. We obtain some mathematical properties and discuss maximum likelihood estimation of the parameters. For these models, we presented some ways to perform global influence (case-deletion) and, additionally, we developed residual analysis based on the quantile residual. For different parameter settings and sample sizes, various simulation studies were performed and the empirical distribution of quantile residual was displayed and compared with the standard normal distribution. These studies suggest that the empirical distribution of the quantile residual for the OLLGIG regression model with two regression structures a high agreement with the standard normal distribution.
This paper is organized as follows. In Section 2, we define the OLLGIG distribution. In Section 3, we obtain some of its structural properties. We define the OLLGIG regression model in Section 4 and evaluate the performance of the maximum likelihood estimators (MLEs) of the model parameters by means of a simulation study. In Section 5, we adopt the case-deletion diagnostic measure and define quantile residuals for the fitted model. Further, we perform various simulations for these residuals. In Section 6, we provide two applications to real data to illustrate the flexibility of the OLLGIG regression model. Finally, some concluding remarks are offered in Section 7.
2. The OLLGIG Distribution
The GIG distribution [3] has been applied in several areas of statistical research. The cumulative distribution function (cdf) and probability density function (pdf) of the GIG distribution are given by (for y>0)(1)Gμ,σ,νy=∫0ybμνtν-12Kνσ-2exp-12σ2btμ+μbtdtand(2)gμ,σ,νy=Cyν-1exp-12σ2byμ+μby,where μ>0 is the location parameter, σ>0 is the scale parameter, ν∈R is the shape parameter, Kν(t)=1/2∫0∞yν-1exp-1/2tu+u-1du is the modified Bessel function of the third kind and index ν, b=Kν+1(σ-2)/Kν(σ-2), and C=C(μ,σ,ν)=b/μν/2Kν(σ-2).
We denote by W~GIG(μ,σ,ν) a random variable having density function (2). The mean and variance of W are(3)EW=μ,VW=μ22σ2bν+1+1b2-1,respectively.
The moment generating function (mgf) of W reduces to(4)Mt=1-2μσ2tb-ν/2Kνσ-2-1Kν1σ21-2μσ2tb1/2.We use the reparameterized GIG distribution according to GAMLSS in software R. For example, we have GIG(μ,σμ1/2,-0.5)=IG(μ,σ). Other properties of the GIG distribution are investigated by Jørgensen [3].
The statistical literature is filled with hundreds of continuous univariate distributions. Recently, several methods of introducing one or more parameters to generate new distributions have been proposed. Based on the odd log-logistic generator (OLL-G) [10], we define the OLLGIG cdf, say F(y)=F(y;μ,σ,ν,τ), by integrating the log-logistic density function as follows:(5)Fy=∫0Gμ,σ,νy/G¯μ,σ,νyτxτ-11+xτ2dx=Gμ,σ,νyτGμ,σ,νyτ+G¯μ,σ,νyτ,where G¯μ,σ,ν(y)=1-Gμ,σ,ν(y), μ>0 is a position parameter, σ>0 is a scale parameter, and ν∈R and τ>0 are shape parameters. Clearly, Gμ,σ,ν(y) is a special case of (5) when τ=1.
Henceforth, we write η(y)=Gμ,σ,ν(y) to simplify the notation. The OLLGIG density function can be expressed as(6)fy=fy;μ,σ,ν,τ=bμντyν-12Kνσ-2exp-12σ2byμ+μby×ηy1-ηyτ-1ηyτ+1-ηyτ-2.
The main motivations for the OLLGIG distribution are to make its skewness and kurtosis more flexible (compared to the GIG model) and also allow bi-modality. We have τ=logFy/F¯y/logη(y)/η¯(y), where F¯(y)=1-F(y) and η¯(y)=1-η(y). Thus, the parameter τ represents the quotient of the log odds ratio for the new and baseline distributions. Note that the pdf and cdf of the OLLGIG distribution depend on integrals, which are calculated numerically in the same way as those of the Birnbaum-Saunders distribution.
Hereafter, we assume that the random variable Y follows the OLLGIG cdf (5) with parameters (μ,σ,ν,τ)T, say Y~OLLGIG(μ,σ,ν,τ). The OLLGIG distribution contains as special cases the GIG distribution when τ=1 and the IG distribution when τ=1 and ν=-0.5.
Some plots of the OLLGIG density for selected parameter values are displayed in Figure 1. It is evident that the proposed distribution is much more flexible, especially in relation to bi-modality (for 0<τ<1), than the GIG and IG distributions.
Plots of the OLLGIG density for some parameter values.
Equation (5) has tractable properties especially for simulations, since its quantile function (qf) takes the simple form(7)y=QGIGu1/τu1/τ+1-u1/τ,where QGIG(u)=Gμ,σ,ν-1(u) is the qf of the GIG distribution. This scheme is useful because of the existence of fast generators for GIG random variables in some statistical packages. For example, we can fit the generalized additive models for the location, scale, and shape (GAMLSS) in R.
We use the GAMLSS package to simulate data from this nonlinear equation. The plots comparing the exact OLLGIG densities and the histograms from two simulated data sets with 100,000 replications for selected parameter values are displayed in Figure 2. These plots (and several others not shown here) indicate that the simulated values are consistent with the OLLGIG distribution.
Histograms and plots of the OLLGIG density.
3. Properties of the OLLGIG Model3.1. Linear Representation
By defining the sets Ii={(k,j);k-j=i} for i=0,1,…, and following the results of Lemonte and Cordeiro [9, Section 3], we can expand η(y)=Gμ,σ,ν(y) as(8)ηy=1-ρ-yν∑i=0∞diyi,where ρ=ρ(μ,σ,ν)=Cb/2μσ2-ν∑j=0∞Γ(ν-j)[-4σ2)-1j/j!, di=∑(k,j)∈Iiaj,k, and(9)aj,k=aj,kμ,σ,ν=-1k+j+1Ck-j+νj!k!bk-jμj-k2k+jσ2k+j.To calculate ρ, the index j can stop after a large number of summands.
Further, we can rewrite η(y) after some algebra as(10)ηy=1-ρ-c0-∑i=1∞ciyi,where ci=∑k=0ifkdi-k (for i=0,1,…) and (ν)r=ν(ν-1)⋯(ν-r+1) is the descending factorial and fj=∑r=j∞(-1)r-jrj(ν)r/r!.
We obtain an expansion for F(y) in (5). First, we use a power series for η(y)τ (τ real)(11)ηyτ=∑k=0∞pkηyk,where(12)pk=pkτ=∑j=k∞-1k+jαjjk.For any real τ, we consider the generalized binomial expansion(13)1-ηyτ=∑k=0∞-1kαkηyk.Inserting (11) and (13) in (5) gives(14)Fy=∑k=0∞pkηyk∑k=0∞qkηyk,where qk=qk(τ)=pk(τ)+(-1)kτk (for k≥0). The ratio of the two power series in the last equation can be reduced to(15)Fy=∑k=0∞wkηyk,where the coefficients wk’s (for k≥0) are determined from the recurrence equation(16)wk=wkτ=q0-1pk-∑r=1kqrwk-r.By differentiating (15), the pdf f(y) reduces to(17)fy=∑k=0∞wk+1hk+1y,where hk+1(y)=(k+1)η(y)kgμ,σ,ν(y) is the exponentiated generalized inverse Gaussian (EGIG) density function with power parameter k+1 (for k≥0).
We can derive a linear representation for f(y) in terms of GIG densities based on the previous results and following the expansions of Lemonte and Cordeiro [9] that lead to their (24). First, we can express hk+1(y) as(18)hk+1y=∑j=0kmjkπjy.Here, πj(y) represents the GIG(μ,σ,ν+j) density function and the coefficients are given by mj(k)=(k+1)vj,kC(μ,σ,ν)/C(μ,σ,ν+j), where vj,k=∑i=0k(-1)iki∑r=0iirρi-rtj,r and the quantities tj,r are determined from the recurrence relation tj,r=j-1∑m=1j[(r+1)m-j]cmtj-m,r (for j≥1) and t0,r=1 with cm’s given in (10).
By combining (17) and (18) and changing ∑k=0∞∑j=0k by ∑j=0∞∑k=j∞, we obtain(19)fy=∑j=0∞sjπjy,where sj=∑k=j∞wk+1mj(k).
Equation (19) reveals that the OLLGIG density function is an infinite linear combination of GIG densities.
3.2. Two Properties
Equation (19) becomes useful in deriving several mathematical properties of the proposed distribution using well-known properties of the GIG distribution. We provide only two examples. The rth moment about zero of the GIG(μ,σ,ν) random variable defined by (2) is(20)EWr=μbrKν+rσ-2Kνσ-2.
Then, the ordinary moments of the OLLGIG random variable Y follow from (19) as(21)EYr=μrKνσ-2∑j=0∞sjKν+j+rσ-2bjr,where bj=Kν+j+1(σ-2)/Kν(σ-2).
By combining (19) and (4), the generating function of Y takes the form(22)MYt=1Kνσ-2∑j=0∞sj1-2μσ2tbj-ν/2Kν+j1σ21-2μσ2tbj1/2.
4. The OLLGIG Regression Model
In many practical applications, the lifetimes are affected by explanatory variables such as sex, smoking, diet, blood pressure, cholesterol level and several others. So, it is important to explore the relationship between the response variable and the explanatory variables. Regression models can be proposed in different forms in statistical analysis. In this section, we define the OLLGIG regression model with two systematic structures based on the new distribution. It is a feasible alternative to the GIG and IG regression models for data analysis.
Regression analysis involves specifications of the distribution of Y given a vector x=(x1,…,xp)T of covariates. We relate the parameters μ and σ to the covariates by the logarithm link functions(23)μi=expxiTβ1,σi=expxiTβ2,i=1,…,n,respectively, where β1=(β11,…,β1p)T and β2=(β21,…,β2p)T denote the vectors of regression coefficients and xiT=(xi1,…,xip). The most important of the parametric regression models defines the covariates in x which model both μ and σ.
Consider a sample (y1,x1),…,(yn,xn) of n independent observations. Conventional likelihood estimation techniques can be applied here. The total log-likelihood function for the vector of parameters θ=(β1T,β2T,ν,τ)T from model (23) is given by(24)lθ=nlogτ+ν∑i=1nlogbμi+ν-1∑i=1nlogyi-∑i=1nlog2Kν1σi2-12∑i=1n1σi2byiμi+μibyi+τ-1∑i=1nlogηyi1-ηyi-2∑i=1nlogηyiτ+1-ηyiτ,where Kν(·) and η(·) are defined in Section 2. The MLE θ^ of θ can be calculated by maximizing the log-likelihood (24) numerically in the GAMLSS package of the R software. The advantage of this package is that we can adopt many maximization methods, which will depend only on the current fitted model. Initial values for β1 and β2 are taken from the fit of the GIG regression model with τ=1. We do not have problems of maximizing this log-likelihood function. This fact is shown in Section 4.1, where some simulations of the proposed regression model are given under different scenarios.
Under general regularity conditions, the asymptotic distribution of (θ^-θ) is multivariate normal N2p+2(0,K(θ)-1), where K(θ) is the expected information matrix. The asymptotic covariance matrix K(θ)-1 of θ^ can be approximated by the inverse of the (2p+2)×(2p+2) observed information matrix -L¨(θ). The elements of this matrix are calculated numerically. The approximate multivariate normal distribution N2p+2(0,-L¨(θ^)-1) for θ^ can be used in the classical way to construct approximate confidence for the parameters in θ.
We can use the likelihood ratio (LR) statistic for comparing some special sub-models with the OLLGIG regression model. We consider the partition θ=(θ1T,θ2T)T, where θ1 is a subset of parameters of interest and θ2 is a subset of remaining parameters. The LR statistic for testing the null hypothesis H0:θ1=θ1(0) versus the alternative hypothesis H1:θ1≠θ1(0) is given by w=2{l(θ^)-l(θ~)}, where θ~ and θ^ are the estimates under the null and alternative hypotheses, respectively. The statistic w is asymptotically (as n→∞) distributed as χk2, where k is the dimension of the subset of parameters θ1 of interest. For example, the test of H0:τ=1 versus H:τ≠1 is equivalent to compare the OLLGIG regression model with the GIG regression model and the LR statistic reduces to w=2lβ^1,β^2,ν^,τ^-lβ~1,β~2,ν~,1, where β^1, β^2, ν^, and τ^ are the MLEs under H and β~1, β~2, and ν~ are the estimates under H0.
4.1. Simulation Study
In this part of simulation, we approach in two different ways. First, we perform a simulation to study the behavior of the MLEs of the parameters of the OLLGIG distribution without systematic structures. Second, we evaluate the behavior of the parameter estimates considering two systematic structures.
The OLLGIG Distribution. Some properties of the MLEs are evaluated using a classical analysis by means of a simulation study. We simulate the OLLGIG distribution as follows:
Compute the inverse function F-1(·) from the cumulative distribution (1).
Generate u~U(0,1).
Apply u in F-1(u)=Q(u) from (7).
The values t=Q(u) are generated from the OLLGIG distribution, where Q(u) is the inverse of (1).
We take n=20,50,150 and 350 for each replication and then evaluate the estimates μ^, σ^, ν^, and τ^. We repeat this process 1,000 times and then calculate the average estimates (AEs), biases, and means squared errors (MSEs). In the first scenario, we take τ=0.3662, μ=5.7915, σ=0.0658, and ν=12.7216. We use the values fitted in the adjustment to the iris data set in Section 6. The estimates of the model parameters are computed using the GAMLSS package of the R software. The results of the Monte Carlo study under maximum likelihood are given in Table 1. They indicate that the MLEs are accurate. Further, the MSEs of the MLEs of the model parameters decay toward zero when n increases in agreement with first-order asymptotic theory.
AEs, biases, and MSEs for the parameters of the OLLGIG distribution.
scenario 1
scenario 2
n=20
n=50
Parameter
AE
Bias
MSE
Parameter
AE
Bias
MSE
μ^
5.9023
0.1109
0.0895
μ^
5.8618
0.0704
0.0231
σ^
0.7468
0.6810
2.6764
σ^
0.2119
0.1461
0.5443
ν^
13.4759
0.7544
66.2484
ν^
12.6530
-0.0685
7.2487
τ^
1.0945
0.7283
2.2952
τ^
0.7115
0.3452
0.4276
scenario 3
scenario 4
n=150
n=350
Parameter
AE
Bias
MSE
Parameter
AE
Bias
MSE
μ^
5.8354
0.0439
0.0039
μ^
5.8195
0.0281
0.0014
σ^
0.0822
0.0165
0.0004
σ^
0.0757
0.0099
0.0001
ν^
12.7241
0.0026
0.0097
ν^
12.7131
-0.0085
0.0080
τ^
0.4822
0.1160
0.0242
τ^
0.4363
0.0700
0.0075
The OLLGIG Regression Model. We examine the performance of the MLEs in the OLLGIG regression model by means of some simulations with sample sizes n=100,300 and 500. We simulate 1,000 samples from two scenarios (τ=0.5 and τ=1.5) by considering μi=β10+β11xi and σi=β20+β21xi. For both cases, we take ν=0.53. The explanatory variable is generated by xi~U(0,1) and the response variable is generated by yi~OLLGIG(μi,σi,ν,τ). For each fitted model, we compute the AEs, biases, and MSEs. Based on the results given in Table 2, we note that the MSEs of the MLEs of β10, β11, β20, β21, and τ decay toward zero when the sample size n increases, as usually expected under first-order asymptotic theory. Further, the AEs of the parameters tend to be closer to the true parameter values when n increases. These facts support that the asymptotic normal distribution provides an adequate approximation to the finite sample distribution of the estimates.
AEs, biases, and MSEs for the OLLGIG regression model under scenarios 1 and 2.
scenario 1
n=100
n=300
n=500
Parameter
AE
Bias
MSE
AE
Bias
MSE
AE
Bias
MSE
β10
1.5044
0.0044
0.0028
1.5011
0.0011
0.0008
1.5014
0.0014
0.0005
β11
-0.6979
0.0021
0.0100
-0.6981
0.0019
0.0027
-0.6992
0.0008
0.0017
β20
-1.9426
0.0574
0.1223
-1.9606
0.0394
0.0401
-1.9637
0.0363
0.0280
β21
0.3636
0.0136
0.0508
0.3522
0.0022
0.0162
0.3516
0.0016
0.0093
τ
0.5998
0.0998
0.0763
0.5448
0.0448
0.0228
0.5368
0.0368
0.0149
scenario 2
n=100
n=300
n=500
Parameter
AE
Bias
MSE
AE
Bias
MSE
AE
Bias
MSE
β10
1.4975
-0.0025
0.0005
1.4982
-0.0018
0.0002
1.4986
-0.0014
0.0001
β11
-0.7017
-0.0017
0.0018
-0.7024
-0.0024
0.0005
-0.7025
-0.0025
0.0003
β20
-2.2627
-0.2627
0.1839
-2.1766
-0.1766
0.0872
-2.1548
-0.1548
0.0659
β21
0.3502
0.0002
0.0717
0.3432
-0.0068
0.0247
0.3454
-0.0046
0.0148
τ
1.2052
-0.2948
0.3226
1.2720
-0.2280
0.1720
1.2932
-0.2068
0.1389
5. Checking Model: Diagnostic and Residual Analysis
A first tool to perform sensitivity analysis, as stated before, is by means of global influence starting from case-deletion [11, 12]. Case-deletion is a common approach to study the effect of dropping the ith observation from the data set. The case-deletion model with systematic structures (23) is given by(25)μl=expxlTβ1,σl=expxlTβ2,l=1,2,…,n,l≠i.In the following, a quantity with subscript “(i)” means the original quantity with the ith observation deleted. For model (25), the log-likelihood function of θ is denoted by l(i)(θ). Let θ^(i)=(β1^(i)T,β2^(i)T,ν^(i),τ^(i))T be the MLE of θ from l(i)(θ). To assess the influence of the ith observation on the MLEs θ^=(β1^T,β2^T,ν^,τ^)T, we can compare the difference between θ^(i) and θ^. If deletion of an observation seriously influences the estimates, more attention should be paid to that observation. Hence, if θ^(i) is far from θ^, then the ith observation can be regarded as influential. A first measure of the global influence is defined as the standardized norm of θ^(i)-θ^ (generalized Cook distance) given by(26)GDiθ=θ^i-θ^TL¨θθ^i-θ^.
Another alternative is to assess the values of GDi(β1), GDi(β2), and GDi(ν,τ) since these values reveal the impact of the ith observation on the estimates of β1, β2, and (ν,τ), respectively. Another popular measure of the difference between θ^(i) and θ^ is the likelihood distance given by(27)LDiθ=2lθ^-lθ^i.
Once the model is chosen and fitted, the analysis of the residuals is an efficient way to check the model adequacy. The residuals also serve to identify the relevance of an additional factor omitted from the model and verify if there are indications of serious deviance from the distribution considered for the random error. Further, since the residuals are used to identify discrepancies between the fitted model and the data set, it is convenient to define residuals that take into account the contribution of each observation to the goodness-of-fit measure.
In summary, the residuals allow measuring the model fit for each observation and enable studying whether the differences between the observed and fitted values are due to chance or to a systematic behavior that can be modeled. The quantile residuals (qrs) [13] for the OLLGIG regression model with two systematic structures are defined by(28)qri=Φ-1ηyiτηyiτ+1-ηyiτ,where η(·) is given in (1) and Φ(·)-1 is the inverse cumulative standard normal distribution.
Atkinson [14] suggested the construction of an envelope to have a better interpretation of the probability normal plot of the residuals. The simulated confidence bands of the envelope should contain the residuals. If the model is well-fitted, the majority of points will be within these bands and randomly distributed. The construction of the confidence bands follows the steps:
Fit the proposed model and calculate the residuals qri’s;
Simulate k samples of the response variable using the fitted model;
Fit the model to each sample and calculate the residuals qrij (j=1,…,k and i=1,…,n);
Arrange each group of n residuals in rising order to obtain qr(i)j for j=1,…,k and i=1,…,n;
For each i, calculate the mean, minimum and maximum qr(i)j, namely,(29)qriM=∑j=1kqrijk,qriI=minqrij:1≤j≤k,qriS=maxqrij:1≤j≤k;
Include the means, minimum, and maximum together with the values of qri against the expected percentiles of the standard normal distribution.
The minimum and maximum values of qri′s form the envelope. If the model under study is correct, the observed values should be inside the bands and distributed randomly.
Simulation Study. A simulation study is conducted to investigate the behavior of the empirical distribution of the qrs for the OLLGIG regression model. We generate 1,000 samples based on the algorithm presented in Section 4.1. We also give the normal probability plots to assess the degree of deviation from the normality assumption of the residuals. Based on the plots in Figures 3 and 4 representing the first and second scenarios, respectively, we conclude that the empirical distribution of the qrs agrees with the standard normal distribution in both scenarios. This empirical distribution becomes closer to the standard normal distribution when n increases in both scenarios.
Normal probability plots for qri in the OLLGIG regression model under scenario 1 (τ=0.5) (a) n=100. (b) n=300. (c) n=500.
Normal probability plots for qri in the OLLGIG regression model for scenario 2 (τ=1.5) (a) n=100. (b) n=300. (c) n=500.
6. Applications
In this section, we provide two applications to real data to prove empirically the flexibility of the OLLGIG model. The calculations are performed with the R software.
6.1. Application 1: Iris Data
In the first application, the OLLGIG distribution is compared with the nested GIG and IG distributions. The data set is iris, in which it provides measurements in centimeters of the variables length and width of the septal and length and width of the petal, respectively, for 50 flowers of each of the 3 iris species (setosa, versicolor, and virginica). In this application, the variable septum length (Sepal.Length) is used. This data set has been analyzed by several authors in multivariate analysis, for example, Anderson (1935) and Fisher [15]. We show that the distribution for these data presents bimodality.
Table 3 provides a descriptive summary for these data and indicates positively distorted distributions with varying degrees of variability, skewness, and kurtosis.
Descriptive statistics for iris flower data.
Mean
Median
SD
Skewness
Kurtosis
Min.
Max.
5.843
5.800
0.828
0.3086
-0.6058
4.300
7.900
A brief descriptive analysis of the data in Table 3 reveals that the average score of the variable septum length is 5.843 and the median value is 5.800, thus indicating that the data has a symmetric distribution.
In Table 4, we report the MLEs of the model parameters and their standard errors (SEs) in parentheses. We give in Table 5 the following goodness-of-fit measures: Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hernnan-Quinn Information Criterion (HIQC), Cramér-von Misses (W∗), Anderson Darling (A∗), and Kolmogarov-Smirnov (KS) test statistic. The small values of these measures, the better the fit. The figures in Table 5 indicate that the OLLGIG distribution has the lowest values of AIC, CAIC, BIC, HQIC, A∗, W∗, and KS among those of the fitted models and therefore it could be chosen as the best model.
MLEs and SEs (in parentheses) of the model parameters for the iris data.
Model
τ
μ
σ
ν
OLLGIG
0.3662
5.7915
0.0658
12.7216
(0.0685)
(0.0091)
(0.0079)
(0.0130)
GIG
1
5.8433
0.1413
0.1000
(-)
(0.0674)
(0.0082)
(72.9562)
IG
1
5.8433
0.0585
-0.5
(-)
(0.0674)
(0.0034)
(-)
Goodness-of-fit measures for the iris data.
Model
AIC
CAIC
BIC
HQIC
A∗
W∗
KS
OLLGIG
365.0638
365.3397
377.1064
369.9563
0.3474
0.0486
0.0578
GIG
369.8170
369.9814
378.8489
373.4864
0.7242
0.1164
0.0881
IG
367.8134
367.8951
373.8347
370.2597
0.7244
0.1165
0.0881
We consider LR statistics to compare nested models. The OLLGIG distribution includes some submodels as mentioned above, thus allowing their evaluations relative to the others and to a more general model. The values of the LR statistics are listed in Table 6. It is evident from the figures in this table that the OLLGIG distribution outperforms its submodels according to the values of the LR statistics. So, it indicates that the OLLGIG model provides a better fit to these data than their sub-models.
LR tests for the iris data.
Models
Hypotheses
Statistic w
p-value
OLLGIG vs GIG
H0:τ=1 vs H1:H0 is false
6.7532
0.0094
OLLGIG vs IG
H0:τ=1 and ν=-0.5 vs H1:H0 is false
6.7496
0.0342
More information is provided by a visual comparison of the histogram of the data and the fitted density functions and cumulative functions. The plots of the fitted OLLGIG, GIG, and IG densities are displayed in Figure 5(a). The estimated OLLGIG density provides the closest fit to the histogram of the data. In order to assess if the model is appropriate, the plots of the fitted OLLGIG, GIG, and IG cumulative distributions and the empirical cdf are displayed in Figure 5(b). They indicate that the OLLGIG distribution provides a good fit to these data.
(a) Estimated densities of the OLLGIG, GIG, and IG models for iris data. (b) Estimated cumulative functions of the OLLGIG, GIG, and IG models and the empirical cdf for iris data.
In Figure 6, we note that the iris data has a bimodality shape, where they cannot have the GIG and IG distributions (see Figure 5(a)).
Estimated densities of the OLLGIG for iris data.
6.2. Application 2: Price of Urban Property Data
Here, we provide a second application of the OLLGIG regression model to evaluation the price of urban residential properties for sale in the municipality of Paranaíba in the State of Mato Grosso do Sul (MS) in Brazil. These data collected in 2017 refer to n=45 houses for sale in the municipality. In the context of real estate appraisal, it is necessary to develop statistical methodologies (characterized by the scientific accuracy) of residential property prices. Besides this aspect, we can perceive the rare use of such methodologies by the real estate market. We construct a OLLGIG regression model with two systematic components to describe the relationship between real estate prices and other explanatory variables, thus allowing an understanding of the behavior of the price variable [16, 17]. The following explanatory variables are considered:
price of the property yi; this variable was divided by 10,000;
area xi1 of land in square meters;
number of parking spaces xi2 in the residence (0=no vacancy, 1=one vacancy, and 2=more than one vacancy); in this case, two dummy variables, xi21 and xi22, are created;
number of rooms with suites xi3 in the residence (0=no suites, 1=one suites, 2=more than one suites); in this case two dummy variables, xi31 and xi32, are created;
if the residence has a swimming pool xi4 (0=no, 1=yes);
if the residence is located in the center of the city xi5 (0=no, 1=yes); i=1,…,45.
In the descriptive analysis of the data from Table 7, the mean score of the variable value is 24.98, which is not close to the median value 17.00, thus indicating that the data has an asymmetric distribution.
Descriptive analysis of the price of urban property data$.
Mean
Median
SD
Skewness
Kurtosis
Min.
Max.
24.98
17.00
23.9180
3.3330
14.0134
5.50
150.00
We define the OLLGIG regression model by two systematic structures for μ and σ(30)μi=expβ10+β11xi1+β121xi21+β122xi22+β131xi31+β132xi32+β14xi4+β15xi5and(31)σi=expβ20+β21xi1+β221xi21+β222xi22+β231xi31+β232xi32+β24xi4+β25xi5,i=1,…,45.
We now consider the test of homogeneity of the scale parameter for the price of urban property data. The LR statistic (see Section 4) for testing the null hypothesis H0:β21=β221=β222=β231=β232=β24=β25=0 is w=31.98(pvalue<0.0001), which gives a favorable indication toward to the dispersion not be constant.
In Table 8, we present the MLEs, SEs, and p-values. The covariates x2, x3, and x5 are significant at the 5% level in the regression structure for the location parameter μ, whereas the covariates x1, x3, x4, and x5 are significant (at the same level) for the parameter σ. The figures in this table reveal that the covariate x1 is not significant with respect to the parameter μ, but it is significant with respect to the parameter σ. This is due to a strong dispersion in the response variable. The covariate x2 is also significant for the number of parking spaces in the structure of μ. The covariate x3 is significant in the location and scale structure; i.e., there is a significant difference between the residences that do not have a suite, have a suite, or more. The covariate x4 is not significant in relation to the location, but it is significant in the structure of σ. There is a significant difference in the residence with or without swimming pool for the dispersion parameter. This fact can also be noted in Figure 7(a). The covariate x5 is significant in relation to both parameters μ and σ; i.e., there is a significant difference between the residences being in the center of the city and outside the center. This fact can also be noted in Figure 7(b).
MLEs, standard errors (SEs), and p-values for the OLLGIG regression model fitted for the price of urban property data.
Parameter
Estimate
SE
p-Value
β^10
7.0690
0.4428
<0.001
β^11
-0.0005
0.0002
0.0679
β^121
0.8069
0.2689
0.0057
β^122
0.8407
0.2677
0.0041
β^131
-0.8976
0.1945
<0.001
β^132
0.4326
0.1872
0.0287
β^14
0.5794
0.6941
0.4111
β^15
-0.5323
0.1008
<0.001
β^20
2.614
0.5982
<0.001
β^21
0.0013
9.961e-05
<0.001
β^221
-0.2054
0.1316
0.1303
β^222
-0.1741
0.1223
0.1660
β^231
0.5139
0.0739
<0.001
β^232
0.2585
0.1077
0.0235
β^24
-2.135
0.4818
<0.001
β^25
0.2575
0.0481
<0.001
ν^
-0.4942
0.1231
<0.001
τ^
12.764
2.436
Estimated cdf from the fitted OLLGIG regression model and the empirical cdf for the price of urban property data. (a) For covariate x4, and (b) for covariate x5.
The AIC, BIC, and global deviance (GD) statistics are listed in Table 9. We note that the OLLGIG regression model presents the lowest AIC, BIC, and GD values among the other fitted models. So, there are indications that the OLLGIG model provides a better fit to these data.
Goodness-of-fit measures for the price of urban property data.
Model
AIC
BIC
GD
OLLGIG
322.0612
354.5811
286.0612
GIG
348.8190
379.5323
314.8190
IG
333.3241
362.2307
301.3241
We adopt again the LR statistics to compare the fitted models in Table 10. We reject the null hypotheses in the two tests in favor of the wider OLLGIG regression model. Rejection is significant at the 5% level and provides clear evidence of the need of the shape parameter τ when modeling real data.
LR tests for the price of urban property data.
Models
Hypotheses
Statistic w
p-value
OLLGIG vs GIG
H0:τ=1 vs H1:H0 is false
28.7579
<0.001
OLLGIG vs IG
H0:τ=1 and ν=-0.5 vs H1:H0 is false
15.2629
<0.001
We use the R software to compute the LDi(θ) and GDi(θ) measures in the diagnostic analysis presented in Section 5. The results of such influence measures index plots are displayed in Figure 8. These plots indicate that the cases ♯7, ♯43, and ♯45 are possible influential observations.
Index plot for θ: (a) LDi(θ) (likelihood distance) and (b) GDi(θ) (generalized Cook’s distance).
In addition, Figure 9(a) provides plots of the qrs for the fitted model, thus showing that all observations are in the interval (-3,3) and a random behavior of the residuals. Hence, there is no evidence against the current suppositions of the fitted model. In order to detect possible departures from the distribution errors in model, as well as outliers, we present the normal plot for the qrs with a generated envelope in Figure 9(b). This plot reveals that the OLLGIG regression model is very suitable for these data, since there are no observations falling outside the envelope. Also, no observation appears as a possible outlier.
(a) Index plot of the qrs and (b) normal probability plot with envelope for the qrs from the fitted OLLGIG regression model fitted to urban property data.
7. Concluding Remarks
We present a four-parameter distribution called the odd log-logistic generalized Gaussian inverse (OLLGIG) distribution, which includes as special cases the generalized Gaussian inverse (GIG) and inverse Gaussian (IG). We provide some of its mathematical properties. Further, we define the OLLGIG regression model with two systematic structures based on this new distribution, which is very suitable for modeling censored and uncensored data. The proposed model serves as an important extension to several existing regression models and could be a valuable addition to the literature. Some simulations are performed for different parameter settings and sample sizes. The maximum likelihood method is described for estimating the model parameters. Diagnostic analysis is presented to assess global influences. We also discuss the sensitivity of the maximum likelihood estimates from the fitted model via quantile residuals. The utility of the proposed OLLGIG regression model is demonstrated by means of a real data set for price data of urban residential properties in the municipality of Paranaíba in the State of Mato Grosso do Sul, Brazil.
Data Availability
The [DATA TYPE] data used to support the findings of this study were supplied by Uiversidade Federal de mato Grosso do Sul under license and so cannot be made freely available.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil.
GoodI. J.The population frequencies of species and the estimation of population parameters195340237264MR006133010.1093/biomet/40.3-4.237Zbl0051.37103SichelH. S.On a Distribution Law for Word Frequencies19757054254710.1080/01621459.1975.10482469JørgensenB.1982New York, NY, USASpringerMR648107Zbl0486.62022AtkinsonA. C.The simulation of generalized inverse Gaussian and hyperbolic random variables19823450251510.1137/0903033MR677102Zbl0489.65008DagpunarJ. S.An easily implemented generalised inverse Gaussian generator198918270371010.1080/03610918908812785MR1016234NguyenT. T.ChenJ. T.GuptaA. K.DinhK. T.A proof of the conjecture on positive skewness of generalised inverse Gaussian distributions200390124525010.1093/biomet/90.1.245MR1966565Zbl1035.60012MadanD.RoynetteB.YorM.Unifying black-scholes type formulae which involve brownian last passage times up to a finite horizon2008152971152-s2.0-5854910066610.1007/s10690-008-9068-yZbl1163.91414KoudouA. E.LeyC.Efficiency combined with simplicity: new testing procedures for generalized inverse Gaussian models201423470872410.1007/s11749-014-0378-2MR3274471Zbl1312.62027LemonteA. J.CordeiroG. M.The exponentiated generalized inverse Gaussian distribution201181450651710.1016/j.spl.2010.12.016MR2765171GleatonJ. U.LynchJ. D.Properties of generalized log-logistic families of lifetime distributions2006415164MR2240301CookR. D.Detection of influential observation in linear regression19771911518MR0436478Zbl0371.62096CookR. D.WeisbergS.1982NY, USAChapman & HallMonographs on Statistics and Applied ProbabilityMR675263DunnP. K.SmythG. K.Randomized Quantile Residuals1996532362442-s2.0-003030545310.2307/1390802AtkinsonA. C.1985Oxford, UKClarendon Press Oxford10.2307/2531829FisherR. A.The use of multiple measurements in taxonomic problems1936717918810.1111/j.1469-1809.1936.tb02137.xBertrandJ. W. M.FransooJ. C.Modelling and simulation: Operations management research methodologies using quantitative modeling200222241264AraújoE. G.PereiraJ. C.XimenesF.SpanholC. P.GarsonS.AraújoE. G.Proposta de uma metodologia para a avaliação do preço de venda de imóveis residenciais em Bonito/MS baseado em modelos de regressão linear múltipla201210195207