A New Ridge-Type Estimator for the Gamma Regression Model

The known linear regression model (LRM) is used mostly for modelling the QSAR relationship between the response variable (biological activity) and one or more physiochemical or structural properties which serve as the explanatory variables mainly when the distribution of the response variable is normal. The gamma regression model is employed often for a skewed dependent variable. The parameters in both models are estimated using the maximum likelihood estimator (MLE). However, the MLE becomes unstable in the presence of multicollinearity for both models. In this study, we propose a new estimator and suggest some biasing parameters to estimate the regression parameter for the gamma regression model when there is multicollinearity. A simulation study and a real-life application were performed for evaluating the estimators' performance via the mean squared error criterion. The results from simulation and the real-life application revealed that the proposed gamma estimator produced lower MSE values than other considered estimators.


Introduction
e gamma regression model (GRM) is generally adopted to model a skewed response variable that follows a gamma distribution with one or more independent variables. It is used in modelling the real-life data problems of several fields such as the medical sciences, health care economic, and automobile insurance claim [1]. When the positively skewed response variable follows a gamma distribution with a given set of independent variables, then it is preferred to use the gamma regression model [2][3][4]. As in linear regression models, the explanatory variables independence assumption rarely holds in practice, so the multicollinearity problem exists in the gamma regression models which means the maximum likelihood estimator (MLE) is unstable and gives high variances [5]. Consequently, constructing confidence intervals or testing the regression parameters of the model becomes difficult [6]. A lot of authors proposed different estimators for handling multicollinearity. e ridge estimator given by Hoerl and Kennard [7] is an alternative to MLE to overcome the multicollinearity in the linear regression model. e estimator has been extended to the generalized linear models (GLM) (see [8,9]). Also, Månsson and Shukur [10] and Månsson [11] introduced the ridge estimator to the Poisson regression model and the negative binomial regression model, respectively. Kurtoglu and Ozkale [12] extend the Liu estimator of Liu [13] to the gamma regression model. Batah et al. [14] proposed a modified Jackknife ridge estimator by combining the ideas of the generalized ridge estimator and Jackknifed ridge estimator. Also, Algamal [3] developed the modified Jackknifed ridge gamma regression estimator. Recently, the modified version of the ridge regression estimator with two biasing parameters was proposed for both the LRM and GRM [15,16]. Kibria and Lukman [17] proposed a new estimator called the ridgetype estimator and applied to the popular linear regression model. e main objective portrayed in this article is to extend the new ridge-type estimator of Kibria and Lukman [17] to the GRM. e article organization is as follows: in Section 1, we proposed the new ridge-type gamma estimator, and then we derived its properties. Also, we have done the theoretical comparisons and have explained the estimation of the biasing parameter in Section 2. A simulation study is conducted to investigate and compare the performance of the new gamma estimator and some existing estimators in Section 3. We also analyzed a real-life data in Section 4. Finally, we have provided some concluding remarks in Section 5.

The Statistical Methodology
Consider the response variable y i which follows the known gamma distribution with the parameter of the nonnegative shape aand the parameter of the nonnegative scale b with probability density function: where Equation (2) is solved iteratively since it is nonlinear in βusing the Fisher scoring method as follows: where t is the iteration degree, S(β) � zl(β)/zβ and I − 1 (β) � (− E(z 2 l(β)/zβ zβ ′ )) − 1 . e last step for the estimated coefficients is considered as where and z is called the vector in ith element, z � θ i + (y i − θ i /θ 2 i ). W and z are obtained by procedure of the Fisher scoring iterative (see [12,18]). e matrix form of the covariance, the matrix of the mean squared error (MMSE), as well as the mean square error (MSE) are obtained by Algamal and Asar [19] and written, respectively, as follows: where c j is considered as an jth eigenvalue of the given matrix D � X ′ WX and the notation X ′ is the transpose of X. e gamma ridge estimator (GRE) is considered as where D k � (Ι + kD − 1 ) and k is the biasing parameter. e MMSE and MSE of GRE are given by where α � P ′ β such that P is the matrix of eigenvectors of D. e gamma Liu estimator (GLE) is given by where F d � (D + Ι) − 1 (D + dΙ) and d is the biasing parameter. e MMSE and MSE of GLE are given by 2.1. e New Gamma Estimator. For the known linear regression model, Kibria and Lukman [17] proposed the following new ridge-type estimator and called as the Kibria-Lukman (KL) estimator, which is defined as where In this study, we extend the KL estimator to the GRM and referred to the estimator as gamma KL estimator (GKL) which is written as follows: where R k � (Ι − kD − 1 ). e bias and covariance matrix form of GKL estimator are gotten respectively as: where E(β MLE ) � β and 2 Scientifica Cov So, the MMSE and MSE in terms of eigenvalues are defined, respectively, as

e eoretical Comparison for the Estimators.
Some needed lemmas are stated as follows for comparing the estimators in theoretical.

Lemma 2.
Suppose R is an n × n matrix which is p.d. and αbe a vector; then, Lemma 3. Suppose that α i � L i y, i � 1, 2 be the given two linear estimators of α. Also, suppose

Comparison of GKL and MLE
Proof. e difference of the dispersion is We observed that By Lemma 3, the proof is done.

Comparison of GKL and GRE
where Proof where Clearly, for the biasing parameters k > 0 and 0 < d < 1, is the max eigenvalue of the matrix form AF − 1 . By Lemma 1, the proof is done.

Comparison of GKL and GLE
where Proof. e difference of the dispersion is We observed that 2 (c j + 1) 2 > 0 for k > 0 and 0 < d < 1. By Lemma 3, the proof is done.

Estimation of Parameter k.
e optimal value of k in β GKL is adopted from the KL estimator of the study of Kibria and Lukman [17] as follows: e optimal value of k given in (24) depends on the unknown parameters ϕ and β 2 j . erefore, we put the corresponding unbiased estimators instead of them. Consequently,

Simulation Design
R 3.4.1 programming language is adopted for the simulation design of this study. Following Algamal [19], the response variable is generated as follows: where θ � exp(X ′ β) and , υdenotes θ 2 . e parameter vector, β, is chosen such that p j�1 β 2 j � 1 [1,23,24]. Following Kibria [25] and Kibria and Banik [26], the given explanatory variables are obtained as follows: where w ij are generated from standard normal and ρ 2 is the correlation between the explanatory variables. e values of ρin this study are chosen to be 0.95, 0.99, and 0.999. We obtained the mean function for p = 4 and 7 explanatory variables, respectively, for the following sample sizes: 20, 50, and 200. For each replicate, we compute the mean square error (MSE) of the estimators by using the following equation: where β * i would be any of the following estimators (MLE, GRE, GLE, and GLK). e smaller the mean square error value is, the better the estimator is. e biasing parameters for GRE and GLE are obtained as follows: We examined two shrinkage parameters for the proposed estimator. ey are defined as follows: e simulation results for different values of n, φ, and ρ are presented in Tables 1 and 2 for p = 4 and 7, respectively. For a graphical representation, we also plotted MSE vs n, ρ, φ, and p in Figure 1.
It was observed from both Tables 1 and 2 and Figure 1 that the MSE increases as the level of multicollinearity increases keeping other variables constant. For instance, when n = 50, for the MLE, the MSE increases from 1.265 to 38.172 as the level of multicollinearity, ρrises from 0.95 to 0.999 for given ϕ � 0.5 and p = 4. We also observed that, as the explanatory variables increases from p = 4 to p = 7, the MSE increases provided other variables are kept constant. For instance, when n = 20 for ρ = 0.99 and ϕ � 1, the MSE for the GRE-k rises from 6.753 to 19.071. Also, when other variables are fixed, increasing the sample size n results in a decrease in the MSE for all the estimators', for example, the MSE value of GLE-d for n = 200, ϕ � 0.5, p = 7, and ρ = 0.95 reduces from 1.282 to 1.549. Furthermore, the MSE increases as the dispersion parameter ϕ increases from 0.5 to 1. e maximum likelihood estimator performs least as expected because of the effect of multicollinearity on the estimator. e result in Tables 1 and 2 and Figure 1 shows that the GKL outperforms other estimators. Since the performance of the proposed estimator GKL depends on its biasing parameter, we examined two different biasing parameters for GKL estimator and observed that the GKL estimator performs best with the biasing parameter, k 2 . e simulation result further supports the theoretical results that the performance of GKL estimator is the best. e performance of the GRE and GLE is better than that of the MLE. Furthermore, we explored the performance of the proposed estimator and the existing estimators by analyzing a real-life data in Section 4.

Real-Life Data: Algamal Data
e chemical dataset adopted in this study was employed in the study of Algamal [3,19]. He employed the quantitative structure-activity relationship (QSAR) model to study the relationship between the biological activities IC 50 of 65 imidazo [4, 5-b] pyridine derivatives -an anticancer compound -and 15 molecular descriptors. e QSAR model is widely used in the following fields: chemical sciences, biological sciences, and engineering. e linear regression model is popularly used to model the QSAR relationship between the response variable (biological activity) and one or more physiochemical or structural properties which serve as the explanatory variables especially when the response variable is normally distributed [27]. However, the regression modelling is employed when the response variable is skewed [3,19,24,28]. In this study, following Algamal [3,19], the variables of interest are described in Table 3.
According to Algamal [3,19]; the response variable, y, follows a gamma distribution. Using the chi-square goodness of fit test, author examined that the response variable is well fitted to the gamma distribution with test statistic (p value) given as 9.3657 (0.07521). Algamal [19] reported that the correlation coefficient between the following variables, Mor21v and Mor21e, SpMax3_Bh(s) and ATS8v, SpMaxA_D and MW and finally MW and ATS8v, is greater than 0.9 and interpreted as high correlation. e eigenvalues of X ′ WX are 7.6687E + 8,  [19]. e results of the gamma regression model and the mean square error are presented in Table 4. e result in Table 4 agrees with the simulation results. e performance of the MLE is the worst in terms of possessing the highest MSE. e proposed estimator with   Moran autocorrelation of lag 7 weighted by van der Waals volume MATS2s Moran autocorrelation of lag 2 weighted by l-state HATS6v Leverage-weighted autocorrelation of lag 6/weighted by van der Waals volume  the biasing parameter k 2 , in this order has the least mean square error followed by k min , GRE-k and GLE-d estimators.
Recall in the simulation study GKL with k 2 as the shrinkage parameter performed the best.

Some Concluding Remarks
e Kibria-Lukman [17] estimator was developed to circumvent the problem of multicollinearity for the linear regression model. is estimator is in the class of the ridge regression and the Liu-type regression estimator, and it has a single biasing parameter. In gamma regression model, multicollinearity is also a threat for the performance of the maximum likelihood estimator (MLE) in the estimation of the regression coefficients. e gamma ridge (GRE) and the gamma Liu estimator (GLE) has been introduced in the previous study to mitigate the problem of multicollinearity. Since, Kibria and Lukman [17] claimed that the KL estimator outperforms the ridge and Liu estimator in the linear regression model, which motivated us to develop the gamma KL (GKL) estimator for the effective estimation in the GRM. We derived the statistical properties of GKL estimator and compared it theoretically with the MLE, GRE, and GLE. Furthermore, a simulation study and a chemical data analysis were conducted in support of the theoretical study.
e simulation and application result show that GKLE with k 2 as the shrinkage parameter performed the best. In conclusion, the use of the GKL estimator is preferred when multicollinearity exists in the known gamma regression model.

Data Availability
e data used to support the findings of this study are available upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.