Quantile-Based Estimation of Liu Parameter in the Linear Regression Model: Applications to Portland Cement and US Crime Data

In multiple linear regression models, the multicollinearity problem mostly occurs when the explanatory variables are correlated among each other. It is well known that when the multicollinearity exists, the variance of the ordinary least square estimator is unstable. As a remedy, Liu in [1] developed a new method of estimation with biasing parameter d . In this paper, we have introduced a new method to estimate the biasing parameter in order to mitigate the problem of multicollinearity. The proposed method provides the class of estimators that are based on quantile of the regression coeﬃcients. The performance of the new estimators is compared with the existing estimators through Monte Carlo simulation, where mean squared error and mean absolute error are considered as evaluation criteria of the estimators. Portland cement and US Crime data is used as an application to illustrate the beneﬁt of the new estimators. Based on simulation and numerical study, it is concluded that the new estimators outperform the existing estimators in certain situations including high and severe cases of multicollinearity. 95% mean prediction interval of all the estimators is also computed for the Portland cement data. We recommend the use of new method to practitioners when the problem of high multicollinearity exists among the explanatory variables.


Introduction
e commonly used method of estimation in multiple linear regression models (MLRM) is the method of ordinary least squares (OLS) [1]. e results obtained through this method might be misleading when the problem of multicollinearity is present among the explanatory variables [2]. To overcome such problem, Ridge regression (RR) and Liu regression suggested by [1,3], respectively, are the two widely used alternative methods. Liu estimator (LE) is usually preferred over RR, because it is the linear function of its biasing parameter d [4]. e optimal value of biasing parameter d in Liu regression plays an important role in minimizing the variance. Many researchers have suggested several estimators for estimating d. Few of them are [4][5][6]. In this paper, the performance of some existing LEs is investigated, and a new method called as quantile based estimation of Liu or biasing parameter d is proposed. Also, the new estimators are compared with the existing ones through a Monte Carlo simulation based on mean squared error (MSE) and mean absolute error (MAE) performance criterions. e rest of the article is written as follows. e model estimation, newly proposed, and existing LEs are discussed in Section 2. e simulation design and results are discussed in Section 3. Section 4 includes two empirical applications to demonstrate the benefits of the new method. e conclusion of the paper is given in Section 5.

Statistical Methodology
Consider the following MLRM: where y is the vector of random response variable of order (n × 1), X shows the fixed design matrix of explanatory variables with order (n × p) and β is the (p + 1) × 1 vector of population regression coefficients. ε is the vector of stochastic or random errors with order (n × 1) and is distributed as normal with mean E(ε) � 0 and variance covariance matrix E(εε ′ ) � σ 2 I n , I n is an (n × n) identity matrix. e vector of OLS estimators for β is given below: e OLS estimator is unbiased and more efficient than all other unbiased estimators [2]. However, in the presence of multicollinearity, OLS estimator becomes inefficient and provide large variance [4]. To circumvent such situation, numerous biased estimation methods are available, which provide smaller MSE than OLS, and LE is one of them. e Liu estimator defined by [1] is given as In the presence of multicollinearity, β LIU provides the smaller MSE than OLS [7]. e optimal choice of Liu parameter d plays a vital role in minimizing the MSE of β LIU [8]. Some existing LEs for the biasing parameter d are given in the following subsection.

Some Existing Liu Estimators.
Consider the canonical form of model (1): where Z � X D and α � (α 1 , α 2 , . . . , consists of the eigenvalues of the X ′ X matrix. Note here that MSE(α) � MSE(β) so it suffices to consider the canonical form only. e OLS estimator can be defined in canonical form as follows: e LE is defined as e first estimator for d was suggested by [1] and is given below: where α j is the j th element of α, an OLS estimator of α. σ 2 is the unbiased estimator of population error variance σ 2 and λ j is the j th eigenvalue of the matrix X ′ X. Liu in [1] also suggested the following estimator: Shukur et al. [6] considered the idea of [5,9] and suggested the following four estimators: Based on the work of [10,11], we proposed five new quantile based LEs, which are defined in the following section.

Proposed Method.
Let (d 1 , d 2 , . . . , d p ) be the realized values of equation (7) and it can be written in the ascending order of magnitude as where d (1) � min(d 1 , d 2 , . . . , d p ) and d (p) � max(d 1 , d 2 , . . . , d p ). e set d (1) , d (2) , . . . , d (p) is the order statistics for (d 1 , d 2 , . . . , d p ) and d (j) , j � 1, 2, . . . , p, is the j th ordered observation. Now let d c , 0 < c < 1, be the 100c th quantile of d (1) , d (2) , . . . , d (p) , and then the new proposed quantile estimator is such that where "c" is the quantile probability, α (j) is the j th ordered element of α, an OLS estimator of α, and λ (j) is the j th ordered eigenvalue of the matrix X ′ X. e quantile probability generally depends on the degree of multicollinearity in order to obtain the minimum MSE or MAE. e new estimator d c defined in equation (11) depends on the quantile probability whose value is selected according to the degrees of multicollinearity. So the new proposed estimator is more robust to high or severe degrees of multicollinearity. Since the range of LE must be between zero and one, therefore we rewrite the equation (11) as Equation (13) satisfies the interval condition for Liu parameter d suggested by [1]. In order to present the role of quantile probability, we choose some specific values for "c" as: 0 (minimum), 0.25 (first quartile), 0.50 (median), 0.75 (third quartile) and 1 (maximum). e mathematical form of new LEs, denoted here by D6, D7, D8, D9, and D10, is given below:

The Design of an Experiment
is section covers the Monte Carlo simulation experiment, performance criterion measures, and results and discussion.

e Monte Carlo Simulation.
e performance of LEs is compared in this section through the simulation study. Following [12][13][14], the explanatory variables are generated as (15) where ρ is the degree or level of multicollinearity between the explanatory variables, which are given as 0.80, 0.90, 0.99, and 0.999. z ij are the random numbers obtained from the standard normal distribution. e n observations on the response variable are computed as where ε i ∼ N(0, σ 2 ), σ 2 is the error variance. β 0 is considered to be identically zero. Following [9], the eigenvector corresponding to maximum eigenvalue of the X ′ X matrix is taken as the vector of regression coefficients. Following [4][5][6], the different factors we choose to vary in our study are given below: Error variance: σ 2 � 0.
where β i is the estimated value of β. M shows the simulation runs. In this study, we choose M � 5000. e EMSE simulation results are presented in Tables 1-6 and EMAE in  Tables 7, 8 . e results are discussed in the following section.

Results and Discussion.
e EMSE and EMAE values of the new and existing Liu estimators are presented in Tables 1-6 and 7, 8 respectively. e performance of the LEs is evaluated by varying the values of factors such as multicollinearity, error variance, sample size, and the explanatory variables. A general remark from the literature is that these factors have the significant effect on the simulation design. e effect of each factor on EMSE and EMAE of estimators is discussed below: e first factor we considered is the effect of multicollinearity on the EMSE and EMAE of estimators. e EMSE and EMAE of all the estimators increase by increasing the degree of multicollinearity generally except D5-D8. Estimators D5-D8 first increase when multicollinearity increases from mild to high and decrease for severe multicollinearity. It is evident from these tables that the LE is always superior to the OLS estimator. For mild multicollinearity (ρ � 0.80), D5 outperforms others. Estimators D6 and D7 are the close competitors to D5. When the degree of multicollinearity is considered to be high (ρ � 0.99) or severe (ρ � 0.999), then the proposed estimators D6 and D7 outperform generally. However, the performance of D6 is good among others, because it yields lowest EMSE and EMAE when the degree of multicollinearity is high or severe.
Secondly, the effect of sample size on the EMSE and EMAE of estimators is considered. In general, the EMSE and EMAE of OLS and LE decrease by raising the sample size. e proposed estimators D6 and D7 exhibit lowest EMSE in most of the cases, while D5 remains the closest competitor to the proposed estimators.  In the third case, we varied the number of explanatory variables from 4 to 8. It is found that EMSE and EMAE of all the estimators increase. But the performance pattern remains the same as in the case of multicollinearity and sample size. It is seen that the increase in the EMSE of OLS estimator is relatively higher than all LEs. LE with Liu

Applications
Since the simulation evidence is not enough to judge the performance of the proposed estimators, because the study is usually conducted assuming some ideal conditions, in practice, the ideal conditions may not be met. erefore, contrary to the previous section, we used two real applications taken from the books of [2,15] in order to compare the performance of our proposed estimators in practical situations.

Portland Cement
Data. e first numerical example used in this study is the Portland cement dataset taken from [15] to compare the performance of new estimators in applied scenario. Some authors named this data as Hald's or chemical dataset and has been widely used in other literature; see [16][17][18][19][20][21][22]. e dataset consists of 13 observations and is given in Table 9. e response variable (y) is to study the heat evolved after 180 days of curing and measured in calories/gram of cement with 40% water at 35 0 C (95 0 F). Besides, the response variable (y), the four explanatory variables considered are the clinker compounds (CALCD) defined as (i) X 1 : Tricalcium aluminate (3CaO * Al 2 O 3 ) (ii) X 2 : Tricalcium silicate (3CaO * SiO 2 ) (iii) X 3 : Tetracalcium aluminoferrite (4CaO * Al 2 O 3 * Fe 2 O 3 ) (iv) X 4 : Dicalcium silicate (2CaO * SiO 2 ) e model is defined as Condition number (CN) is used to measure the severity of multicollinearity among explanatory variables [23]. It can be defined as where λ max and λ min are the maximum and minimum eigenvalues of the matrix X ′ X, respectively. Following [2], a rule of thumb is that multicollinearity is moderate if the CN is between 10 and 30, high if it is between 30 and 100, and severe when it is greater than 100. e eigenvalues and CN for this dataset are computed and presented in Table 10. From e estimated values for d, coefficients, and MSE of estimators are presented in Table 11. is table shows that the LEs have smaller MSE than OLS. However, among the LEs, estimator D5 and new estimators D6 and D7 outperform others; therefore, they are highly efficient among others.
100 (1 − α) % mean prediction interval for the response variable is defined as where y 0 � β 1 X 10 + β 2 X 20 + β 3 X 30 + β 4 X 40 , y 0 � β 1 X 10 + β 2 X 20 + β 3 X 30 + β 4 X 40 , β and β are the OLS and Liu For details, see [2,4]. e results for the 95% mean prediction interval are given in     Mathematical Problems in Engineering 9 (v) X4 � number of males per 1000 females (vi) X5 � state population size in hundred thousands (vii) X6 � unemployment rate of urban males per 1000 of age 35-39 (viii) X7 � median value of transferable goods and assets or family income in tens of dollars (ix) X8 � the number of families per 1000 earnings 1/2 the median income Table 13 gives the eigenvalues and condition number for these data, which shows that moderately high multicollinearity is present. e value of test statistic from Shapiro-Wilk normality test is W � 0.99379 with p-value � 0.9967, which shows that the response variable is normal at 5% level of significance. MSE results are tabulated in Table 14. From Table 14, we see that MSE of LEs outperforms OLS. Among all LEs, the proposed estimator D6 showed smallest MSE, which supports the simulation results given in Section 3.
erefore, based on simulation results and illustrative examples, we recommend the use of LE with Liu parameter D6 and D7 to practitioners in presence of high or severe multicollinearity.

Concluding Remarks
In this paper, we introduced a new quantile based method to estimate the Liu parameter in order to minimize the variance and circumvent the problem of multicollinearity. Extensive Monte Carlo simulations were carried out to evaluate the performance of estimators with MSE and MAE criterions by varying the values of different factors such as sample sizes, number of explanatory variables, error variances, and multicollinearity. Results from the simulation study revealed that the LE performed generally better than the traditional OLS estimator. e LE is a robust choice than the OLS when the problem of multicollinearity is present. Also, the proposed estimator D6 performs better than other considered estimators in many evaluated instances particularly when the problem of multicollinearity is extremely high. Estimator D5 is the closest competitor to D6. Furthermore, the benefits of the new estimators are evidently confirmed in the two empirical applications. Based on the simulation results along with the real applications, we conclude that the Liu method with proposed estimator D6 is the best choice for practitioners to overcome the problem of multicollinearity.

Data Availability
Data used in this research is taken from Portland cement limited available online at: H. Woods, H. H. Steinour, and H. R. Starke, "Effect of composition of Portland cement on heat evolved during hardening," Ind. Eng. Chem., vol. 24, no. 11, pp. 1207-1214, 1932. Research codes will be provided on personal request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.