Two-Parameter Modified Ridge-Type M-Estimator for Linear Regression Model

The general linear regression model has been one of the most frequently used models over the years, with the ordinary least squares estimator (OLS) used to estimate its parameter. The problems of the OLS estimator for linear regression analysis include that of multicollinearity and outliers, which lead to unfavourable results. This study proposed a two-parameter ridge-type modified M-estimator (RTMME) based on the M-estimator to deal with the combined problem resulting from multicollinearity and outliers. Through theoretical proofs, Monte Carlo simulation, and a numerical example, the proposed estimator outperforms the modified ridge-type estimator and some other considered existing estimators.


Introduction
A multiple linear regression model can be defined mathematically as where y is an n × 1 vector of observations referred to as the dependent variable; X is a known full column rank of n × pstandardized and centered explanatory variable matrix; β is an p × 1 vector of unknown parameters; and ε is an n × 1 vector of disturbances with E(ε) � 0 and dispersion matrix Cov(ε) � σ 2 I, I is the n × n identity matrix. e ordinary least squares estimator (OLSE) of β is given as According to Gauss-Markov theorem, OLS estimator is the best linear unbiased estimator (BLUE) possessing minimum variance in the class of all unbiased linear estimators [1,2]. However, the performance of the estimator is imprecise in the presence of multicollinearity [3]. Biased estimators such as ridge regression estimator [4], Liu estimator [5] and Stein estimator [6], principal component estimator [7], modified ridge regression estimator [8], and others are often employed to tackle this problem. Another factor whose presence can negatively influence the regression coefficients of the OLS estimator is the outlier. e general practice in the literature is that one adopts robust estimators as an alternative to the OLS estimator. e M-estimator is popularly used to handle outlier in the ydirection [9].
Hoerl and Kennard [4] defined the ridge estimator (RE) as where R kd1 � (X ′ X + kdI) − 1 X ′ X and d introduced as an additional basing parameter. Following Dorugade [12], Lukman et al. [13] modified the ridge estimator in (3) and called it the modified ridgetype estimator (MRT). e estimator is defined as where R kd � (X ′ X + k(1 + d)I) − 1 X ′ X. e organization of the paper is as follows. We proposed the new estimator in Section 2 and provided a theoretical comparison among the estimators in Section 3. We discussed the robust choice of the biasing parameters in Section 4 and conducted simulation studies in Section 5 to evaluate the performance of the proposed estimator. A real-life data set is analyzed in Section 6 to illustrate the findings in the paper, and Section 7 ends with some concluding remarks.

A New Estimator
e presence of outliers in the y-direction affects the performance of the MRT estimator. erefore, we suggest a ridge-type modified M-estimator (RTMME). is is defined as where k > 0, 0 < d < 1. It appears that β RTMME (k, d) is a general estimator, which includes β M and β M (k): β RTMME (0, 0) � β M , β RTMME (k, 0) � β M (k). (8) e canonical form of model (1) is written as where Z � XT, α � T ′ β, and T is the orthogonal matrix whose columns contain the eigenvectors of X ′ X. en, where λ 1 , λ 2 , . . . , λ p > 0 are the ordered eigenvalues of Let α M be M-defined by the solution of the M-estimating equations φ(e i /s)z i � 0, where e i � y i − z i α M , s is an estimator of scale for the errors and φ(·) is some suitably chosen function [14]. e estimators presented in equations (2)- (7) can be written in canonical form as follows: where R kd � Λ(Λ + k(1 + d)I) − 1 , R * k � Λ(Λ + kI) − 1 , and k > 0.

Superiority of the New Estimator
e mean square error (MSE) criterion is used to compare the performance of the estimators. e following conditions are imposed to present our main theorems: (i) φ is skew-symmetric and nondecreasing (ii) e errors are symmetric (iii) Ω is finite Note that any estimator of α has a corresponding relation β � T ′ α such that MSE(β) � MSE(α). us, it is sufficient to consider the canonical form only. e MSEs of the aforementioned estimators are derived to be where Ω ii � Cov(α M ).

Robust Choice of k and d for α RTMME (k, d)
For the robust biasing parameters k and d for the modified two-parameter estimator, the optimal values can be determined by minimizing equation (23) with respect to each of the parameters: is can be obtained by solving zf(k, d) By doing this, we have . (26) We substitute Ω ii and α 2 i into equations (24) and (25) with their corresponding estimates. We assume that α M is normally distributed with mean α and covariance matrix is assumption holds since n 1/2 (α 2 e Scientific World Journal with the scale estimate s o . us, the estimate of α 2 i is α 2 Mi , and the unbiased estimator of Ω ii is asymptotically A 2 /λ i , where A 2 is given by Huber [9] as We get the optimal estimator of d and k as Following Kibria [15], the arithmetic and geometric mean version of k is obtained, respectively, as e harmonic mean version is generally preferred to other versions [3]. Hence, the robust harmonic mean version of the proposed d and k from (31) and (32) is obtained as e selection of the estimators of the parameters d and k can be obtained iteratively as follows: Step 1: use d � min(A 2 /α 2 Mi ) to obtain an initial estimate for d Step 2: from (33), get k HMR using d in Step 1 Step 3: calculate d HMR in (34) by using k HMR in Step 2 Step 4: use d in Step 1 if d HMR < 0

Monte Carlo Simulation Study
We adopted the simulation design by McDonald and Galarneau [16], Kibria [15], and Lukman et al. [17]. e explanatory variables are generated using the following equation: where ρ 2 denotes the correlation between explanatory variables and z ij are pseudo-random numbers from the standard normal distribution. e coefficients β 1 , β 2 , . . . , β p are selected as the normalized eigenvectors corresponding to the largest eigenvalue of X ′ X so that we have β ′ β � 1, which is a common restriction in simulation studies of this type ( [3]). e dependent variable is then determined using where the error term ε i ′ s is generated with mean and variance 0 and σ 2 , respectively. We fixed the number of explanatory variables to three and seven (p = 3, 7), and other parameters such as ρ, σ, and n were varied; their values considered in this study are given as follows: ρ � 0.7, 0.8, 0.9 and 0.99 e standard deviation (σ) of the error term in this simulation study is 1, 5, 10 n � 20, 50, 100 We considered three different cases in this study: Case I: no outlier Case II: one outlier Case III: two outliers In the case of no outliers, equation (36) is taken into consideration. For the case of one outlier, the tenth observation is changed as y * 10 � y 10 + 20σ. For the case of two outliers, the fifth and the tenth observations are changed as y * 5 � y 5 + 20σ and y * 10 � y 10 − 20σ, respectively. e experiment is replicated 2,000 times by generating new pseudo-random numbers, and the estimated MSE is calculated as e results of the simulation are presented in Tables 1-18. As expected, the OLSE is observed to have the least performance. e following observations are also made: (i) As the error standard deviation (σ) and the degree of multicollinearity (ρ) increase, the MSEs of the estimators (α, α M , α(k), α M (k), α MRT , and α RTMME ) increase. (ii) As the biasing parameters k and d increase, the MSEs of the estimators (α, α M , α(k), α M (k), α MRT , and α RTMME ) also decrease. (iii) e MSEs of the estimators (α, α M , α(k), α M (k), α MRT , and α RTMME ) decrease as the sample size increases. However, as the number of outliers increases, the MSEs also increase. (iv) Finally, just as in the outcome of Lukman et al. [13], the MRT estimator outperforms other estimators considered in the case of no outlier. However, when outliers were introduced, the RTMME outperforms other considered estimators.                     k Sigma Rho  k Sigma Rho

Conclusion
We proposed a two-parameter ridge-type modified M-estimator to jointly handle the problem of multicollinearity and outliers in a linear regression model. eoretically, the new estimator outperforms the existing estimators under certain conditions. e results of the simulation study and numerical example agree with the theoretical findings. A right choice of k and d also produces better estimates using the proposed estimator.
us, in the presence of multicollinearity and outliers, this estimator can effectively replace the following estimators: the ordinary least squares estimator, the M-estimator, the ridge estimator, the M-ridge estimator, and the modified ridge-type estimator.

Data Availability
e data used to support the findings of this study are available in page 7 of [19].

Conflicts of Interest
e authors declare that they have no conflicts of interest.