Modified Robust Ridge M-Estimators in Two-Parameter Ridge Regression Model

The methods of two-parameter ridge and ordinary ridge regression are very sensitive to the presence of the joint problem of multicollinearity and outliers in the y -direction. To overcome this problem, modiﬁed robust ridge M-estimators are proposed. The new estimators are then compared with the existing ones by means of extensive Monte Carlo simulations. According to mean squared error (MSE) criterion, the new estimators outperform the least square estimator, ridge regression estimator, and two-parameter ridge estimator in many considered scenarios. Two numerical examples are also presented to illustrate the simulation results.


Introduction e matrix form of the multiple linear regression model is
where Y (n×1) is the vector of the response variable, X (n×p) is the matrix of predictor variables, β (p×1) is the vector of unknown regression coefficients, and ε (n×1) is the vector of disturbance term, such that ε ∼ N(0, σ 2 ). e ordinary least square (OLS) estimates of β is defined as: e estimator β is unbiased and has minimum variance among all the linear unbiased estimators. However, the performance of this estimator is poor in the presence of multicollinearity, such that it is statistically insignificant with large variance [1]. To cope with this issue, several alternatives have been developed. e first method is proposed by Ref. [2] and is defined as where I is the identity matrix, (k ≥ 0) and T k � (X ′ X + kI) − 1 X ′ X. To handle the problem of outliers, Ref. [3] derived a new estimator known as M-estimator (ME). M-estimator is defined as the solution of the equations ψ(e i /s) � 0 and ψ(e i /s)z i � 0with e i � y i − z i β M s being scale estimator for errors and ψ(·) being a suitably chosen function.
Ref. [4] illustrated that ridge regression (RR) is sensitive to outliers in the y-direction, hence developed a new robust ridge M-estimator (MRE) defined as β M (k) � T k β M , (4) where β M is M-estimator.
According to Ref. [5], the quality of fit for RR is not good as compared to OLS. To overcome this deficiency, they developed a two-parameter ridge estimator (TPR) that always performs better than the ordinary RR. Also, TPR has good orthogonal properties between the residuals and predicted values of dependent variables. ey defined TPR as where Later on, many researchers worked on TPR, see e.g., [6][7][8][9][10][11][12][13]. e selection of ridge M-estimator plays an important role to reduce the MSE of TPR in the presence of multicollinearity and outliers. Different ridge M-estimators have been proposed by various researchers. Some of them are Refs. [4,8,[14][15][16][17]; and recently Ref. [18]. In case of near singularity and large number of outliers, the existing estimators do not perform well in terms of MSE. erefore, the aim of this article was to continue the series of work on the selection of ridge M-estimator in TPR. Motivated by the work of Ref. [8] and following the idea of Ref. [1], we proposed the modified ridge M-estimators in TPR. e developed M-estimators provide the minimum MSE than OLS, RR, and existing TPR estimators for different levels of correlation, sample size, error variance, and outliers. e organization of this article is as follows: Section 2 gave the review of estimators included in this study, new developed estimators for the selection of k and their comparison criterion. Section 3 included the simulation design that we have adopted in this article together with the discussion of simulation results and numerical examples. Concluding remarks are given in section 4.
In general, ridge M-estimators available in the literature may not fully address the simultaneous occurrence of high multicollinearity and outliers in data. To resolve this issue, we propose some new ridge M-estimators in TPR that perform generally better than other existing estimators in most of the considered situations.

New Estimators.
According to Ref. [8], the TPR is also sensitive to outliers in the y-direction as RR is. us, here we suggest modified ridge M-estimators (MTPM) in TPR. In a similar manner to TPR, the primary focus in MTPM is to find the suitable value of biasing parameter, which minimizes the MSE. By adopting the idea of Ref. [1], we multiply a quantity V Mj � λ j /|α Mj | with K as suggested by Ref. [8]. Hence, the modified biasing parameter is where K � qA 2 λ j + (q − 1)λ 2 j α 2 Mj /λ j α 2 Mj and q defined in TRME1.
As λ j is based on correlation, an increase in the degree of correlation causes an increase in the value of V Mj . is increase in V Mj will lead to the larger value of k Mj . Since many existing estimators did not provide a large enough value of k Mj , this increase is required to obtain the suitable value of k Mj to solve the problem of near singularity. e term α Mj is used to deal with the outliers. Here, we have used Huber's M-estimator.
We proposed three new methods by taking arithmetic mean (AM), geometric mean (GM), and harmonic mean (HM) of k Mj , denoted by MTPM1, MTPM2, and MTPM3, respectively, and defined as Hence, the new modified two parameter ridge M-estimator is defined in the canonical form as where and k * � k * AM , k * GM and k * HM . Furthermore, through Algorithm 1, we proposed the modified iterative two-parameter ridge estimators. e new modified iterative TPR is defined as where k is from algorithm of TRME2. Now by taking the AM, GM, and HM of k (I)Mj three new estimators denoted by MTPM4, MTPM5, and MTPM6 are obtained and defined as e new modified iterative two parameter ridge M-estimator is defined in the canonical form as ALGORITHM 1: Iterative algorithm for modified two-parameter ridge estimators.

Mathematical Problems in Engineering
where and k * I � k * (I)AM , k * (I)GM and k * (I)HM .

Simulation Study
In this section, a simulation study is taken to check the performance of new and existing estimators.
3.1. Simulation Design. By following the simulation design of Refs. [8,15], predictors are generated as where δ 2 shows the correlation between two predictor variables and z i j are pseudo random numbers generated using standard normal distribution. e response variable is generated as where β 0 is set to be zero and u i ∼ N(0, σ 2 ). is simulation experiment is carried out by randomly generating different factors that we consider in this study. e details are given below: To check the robustness of the newly proposed estimators against outliers, different percentages of outliers (10%, 20%, and 30%) in the y-direction are generated using an error term u i ∼ N(50, σ 2 ), see Refs. [19,20]. ese simulation results based on 5000 replications and estimated MSE is calculated as

Real-Life Applications
Example 1. We consider the Tobacco data of Ref. [21] to show the performance of newly modified estimators. e data contain four predictor variables with 30 observations. Condition number is 1892.33 which shows severe multicollinearity. Considering the following linear model: e eigenvalues are λ 1 � 3.9739, λ 2 � 0.0176, λ 3 � 0.0064, and λ 4 � 0.0021. e calculated value of error variance is 0.223. e correlation among the predictor variables is shown in Table 19. e data contain two outliers in the y-direction. Estimated MSE and regression coefficients for tobacco data are presented in Table 20. From the result, it is noticed that MTPM3 has the smallest MSE among all the considered estimators.

Example 2.
e second example is of water quality data taken from the Pakistan Council of Research in Water Resources (PCRWR) for the year 2014-2015. We consider four predictors each with 31 observations. Predictor variables are HCO3, SO4, Na, and EC, while response variable is TDS. e estimated error variance is 0.111 and eigenvalues are λ 1 � 3.3024, λ 2 � 0.6599, λ 3 � 0.0210, and λ 4 � 0.0166. Condition number is 157.257, which shows strong multicollinearity. Table 21 shows the correlation among the predictors. e outliers are present in the y-direction. e estimated MSE and regression coefficients are shown in     6 Mathematical Problems in Engineering   Mathematical Problems in Engineering

Concluding Remarks
In this article, modified robust ridge M-estimators for two parameter ridge regression model are proposed to overcome the joint problem of multicollinearity and outliers in the ydirection. We proposed six new estimators as an alternate to TRME. A simulation study is conducted to investigate the performance of new estimators on the basis of MSE. e simulation results indicated that the performance of new modified robust ridge M-estimators is better than the other considered estimators. It is also noticed that proposed estimators MTPM1 and MTPM4 and in some cases MTPM3 performed well in the presence of multicollinearity and outliers. e benefits of the new estimators are also shown through the two different numerical examples. erefore, on the basis of these results, we recommend the use of proposed estimators in the considered scenarios.

Data Availability
Data used in this research were taken from the website available at [21]. All the results reported in this research are carried out on R-environment, a user-friendly statistical analysis tool. Furthermore, research code will be available on request from the corresponding author upon acceptance of this research.

Ethical Approval
is article does not contain any studies with human participants or animals performed by any of the authors.