Stochastic Restricted LASSO-Type Estimator in the Linear Regression Model

Among several variable selection methods, LASSO is the most desirable estimation procedure for handling regularization and variable selection simultaneously in the high-dimensional linear regression models when multicollinearity exists among the predictor variables. Since LASSO is unstable under high multicollinearity, the elastic-net (Enet) estimator has been used to overcome this issue. According to the literature, the estimation of regression parameters can be improved by adding prior information about regression coefficients to the model, which is available in the form of exact or stochastic linear restrictions. In this article, we proposed a stochastic restricted LASSO-type estimator (SRLASSO) by incorporating stochastic linear restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion based on a Monte Carlo simulation study. Finally, a real-world example was used to demonstrate the performance of SRLASSO.


Introduction
Let us consider the linear regression model where y is the n × 1 vector of observations on the dependent variable, X is the n × p matrix of observations on the nonstochastic predictor variables, β is a p × 1 vectors of unknown coefficients, and ϵ is the n × 1 vector of random error terms, which is independent and identically normally distributed with the mean zero and common variance σ 2 , that is, E(ϵ) � 0 and E(ϵϵ′) � Ω � σ 2 I. It is well-known that ordinary least square estimator (OLSE) is a best linear unbiased estimator for model (1), and it is defined as Furthermore, researchers [1,2] have shown that parameter estimation is improved when prior information for regression coefficients is available, which can be in the form of exact linear restrictions or stochastic linear restrictions. Let us assume that there exists prior information on β in the form of stochastic linear restriction as where ϕ is the q × 1 vector, R is the q × p matrix with rank q, and v is the q × 1 vector of disturbances, such that E(v) � 0, D(v) � E(vv′) � Ψ � σ 2 W (W is positive definite) and E(vϵ′) � 0. Note that equation (3) will be the exact linear restriction when v � 0. eil and Goldberger [2] proposed the mixed regression estimator (MRE) by combining the models (1) and (3), and it is defined as this case, the variable selection methods such as forward selection, backward selection, and step-wise selection have been used. However, these methods are also unstable when multicollinearity exists among predictor variables. As a remedial solution to this problem, Tibshirani [3] proposed the least absolute shrinkage and selection operator (LASSO) by considering the model (1) to handle both multicollinearity and variable selection simultaneously in the high-dimensional linear regression model. e LASSO estimator is defined as where t ≥ 0 is a turning parameter. e solutions of LASSO can be obtained using either the standard quadratic programming technique or least angle regression (LARS) [4] algorithm. According to Zou et al. [5,6], LASSO is unstable when high multicollinearity exists among predictor variables. erefore, they proposed elastic-net (Enet) estimator as an alternative to LASSO to handle this issue. e Enet estimator is defined as e Enet solutions can be obtained using LARS-EN algorithm, which is the modified version of LARS algorithm.
Norouzirad et al. [7] and Tuaç and Arslan [8] attempted to combine LASSO with the exact linear restriction, and their work did not well define how the exact restriction is incorporated since there is no analytical solution for LASSO. In this article, we proposed stochastic restricted LASSO-type estimator (SRLASSO) by combining LASSO and stochastic restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion using a Monte Carlo simulation study and a real-world example. e structure of the rest of the article is as follows: Section 2 describes SRLASSO and algorithm to find SRLASSO solutions, Section 3 shows the performance of SRLASSO, and Section 4 concludes the article and references provided at the end of the paper.

Stochastic Restricted LASSO-Type Estimator (SRLASSO)
By considering equation (3) as an additional constrain, we define stochastic restricted LASSO-type estimator (SRLASSO) as We can view this as a quadratic optimization problem. Here, we have 2 p + q constraints. However, this method is not suitable in a practical situation if p is large. So, we propose a stochastic restricted LARS (SRLARS) algorithm, which is the modified version of the LARS algorithm, to find SRLASSO solutions. In SRLARS, we consolidate MRE with LARS.

Stochastic Restricted LARS (SRLARS).
Standardize the predictor variables X to have a mean of zero and a standard deviation of one, and response variable y to have a mean zero. Let residuals of the models (1) and (3) be r � ϵ and τ � v, respectively.
Step 2. Find the predictor X j1 most correlated with r 0 as follows: Let β ji be the regression coefficient of X ji . en, increase the estimate of β j1 from 0 toward MRE until any other predictor X j2 has a high correlation with the current residual as X j1 does. At this point, SRLARS proceeds in the equiangular direction between the two predictors X j1 and X j2 instead of continuing in the direction based on X j1 .
In a similar way, i th variable X ji eventually earns its way into the active set, and then SRLARS proceeds in the equiangular direction between X j1 , X j2 , . . ., X ji . Continue adding variables to the active set in this way moving in the direction defined by the least angle direction. In the intermediate steps, the coefficient estimates are updated using the following formula: where α i is a value between 0 and 1 which represents how far the estimate moves in the direction before another variable enters the model and the direction changes again, and u i is the equiangular vector. e direction u i is calculated using the following formula based on MRE: where E i is the matrix with column (e j1 , e j2 , . . ., e ji ) and e j be the j th standard unit vector in R p , which has the index of variables selected in each subsequent step. en, α i is calculated as follows: where for any j such that β j(i−1) � 0 and for any j such that β j(i−1) ≠ 0.
Step 3. If α i � α * ji , then E i is the matrix formed by removing the column e j from E i−1 . en r i and τ i related to the current step is calculated as and then move to the next step where j i+1 is the value of j such that Step 4. Proceed Step 2 until α i � 1.

Properties of SRLARS. SRLARS algorithm sequentially
updates the SRLASSO estimates. It requires O(m 3 + pm 2 ) operations, where m is the number of steps. e prediction performance of SRLARS is evaluated using the RMSE criterion and MAPE criterion, which is described in Section 3. According to Efron et al. [4], the conventional tuning parameter is t � p j�1 |β j |, and the suitable value of turning parameter t for the particular problem is selected using K-fold cross-validation.

Selection of Prior Information.
According to Nagar and Kakwani [9], we can define the prior information as follows: Let β 1 be a vector of some selected q elements of β and β 2 be the rest of elements. Assume that b is the known unbiased estimates of β 1 . By using the "two sigma rule," now we can write the range of β 1 as b ± 2SE(b). Based on that, we can set the expressions of equation

Performance of SRLASSO
SRLASSO is compared with LASSO and Enet using the RMSE criterion and MAPE criterion, which are the expected prediction errors of the algorithms, and are defined as where (y new , X new ) denotes the new data which are not used to obtain the parameter estimates, n is the number of new observations, and β is the estimated value of β using the respective algorithm. A Monte Carlo simulation study and a real-world example are used for the comparison.

Simulation Study.
According to McDonald and Galarneau [10], first we generate the predictor variables by using the following formula: where z i,j is an independent standard normal pseudo random number and ρ is the theoretical correlation between any two explanatory variables.
In this study, we have used a linear regression model of 100 observations and 20 predictors. A dependent variable is generated by using the following equation: where ϵ i is a normal pseudo random number with a mean zero and common variance σ 2 .
We choose β � (β 1 , β 2 , . . ., β 20 ) as the normalized eigenvector corresponding to the largest eigenvalue of X′X for which β′β � 1. To define the prior information according to Section 2.3, we assume that OLSE estimates of the first four elements of β are unbiased, which are the estimates of b. To investigate the effects of different degrees of multicollinearity on the estimators, we choose ρ � (0.5, 0.7, 0.9), which represents weak, moderated, and high multicollinearity. For the analysis, we have simulated 50 data sets consisting of 50 observations to fit the model and 50 observations to calculate the RMSE and MAPE. e cross-validated RMSE and MAPE of the estimators are displayed in Figure 1 and Figure 2, respectively. e median cross-validated RMSE and MAPE of the estimators are displayed in Table 1.
From Figures 1 and 2 and Table 1, we can observe that SRLASSO always shows better performance compared to LASSO and Enet in both RMSE criterion and MAPE criterion under all degrees of multicollinearity.

Real-World Example.
As a numerical example, the wellknown Prostate Cancer Data [11] was used to compare the performance of SRLASSO.
is data set is attached with "lasso2" R package. In the Prostate Cancer Data, the predictors are the following eight clinical measures: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log capsular penetration (lcp), Gleason score (gleason), and percentage Gleason score 4 or 5 (pgg45). e response is the log of prostate specific antigen (lpsa), and the dataset has 97 observations. e variance inflation factor (VIF) values of the predictor variables of the dataset are 3.09, 2.97, 2.47, 2.05, 1.95, 1.37, 1.36, and 1.32, and the condition number is 243, which shows evidence of multicollinearity among the predictor variables. Stamey et al. [11] have examined the correlation between the level of prostate specific antigen with those eight clinical measures. Furthermore, Tibshirani [3] and Tibshirani et al. [4] have used this data to  examine the performance of LASSO and LARS algorithm, respectively. We have used 67 observations to fit the model and 30 observations to calculate RMSE and MAPE. We assume that OLSE estimates of the first three regression coefficients of Prostate Cancer Data are unbiased and we defined the prior information for this data based on Section 2.3. e cross-validated RMSE and MAPE of the estimators are displayed in Table 2, and coefficient paths of each estimator are displayed in Figure 3. From

Conclusions
is study clearly showed that SRLASSO does a better performance than LASSO and Enet in both RMSE criterion and MAPE criterion when multicollinearity exists among the predictor variables. erefore, SRLASSO can be used as an alternative estimator of LASSO and Enet if prior information is accessible on the regression coefficients. e proposed SRLARS algorithm can be used to obtain SRLASSO solutions.

Data Availability
e data used to support the findings of this study are included within the article. Journal of Probability and Statistics 7