Comparison of Weighted Lag Adaptive LASSO with Autometrics for Covariate Selection and Forecasting Using Time-Series Data

,


Introduction
Since the beginning of time-series analysis, modeling and forecasting have been the center of attraction. e accuracy of the model in time-series analysis is always unknown. Only one in a million models can be accurate; ''essentially, all models are wrong, but some are useful'' [1]. However, the massive availability of data in the current era leads us to a new phase of time-series analysis for model selection and forecasting. Including many financial and economic covariates in the time-series model for superior prediction may yield considerable benefits. However, parsimonious models in time-series analysis are superior in forecasting. Failure to decrease dimensionality may lead to poor performance due to cumulative estimation losses from redundant or insignificant variables.
On the other hand, the traditional time-series modeling for covariates and lag selection in Autoregressive Distributed Lag (ARDL) modeling uses Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) [2,3]. is technique is limited to the number of covariates, and their lag must not be greater than the number of observations. e traditional Ordinary Least Square method fails to estimate the forthcoming models with huge regressors and limited observations due to inadequate degrees of freedom. Several statistical techniques exist in the literature for model selection and forecasting when covariates and their lags are more than the number of observations. Meanwhile, classical approach (Autometrics, general-to-specific) and regularization techniques (Machine Learning) are frequently used in time-series modeling when covariates exceed the number of observations. Besides these techniques, complex network theories provide an efficient and reliable solution for handling time-series issues. In recent years, the complex network has been extensively used in socioeconomic phenomena [4][5][6]. However, this study aimed to identify the true covariate and evaluate the model's forecasting performance, and we only concentrated on regularization techniques and the classical approach. e use of sparse modeling has grown widely in timeseries analysis as it can efficiently handle big macroeconomic data sets and substitute the factor models [7][8][9][10][11][12][13]. For the time being, Medeiros & Mendes [14] find that the adaptive LASSO (AdaLASSO) consistently chooses the essential covariates as the number of observations grows (model selection consistency) even when the errors are non-Gaussian and conditionally heteroscedastic. Audrino and Camponovo [15] illustrate the theoretical and empirical efficiency of AdaLASSO, as it asymptotically selects covariate with finite-sample in time-series regression models. Covariates and their lag selections are challenging in time-series modeling, mainly when there is a mixture of serial correlation [16]. To probe this gap, Konzen and Ziegelmann [17] introduce Weighted Lag adaptive LASSO (WLAdaLASSO), which applies various weights to each coefficient and penalizes coefficients of higher-lagged variables. e WLA-daLASSO outperforms LASSO and AdaLASSO in forecasting and covariate selection, even in a greater linear dependency between predictors with many candidate lags, whereas Uematsu and Tanaka [18] use folded concave penalties for ultra-high-dimensional time-series forecasting and covariate selection. ey verify the oracle inequalities of folded concave penalties (SCAD and MCP) for macroeconomic time series under appropriate conditions with the theoretical and empirical contribution.
In the meantime, very few studies exist that utilize the classical technique (Autometrics) in the context of macroeconomic forecasting [19][20][21][22]. In cross-sectional modeling, Epprecht et al. [23] compare the LASSO and AdaLASSO estimate with classical technique (Autometrics) in forecasting and covariate selection. e result indicates that LASSO and AdaLASSO estimates outperform Autometrics in prediction. Conversely, for time-series modeling with dynamic structure, WLAdaLASSO outperforms forecasting and covariate selection than LASSO and AdaLASSO. However, we have not come across a work that has compared the computational efficiency of regularization techniques, particularly WLAdaLASSO, SCAD, and MCP, with classical technique (Autometrics) in dynamic time-series modeling. For this purpose, we implement an updated regularization technique for dynamic time-series modeling to assess their performance with the classical approach (Autometrics) for covariate selection and forecasting theoretically and empirically. Furthermore, we assess the efficiency of these techniques in simulation experiments where the real data generating process (DGP) has a dynamic structure. To summarize the entire discussion, our main contribution is a comparison of WLAdaLASSO and Autometrics for covariate selection and forecasting under different scenarios with various autocorrelation coefficients (0.1, 0.5, and 0.8) of regressors and T sample sizes (50, 100, and 500), as well as application to macroeconomic data to provide a conclusive solution to predictability. e computational efficiency of these techniques is assessed in terms of gauge, potency, and out-of-sample Root Mean Square Error (RMSE). We constructed a Generalized Unidentified Model (GUM) and considered all the possible macroeconomic determinants of the trade balance for real data analysis. e techniques are not restricted to the balance of trade but valid for any time series. e rest of the paper is organized as follows: in Section 2, we will briefly illustrate the model selection techniques. Section 3 is based on the simulation experiment and results. Section 4 discusses real data analysis. Finally, Section 5 presents the conclusion and remarks on the efficacy of considered techniques.

Model Selection Techniques
Technically, two broad spectrums of model selection exist in the literature: regularization techniques and classical approach (Autometrics, general-to-specific), whenever P regressors are greater than N number of observations. e classical approach (Autometrics, general-to-specific) starts with a fully saturated model and uses a backward elimination with the multipath search process, and the selection of the model mainly depends on the predefined significance level. However, the regularization technique applies the sparsity on the p-dimensional parameter vector, which forces many of its components to be zero. is technique combats the issues posed by high-dimensionality. We describe each of these techniques in more detail, but we only considered orthogonal regularization techniques.

Autometrics Algorithms for Covariate and Lag Selection.
Autometrics is a third-generation algorithm created on similar concepts of PcGets. Hoover et al. [24] proposed the general-to-specific model selection technique that aggregates many elements of the "Hendry" methodologies and "London School of Economics (LSE)." Doornik [25] proposed PcGets as a second-generation method extended by Krolzig and Hendry [26], prolonging and enlightening Hoover and Perez's algorithm [26,27]. e concept of general-to-specific (gets) modeling is the cornerstone of the Autometrics approach: (i) Initially, the GUM includes the overall covariates and estimates it by the OLS method with expelling 2 Complexity statistically irrelevant covariates; the reliability of the reduced model is confirmed at each stage to prove the congruence with diagnostic tests. (ii) Autometrics uses a tree-path search with multistep simplifications along numerous paths. Final models are calculated using a tree-path search and confirmed using diagnostic tests; if the coefficient estimates are statistically insignificant, the model is discarded. When a large number of terminal models are identified, Autometrics retests their union. A new GUM is created when the "surviving" terminal models are combined, allowing for one more treepath search repeat. e entire exploration process is repeated, with the terminal models and their combinations being examined once again. If many models pass the encompassing tests, the final choice is based on predetermined information criteria.
Diagnostic tests are used to double-check the simplified models, while comprehensive tests resolve numerous terminal models. For diagnostic tests, Autometrics uses Jarque and Bera [28] residual normality test, Breusch and Pagan [29], and Godfrey [30] second-order residual autocorrelation, autocorrelated conditional heteroscedasticity (ARCH) to second-order [31], and in-sample stability [32]. In some aspects, Autometrics is a partially black box [23]. However, it allows the user to choose between "nominal significance level" and "1-cut and tight significance level" when establishing modeling approaches. e multipath approach avoids path dependency by using a tree structure and a similar stepwise backward elimination, a built-in function of the gets package in R environments [33].

Regularization Techniques.
Regularization techniques handle saturated models with irrelevant regressors even if regressors are more than the number of observations and shrink the irrelevant coefficients equal to zero with some bias. Several regularization techniques exist in the literature; we only opt orthogonal regularization techniques for dynamic covariate selection and forecasting: LASSO, Ada-LASSO, WLAdaLASSO, SCAD, and MCP.

LASSO and AdaLASSO
Estimate. Due to lower computation cost, the Least Absolute Shrinkage and Selection Operator (LASSO) is a popular estimation method in a linear regression framework, introduced by Tibshirani (1996). e LASSO method is like ridge regression; however, it set some coefficients precisely equal to zero with a substantial bias. e resulting model is easy to interpret and possesses the least forecast error. Consider a linear regression model, where y � (y 1t , y 2t , . . . , y nt ) are the continuous response regressors, and x it � (x 1t , x 1t−1 , . . . , x pt−1 ) are the covariates with its lag, and c j are the estimated coefficients.
e equation can be defined as where p λ j (.) is a penalty function and λ j is the hyperparameter. e second term in the above equation is defined as "L1 penalty," and λ leads to a sparse solution with a shrinking specific set the coefficients precisely equal to zero with a certain amount of bias. e amount of shrinkage depends upon the selection of λ, whereas its range is 0 < λ < ∞.
Zou [34] demonstrated that the LASSO estimator lacks the oracle characteristic and introduced the adaptive LASSO, a simple and effective solution. In contrast, the coefficients in LASSO are all penalized equally in the 'L 1 penalty. However, in AdaLASSO, each coefficient is given a distinct weight. Zou [34] showed that the AdaLASSO could have the best results if the weights are data-dependent and carefully chosen; then, the AdaLASSO can possess the oracle property.
w j � 1/c * τ j , τ >0, and c * j is an initial parameter estimate. e weights for zero coefficients diverge (to infinity) as the sample size expands, and nonzero coefficients converge to a finite constant. To estimate the c * j , Zou [34] recommended the OLS method. However, when the number of candidate variables exceeds the number of observations, the OLS method does not work. A ridge estimate can be employed as an initial estimator in this case.

Weighted Lag Adaptive LASSO (WLAdaLASSO).
e Weighted Lag Adaptive LASSO (WLAdaLASSO) was introduced by Konzen and Ziegelmann [17] and established on the concept of Park and Sakaori [35] work. It is defined as another type of LASSO estimate specifically for time-series modeling with lag structure. e idea is similar to Ada-LASSO and built for the time-series ARDL framework, as the more distant lags have a more negligible effect in predicting the dependent variable, imposing more enormous penalties on them.
Here, w j � (|c bridge j |e − αl ) −τ , l is the lag length, τ >0, and α ≥ 0 are tuning parameters. Moreover, c * j is an initial parameter estimate. τ � 1 like in AdaLASSO. To pick α, Konzen and Ziegelmann (2016) suggest estimating the model for a given λ using a grid (0; 0 : 5; 1; : : : ; 10) and choose the one with the lowest BIC and the λ parameter selected on the same criteria of the lowest BIC.

SCAD and MCP Estimate.
Smoothly Clipped Absolute Deviation is unbiased and sparse (i.e., small estimated coefficients automatically set to zero) and fulfills the condition of continuity proposed by Fan and Li [36]. e smoothly clipped absolute deviation (SCAD) for covariate selection and its lags are defined as Complexity 3 where x is the matrix of covariates and its lag, and the second term in the above equation is d j�1 p j (|c j |; λ; α), that is, a penalized term designed to meet all three requirements (unbiasedness, sparsity, and continuity). e SCAD has proven to be effective in many statistical circumstances, such as cross-sectional regression and time-series modeling [18]. P(c|λ, α) is a folded concave penalty; unlike LASSO, it depends on two tuning parameters, and penalties depend on λ in a nonmultiplicative way, so that P(α|λ) � λP(α). Additionally, the tuning parameter α controls the concavity of the penalty. e maximization of the objective function depends on α and λ, whereas α equals 3.7 and λ is selected via cross-validation [36].
Zhang [37] introduced the Minimax Concave Penalty (MCP), a nonconvex penalization strategy that employs sparse area up to a particular variable selection threshold, resulting in an unbiased estimate.
MCP uses d j�1 p j (|c j |; λ; α) regularization path based on the family of nonconvex penalty function with two tuning parameters α and λ, where α is fixed, and λ is selected via cross-validation. e tuning parameter λ controls the amount of shrinkage and α concavity of penalty. MCP prevents the spares convexity to a greater extent due to minimizing the maximum concavity [37]. e regularization parameter tends to larger α coefficient affords less unbiased and more convexity [37]. SCAD and MCP estimates belong to a family of folded concave penalties, as P(·) penalty function is neither convex nor concave.

Selection of Tuning Parameters for Regularization
Techniques.
e selection of λ tuning parameter is crucial as it governs the complexity of the selected model. e choice of the optimal tuning parameter provides a parsimonious model with a precise prediction performance. In practice, the tuning parameter is frequently selected using a crossvalidation approach to achieve prediction optimality. Such prediction optimality is frequently at odds with covariates selection; however, the objective is to recover the underlying set of sparse variables: frequently, a bigger penalty parameter is required for covariate selection than the optimal prediction [38]. However, the BIC criterion is superior to crossvalidation for covariate selection, but it has no theoretical explanation. Meanwhile, WLAdaLASSO with BIC-based tuning parameter outperforms others in covariate selection and out-of-sample forecast [17]. Hence, we use BIC-based tuning parameters for covariate selection and out-of-sample forecast in simulation exercises and real data analysis.

eoretical Comparison.
To compare these techniques, we use Gauge, Potency, and out-of-sample RMSE. Gauge is the empirical null retention frequency of how irrelevant covariates are retained, whereas potency is known as correct covariate identifications. e comparison of regularization techniques and Autometrics assessed via a correct zero identification interpreted as potency, and incorrect zero identification is referred to as Gauge [39]. We use RMSE for out-of-sample forecasting to evaluate the performance of concerned techniques in a simulation study and real data analysis. If the approaches correctly identify the accurate model, the estimations of the following parameters should be expected: (1) Gauge approaches to nominal significance level α or tight significance level (0.01 or 0.001).
(2) Potency approaches 1 if considered estimation techniques efficiently estimate the accurate model.

Simulation Experiments and Results
e simulation study has been performed in R-free statistical software; for Autometrics, we used the gets package of R, which is freely available, and for regularization techniques, we use the glmnet for LASSO, AdaLASSO, and WLAda-LASSO as for SCAD and MCP, the ncvreg package. e performance of Autometrics for covariate selection and forecasting is assessed with two levels of significance 0.05 and 0.01.

Data Generating Process (DGP). We use Konzen and
Ziegelmann [17] DGP for statistical comparison. Regarding covariate and lag selection performance, our goal is to compare the gauge ("size") and potency ("power") of the model. We also emphasize the out-of-sample forecasting performance of considered techniques. To illustrate our purpose, we chose Konzen and Ziegelmann [17] DGP with 10 independent time-series covariates that follow AR(1) as N(0, 1) and i � 1, 2, . . ., 10. We assess the performance of considered techniques under different scenarios based on the same linear model with varying autocorrelation coefficients AR(1) ϕ equal to 0.1, 0.5, and 0.8 and T number of observations equal to 50, 100, and 500. e considered DGP is as follows: We employ WLAdaLASSO, Autometrics, and other regularization techniques to estimate the model. e lag length of dependent and independent regressors is equal to 5 throughout the simulation study with varying Tobservations and ϕ parameter of independent regressors. We eliminate the last ten observations of the simulated series to implement the out-of-sample RMSE. e RMSE of the out-of-sample forecast is reported in the below figures, and simulation is repeated 1000 times. Tables 1-3 illustrate the simulation findings of considered techniques in terms of average gauge and potency. e simulated result of out-of-sample RMSE is presented in the figures. e above indicates that, among all concerned techniques, WLAdaLASSO outperforms others in potency equal to 63.6%, with T being equal to 50, whereas Autometrics with a 0.01 level of significance retains the least potency of 16.1% on average. As the sample size increases, the performance of considered techniques is improved in average potency (increases) and average gauge (decreases). Table 1 indicates that Autometrics retains its average gauge around its nominal significance level (0.05) and 0.01 at the cost of least average potency. However, with an increase in sample Tequal to 500, the Autometrics with 0.05 significance performs near WLAdaLASSO both in potency and gauge. Among regularization techniques, LASSO, AdaLASSO, SCAD, and MCP perform inferior to WLAdaLASSO in gauge and potency. Tables 2 and 3 illustrate the simulated result with ϕ (autocorrelation coefficients) of regressors equal to 0.5 and 0.8. e WLAdaLASSO estimate outperforms others with an average potency of 64.5%, ϕ equal to 0.5, and T equal to 50. As the T sample of WLAdaLASSO increases, the average gauge approaches nominal significance level, and average potency approaches 1. e simulation result indicates that the WLAdaLASSO estimate is not sensitive to autocorrelation coefficients as with ϕ equal to 0.1 and T equal to 50; the average retains potency equal to 63.6% and 64.5% with ϕ being equal to 0.5. However, the Autometrics performs poorly as the autocorrelation coefficient increases from 0.1 to 0.5.

Simulation Results.
Meanwhile, Autometrics with ϕ equal to 0.8 and T equal to 50 possess 11.5% gauge, which is higher than the 5% significance level. e performance of Autometrics does not get enhanced (gauge ⟶α, and potency ⟶1) as the sample size increases with ϕ being equal to 0.8. However, WLAdaLASSO performs superior in average potency and gauge compared to all other techniques. e simulation experiment indicates that WLAdaLASSO performs robust even with a stronger linear dependence between predictors. With increasing sample, the performance of Autometrics, LASSO, AdaLASSO, SCAD, and MCP does not get enhanced as that of WLAdaLASSO.
e WLAdaLASSO performs superior to other considered regularization techniques and to Autometrics as well in average gauge and potency even with higher and weak linear dependency between predictors and small sample size. Figures 1-3 illustrate the RMSE of considered techniques with T samples being equal to 50, 100, and 500 and ϕ equal to 0.1, 0.5, and 0.8. e result shows that the WLAdaLASSO outperforms other techniques in out-of-sample forecasting.
e WLAdaLASSO estimate is insensitive to autocorrelation coefficients, as the forecast performance and average potency have not decreased because ϕ equals 0.8 even with a small sample. However, with ϕ being equal to 0.8, all other techniques perform poorly in out-of-sample forecasting, whereas WLAdaLASSO possesses the least RMSE. Autometrics with autocorrelation coefficient equals 0.1, and T equal to 50 performs poorly in RMSE compared to WLA-daLASSO, but with sample size increment, the RMSE decreases because the average potency increases. However, Autometrics with a ϕ being equal to 0.8 and T equal to 50 performs the worst among all other techniques. e overall simulation result indicates that WLAdaLASSO outperforms Autometrics and other regularization techniques in potency and out-of-sample forecasting.

Real Data Analysis
For the real data analysis, we aim to probe the determinants of the trade balance for Pakistan and implement the considered techniques and assess their performance. Trade has played an important role in developing countries as a growth engine in various eras. e trade deficit or surplus is a term used to describe trade imbalances. Since independence, Pakistan has been in a trade deficit, except for three years: 1947-1948, 1950-1951, and 1972-1973 [40]. According to economic literature, a variety of factors are thought to be responsible for long-term trade deficits in various economies, including ineffective public policies, shocks in major trading countries, oil price hikes if the economy is heavily reliant on oil imports, residents' socioeconomic conditions, and increased urbanization [41,42]. e existing studies in the case of Pakistan considered only a few macroeconomic Complexity variables as like GDP, exchange rate, broad money supply, inflation, and Foreign Direct Investment [40,[43][44][45][46][47]. is study intakes the Generalized Unidentified Model (GUM) that includes each and every possible determinant of trade balance with 11 regressors, namely, Domestic Investment (log), Domestic Consumption (log), FDI (log), GDP (log), Inflation (log), Budget Deficit (log), Remittances (log), Exchange Rate (log), Population (log), Urban population (log), and Government expenditure (log).
We use annual frequency data from 1980 to 2020. e data has been compiled from World Data Indicator. e model contains 11 regressors (with a difference) and includes 5 lags of each covariates and the lags of the dependent variable.
e GUM includes 71 covariates; due to    Figure 4 verifies our simulation findings as WLAdaLASSO outperforms all other techniques with the least out-of-sample RMSE being equal to 0.0069 followed by Autometrics (0.01) with RMSE of 0.018. Autometrics with 0.05 possesses a higher RMSE equal to 0.111 than Autometrics with a 0.01 significance level. e finding is aligned with the simulation experiment as Autometrics with a 0.05 level of significance possesses a slightly higher average gauge with higher RMSE than a 0.01 level of significance. SCAD, MCP, LASSO, and AdaLASSO estimate higher RMSE as the model selects more irrelevant covariates and lag than WLAdaLASSO and Autometrics. WLAdaLASSO selects three covariates, namely, difference of urban population (dupop), difference of log GDP lag 1 (dlnGDP(−1)), and difference of log population lag 4 (dpop(−4)). Autometrics with a 0.05 significance level selects five covariates and their lag, and with 0.01 significance level, it selects three covariates. dlnGDP(−1) is a common covariate between WLAdaLASSO and Autometrics with 0.05 and 0.01 significance levels. SCAD, MCP, and LASSO select too many covariates and their lag, due to which these techniques possess higher RMSE compared to T=50 T = 100 T=500

Complexity
WLAdaLASSO. However, AdaLASSO selects three covariates and their lag with RMSE being equal to 0.16, which is higher than that of WLAdaLASSO and Autometrics.

Conclusion
Regularization techniques have become extremely popular in time-series modeling in recent years due to availability of massive data. is study aims to analyze the performance of the WLAdaLASSO with Autometrics for covariate selection and forecasting. e simulation study illustrates that the WLAdaLASSO with the stronger linear dependency between predictors outperforms Autometrics and other regularization techniques. However, Autometrics with ϕ being equal 0.1, the performance of gauge ⟶ α (0.05 or 0.01 level of significance), potency ⟶ 1, and the Average RMSE also decrease, with sample size increment. On the contrary, the situation is limited to ϕ equal to 0.1; however, ϕ equal to 0.8 and increasing sample size does not significantly enhance the performance of Autometrics compared to WLAdaLASSO. Autometrics with 0.05 significances level includes irreverent covariates that increase the RMSE compared to 0.01 significance, and the finding is aligned with real data analysis. However, other than the WLAdaLASSO, all considered regularization techniques perform poorly in covariate selection and forecasting even with ϕ being equal to 0.1 and T equal to 50, whereas the performance of considered techniques is improved with an increase in sample size; still, WLAdaLASSO outperformed others among all simulation experiments. e simulation experiment and real data  8 Complexity analysis are evidence that the WLAdaLASSO is a more robust technique than all other considered regularization techniques and Autometrics as well in out-of-sample forecasting and covariate selection even with the stronger linear dependence between predictors and small sample size.

Limitations.
One of the study's constraints is that it only considers linear models and annual data. ese simulation experiments are limited to Gaussian distributed errors.

Data Availability
Coding of the simulation study, coding of the real data analysis, and data used for analysis can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.