Regularized Weighted Nonparametric Likelihood Approach for High-Dimension Sparse Subdistribution Hazards Model for Competing Risk Data

Variable selection and penalized regression models in high-dimension settings have become an increasingly important topic in many disciplines. For instance, omics data are generated in biomedical researches that may be associated with survival of patients and suggest insights into disease dynamics to identify patients with worse prognosis and to improve the therapy. Analysis of high-dimensional time-to-event data in the presence of competing risks requires special modeling techniques. So far, some attempts have been made to variable selection in low- and high-dimension competing risk setting using partial likelihood-based procedures. In this paper, a weighted likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. The proposed method which considers a larger class of semiparametric regression models for the subdistribution allows for taking into account time-varying effects and is of particular importance, because the proportional hazards assumption may not be valid in general, especially in the high-dimension setting. Also, this model relaxes from the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach. The performance/effectiveness of several penalties including minimax concave penalty (MCP); adaptive LASSO and smoothly clipped absolute deviation (SCAD) as well as their L2 counterparts were investigated through simulation studies in terms of sensitivity/specificity. The results revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L2 penalties outperformed the other methods in term of selecting less noninformative variables. The practical use of the model was investigated through the analysis of genomic competing risk data obtained from patients with bladder cancer and six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON were identified using all the methods and were significantly correlated with the subdistribution.


Introduction
The recent development of high-throughput biology provides powerful information about various phenotypic data including patients' survival times. One important task is to select a small subset of genes that are most relevant to sur-vival outcomes [1,2]. By uncovering the relationship between time to an event such as cancer and the expression profiles, one hopes to achieve more accurate prognoses and improved treatment strategies [3]. This issue is challenging for two main reasons. First, the number of covariates in microarray gene expression analysis or DNA sequencing data obtained from next-generation sequencing technology commonly far exceeds sample size (p > >n). Second, the availability and feasibility of standard analyses are severely affected by the high possibility of potential collinearity among different gene levels [2].
Variable selection techniques are powerful tools for sparse modeling in high-dimensional regression problems and finding the transcripts that most associate with the survival outcome, which aim to improve the predictive power and interpretability of the resulting model [4]. They are well developed in linear regression settings, and in recent years, many of them have been extended to the context of censored survival data [5]. For example, Cox-based methods utilizing the LASSO penalization [6][7][8][9], the elastic net (ENET) [1,10], the nonconcave penalized likelihood approach [11], and smoothly clipped absolute deviation (SCAD) [12] have been proposed.
When data on patient survival time contains competing events, such as 'progression' versus 'death from noncancer cause,' often the standard analysis involves modeling the cause-specific hazards functions of the different failure types [13][14][15]. Nevertheless, "while the cause specific hazards is useful for investigating the disease dynamics to get insights in disease mechanisms and biological processes, it is less appropriate for clinical decision support for which it is preferable to consider the cumulative incidence probability, the marginal probability of failure for a specific cause" [16]. Moreover, the effect of a gene signature on the causespecific hazards function of a particular failure type may be very different from its effect on the corresponding cumulative incidence function [13,17,18]. The synthesis interpretation of two cause-specific model fits is even more difficult in a high-dimensional setting, as the list of selected genes obtained from high-dimensional models are usually rather unstable [19]. So, under the cause-specific hazards formulation, it is not plausible to test the gene effects on the subdistribution, and the issues of model selection and efficient prediction cannot be directly addressed [13]. Some approaches have been proposed to deal with this situation. Fine and Gray [13] proposed a methodological framework for a formal direct synthesis model, which is the hazards attached to the cumulative incidence function. Their model adapts the semiparametric Cox proportional hazards model for the subdistribution hazard. "The method accommodates time-varying covariate effects on the subdistribution hazards and yields the usual nonparametric estimators in the absence of z" [20]. As the subdistribution hazards relates directly to the cumulative incidence function, only one model has to be fitted for describing the cumulative incidence function of the event of interest [19]. The estimation procedure in the proportional subdistribution hazards regression proposed by Fine and Gray [13] is based on a weighted partial likelihood function. Scheike et al. [21] introduced another approach to predict and model cumulative incidence probability by the direct binomial regression technique. They showed that this model is comparable with the Fine and Gray approach and can be fitted by standard packages. Other approaches include Andersen and Klein [22], Klein and Andersen [23], Fine [24], and Gerds et al. [25]. "None of the above methods adapt easily to time-varying covariates, which are most naturally accommodated in models for the hazards function, as with survival data without competing risks. Moreover, these methods do not reduce to the usual nonparametric estimators without covariates" [20].
Recently, some efforts have been made related to variable selection and direct modeling of cumulative incidence function for high-dimensional competing risk data including [19], Ambrogi and Scheike [16], Hou et al. [26], Hou et al. [27], Tapak et al. [28], Tapak et al. [29], Saadati et al. [30], Gilhodes et al. [31], and Fu et al. [32] based on different settings. None of the above approaches was likelihood-based procedures. In this regard, Bellach et al. [20] introduced "a weighted likelihood function that allows for a direct extension of the Fine and Gray model to a broad class of semiparametric regression models." Considering this larger class of semiparametric regression models for the subdistribution is of particular importance, because the proportional hazards assumption may not be valid in general [20], especially in the high-dimension setting. Also, by considering this class of semiparametric regression models, the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach is relaxed [20]. This model allows for timedependent covariate effects on the subdistribution hazards as well [20]. Moreover, likelihood-based inference is permitted [20]. On the other hand, the available packages include "crrp" and "glmnet." The current version 1.0 of "crrp" is designed for low-dimensional competing risk data and the "glmnet" provides only LASSO and elastic net penalties, and it is not possible to use other sparse penalties like the SCAD and the minimax concave penalty (MCP). Recently, Kawaguchi et al. [33] provided a R package named "fastcmprsk" for penalized variable selection with MCP, SCAD, LASSO, and elastic net penalties for competing risks based on the Fine and Gray model using subdistribution hazards model. They studied the performance of their model with p = 100 covariates and n = 1000 to 4000 sample sizes. The aim of the present study is to propose a penalized weighted nonparametric likelihood approach to regularized-based variable selection for competing risk data with high-dimensional covariates. This is the extension of the [20] to the high-dimension setting. We consider three popular penalties for individual variable selection: adaptive LASSO (ALASSO), SCAD, and minimax concave penalty (MCP). We also propose a group variable selection via elastic net (ENET), SCAD-L 2 , and MCP-L 2 . The proposed method, including the model, penalized likelihood approach, and estimation procedure, are described in Section 2. In Section 3, simulation studies are presented. An illustrative example using bladder cancer data is provided in Section 4. Some discussions are provided in Section 5.

Proposed Method
2.1. General Subdistribution Hazards Model. Following notations used by Fine and Gray [13], let T i and C i denote the failure time and the censoring time of the ith individual, respectively, with X i = T i ∧ C i as the observed time, and Δ i 2 Computational and Mathematical Methods in Medicine = IfT i ≤ C i ∧ τg is the noncensoring indicator (τ is the maximum time of the study). Furthermore, ε i ∈ f1, ⋯, Kg specifies the potential type of the failure, and Z i ðtÞ is a d × 1 possibly time-dependent covariate vector of bounded variation [13].
The focus of the present study is on modeling the cumulative incidence function for failure from cause 1, F 1 ðt ; zÞ = PðT ≤ t, ε = 1 | zÞ. To estimate F 1 , the modeling of the subdistribution hazards for the event of interest was proposed by Fine and Gray [13] which leads to direct estimating of F 1 without simultaneous estimating subdistribution functions corresponding to other failure types [34]. Specifically, the subdistribution hazards of the first event type are defined as follows: Considering AðtÞ as the cumulative subdistribution hazard, Bellach et al. [20] proposed the following general model for it: where β ∈ R d stands for the regression parameters and A 0 stands for an unspecified increasing function. Also, g is a thrice differentiable function which is strictly increasing and continuous with gð0Þ = 0, g ′ ð0Þ > 0, and gð∞Þ = ∞. For other regularity conditions, see [20]. These conditions guarantee the existence of the weighted nonparametric maximum likelihood estimations. The gð:Þ can have different forms including gðxÞ = fð1 + xÞ ρ − 1g/ρ for ρ ≥ 0 (the class of Box-Cox transformation models) and gðxÞ = log ð1 + rxÞ /r for r ≥ 0 (the class of logarithmic transformation models) [20]. Both links result in the Fine and Gray model as a special case (let ρ = 1 in the first one and r ⟶ 0 in the second link function).

Penalized Weighted Nonparametric Maximum
Likelihood Estimation. Assume that there are no tied event times. With NðtÞ = ∑ n i=1 N i ðtÞ (whereN i ðtÞ = IfT i ≤ t, ε i = 1g ) as the counting process of the event of interest and YðtÞ = ∑ n i=1 Y i ðtÞ as the at risk indicator, the weighted loglikelihood function under the general semiparametric regression model is as follows: where w i ðtÞ is obtained by using inverse probability of censoring weighting (IPCW) technique with w i ðtÞ = IfC i ≥ T i ∧ tg:Ĝ C ðtÞ/Ĝ C ðT i ∧ tÞ (whereĜ C is the product limit estimator of G C ðtÞ = PðC > tÞ).
We now define the regularized estimator b β as a solution to the regularization problem: where p λ ð:Þ is a penalty function that depends on the regularization parameter λ ≥ 0. The cumulative baseline hazards A 0 is approximated by a sequence of step functions (A 0 n ) with jumps at the observed events of interest. By considering the 0 <T j < τ ; 0 < j < kðnÞ as the ordered times with kðnÞ be the number of the events of interest and replacing A 0 by A 0 n , a modified penalized likelihood function, l pen,n ðβ, A 0 n Þ, is obtained which is maximized to yield the regularized estimators of the regression coefficients. In the maximization process, the estimators of A 0 n are obtained as A 0 . In the absence of covariates, a Nelson-Aalen type estimator of the subdistribution hazards is obtained by using the weighted likelihood function [20] which is derived from the weighted Doob decomposition which is the expected #subjects in the pseudorisk set [20]. Also, by considering the jump sizes of the baseline as a parameter, maximization of the following discretized loglikelihood: will yield the estimator of the parameters.
In this study, we only considered gðxÞ = x and gðxÞ = log ð1 + xÞ which corresponds to the proportional subdistribution hazards model and proportional odds model (nevertheless, the method can be extended to other link functions). Then, the weighted log-likelihood function takes the 3 Computational and Mathematical Methods in Medicine following form: This can be factorized into two parts including the Fine and Gray partial likelihood function and a second term: Without penalty term, for gðxÞ = x, estimation of parameters derived from the weighted log-likelihood function is identical to the estimations derived from the Fine and Gray model [20].
In this study, we considered the following penalties: (1) The adaptive LASSO (Zou 2006): p λ ðjβ j jÞ = λυ j jβ j j (2) The SCAD [11]: [35], p λ ′ðjβ j jÞ = ðλ − jβ j j/γÞ + , where γ > 1 is a tuning parameter (4) The adaptive elastic net (AENET) (Zou 2006 (5) The SCAD-L 2 Zeng and Xie 2020 [36] and MCP-L 2 penalties, where a L 2 penalty is appended to the SCAD and MCP penalties to induce grouping effect in variable selection Asymptotic properties of penalized estimators in different contexts have been investigated by different studies, and all the above penalties have been shown to enjoy the oracle property [26,27,32], i.e., these penalties are consistent in variable selection, and their estimators are asymptotically normal and unbiased. More explicitly, they work as well as knowing the true model in advance. Fan and Li [11] established the oracle property and the asymptotic normality of a general class of nonconcave penalized maximum likelihood estimators with diverging number of parameters and increasing sample size and provided conditions to establish oracle property. In the framework of the subdistribution hazards model, Fu et al. [32] showed that ALASSO, SCAD, and MCP penalized estimators obtained from the Fine and Gray model (as a special case of model (2)) possess the oracle properties and the asymptotic normality. They established a theorem that if a penalty term (say, p λ n ðjβjÞ) simultaneously satisfies the two following conditions for a n = max fp λ n ′ ðjβ j0 jÞ: β j0 ≠ 0g and b n = max fp λ n ″ ðjβ j0 jÞ: β j0 ≠ 0g: (1) a n = O p ðn −1/2 Þ and b n ⟶ 0 and (2) for any C > 0, lim n⟶∞ ffiffiffi n p inf jβj≤Cn −1/2 p λ n ′ðjβjÞ ⟶ ∞; then, the estimator enjoys the consistency in variable selection and asymptotical normality and unbiasedness. As Bellach et al. [20] showed that the Fine and Gray model is a special case of model provided in equation (2) (weighted NPMLE method), so the same results hold here under regularity conditions for the weighted likelihood.

Computational Algorithm.
To compute the coefficients, several algorithms have been suggested by different authors to optimize equation (3), including the path algorithm [7] and LARS [37]. However, the maximization in this paper was utilized through the efficient algorithm proposed by Goeman [38], which is a combination of gradient ascent optimization with the Newton-Raphson algorithm. This algorithm, a full gradient algorithm, follows the gradient of the likelihood from a given starting value of β. But, unlike the coordinatewise gradient approach, it uses the full gradient at each step instead of updating a single coordinate at a time. Moreover, the algorithm automatically switches to a Newton-Raphson algorithm when it gets close to the optimum to avoid slow convergence.
The weighted log-likelihood function in equation (3) and the ℓ 2 penalty term in equation (4) are highly regular functions in terms of being concave and at least twice differentiable everywhere. The L 1 penalty is less well-behaved as it is concave and continuous but is only differentiable at points with β i ≠ 0 for all i. Therefore, the conditions needed to apply the gradient ascent algorithm with Newton-Raphson steps need to be verified. Let us consider l pen ′ ðβ ; νÞ = lim t↓0 ð1/ tÞfl pen ðβ + tνÞ − l pen ðβÞg and l pen ″ ðβ ; νÞ = lim t↓0 ð1/tÞfl pen ′ ðβ + tνÞ − l pen ′ ðβÞg be the directional derivative and directional second derivative of the penalized likelihood defined in equation (3) for every β in every direction ν ∈ R d , respectively. Then, the gradient can be defined for any β as the scaled direction of the steepest ascent. Also, let ν opt (opt stands for optimum) be the direction that maximizes l pen ′ ðβ ; νÞ among all v with kvk = 1, l pen ′ is the derivative of penalized log-likelihood.
We utilized the following algorithm, proposed by Goeman [38], to compute coefficients: (1) Start with some β 0 (e.g., obtained from fitting the univariate Fine and gray model) and A 0 ðtÞ = n −1 Computational and Mathematical Methods in Medicine where if t opt ≤ t edge and sign β until convergence.
For the general form shown in equation (3),

Tuning Parameters.
There are various ways to find optimal penalized estimators including cross-validation (CV; which requires randomly splitting the data), generalized cross-validation (GCV), and Bayesian information criterion (BIC). As the random nature of splitting the data in CV makes the tuning parameters unstable [39] and the GCV may lead to the overfitting effect on the resulting model [35], we used the BIC, which has been shown to be consistent in identifying the true model [39]: where l is the weighted log-likelihood function,β maximizes the weighted log-likelihood function (the penalized estimator), and sðλÞ is the size of the model (the number of nonzero coefficients) [35].

Simulation Study
The proposed variable selection method was investigated through different simulation scenarios. Competing risk data with two possible events (causes of failure) were simulated, where the event of type 1 was the one of interest (I) and the event of type 2 was the competing risk (C). To construct a high-dimension setting, d = 5000 covariates were considered with n = f200, 400g observations. Among covariates, similar to other studies, 5 informative covariates with effects on the subdistribution hazards for events of type 1 and/or 2 were considered; the vector of regression parameters for cause 1 was considered β 1 = ð0:5, 0:5,−:05, 0:5,−0:5 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} 5 , 0, ⋯, 0 |fflfflffl{zfflfflffl} 4995 Þ, and for cause 2, it was β 2 = −β 1 . The values of 0.5 and -0.5 were considered for increasing and decreasing effects, and for the covariates with no direct effect on the hazards, the value of 0 was considered. In all scenarios, covariates were generated from a multivariate normal distribution with mean zero and covariance matrix ðρ ji−jj Þ d i,j=1 . We considered ρ = f0:1, 0:5g as in Lin and Lv [4].
Following the strategy used by Fine and Gray [13], event times were generated based on proportional subdistribution hazards. To this end, after generating covariates, considering Computational and Mathematical Methods in Medicine k as the ratio of I/C, the subdistribution for the first event (type 1) was generated as follows: which is a unit exponential mixture, with mass 1 − k at ∞ when all covariates are zero. The subdistribution for the second event type was generated using an exponential distribution with rate exp ð∑ d i=1 β i1 z i Þ by taking Pr ðε i = 2 | z i Þ = 1 − Pr ðε i = 1 | z i Þ. Moreover, we considered proportional odds model. So, in this setting, the subdistribution for the events was defined by F 1 ðt | Z i Þ = exp ½k + log f Censoring times were generated from a uniform Uð0, aÞ distribution. The value of a = 3 was selected to yield average censoring rate for 40% of the observations. We also considered k = f 0:2, 0:5, 0:8g. Because the calculated estimations for the coefficients were biased toward zero, we focused on the accuracy of variable selection (relevant covariates). Therefore, variable selection was expressed in terms of the sensitivity or the true positive rate (TPR; the number of correctly identified informative/relevant variables (true positives; variables that associated with to the cumulative incidence function of the event of interest, say event (1) divided by the total number of informative variables) and the false positive rate (FPR; the number of unrelated variables chosen divided by the total number of irrelevant variables) with respect to the event of interest (e.g., event 1). For the sake of comparison, the boosted subdistribution hazards regression model [19] was considered and implemented in R package CoxBoost [40]. The constant a in the SCAD and SCAD-L 2 penalty functions was fixed as a = 3:7.

Computational and Mathematical Methods in Medicine
The mean and standard deviation of different scenarios (twelve scenarios) over 500 replicates were summarized in Tables 1 and 2, respectively. Table 1 shows the results for independent covariate scenarios. As seen, ALASSO and AENET had a very close performance in this setting and MCP and MCP-L 2 outperformed ALASSO and AENET in that they selected sparser models with a better sensitivity and a greater specificity or a lower FPR (a better ability in eliminating irrelevant variables). Moreover, as expected due to the similarity, SCAD, SCAD-L 2 , MCP, and MCP-L 2 had comparable performance, of which MCP and MCP-L 2 selected a slightly sparser model than the SCAD and SCAD-L 2 . Generally, all penalized estimators get at least three out of the five relevant nonzero variables. For k = 0:2 and n = 200, the sensitivities are the lowest compared with other scenarios. Considering TPR, Boosting outperformed the penalized methods, especially in the settings k = 0:2. As the k increases from 0.2 to 0.8, the TPR of the penalized methods became closer to TPR of the Boosting method. On the other hand, considering FPR, Boosting showed similar performance with the penalized methods for the k = 0:2 setting. Nevertheless, as the k increases from 0.2 to 0.8, the FPR of the penalized methods decreases, while there was no change in the FPR of Boosting. Simulation studies of other studies [16] showed that the results of Boosting method is in line with the nonconcave penalty of LASSO which tends to include more irrelevant variables. Moreover, the "Cox-Boost" package uses cross-validation method to choose tuning parameters (a prediction-based criterion), which predispose the method to include too many irrelevant variables in LASSO type procedures [41,42]. Moreover, in general, for both n = 200 and n = 400 settings, it was observed that the performance the MCP and MCP-L 2 performed best among the others, with a performance very close to that of the oracle estimator especially when the ratio of I/C is 0.8 (the lower rate of competing event). This finding was in  [4] where they proposed penalized additive hazards models for survival data analysis with one failure cause. Table 2 shows the results for moderate correlation between covariates (ρ = 0:5). Again, MCP and MCP-L 2 outperformed other methods in almost all scenarios and its performance was closer to that of the oracle estimator, especially when n = 400 and k = 0:8. As there was moderate correlation between covariates, the AENET, SCAD-L 2 , and MCP-L 2 penalties showed a greater TPR compared with the L 1 penalties including ALASSO, SCAD, and MCP. For all methods, the sensitivities increase and FPRs decrease as k increases from 0.2 to 0.5 and 0.8. For n = 400, the average number of selected variables decreases slightly and better sensitivities were resulted in compared with n = 200. Comparing the results provided in Tables 1 and 2, it was revealed that in the presence of moderate correlation compared with low correlation, the TPRs diminish for all penalized models. However, the Boosting method is almost robust to moderate correlation.
We also conducted simulations with various regression coefficients. So, β 1 (Table S1). According to the results, the selection results were robust. These results were in accordance with the findings in Fu et al. [32] and Zhang and Lu [43]. We also designed some scenarios for proportional odds method with gðxÞ = log ð1 + xÞ. Table 3 shows the results of the simulation studies for proportional odds model ðgðxÞ = log ð1 + xÞÞ with 5 informative variables for ρ = 0:1 and ρ = 0:5, about 40% average censoring and k = I/C = 0:5. According to the results, the performance of different penalties in the proportional odds model was similar to those of the proportional hazards and again the MCP and MCP-L 2 outperformed the ALASSO and AENET in terms of greater TPR and lower FPR.
To investigate the grouping effect or the performance of the models in the presence of high correlations in Table 3: Results of the simulation studies for proportional odds model (gðxÞ = log ð1 + xÞ) with 5 informative variables (d = 5000) for ρ = 0:1 and ρ = 0:5 scenario. Values shown are means (standard deviations) of each performance measure over 500 replicates (b = 3:~40% average censoring; I/C = 0:5).     No.  18  19  26  22  20  21  12 9 Computational and Mathematical Methods in Medicine high-dimensional settings, we also considered high correlations. In this regard, following the strategy considered by Zeng and Xie [36], two groups were considered for informative variables, each including 3 variables as follows: with e i~i :i:d: Nð0, 0:01Þ, i = 1, ⋯, 6. The results were reported in terms of selection accuracy illustrated by the number of nonzero coefficients correctly identified (TPR) and the number of zero coefficients misspecified as nonzero coefficients (FPR). Table 4 illustrates variable selection accuracy of different methods. For noninformative variables, we summarized the results as the average of FPR over 500 repetitions. From Table 4, we see that, in all settings, the variables X 1 , X 2 , and X 5 were selected by all methods. In the presence of high correlations, the rate of model misspecification was high, which was due to the fact that the MCP, SCAD, and ALASSO penalties and Boosting tend to select only one variable from a group of variables that are highly correlated and it is not important which one is selected. However, the three penalties of SCAD-L 2 , MCP-L 2 , and AENET selected X 3 , X 4 , and X 6 in addition to X 1 , X 2 , and X 5 in all settings indicating that they enjoy the grouping effect which has been discussed by Zou and Hastie [44] and Huang et al. [45]. These findings were in concordance with those of other studies with other responses [1].

Application to Bladder Cancer Data
We used a publicly available time-to-event dataset with competing risks which corresponds to preprocessed 1381 custom platform microarray features (GEO with series accession no.GSE5479) from patients with bladder cancer to illustrate the proposed techniques. Bladder cancer is a common malignant disease with two different forms including non-muscle-invasive tumors (stages Ta and T1) and muscle-invasive cancers (stages T2-T4) [46]. This dataset includes information about a sample of n = 404 patients with pTa and pT1 tumors, with no previous or synchronous muscle-invasive tumors. In addition to gene expression measurements, this dataset contains potentially important clinical covariates including age, sex, stage (pTa versus pT1), grade (PUNLMP/low versus high), and treatment. There was complete information for only n = 301 patients, and we limit our analysis to this subset. There were also two competing events: time to progression or death from bladder cancer (the event of interest) and death from other or unknown causes. Progression or death from bladder cancer and competing events were observed in 74 and 33 patients, respectively. In addition, there was censoring for 194 patients [46].
The proposed method was applied to this microarray bladder cancer data for 'progression or death from bladder cancer' as the event of interest. Table 5 shows gene signatures selected by each method. ALASSO, AENET, SCAD, SCAD-L2, MCP, MCP-L2, and Boosting selected 18,19,26,22,20,21, and 12 genes, respectively. As can be seen, there are several genes that are related to bladder cancer biologically. Among all genes selected, there were six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON selected by all methods. Table 6 shows regression coefficients of six common genes selected by all methods correlated with bladder cancer patients' subdistribution hazards. According to the results, increasing the expression of CDC20 increases the incidence of death from bladder cancer (or progression of the disease) by 2.68 times. Moreover, increasing the expressions of NCF2, ETFDH, and SON genes are positively correlated with the incidence of death from bladder cancer. On the other hand, increasing the expression of SMAR-CAD1 and RTN4 decreases the incidence of death from bladder cancer.
Electron transfer flavoprotein dehydrogenase (ETFDH), a mitochondrial inner membrane protein, plays an essential role in the electron transfer chain [47]. The expression level of ETFDH correlates with overall survival in hepatocellular carcinoma patients [48]. Reticulon-4 (RTN4) has an essential role in cancer development and progression. The expression level of RTN4 was associated with patients' survival for several cancers [49,50]. Neutrophil cytosolic factor 2 (NCF2), as a novel target of P53, has a critical role in cancer progression [51]. SON DNA-binding protein (SON) plays role in mRNA transcription and pre-mRNA splicing. Moreover, SON can control macrophage activities and cell cycle progression [52]. A recent study by Furukawa et al. indicated that SON has an essential role in pancreatic cancer proliferation and tumorigenesis [53]. SMARCAD1 has a critical role in chromatin remodeling and control gene expression. On the other hand, SMARCAD1 plays an essential role in the homologous recombination (HR) process for DNA double-strand break (DSB) repair. Recent studies show that SMARCAD1 involve in the proliferation and progression of pancreatic and breast cancers [54,55]. CDC20 (Cell Division Cycle 20) encodes a regulatory protein that is an essential cell cycle regulator. Recent studies indicated that CDC20 dysregulation is correlated with tumor progression and prognosis in several cancers [56].

Discussion and Conclusions
Unique challenges are created in statistics due to rapid accumulation of massive information for patients in medical researches. In this regard, the need for selecting informative variables and eliminating noise variables (e.g., noninformative variables) as an important issue highlights the necessity for novel robust data analysis methods. This study proposed a penalized weighted nonparametric likelihood-based approach for sparse variable selection in high-dimension competing risk data setting. The proportional hazards model may not be satisfied for some covariates, and it cannot be assessed in high-dimension setting. The proposed model allows for taking into account time-varying effects. Also, this model relaxes the constraint of the ability to simultaneously model multiple cumulative incidence function using the Fine and Gray approach. As in nonpenalized setting [20], the regularized weighted nonparametric likelihood approach is extendable to a general class of semiparametric transformation models even to nonproportional subdistribution hazards setting.
We evaluated the performances of several penalties, including ALASSO, AENET, SCAD, and MCP, and their L 2 counterparts called SCAD-L 2 and MCP-L 2 empirically through comprehensive simulations in high-dimensional settings with different covariate structures in terms of TPR and FPR. Although, the penalized proportional subdistribution hazards model have been proposed in previous studies, this study considered more scenarios with more penalties. Other works considered only low dimension with different penalties [32] or high dimension with a few L 1 penalties with no more than 1000 variables [26,27]. Our findings revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L 2 penalties outperformed the other methods in term of selecting less noninformative variables. Also, the results of MCP and MCP-L 2 were closer to the oracle estimator compared with other penalties. For correlated structures, the penalties with L 2 term including SCAD-L 2 , MCP-L 2 , and AENET enjoyed the grouping effect and showed better performance which was in concordance with similar studies with other responses like count [57]. Moreover, Fu et al. [32] established asymptotic properties of penalized estimators obtained from the Fine and Gray model. In the framework of the Cox model, Fan and Li [58] extended these properties to the Cox proportional hazards model [58] which is a special case of NPMLE (nonweighted). While there are special cases of the weighted NPMLE that the oracle property of the penalized estimators has been established, there is a need to investigate conditions for the general class of models in equation (2) theoretically, especially for the time-varying covariates framework. So, this would be a subject for future studies.
One useful feature of the penalized weighted nonparametric maximum likelihood approach is that the AIC and BIC can easily be calculated as a simple tool for model selection, as it was used here. This resulted in a more stable variable selection approach compared with the models that uses cross-validation.
Variable selection in the survival setting, in general, is a difficult issue and is even more challenging in the competing risk setting. As a result, a relatively large sample size is required to make reliable inference [4]. Strategies that combine the strengths of a variety of approaches and regularization methods, in situations where the proposed methods may fail, could be used as building blocks in developing more powerful procedures [4].
The proposed approaches were applied to a bladder cancer dataset with gene signature survival data and competing risks. The fitted model based on the subdistribution hazards was shown to identify genes that were related to cancer events. Most of the genes found here have known functions in cancer-related pathways, especially in bladder cancer [59][60][61][62][63][64]. Although we applied the proposed method over a gene expression data, it can be easily applied to other types of high-dimension data like single-nucleotide polymorphism.
The main objective of the present study was to explore variable selection methods in high-dimensional competing risk data based on the subdistribution hazards. Although, we have focused on the multiplicative hazards model, the techniques here can be adapted to other survival models such as additive hazards approaches, which have promising characteristics. This issue is an interesting topic for future research. In simulations and data analysis of this study, we only considered identical link function from Box-Cox transformation class and one a link function for proportional odds model from logarithmic transformation class, but, comparing and considering other types of link functions (gð:Þ) is another potential topic for future studies which would be interesting with more scenarios. Extension of the proposed model to the cure mixture models is another possible future work.

Additional Points
Key Points. (i) Analysis of high-dimensional competing risks requires models that consider the event of interest and the competing events simultaneously, while also dealing with censoring. (ii) A likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. (iii) Some widely used penalties, including ALASSO, AENET, SCAD, and MCP, and their L 2 counterparts called SCAD-L 2 and MCP-L 2 were considered. (iv) Simulation studies showed that the proposed methods performed effective in identifying important variables in highdimension competing risk data. (v) Analysis of a real genomic competing risk dataset obtained from patients with bladder cancer revealed a set of genes associated with the incidence of death.