A 5-lncRNA Signature Associated with Smoking Predicts the Overall Survival of Patients with Muscle-Invasive Bladder Cancer

Increasing evidence demonstrated that noncoding RNA is abnormally expressed in cancer tissues and serves a vital role in tumorigenesis, tumor development, and metastasis. The aim of the present study was to determine an lncRNA signature in order to predict the overall survival (OS) of patients with muscle-invasive bladder cancer (MIBC). A total of 246 patients with pathologically confirmed MIBC in The Cancer Genome Atlas (TCGA) dataset were recruited and included in the present study. We choose patients who have smoked less (including never smoking) or more than 15 years. A total of 44 differentially expressed lncRNAs were identified with a fold change larger than 1.5 and a P value < 0.05 through the limma package. Subsequently, a comparison between patients with no tobacco smoke exposure for <15 years and patients who had been exposed to tobacco smoke for >15 years was performed by using the matchIt package. Among the 44 differentially expressed lncRNAs, 5 lncRNAs were identified to be significantly associated with OS. Based on the characteristic risk scores of these 5 lncRNAs, patients were divided into low-risk and high-risk groups and exhibited significant differences in OS. Multivariate Cox regression analysis demonstrated that the 5-lncRNA signature was independent of age, tumor-node metastasis (TNM) staging, lymphatic node status, and adjuvant postoperative radiotherapy. In the present study, a novel 5-lncRNA signature was developed and was demonstrated to be useful in predicting the survival of patients with MIBC. If validated, this lncRNA signature may assist in the selection of a high-risk subpopulation that requires more aggressive therapeutic intervention. The risk scores involved in several associated pathways were identified using gene set enrichment analysis (GSEA). However, the clinical implications and mechanism of these 5 lncRNAs require further investigation.


Introduction
Bladder cancer is diagnosed in~74,000 patients in the USA and in >430,000 patients worldwide, making it the 4th most common cancer in men and the 11th most common cancer in women [1]. In China, there were~80,500 new cancer cases including 62,100 in men and 18,400 in women in 2015 [2]. The standard treatment for bladder cancer depends on the tumor (T) stage [3]. The preferred treatment for nonmuscle-invasive bladder cancer (NMIBC) is transurethral resection of the bladder tumor (TURBT) combined with intravesical chemotherapy, when the T stage is <T2 [4]. Depending on the pathological grade and other risk factors, 24-49% of patients undergo repeat transurethral resection of the bladder tumor (Re-TURBT) [5]. Approximately 25% of patients are diagnosed with muscle-invasive bladder cancer (MIBC) and have a less favorable prognosis, with a 5year survival rate < 40% [6]. Although there are improvements in surgical techniques and adjuvant therapy, the complexity and high cost of the procedure remain extremely challenging, and the treatment has not been advanced in decades [7]. Therefore, in order to further comprehend and predict patient prognosis and to develop novel biological therapies, there is an urgent requirement to identify reliable biomarkers for MIBC.
Although the etiology of bladder cancer has not been completely elucidated, bladder cancer has been linked to tobacco smoke (TS) [8], exposure to radiation or chemicals, and other risk factors, such as environmental exposure. Studies have demonstrated that TS is closely associated with the occurrence and development of bladder cancer. It is reported that TS is one of the most significant single risk factors for bladder cancer with a causal relationship of 40-60% [9]. The risk of bladder cancer among smokers has been estimated to be 5 times higher compared with that among nonsmokers [10]. A positive smoking history was identified as one of the independent risk factors for bladder tumor recurrence after transurethral resection of the bladder tumor. Furthermore, refraining from smoking for 15 years or more reduced the risk of tumor recurrence in former smokers with NMIBC regardless of the intensity or duration of smoking [11].
There have been many studies that have demonstrated that the expression of a single gene is associated with bladder cancer, and this includes not only RNA encoding but also noncoding RNA [12]; the present study has no way to clearly explain its OS mechanisms. However, owing to the complex pathological processes of bladder cancer, the use of a single gene to clarify the appearance of OS would more readily introduce a false-positive result. For MIBC, the understanding of long noncoding RNAs (lncRNAs) remains to be limited, and to the best of our knowledge, at present, there is no relevant study regarding lncRNA characteristics for the prediction of OS using a publicly available dataset. Therefore, the present study was aimed at acquiring more information from the available datasets. By using The Cancer Genomic Atlas (TCGA) dataset, the present study attempted to establish whether there was a set of lncRNAs that could distinguish between more aggressive phenotypes and poor survival outcomes in patients. Following this, an lncRNA novel signature was highlighted, which was demonstrated to accurately predict the survival of patients with MIBC.  [13] and the UCSC Xena website (http://xena.ucsc.edu/), respectively. The exclusion criteria were set as follows: (1) if the histological diagnosis was not MIBC (n = 1), (2) samples with clinical data but without T stage data (n = 1), and (3) missing important clinical data, including diagnosis subtype (n = 5 cases). Overall, the present study recruited 246 patients with lncRNA expression data and pertinent clinical information available. Furthermore, we also excluded the lncRNA expression information of the adjacent normal tissues from 23 patients from TCGA dataset.

Statistical and Data
Mining Analyses of TCGA BLCA lncRNA Profiles. The neoplasm histological grade, pathological stage, and diagnostic subtype were used to confirm the clinicopathological features of the two groups by using the matchIt package (ratio = 1, caliper = 0:05). Finally, 13 vs. 13 patients (no relapse or metastasis with smoking more than 15 years vs. relapse or metastasis with smoking less than 15 years including no smoking) were generated from the matched clinicopathological features of the patient [11].
The present study utilized the "limma" package to identify differentially expressed lncRNA between the two previously mentioned groups [14]. The fold change and P values were FC > 1:5 and P value < 0.05. The lncRNA "Pheatmap" package was used to plot the heat map of differentially expressed lncRNAs. The different lncRNAs were analyzed in a large sample using two items of logistic regression and a single-factor Cox proportional hazards regression, using the median expression value as the cut-off point, to determine potential lncRNA associated with OS by using the "osgeneral" package. The coefficient of each lncRNA was measured with a multivariable Cox regression hazard model with all selected lncRNAs.
The risk score was calculated as the following formula [15]: where β i indicates the coefficient and x i refers to the relative expression value of the corresponding gene, dividing the patients into a high-risk group or a low-risk group with the median as a cut-off point. In this formula, value m refers to the total number of genes in the panel, and the coefficient n is the regression coefficient in all selected lncRNAs in the multi-Cox regression analysis. Pearson's chi-square test was used to examine the association between clinicopathological features of patients and the 5-lncRNA signature risk score. The unpaired t-test was used to compare the risk score between two different subtypes of patients. The Kaplan-Meier survival was used to assess the survival distribution between the classification sets. The log-rank test was used to assess the statistical significance between stratified survival sets. The area under the receiver operating characteristic (ROC) curve (AUROC) was calculated from the ROC curve using the "survivalROC" package. P value < 0.05 was considered significant. All of the above data were processed by using packages in R 2.15.3 and 3.5.3 and SPSS for Windows, version 22 (IBM Corp., Armonk, NY, USA).

GSEA KEGG Pathway Analysis Settings.
Gene set enrichment analysis (GSEA) was implemented using Java software (http://software.broadinstitute.org/gsea/index.jsp), and the gene expression file and phenotype file (high/low-risk group) were prepared according to the guideline of GSEA. The parameters were set as 1000 permutations, at least 5 genes in a single pathway, and used the KEGG pathway.  Table 1). In addition, a total of 23 patients with adjacent nontumor tissue were not involved in the screening for differential expression of lncRNAs among bladder cancer tissue. During a follow-up, 104 patients died.

Differentially Expressed lncRNAs of MIBC between
Patients with/without Relapse-or Metastasis-Related TS.

Establishment of lncRNA Signatures Associated with
Survival in Patients with MIBC. In order to determine candidate lncRNAs with prognostic features, differentially expressed lncRNAs were further analyzed using binomial logistic regression and univariate Cox proportional hazards regression. A total of five lncRNAs were identified to be significantly associated with OS (P < 0:05). Of these five lncRNAs, one (AC090587.5) was positively correlated with OS and the remaining four lncRNAs (RP5-827C21.4, AL365277.1, RPARP-AS1, and CTD-2576F9.2) were negatively correlated with OS. According to the risk score, the median risk was taken as the critical value, and the patients were divided into a high-risk group (n = 123) and a lowrisk group (n = 123). Compared with the low-risk group, the high-risk group of patients exhibited a poorer OS ( Figure 1). There were significant differences in the distribution of age, lymph node status, and American Joint Committee on Cancer (AJCC) T stage, but not in the pathological stage (Table 3). In addition, univariate and multivariate Cox regression analyses were performed to examine the prognostic impact of the 5-lncRNA signature on OS. As summarized in Table 4, univariate analysis demonstrated that the 5-lncRNA signature, age, tumor subtype, AJCC T stage, lymphatic status, lymphovascular invasion, and pathological stage were significantly associated with patient OS; however, sex, histological grade, extracapsular extension, family history, and smoking status were not. In a subsequent multivariate analysis, the 5-lncRNA signature continued to indicate significance by a two-sided log-rank test (hazard ratio (HR), 2.210; 95% confidence interval (CI), 0.845-2.430), revealing that the 5-lncRNA signature was an independent prognostic factor.
Although by multivariate analysis, this 5-lncRNA-based risk score was not associated with the AJCC T stage, a positive correlation was found for this signature when patients were stratified according to the T stage (T ≤ 2 vs. T > 2; Figure 2, P = 0:0067, unpaired t-test). Patients with a high T stage (T > 2) exhibited higher risk scores. Therefore, a subgroup analysis was performed to determine whether the prognostic value of the 5-lncRNA signature is suitable for all patients, irrespective of the AJCC T stage. As illustrated in Figure 3, it was identified that patients with T stage < 2 were statistically significant (P = 0:0072 vs. P = 0:0169) from patients with T stage ≥ 2.
In addition, its validity in the prediction of relapse-free survival (RFS) was examined between the low-and highrisk score groups ( Figure S1). A clear statistical difference between the two groups (P < 0:001) was identified. Additionally, in a multivariate Cox regression analysis, the 5-lncRNA signature remained a prognostic factor for RFS independent of the AJCC T stage, lymphatic vessel status, pathological stage, and lymphovascular invasion (Table 5). Finally, the ROC package was implemented in order to compare the accuracy of the predictions calculated by the multivariate logistic model with the 5-lncRNA signature or with the TNM stage. It was observed that the addition of the 5-lncRNA signature led to a 3.6% and 10.7% increase in accuracy in predicting the 2-year and 5-year OS (AUROC  Figure 4). Similar results were obtained by predicting the RFS. When the 5-lncRNA signature was added to the multivariate logistic model, the accuracy levels increased by~5.5% and 3.7%, respectively, during the prediction of disease-free survival for 2 and 5 years, respectively. These results indicate that the 5-lncRNA signature may have promising survival prediction capabilities in patients with bladder cancer.
3.4. lncRNA Signature-Associated Signaling Pathways. The present study performed gene set enrichment analysis (GSEA) (http://software.broadinstitute.org/gsea/index.jsp) to identify potential relevant signaling pathways and biological processes in MIBC. As illustrated in Figure 5, certain cancer-associated pathways such as systemic lupus erythematosus, glycine serine and threonine metabolism, and glycylglycine biosynthesis of keratin sulfate are enriched in the high-risk patient group, suggesting that the 5-lncRNA signature may be involved in cancer metabolism.

Discussion
Urothelial carcinoma is the most common type of bladder cancer [16]. Certain novel drugs, such as PD-L1 inhibitors, propose novel therapeutic options and an improved prognosis [17]. However, the heterogeneity of tumors necessitates the exploration of individualized treatments and prognostic biomarkers. Noncoding RNAs have been the focus of a plethora of studies; however, the oncological value of lncRNAs, as a novel type of noncoding RNA, is currently unclear in the clinical setting. Consequently, the present study focused on the clinical application of lncRNAs in bladder cancer and explored the underlying complex biological function involved in various cancer types, including MIBC, such as cell migration and invasion [18]. In addition, certain lncRNAs including UCA1 [19], lncRNA-n336928 [20], HNF1A-AS1 [21], and ZEB2-AS1 [22] were associated with higher grade, higher TNM staging, progression, metastasis, or unfavorable survival results. It has been established previously that the expression of lncRNAs is relatively low, and using a single lncRNA alone may easily introduce bias into the results obtained. The combination of candidate lncRNAs not only improves the accuracy of results but also reduces this difference. In recent years, signatures based on RNA sequencing analysis [23][24][25][26] have been developed to identify subgroups that have more aggressive phenotypes or poor survival outcomes in several types of cancer, such as liver cancer [27]. However, the majorities of constructed signatures have failed to describe clinical features and are unable to work in tandem with the clinical needs of patients. The selection of an lncRNA signature was detected between the tumor tissue

Disease Markers
and adjacent tissues of study participants, which did not accurately reflect the clinical features observed. The present study, to the best of our knowledge, is the first to demonstrate that some nontobacco smokers and smokers who smoked for a shorter period of time exhibited recurrence or metastasis. Conversely, patients who had smoked for a longer period had a good prognosis. Between these two groups, a total of 44 differential expressed lncRNAs were identified. Among these lncRNAs and through excavating the 246 RNA sequencing data from patients with MIBC, a 5-lncRNA signature associated with the AJCC T stage and worse patient outcome was identified. Further multivariate analysis demonstrated that the 5-lncRNA signature was an independent predictor of RFS and OS in patients with MIBC.

Disease Markers
To examine the independence of our the 5-lncRNA signature in predicting OS, a multivariate Cox regression analysis was conducted, including age at the time of diagnosis, AJCC T stage, AJCC N stage, lymphovascular invasion, tumor subtype, and pathological stage as covariates. Patients with MIBC with a high AJCC T stage are associated with MIBC-related poor survival prognosis. Similarly, other parameters of malignancy including the AJCC T stage and lymph node status also have a prognostic impact on patient survival. In the univariate analysis, all the examined covariates revealed significant relevance with OS. However, the risk score for 5-lncRNA signature maintains its prognostic impact on OS. Therefore, the present study can conclude that the 5-lncRNA signature identified may serve as an independent prognostic factor for OS. In addition, a positive correlation between the risk score and AJCC T stage was identified. As illustrated in Figure 3, the patients with a higher AJCC T stage typically exhibited higher risk scores (P = 0:01).

Disease Markers
Subgroup analysis based on the AJCC T stage was subsequently performed. Surprisingly, the statistical significance of the survival curve between patients with both the AJCC T stage equal to 2 and AJCC T stages higher than 2 indicates the suitability of this signature for low and high AJCC T stages.
Furthermore, ROC analysis demonstrated that the 5-lncRNA-signature AUROC was equal to 0.668 in 2-year OS predictions, 0.714 in 5-year OS predictions, 0.647 in pre-dicted 2-year RFS, and 0.610 in 5-year RFS predictions. This information exceeds the level of detail previously provided by pathological stages. At present, pathological staging is an element that contributes to risk scores and has been demonstrated to correlate with patient survival [28]. Based on standard treatment, high-risk score patients may require more aggressive treatment or additional adjuvant therapy.
Regarding the characteristics of the five lncRNAs, overexpression of AC090587.5 was correlated with a lower survival  2) were significantly higher in the high-risk group when compared with the low-risk group. However, the functions of these four lncRNAs in MIBC have not yet been reported. At present, a number of studies have demonstrated that the proliferation of MIBC cell lines may be regulated by specific lncRNAs [29,30]. For example, upregulated lncRNAs in MIBC tissues was demonstrated to affect the proliferation of MIBC cell lines through the regulation of ZIC2 and the PI3K/AKT signaling pathway [31]. Additionally, the lncRNA XIST that interacted with miR-124 may largely affect the growth, invasion, and migration of MIBC cells [32]. In order to obtain an improved understanding of these lncRNAs in MIBC, additional functional studies should be conducted.
Several limitations of the present study should be considered. Firstly, partial lncRNAs we include were not annotated in the dataset but were still included in the present study. Secondly, the primers of these lncRNAs were not designed; therefore, the signatures in the Fudan University Cancer Center (Shanghai, China) cohort could not be verified. Furthermore, there is a lack of an appropriate mechanism to investigate the prognostic value of lncRNAs in MIBC. In addition, the specific roles of 5 lncRNAs on MIBC phenotypes are unclear. Finally, these findings were summarized in a published dataset without a prospective testing in clinical trials; although the large sample size provides support for these findings, in vitro and in vivo studies should also be performed.

Conclusion
It is well known that prognosis monitoring is a key issue through the treatment process of MIBC [33]. In the present study, five lncRNA signatures, which correlated with the AJCC N stage, lymph vascular invasion, tumor subtype, and pathological stage, were demonstrated to be independent predictors of OS and RFS. Although not significantly different, the predictive worth of the 5-lncRNA signature was more accurate than that of the pathological stage. Clinically, the 5-lncRNA signature has the potential to be developed into an easy-to-use prognostic model to facilitate further stratification in patients. In addition, the 5-lncRNA signature may have clinical value as a therapeutic target; therefore, the clinical significance and mechanism of action of these five lncRNAs require further study.

Data Availability
The data used to support this study can be available from http://xena.ucsc.edu/ and http://bioinformatics.mdanderson .org/main/TANRIC:Overview.