A Three Protein-Coding Gene Prognostic Model Predicts Overall Survival in Bladder Cancer Patients

Bladder cancer (BLCA) is the most common urinary tract tumor and is the 11th most malignant cancer worldwide. With the development of in-depth multisystem sequencing, an increasing number of prognostic molecular markers have been identified. In this study, we focused on the role of protein-coding gene methylation in the prognosis of BLCA. We downloaded BLCA clinical and methylation data from The Cancer Genome Atlas (TCGA) database and used this information to identify differentially methylated genes and construct a survival model using lasso regression. We assessed 365 cases, with complete information regarding survival status, survival time longer than 30 days, age, gender, and tumor characteristics (grade, stage, T, M, N), in our study. We identified 353 differentially methylated genes, including 50 hypomethylated genes and 303 hypermethylated genes. After annotation, a total of 227 genes were differentially expressed. Of these, 165 were protein-coding genes. Three genes (zinc finger protein 382 (ZNF382), galanin receptor 1 (GALR1), and structural maintenance of chromosomes flexible hinge domain containing 1 (SMCHD1)) were selected for the final risk model. Patients with higher-risk scores represent poorer survival than patients with lower-risk scores in the training set (HR = 2.37, 95% CI 1.43-3.94, p = 0.001), in the testing group (HR = 1.85, 95% CI 1.16-2.94, p = 0.01), and in the total cohort (HR = 2.06, 95% CI 1.46-2.90, p < 0.001). Further univariate and multivariate analyses using the Cox regression method were conducted in these three groups, respectively. All the results indicated that risk score was an independent risk factor for BLCA. Our study screened the different methylation protein-coding genes in the BLCA tissues and constructed a robust risk model for predicting the outcome of BLCA patients. Moreover, these three genes may function in the mechanism of development and progression of BLCA, which should be fully clarified in the future.


Introduction
Bladder cancer (BLCA) is the most common urinary tract tumor and the 11th most malignant tumor worldwide. It is estimated that 80,470 new cases and 17,670 new deaths occurred in America in 2019 [1]. Muscle invasive bladder cancer (MIBC) is a more aggressive form of the disease, with a 5-year survival rate of approximately 60% in patients with local disease and less than 5% in patients with distant metastasis [2]. The classical factors that are closely associated with the prognosis of BLCA include tumor clinical stage, tumor grade, carcinoma in situ, and distant metastasis. In the past decade, the study of molecular markers to predict the prognosis of BLCA has expanded and has focused mainly on DNA alterations, such as DNA polymorphisms, mutations, and mRNA and protein expression levels [3]. In recent years, with the development of in-depth multisystem sequencing, an increasing number of prognostic molecular markers have been identified [4]. Molecular biomarkers can effectively improve the accuracy of prognosis prediction and may also highlight tumorigenic characteristics that are helpful for the development of novel therapies, as well as identifying mechanisms underpinning the development and progression of diseases.
Epigenetic alteration can affect the development and progression of tumors. The main epigenetic processes and modifications associated with cancer biology can be divided into three categories: abnormal DNA methylation, chromatin remodeling (including histone modification), and regulation by noncoding RNA [5]. Of these biological processes, DNA methylation is the most studied epigenetic alteration in BLCA. DNA hypermethylation occurs in approximately 50-90% of BLCAs. The hypermethylated genes can be further clustered into tumor suppressor genes, DNA repair genes, cell cycle control genes, and cell invasion protein-coding genes (e.g., RNA-binding fox-1 homolog 1 (RBFOX1), telomerase reverse transcriptase (TERT), neuronal pentraxin 2 (NPTX2), SRY-box transcription factor 11 (SOX11), and homeobox A9 (HOXA9)). Changes in the methylation status of these genes may promote the development or progression of BLCA. In addition, DNA hypomethylation is also seen in BLCA, but its effect is still unclear [5].
Based on the above evidence, investigations into the prognosis of BLCA have now extended into other forms of epigenetic regulation. For example, miRNA, lncRNA, other noncoding RNAs, and DNA methylation are also used to predict the prognosis of BLCA.
For instance, abnormal expression of lncRNAs such as HOX transcript antisense RNA (HOTAIR) and growth arrest specific 5 (GAS5) is associated with disease-free survival and disease-specific survival of patients with BLCA [6]. It has also been reported that abnormal DNA methylation of different genes such as aldehyde dehydrogenase 1 family member A3 (ALDH1A3), protocadherin 8 (PCDH8), Ras association domain family member 1 (RASSF1), and RUNX family transcription factor 3 (RUNX3) is independently associated with the prognosis of BLCA [7]. Indeed, prognosis-predicting models based on gene methylation have been developed in recent years. Six genes, namely, Rho GDP dissociation inhibitor beta (ARHGDIB), long intergenic non-proteincoding RNA 526 (LINC00526), isocitrate dehydrogenase (NADP(+)) 2 (IDH2), ADP ribosylation factor like GTPase 14 (ARL14), glutathione S-transferase mu 2 (GSTM2), and leucine-rich adaptor protein 1 (LURAP1), have been identified for the development of risk models to predict BLCA prognosis. However, this methylation risk model includes LINC00526, which is a noncoding RNA, and the role of lncRNA gene methylation in cancer has not been well studied [8].
Therefore, our study is focused on the role of proteincoding gene methylation in the prognosis of BLCA. We describe a new risk model including protein-coding gene methylation which may not only improve the methods of predicting BLCA prognosis but can also screen genes with clear biological effects, which will help in identifying the mechanisms involved in BLCA development and progression.

Materials and Methods
2.1. BLCA Clinical and Methylation Data Acquisition. BLCA clinical data and methylation data from the HumanMethyla-tion450 BeadChip were downloaded from The Cancer Genome Atlas (TCGA) database according to the publication guidelines (https://portal.gdc.cancer.gov/). Patients whose data fulfilled the enrollment criteria were included in our study. The enrollment criteria were as follows: survival status, age, gender, and clear information regarding tumor characteristics (tumor grade, clinical stage, and Tumor/Node/Metastasis (TNM) stage), with a survival time of longer than one month. All the datasets used throughout this study were publicly available (https://portal.gdc.cancer.gov/).

Differential Methylation Analysis and Construction of
Survival Model. The Wilcox test was used to compare the differential methylation status of genes in tumor and normal tissues, and genes with p < 0:05 and log fold change > 1 were considered to be differentially methylated. These genes were annotated using an annotation program obtained from GENCODE (https://www.gencodegenes.org/). Total tumor samples were randomly divided into training and testing sets BioMed Research International using the set seed method. Then, lasso (least absolute shrinkage and selection operator) regression was used to identify the eligible protein-coding genes for construction of a BLCA prognostic signature using the screened methylation proteincoding genes, which generated corresponding coefficients for each sample [9]. Lasso regression analysis was conducted using the "glmnet" package in the R software suite [10].
And the optimal value of the penalty parameter λ was determined by tenfold cross-validations. The best λ value (lambda.min) was used to identify the proper genes and their coefficients to construct the gene risk model. In the training set, the risk score for each patient was calculated and the data were divided into high-risk and low-risk groups according to the median risk score. In the testing set and the total set, the same procedure was performed to validate the model.

Statistical Analyses.
Overall survival was assessed using the Kaplan-Meier method and the log-rank test. The hazard ratio (HR) was calculated using a Cox regression model, and the result was provided as HR value with a 95%   Figure 1). We annotated a total of 227 differentially expressed genes, of which 165 genes were protein-coding genes.

Three Methylated Protein-Coding Genes Are Related to
Overall Survival of Patients with BLCA. Based on the training set, we used a multivariate lasso Cox regression model to establish a prognosis risk model based on the methylation status of the 165 protein-coding genes. Throughout the lasso regression, the minimal λ was 0.07802942 and three genes (zinc finger protein 382 (ZNF382), galanin receptor 1 (GALR1), and structural maintenance of chromosomes   (Figure 1). Utilizing the lasso Cox regression model, we also calculated the risk score of each patient based on the methylation values of these three genes, as follows: Risk score = ð0:2337 * ZNF382Þ + ð0:4722 * GALR1Þ + ð4:0402 * SMCHD1Þ (Figure 2). Patients with higher-risk scores presented with poorer survival than patients with lower-risk scores (HR = 2:37, 95% CI 1.43-3.94, p = 0:001; Figure 3). To validate these risk models, we conducted analyses in the testing and the total data sets. The validation results showed that patients with higher-risk scores had better survival than patients with a lower-risk score in the testing group (HR = 1:85, 95% CI 1.16-2.94, p = 0:01; Figure 3), as well as in the total cohort (HR = 2:06, 95% CI 1.46-2.90, p < 0:001; Figure 3). Further univariate and multivariate analyses using the Cox regression method were conducted in all three groups. All the results indicated that risk score was an independent risk factor for BLCA ( Table 2). The results developed using the total set were used to make the forest map and the nomogram (Figure 4).

Discussion
Our study proposes a novel survival model, based on methylation of protein-coding genes, for predicting prognosis of patients with BLCA. Three genes (ZNF382, GALR1, and SMCHD1) were selected for the risk model. Galanin receptor 1 (GALR1) is a protein-coding gene that is widely expressed in the brain, spinal cord, around the small intestine, and heart. Many studies have demonstrated that GALR1 plays a tumor suppressor role in different types of cancers. GALR1 has been intensively studied in head and neck squamous cell carcinoma (HNSCC). These investigations revealed that the GALR1 promoter was widely hypermethylated in HNSCC cell lines and primary tumor specimens, and its methylation was closely related to reduce expression of GALR1. In addition, the expression of GALR1 could be recovered by treatment with the histone deacetylase inhibitor, trichostatin A, and the methyltransferase inhibitor, 5-azacytodine [11]. The status of GALR1 methylation may be an important site-specific biomarker for predicting clinical outcomes in HNSCC patients, and assessing methylation of its promoter can play a role in risk stratification for individualized treatment [12,13]. GALR1 has also been used in the formation of a risk model to diagnose non-small-cell lung cancer (NSCLC) [14]. Furthermore, DNA methylation of GALR1 is the most frequent epigenetic change in endometrial cancer, and the detection of GALR1 methylation in vaginal swabs can accurately identify female endometriosis and malignant changes [15]. Our study also suggests that GALR1 gene methylation is involved in the prognosis of BLCA, but its role in this cancer is still unknown.    The zinc finger protein 382 (ZNF382) gene encodes the KRAB domain zinc finger transcription factor (KZNF). KZNF is widely involved in development and tumorigenesis. It plays a key role in the regulation of various cellular processes, including differentiation, proliferation, and apoptosis. It acts as a tumor suppressor and is often methylated in many cancers. ZNF382 is frequently downregulated by promoter methylation in HBV-associated hepatocellular carcinoma (HCC), and decreased expression of ZNF382 is closely linked to poor survival in patients with early HCC. ZNF382 is an effective tumor suppressor in HCC cells. Functional studies have shown that it inhibits cell proliferation, colony formation, migration, invasion, and tumorigenic potential in nude mice and induces apoptosis [16]. ZNF382 is also methylated and exhibits reduced expression in gastric cancer (GC) tissues. Pei et al. showed that ZNF382 can reverse the process of epithelial to mesenchymal transition in GC cells through NOTCH signaling, further supporting its role as a tumor suppressor [17]. In esophageal squamous cell carcinoma (ESCC), the expression of ZNF382 was inhibited due to aber-rant promoter methylation and ZNF382 methylation correlated with the level of ESCC differentiation. ZNF382 also inhibited the proliferation and metastasis of ESCC cells by inhibiting the Wnt/beta-catenin signaling pathway. These data suggest that ZNF382 plays a role in inhibiting the formation of ESCCs [18]. ZNF382 is also methylated in a range of primary tumors (nasopharyngeal, esophageal, colon, stomach, and breast). Ectopic expression of ZNF382 in silenced tumor cells significantly inhibited their cloning and proliferation and induced apoptosis. Cheng et al. showed that ZNF382 can inhibit nuclear factor kappa-B and AP-1 signaling and that it downregulates the expression of several oncogenes [19]. However, the role of ZNF382 in BLCA has not been well established and is worthy of further investigation.

BioMed Research International
Structural maintenance of chromosomes flexible hinge domain containing 1 (SMCHD1) is a protein-coding gene. Mutations in SMCHD1 have been associated with mastoid microocular syndrome and facial-shoulder-brachial dystrophy 2. The SMCHD1 protein is involved in DNA methylation, and a number of studies have focused on its contribution to X chromosome inactivation. SMCHD1 plays a role in the normal development of the nose, eyes, and other structures of the head and face and appears to be involved in repairing damaged DNA. In response to DNA damage, SMCHD1 is recruited to sites of DNA double-strand breakage, where it promotes repair of the breakage by nonhomologous end joining (NHEJ), while inhibiting repair by homologous recombination [20,21]. One study suggested that SMCHD1 may be a candidate tumor suppressor gene in prostate cancer [22]. However, there has been comparatively little research into the role of SMCHD1 in different cancer types, so whether it functions as a tumor suppressor or an oncogene is not yet clear. In this study, we found that the SMCHD1 gene was hypomethylated in BLCA tumors. DNA hypomethylation has been shown to cause the abnormal activation of a few genes. The mechanistic links between the loss of DNA methylation and cancer development, including induction of chromosomal instability, reactivation and transposition of reversed loci, loss of imprinting, and activation of normally silenced genes, are directly related to DNA methylation patterns in mammalian genomes [23]. Future studies should focus on identifying the role of SMCHD1 in BLCA.
In this study, we construct a three-gene model for predicting the prognosis of BLCA using TCGA dataset. However, one limitation of our study is the lack of an independent dataset for validating our results. In addition, our study is an observational study, and the gene model should be verified by prospective studies in the future.

Conclusion
Our study screened differentially methylated protein-coding genes in BLCA tissues and constructed a robust risk model for predicting the survival of BLCA patients. Moreover, the three genes which form the basis of our risk model may function in the development and progression of BLCA and represent worthwhile avenues for future investigations.  The composite nomogram consists of the three protein-coding gene methylation risk score, tumor stage, and age. Each variable can generate a point according to the "Points" line. Add these three points together and get the total points on the "Total Points" line. Then, draw a vertical line from the "Total Points" line to the three lines below which correspond to the predicted 1-year, 3-year, and 5-year OS rates by the nomogram.