A Potential Four-Gene Signature and Nomogram for Predicting the Overall Survival of Papillary Thyroid Cancer

Background Although the prognosis of papillary thyroid cancer (PTC) is relatively good, some patients experience recurrence or distant metastasis after thyroidectomy and progress to radioactive iodine refractory stage. Therefore, accurate prediction of clinical outlook can aid to screen out the minority of patients with poorer prognosis and avoid excessive treatment in low-risk patients. Methods The RNA-seq and clinical data of PTC patients was downloaded from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) databases. Multivariate and Lasso Cox regression analyses were used to construct a prognostic nomogram to predict overall survival (OS). Thereafter, quantitative RT-PCR and Human Protein Atlas (HPA) database were employed to verify the expression of key genes. Results A four-gene risk score comprising ABI3BP, DPT, MRO, and TENM1 was exhibited strong prognostic value. Moreover, an integrated nomogram was established based on the risk score, age, AJCC (American Joint Commission on Cancer) stage, tumor size, extrathyroidal extension, and history of neoadjuvant treatment, which exhibited significantly better predictive performance than TNM stage system (P < 0.05). GSEA (Gene Set Enrichment Analysis) and GSVA (Gene Set Variation Analysis) revealed that the different tumor-associated hallmarks, biological processes, and pathways were substantially enriched in the poor-prognosis group. In addition, a ceRNA network was constructed by including the four genes (ABI3BP, DPT, MRO, and TENM1), 54 lncRNAs, and 10 miRNAs. Finally, both the relative mRNA and protein expression of ABI3BP, DPT, MRO, and TENM1 were validated. Conclusion The present study identified a four-gene risk signature and developed a novel nomogram, which could be regarded as a reliable prognostic model for PTC patients. The findings also revealed preliminary potential mechanisms that may influence the prognosis outcome. These results can be conducive to design personalized treatment and prognosis management in affected patients.


Introduction
In the past few decades, the incidence of thyroid cancer has increased rapidly worldwide [1]. Papillary thyroid carcinoma (PTC) is the most common pathological subtype, which has accounted for more than 85% of cases [2]. PTCs usually presents an excellent prognosis, and 10-year disease-specific survival rates have been reported to be over 90% via management through the common therapeutic approaches such as thyroidectomy, RAI therapy, and thyroid-stimulating hormone (TSH) suppressive therapy [3]. However, local recurrence and distant metastasis inevitably occur in up to 20% and 10% in PTC patients [4]. Moreover, two-thirds of these patients exhibit loss of iodine-131 (131I) uptake initially or gradually, thereby indicating dedifferentiation of the PTC termed as RAI-refractory PTC [5]. Thus, accurate assessment of the prognosis of PTC is critical to ensure that the high-risk patients receive appropriate treatment and to prevent excessive treatment of the lowrisk patients.
The 8 th edition of the AJCC/TNM (tumor node metastasis) manual was released in 2017 [6,7]. TNM staging has   Disease Markers been identified to be useful in predicting the disease mortality, and it is recommended for all PTC patients [7,8]. However, it has been difficult to accurately distinguish the difference between the survival outcomes in PTCs with similar clinicopathological features [9,10]. A number of prior studies have proposed that combining BRAF, TERT, and RAS mutations with TNM staging can lead to better prediction of the prognosis of PTC patients [11][12][13]. Our team has previously constructed an integrated nomogram based on clinicopathological factors and the related risk scores, which showed significantly better predictive performance than AJCC stage [14][15][16].
In this study, we aimed to develop a prognosispredicting model based on nomogram as a prognostic evaluation method. We have obtained data of PTC patients from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Univariate and multivariate Cox regression analysis and Lasso regression analysis was performed to establish a four-gene signature in the training cohort. Time-dependent receiver-operating characteristic (ROC) and Kaplan-Meier (KM) curve was used to assess the validity of the four-gene signature and the nomogram in the entire patient cohort.

Materials and Methods
2.1. Acquisition and Processing of Data. ALL RNA-seq data and the clinical characteristics were extracted from three GEO datasets (GSE33630, GSE3678, and GSE60542) and TCGA-THCA dataset (Table S1). A total of 494 PTC samples were selected from the TCGA-THCA dataset and were then enrolled for the subsequent analysis. The flowchart of the designed study has been illustrated in Figure 1.

2.2.
Screening of the Differentially Expressed Genes and Gene Enrichment Analysis. Differentially expressed genes (DEGs) between the normal and PTC samples with absolute log2 fold change ðFCÞ > 1:5 and P < 0:05 were screened from three GEO datasets and the TCGA-THCA dataset by using the "limma" R package. Enrichment analyses of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway for DEGs were performed by using the "clusterProfiler" R package and Webgestalt website (http:// www.webgestalt.org/).

Establishment and Validation of the Four-Gene Risk
Signature. In total, 344 samples were randomly selected from the 494 PTC samples in the TCGA-THCA dataset as the training cohort. The remaining 150 samples were used as the testing cohort. In the training cohort, 35 OS-related genes were selected based on univariate cox analysis. Thereafter, least absolute shrinkage and selection operator (LASSO) regression was used to screen 12 potential prognostic genes. Finally, only 4 genes (P < 0:05) were included in the risk signature based on the results of multivariate Cox regression. Finally, the risk score for each patient was calculated as follows: ∑ n i=1 coef ðiÞ * exp ðiÞ, where exp ðiÞ represents the expression of genes, and coef ðiÞ is the coefficient of multivariate Cox regression. ROC and KM curves for risk score were drawn with the "survival" and "time-ROC" R packages to assess the potential predictive capacity of the risk signature in the training cohort, testing cohort, and the entire cohort.

Construction and Evaluation of the Nomogram.
Univariate and multivariate Cox regression analyses were utilized to screen the various essential clinical characteristics related to OS. A nomogram was established based on the risk score, age, AJCC stage, tumor size, extrathyroidal extension, and history of neoadjuvant treatment with the "rms" R package. The C-index, Akaike information criterion (AIC), and Bayesian Information Criterion (BIC) of the nomogram were calculated, and the ROC, KM, calibration, and decision curves were drawn.
2.5. Immune Analysis. The median nomogram point was used to divide the entire cohort into two distinct groups of   was then employed to analyze the immune infiltration of 22 types of immune cells in good-prognosis and poor-prognosis groups. The stroma and immune scores were measured by Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) analysis using "estimate" R package.
2.6. Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA). GSEA and GSVA were performed with the GSEA software and the "GSVA" R package. The reference gene sets in GSEA were the c2.cp.kegg.v7.5.1.symbols.gmt, and if the normal P value < 0.05 and FDR (false discovery rate) q value < 0.25, the gene set was considered as significantly enriched.

Construction of a Competing
Endogenous RNA Regulatory Network. The (differentially expressed miRNAs) DEmiRNAs and (differentially expressed lncRNAs) DElncR-NAs between the normal and PTC samples in the TCGA-THCA dataset with absolute log2 fold change ðFCÞ > 1:5 and P < 0:05 were screened by using the "limma" R package. The miRcode database was used to match DElncRNAs and DEmiRNAs. ABI3BP, DPT, MRO, and TENM1 target miR-NAs were predicted based on the three distinct databases: miRWalk, TargetScan, and miRmap. Subsequently, a ceRNA regulatory network was constructed according to the results obtained above. Cytoscape (version 3.8.2) was used to visualize the competing endogenous RNA (ceRNA) network. OS-related DEmiRNAs and DElncRNAs were screened by the KM curves.  Table S2.
2.9. Patients. Ten paired PTC tumors and adjacent normal thyroid tissues were obtained from patients in the Thyroid Surgery Department of Xiangya Hospital from March 2020 to June 2020. An informed consent was obtained from all the participants, and the study was approved by the Ethics Committee of Xiangya Hospital of Central South University (No. 202004192).
2.10. Quantitative RT-PCR. Total RNA was extracted from the normal and PTC tissues by using Trizol Reagent (Accurate Biology, China). cDNA synthesis was performed using Reverse Transcription Kit (Accurate Biology, China). Quantitative RT-PCR was then carried out with the RT-PCR Kit (Accurate Biology, China). The sequence of all the primers used have been described in Table S3. 2.11. Statistical Analysis. Statistical analysis was performed based on R software (version 4.1.0). Chi-square test or Fisher's exact test were used to analyze the categorical variables. T-test and one-way ANOVA were used to analyze the continuous variables. Univariate and multivariate Cox regression and log-rank test were performed to evaluate OS. Unless otherwise stated, P < 0:05 indicated that the difference was statistically significant.  genes. GO analysis demonstrated that these 176 DEGs were primarily enriched in cell junction assembly, synapse organization, and positive regulation of protein serine in the biological process, collagen-containing extracellular matrix, secretory granule lumen, and cytoplasmic vesicle lumen in the cellular components, serine-type peptidase activity, serine hydrolase activity, and serine-type endopeptidase activity in the molecular functions (Figures 2(a)-2(c)). Additionally, KEGG pathway enrichment analysis indicated that DEGs were significantly enriched in tyrosine metabolism ( Figure 2(d)). These results suggested that the genesis and development of PTC might be closely related to these genes or pathways, including pathways that play an important role in development of cancers, such as serine activity, tyrosine metabolism, extracellular matrix, and cellular junction. Therefore, these results can guide our subsequent study related to understanding the molecular mechanisms of PTC.     (a-f) Risk score distribution and survival overview in the training cohort, testing cohort, and entire cohort. The horizontal axis depicts the ranking of patients in the cohort in terms of the risk score from lowest to highest. (g-i) The KM survival curves of risk score in the training cohort, testing cohort, and entire cohort. Patients were divided into high-risk and low-risk groups based on the median risk score. (j-l) The ROC curve of the risk score in the training cohort, testing cohort, and entire cohort.   13 Disease Markers testing cohort, and the entire cohort has been presented in Table 1. 35 genes (P < 0:05) related to OS were screened by univariate Cox regression in the training cohort (Table S4). After performing Lasso regression analysis on these 35 genes, 12 genes were obtained (Figures 3(a) and 3(b)). Finally, 12 genes were analyzed by using multivariate Cox regression, and ABI3BP, DPT, MRO, and TENM1 were selected to construct the four-gene risk signature (Figure 3(c)). The risk score = −0:56124 × ABI3BP + ð− 0:25805 × DPTÞ + ð−0:38946 × MROÞ + 0:55421 × TENM1.

Building and Validating a Predictive
Nomogram. Thereafter, in the entire cohort, univariate and multivariate Cox regression analyses were performed on the risk score and the clinical characteristics ( Table 2). The risk score, age, AJCC stage, tumor size, extrathyroidal extension, and history of neoadjuvant treatment were selected to establish a nomogram ( Figure 5(a)). The C-index, AIC, and BIC of this nomogram were 0.970, 85.743, and 86.451, respectively (Table 3). Moreover, a calibration curve revealed that the nomogram was excellent at predicting OS ( Figure 5(b)). Moreover, the AUCs of the nomogram corresponding to 1-year, 3-year, and 5-year OS were 0.985, 0.962, and 0.973, respectively ( Figure 5(c)). They were found to be significantly better than the traditional (Age, Grade, Extrathyroidal extension, Size) AGES score and (Metastases, Age, Completeness of resection, Invasion, Size) MACIS score ( Figures 5(d) and 5(e)). Moreover, the decision curves of 3-year OS and 5-year OS of these three models indicated that in terms of predicting OS of PTC patients, the net benefit [17] of the nomogram model was substantially higher than that of the traditional AGES and MACIS score (Figures 5(f) and 5(g)). Taken together, these results revealed that this nomogram performed well in predicting the prognosis of PTC patients.
3.5. Immune Analysis. The nomogram could effectively stratify patients into a good-prognosis group and a poorprognosis group based on the median total point. Surpris-ingly, immune analysis demonstrated that there was no significant difference in the abundance of 22 types of immune cells between the good-prognosis and poor-prognosis groups (Figures 6(a) and 6(b)). Consistent with this, the stroma scores, immune scores, and ESTIMATE scores were observed to be not significantly higher in the poorprognosis group compared with the good-prognosis group (Figures 6(c)-6(e)), thus suggesting that the proportion of the stromal cells to immune cells in these two groups may be similar, but there was no difference in the tumor purity. These results implied that immune infiltration and immune microenvironment had minimal effect on the OS of PTC patients.

Gene Set Enrichment Analysis (GSEA) and Gene Set
Variation Analysis (GSVA). Since the immune infiltration analysis did not yield significant results, GSEA and GSVA were performed to further explore the potential differences in the molecular mechanisms between the good-prognosis and poor-prognosis groups. As shown in Figures 7(a)-7(h), the "ribosome," "intestinal immune network for IgA production," "systemic lupus erythematosus," "asthma," "nod like receptor signaling," "glycosaminoglycan degradation," "viral myocarditis," and "cell adhesion molecules cams" pathways were found to be significantly enriched in the poor-prognosis group. Moreover, GSVA demonstrated that "peroxisome," "myc targets V2," "uv response up," and "cholesterol homeostasis" were the hallmark pathways modulated in the poor-prognosis group (t value > 2) (Figure 7(i)). This implied that these pathways or the genes regulating them could play an important role in promoting the progression of PTC.

Validation of the Expression of the Four Genes in PTC
Tissues. Among the four genes in four-gene risk signature, the levels of ABI3BP, DPT, and MRO were downregulated, whereas that of TENM1 was upregulated in PTC samples (Figures 9(a)-9(d)). Thereafter, we used quantitative RT-PCR to verify the expression of these four genes in 10 pairs of PTC tissues (Figures 9(e)-9(h)). Furthermore, immunohistochemical staining data of TENM1 in the normal and tumor tissues were obtained by searching the HPA database,

Discussion
It has been established that even when the patients receive the standardized treatment, about 5%-23% of PTC patients display a poor prognosis [18]. Therefore, prediction of the prognosis of PTC patients can not only promote the active implementation of individualized treatment but also aid to avoid the various negative effects associated with excessive medical treatment. A number of prognostic markers obtained from gene expression profiles can accurately predict the prognosis of a single patient at the molecular level and can be complementary to the traditional clinical staging prediction system such as TNM staging [19].
In the present study, a novel four-gene risk signature comprising ABI3BP, DPT, MRO, and TENM1 to predict the OS of PTC was identified. The efficacy of this signature was validated in the study cohort. Among these four genes, the levels of ABI3BP, DPT, and MRO were downregulated whereas that of TENM1 was upregulated in PTC. ABI3BP is an ArgBP/E3B1/Avi2/NESH family protein, which can participate in the negative regulation of the cell movement and metastasis through its influence on membrane folding and layer formation [20,21]. It has been proven to be an src-homologous 3(SH3) adapter molecule and can exhibit a tumor-suppressive effect in thyroid cancer [22,23]. ABI3BP, which is reexpressed in the thyroid cells, has been reported to trigger cellular senescence through affecting the p21 pathway, resulting in a reduction in transformation activity, cell growth, viability, migration, invasion, and tumor growth in nude mice [22]. Moreover, the loss of ABI3BP expression may be functionally involved in the pathogenesis of several types of cancer such as gallbladder cancer [24] and esophageal cancer [25]. DPT is a tyrosine-rich noncollagenous extracellular matrix component, and the depletion of DPT has been associated with hyperproliferation of scars, skin fibrosis, systemic sclerosis as well as some cancers [26]. DPT has been reported to regulate cell proliferation and invasiveness of a variety of tumors like endometrial cancer [27], prostate cancer [28], hepatocellular carcinoma [29], and oral cancer [30]. Moreover, low DPT expression in PTC has been related to higher T classification. DPT can regulate CDK4, CDK6, and p21 through modulating MEK-ERK-MYC signaling to inhibit PTC proliferation [31]. The expression of TENM1 as a cell signal sensor in neurons has been positively correlated with the growth and invasion of PTC. TENM1 has been identified as the direct functional target of miR-486 in PTC cells. The restoration of miR-486 can significantly inhibit the growth of PTC in vivo [32,33]. Similar to the previous two genes, TENM1 can also play an important role in different types of cancer especially in the brain tumors such as prolactin pituitary tumor [34] and glioblastoma [35]. MRO belongs to a novel gene family, named "maestro heat-like repeat family (MROH)" [36]. Expression of MRO in lean-type polycystic ovarian syndrome has been found to be increased [37]. Nevertheless, MRO has been poorly studied in the tumors, and the role of MRO in thyroid cancer has not been previously reported. We speculate that MRO, as a sex-determining gene affecting the prognosis of PTC, might be related to the sex differences between PTC patients, but this observation needs further analysis.

22
Disease Markers The 8th edition of the AJCC/TNM staging system (TNM-8th) was released in 2017. A number of studies have proved that it was more appropriate for the prediction of the survival and recurrence than TNM-7th [7,38]. However, studies also have shown that approximately 30%-40% [39,40] of patients were downstaged upon reclassification. In addition, TNM-8th oversimplifies the survival deterioration that primarily occurs with increasing age at the time of diagnosis and underestimates the prognosis for the younger patients, particularly those aged 45-55 years [9]. What is more, the molecular profile, which is important for the precision medicine, has not been contained in TNM-8th. Consequently, as knowledge of cancer biology evolves, innovative diagnostic tools and treatment modalities need to be developed and improved. Novel nomograms for predicting the survival of thyroid cancer patients have been previously formulated in several studies. For instance, Pathak and colleagues developed nomograms to predict the likelihood of the relapse and death from thyroid cancer in an individual patient (C-indices were 0.92 and 0.76, respec-tively) based on patients' characteristics but without identification of any specific gene signature [41]. A nomogram that first included a gene signature to predict 1-year, 3-year, and 5year DFS of DTC patients was established (C-index = 0:801) by Pan Ruchong [42]. However, five-gene signature appears to cause significant expenditure in the healthcare compared to the four-gene signature. In addition, to the best of our knowledge, our study is the first study to confirm that the developed nomogram performed better than the traditional AGES score and MACIS score for prediction of OS. Besides, our study demonstrated that BRAF mutations had minimal effect on OS of PTC. Consistent with this, unlike previous studies suggesting that BRAF mutations can imply poor prognosis [43], some recent studies have also indicated that BRAF mutations cannot be used as an independent prognostic and predictive factor in PTC [44]. Moreover, Wang et al. found that compared to male PTC patients, BRAF mutations cannot be considered as robust independent risk factor for female patients [45]. In general, the nomogram constructed in this study exhibited better predictive efficacy for the OS of PTC patients and clinical applicability.

Disease Markers
To understand the potential mechanisms affecting the prognosis of patients with PTC, immune analysis, GSEA, and GSVA were performed between good-prognosis and poor-prognosis groups. Surprisingly, in contrast to some other prior studies [46], immune infiltration and immune microenvironment displayed little effect on the OS of PTC patients. These results can be explained, in part, by the fact that the four key genes we identified in this study are not directly involved in the process of tumor immunity. Therefore, no significant biological difference in immune infiltra-tion was observed when PTC samples were divided into good and poor prognosis groups. The "cell adhesion molecules cams" [47], "Myc" pathways [48], and glycosaminoglycans (GAGs) [49], which can play an essential role in the tumor pathogenesis and distant metastasis were found to be enriched in poor-prognosis groups. The effects of intestinal immunity caused by intestinal flora on the various cancers are currently being elucidated, including gastrointestinal tumors, liver cancer [50], and breast cancer [51]. In addition, intestinal immunity can also affect the thyroid 27 Disease Markers function [52]. However, the effect of intestinal immunity on PTC remains to be further studied. Moreover, a number of studies have reported that systemic lupus erythematosus (SLE) and asthma were associated with an increased risk of overall cancers including non-Hodgkin's lymphoma, Hodgkin's lymphoma, leukemia, multiple myeloma, and thyroid cancer. Therefore, the increased expression of genes associated with lupus and asthma in the poor-prognosis group

28
Disease Markers is understandable. What is more, the key molecules were presented in the ceRNA network including MIR181A2HG and hsa-mir-375, which need to be studied in detail in PTC. Although this study revealed numerous important findings, but there are several limitations associated with it. First, selection bias and confounding bias were inevitable due to the retrospective design of this study. Second, the clinical characteristics were mainly derived from TCGA database, and thus, caution should be exercised when expanding our results to patients of other ethnicities. Besides, in future studies, the nomogram should be validated in different external datasets. Finally, additional in vitro and in vivo functional experiments need to be performed to further elucidate the detailed molecular mechanisms affecting the prognosis of PTC.

Conclusion
To conclude, our study established a four-gene risk signature and developed a novel prognostic nomogram in combination with prognosis-related clinical characteristics to predict the OS of PTC. The four DEGs were found closely related to the prognosis of PTC and thus can act as potential therapeutic targets. These results might be beneficial for individualized treatment and medical decision-making during the management of PTC patients.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.