Epithelial to Mesenchymal Transition Relevant Subtypes with Distinct Prognosis and Responses to Chemo- or Immunotherapies in Osteosarcoma

Objective Currently, clinical classification of osteosarcoma cannot accurately predict the survival outcomes and responses to chemo- or immunotherapies. Our goal was to classify osteosarcoma patients into clinical/biological subtypes based on EMT molecules. Methods This study retrospectively curated the RNA expression profiling of osteosarcoma patients from the TARGET and GSE21257 cohorts. Consensus clustering analyses were conducted in accordance with the expression profiling of prognostic EMT genes derived from univariate analyses. Immunological features were evaluated through immune cell infiltration, immune checkpoint expression, and activity of cancer immunity cycle. Drug sensitivity was estimated with the GDSC database. WGCNA approach was adopted to determine the EMT-derived genes. Following univariate analyses, a multivariate cox regression model was developed and externally verified. Predictive independency was evaluated with uni- and multivariate analyses. GSEA was presented to uncover relevant molecular mechanisms. Results Prognostic EMT genes across osteosarcoma patients were stratified into distinct subtypes, namely, subtypes A and B. Patients in subtype B presented desirable prognosis, high immune activation, and enhanced sensitivity to cisplatin. Meanwhile, patients in subtype A were more sensitive to gemcitabine. In total, 86 EMT-derived hub genes were determined, and an EMT score was conducted for osteosarcoma prognosis. Following external verification, this EMT score was reliably and independently predictive of patients' survival outcomes. Additionally, it was positively linked to steroid biosynthesis. Conclusion Overall, our findings proposed EMT-relevant molecular subtypes and signatures for predicting prognosis and therapeutic responses, contributing to personalized treatment and clinical implication for osteosarcoma.


Introduction
Osteosarcoma represents the most prevalent primary bone sarcomas, which originates from mesenchymal stem cell populations, with an annual incidence of 3-5 cases per million individuals [1][2][3]. It most often affects children, adolescents, and young adults, with aberrant bone growth areas [4]. The difficulty in biology research is in relation to the complexity of the osteosarcoma genome as well as remarkable biological discrepancy among osteosarcoma subtypes [5]. Hence, an in-depth understanding of osteosarcoma biology highlights the heterogeneity as well as uncovers the molecular abnormality that defines patient subpopulations. The standard treatment of osteosarcoma comprises surgical resection and chemotherapy [6]. Patients are usually resistant to conventional chemotherapy; meanwhile, high-dose chemotherapeutic agents cause severe side effects [7]. Hence, to improve the survival duration of osteosarcoma patients has been proven to be challenging.
Epithelial-mesenchymal transition (EMT) is a key process by which epithelial cells acquire mesenchymal characteristics [8]. Growing evidences demonstrate that EMT exerts a crucial role in stemness [9], metabolic reprogramming [10], immune evasion [11], and therapeutic resistance [12] for cancer cells. Osteosarcoma cells that have experienced EMT process will lose the cellular polarity and acquire aggressive and metastatic features. This process has been regarded as a crucial event for osteosarcoma metastases [13]. Previously, Yiqi et al. proposed an EMT-relevant model of osteosarcoma as a prognostic indicator through integrated multicohorts [14], indicative of the crucial prognostic implication of EMT during osteosarcoma progression. Limited evidences demonstrate that chemotherapy (cisplatin, etc.) resistance contributes to EMT activation in osteosarcoma cells [15,16]. Recent research has presented that EMT genes are remarkably linked to immunity in osteosarcoma [17]. Tumor-associated macrophages trigger EMT in osteosarcoma through activation of COX-2/STAT3 signaling [18]. Thus, extensive research of EMT on clinical outcomes and therapeutic responses of osteosarcoma is urgently required. In our research, integrated genomic analyses were conducted for characterizing distinct EMT-relevant subtypes with prognostic and prediction implications in osteosarcoma.

Acquisition of Gene Expression
Profiling. This study acquired the TARGET data matrix (https://ocg.cancer.gov/ programs/target/data-matrix) comprising gene expression data and clinical data of 84 osteosarcoma patients, as the discovery cohort. Genome-wide gene expression profiling and prognostic information of 53 osteosarcoma patients were curated from the GSE21257 cohort in the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/ gds/) [19], as the testing cohort. This cohort was on the basis of the platform of GPL10295 Illumina human-6 v2.0 expression beadchip. The gene expression series matrix files were directly retrieved, and the probe IDs were mapped to the gene symbols in accordance with the matched annotation files. Thereafter, expression values of multiple probes mapping to the same gene were averaged while probes mapping to multiple genes were eliminated. The gene set of EMT was curated from the Molecular Signatures Database (MSigDB), which was listed in Supplementary Table 1 [20].

Identification of Molecular Subtypes.
Univariate analyses were presented for assessment of the association of EMT genes with osteosarcoma prognosis. Genes with P < 0:05 were determined for the subsequent analyses. K-meansbased consensus clustering was carried out utilizing the ConsensusClusterPlus package on the basis of the transcriptome files of prognostic EMT genes [21]. Cumulative distribution function (CDF), delta area, and consensus matrix diagrams were conducted in accordance with default parameters. The number of clusters was determined in accordance with the following criteria: (i) the consistency within the clusters was relatively high; (ii) the coefficients of variation were relatively low; (iii) the area under the CDF curve was not remarkably elevated. The area under the CDF curves was utilized for defining the clustering number. Principal component analyses (PCA) were conducted for verifying the accuracy of this classification.
2.3. Gene Set Variation Analyses (GSVA). Fifty hallmark pathways were acquired from the MSigDB, and the activity of above pathways was estimated with GSVA package in osteosarcoma specimens [22]. Limma package was adopted to compare the discrepancy in activity of hallmark pathways between subtypes [23].

Analyses of Immunological
Features. The infiltration of tumor-infiltrating immune cells was estimated across osteosarcoma tissues utilizing the single-sample gene set enrichment analyses (ssGSEA) [24] and TIMER2 database [25]. The RNA expression of immune checkpoint molecules and HLA molecules was determined across osteosarcoma specimens. Cancer immunity cycle comprises release of cancer cell antigens (step 1), cancer antigen presentation (step 2), priming and activation (step 3), trafficking of immune cells to tumors (step 4), infiltration of immune cells into tumors (step 5), recognition of cancer cells by T cells (step 6), and killing of cancer cells (step 7) [26]. The activity of all steps was determined with ssGSEA approach in accordance with the transcriptome profiling [27]. The gene set of each step within the cancer immunity cycle was listed in Supplementary Table 2. 2.5. Estimation of Drug Sensitivity. Through adopting the pRRophetic algorithm [28], the half-maximal inhibitory concentration (IC50) values of cisplatin and gemcitabine were determined in osteosarcoma in accordance with the Genomics of Drug Sensitivity in Cancer (GDSC; http:// www.cancerrxgene.org/) [29].
2.6. Weighted Gene Coexpression Network Analyses (WGNCA). The data matrix of mRNA expression profiling in the TARGET cohort was analyzed utilizing the WGCNA package [30]. The first 25% genes with the largest variance were determined. For selecting a standard scale-free network, the sample hierarchical clustering approach was utilized for detecting and removing outlier specimens, followed by selection of the appropriate soft thresholding analyses. Thereafter, the adjacency matrix and topological overlap matrix (TOM) were established as well as the matched dissimilarity (1-TOM) was determined. Dynamic tree cutting methods were utilized for completing the gene tree and module clustering. Thereafter, the module characteristic genes were clustered as well as the highly similar modules were merged. The dissimilarity of the module eigengenes (ME) was determined with the moduleEigengenes function. The interactions of ME with EMT subtypes were analyzed with Pearson's correlation. Module membership (MM) represents the relevance of the expression profile to ME, while gene significance (GS) represents the correlation between the expression profiles and clinical traits. In this study, MM was calculated with Pearson's correlation of the expression profiling and ME. Meanwhile, GS was measured for evaluating the genes with EMT subtypes. Thereafter, the correlation between MM and gene signature was evaluated, and EMT-derived genes were determined.

Analysis of Hub
Genes. Protein-protein interactions (PPIs) of genes in the lightcyan module were assessed through the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) online tool (http://string-db.org/ ) [31]. Hub genes in this PPI network were determined utilizing the Molecular Complex Detection (MCODE), a plugin of Cytoscape software [32]. 2 Journal of Immunology Research 2.8. Functional Enrichment Analyses. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were presented utilizing the cluster-Profiler package [33]. The parameter utilized in this package was default as well as the threshold for identifying the GO functions and KEGG pathways was set as P value < 0.05.
2.9. Gene Set Enrichment Analyses (GSEA). For uncovering the difference in functional phenotypes between high-and low-risk subpopulations, GSEA was carried out. The "c5.all.v7.0.symbols.gmt" was utilized as a reference gene set. GSEA was implemented utilizing GSEA software. A nominal P < 0:05 was regarded as a significant enrichment.

Characterization of Two EMT Molecular Subtypes in
Osteosarcoma. Figure 1 illustrates the flowchart of this study.
We curated 200 EMT genes from the MSigDB and analyzed their prognostic implication in osteosarcoma through univariate cox regression analyses. As a result, 40 EMT genes were remarkably linked to osteosarcoma prognosis ( Figure 2(a); Table 1), in which 28 served as protective factors of osteosarcoma prognosis while the others served as risk factors. The consensus clustering of prognostic EMT genes was conducted to determine distinct EMT subtypes for osteosarcoma. For two categories, the area under the CDF curves began to stabilize (Figures 2(b) and 2(c)). Meanwhile, the consensus matrix was conducted for identifying the optimal number of subtypes. In Figure 2(d), the consensus matrix displayed a well-defined block structure when k = 2. Overall, two EMT molecular subtypes were conducted, namely, subtypes A and B. PCA results also fit into two clusters ( Figure 2(e)). Survival analyses showed that subtype A presented poorer prognosis in comparison to subtype B (Figure 2(f)), indicative of the discrepancy in survival outcomes between subtypes.     Journal of Immunology Research metabolism, xenobiotic metabolism, etc.) in EMT subtype B than subtype A (Figure 3(a) and Supplementary Table 3). In Figure 3(b), ssGSEA approach showed that EMT subtype B presented remarkably enhanced immune cell infiltrations containing central memory CD4 and CD8 T cells, effector memory CD4 T cells, memory B cells, regulatory T cells, type 1 and 2 helper cells, CD55bright natural killer cells, macrophages, MDSCs, natural killer cells, and natural killer T cells than subtype A. Meanwhile, TIMER2 approach showed the lower infiltrations of B cells and the higher infiltration of CD4 T cells compared with subtype A (Supplementary Figure 1). Additionally, we noted that immune checkpoint molecules ADORA2A, CD44, PDCD1LG2, TNFRSF14, and TNFRSF8 were prominently activated in subtype B than A (Figure 3(c)). Nevertheless, we did not investigate the significant discrepancy in HLA molecule expression between subtypes ( Figure 3(d)). The activity of cancer immunity cycle was also estimated in each osteosarcoma specimen. Compared with subtype A, release of cancer cell antigen, B cell/monocyte/Th17 cell recruiting, recognition of tumor cells through T cells, and killer of tumor cells presented remarkably enhanced activity in subtype B (Figure 3(e)). Overall, EMT subtype B possessed the features of immune activation. Also, subtype B displayed increased sensitivity to cisplatin while subtype A was more sensitive to gemcitabine (Figure 3(f)).

Establishment of EMT Subtype-Relevant Coexpression
Modules. Considering the sensitivity of WGCNA to batch effects, this study firstly preprocessed the data of osteosarcoma individuals in EMT molecular subtypes A and B from the TARGET cohort. The first 25% genes with the largest variance were determined for subsequent analyses. Thereafter, the hclust function was adopted for confirming the batch effect removal as well as investigating whether there was any outlier. In Figure 4(a), no outlier sample was found. Because of the premise of WGCNA approach required to hypothesize that the network met the scale-free criteria, this study further screened the optimal soft thresholding value for making a coexpression network was more subject to a scale-free network. Through calculation of the scale-free topology fitting index, R 2 value was up to 0.85 ( Figure 4(b)), which validated the feasibility of WGCNA. A hierarchical clustering tree was retrieved through adopting hierarchical clustering analyses. Thereafter, in accordance with the dynamic tree cut approach, 29 distinct coexpression modules were assigned (Figure 4(c)). Figure 4(d) demonstrated that the lightcyan module presented the strongest correlation to EMT molecular subtypes. The genes in this module deserved more exploration.

Development of an EMT-Derived Prognostic Model for
Osteosarcoma. Our univariate cox regression analyses were indicative that 33 EMT-derived genes displayed significant interactions with osteosarcoma prognosis (Table 2). Above genes were input the multivariate cox regression model. In accordance with the expression and coefficient of candidate genes (Table 3), we developed an EMT-derived prognostic model for osteosarcoma, comprising RPS9, RPS23, EIF4A1, RPL12, RPL36, RPL37A, RPL34, EEF1B2, RPS8, RPS28, RPL10, RPS24, RPL35A, RPL11, RPL21, RPS27A, RPS12, and RPL13A. We noted the remarkable discrepancy in their expressions in high-than low-risk subpopulations (Figure 6(a)). Figure 6(b) displayed the EMT score distribution across osteosarcoma patients. As EMT score elevated,  (Figure 6(c)). Survival analyses demonstrated that high-risk cases were indicative of more unfavorable clinical outcomes in comparison to low-risk cases (Figure 6(d)). ROC curves were presented to assess the predictive performance of EMT score in osteosarcoma prognosis. In Figure 6(e), area under the curves (AUCs) at one-, three-, and five-year survival were separately 0.723, 0.858, and 0.860, proving that EMT score was        Journal of Immunology Research capable of prediction of osteosarcoma prognosis. Compared with previously published two EMT models from Yiqi et al. [14] and Peng et al. [17], our EMT-derived prognostic model had higher C-index (0.828), indicating the prediction superiority of this model (Figure 6(f)).

External Verification of the Prognostic Value of EMT
Score. The prognostic value of EMT score was further verified in the GSE21257 cohort. Figure 7(a) depicted the expression patterns of each gene from EMT score across osteosarcoma specimens. We also visualized the distribution of EMT score in the GSE21257 cohort (Figure 7(b)). In accordance with the median value of EMT score, we stratified osteosarcoma individuals into high-as well as low-risk subpopulations. Additionally, we noted that there were relatively increased dead cases in high-risk subgroup (Figure 7(c)). As expected, high EMT score was remarkably linked to more unfavorable survival outcomes (Figure 7(d)). The AUC at five-year survival was 0.818, proving the excellent predictive performance of EMT score (Figure 7(e)).

Genes in EMT Score as Prognostic Indicators of
Osteosarcoma. Further analyses were conducted for assessing the survival implication of each gene in EMT score in the TARGET cohort. As a result, high expression of EEF1B2, EIF4A1, RPL10, RPL11, RPL12, RPL13A, RPL21, RPL34, RPL35A, RPL36, RPL37A, RPS8, RPS9, RPS12, RPS23, RPS24, RPS27A, and RPS28 was prominently linked to more undesirable survival outcomes in comparison to their low expression (Figures 8(a)-8(r)).  9 Journal of Immunology Research this EMT score was prominently linked to osteosarcoma survival outcomes (Figure 9(a)). Further multivariate analyses demonstrated that the EMT score was independently predictive of patients' prognosis (Figure 9(b)). For uncovering the underlying biological phenotypes between subpopulations, this research carried out GSEA. As a result, steroid biosynthesis was remarkably activated in high-risk subpopulation (Figure 9(c)), and hypertrophic cardiomyopathy displayed prominent activation in low-risk subpopulation (Figure 9(d)).

Discussion
Osteosarcoma presents diverse clinical courses and biological heterogeneity [34]. Hence, to reliably predict prognosis and therapeutic responses, it is critical for comprehensively investigating the molecular mechanisms. Our study determined forty prognostic EMT genes in osteosarcoma. Through consensus clustering approach, two EMT molecular subtypes were conducted. Especially, EMT subtype A presented poorer prognosis in comparison to subtype B, indicating that there was a remarkable discrepancy in survival outcomes between subtypes. Recently, immunotherapy is a promising treatment option against osteosarcoma, and it is crucial to improve the comprehending about the immune responses [3]. We evaluated the immunological features from multiple perspectives. GSVA demonstrated the activation of immune pathways (like IL6 JAK STAT3 signaling) in subtype B. Our ssGSEA revealed that subtype B displayed remarkably enhanced immune cell infiltrations containing central memory CD4 and CD8 T cells, effector memory CD4 T cells, memory B cells, regulatory T cells, type 1 and  [35]. Cisplatin, an alkylating drug, can form irreversible covalent bonds with DNA, which causes DNA strands to cross-link and break and missense mutation [36]. It is widely applied for osteosarcoma chemotherapy as well as cisplatin resistance is frequent across osteosarcoma individuals [37]. Our data demonstrated that EMT subtype B presented enhanced sensitivity to cisplatin. Gemcitabine represents the second cytidine analog developed following cytosine arabinoside [38]. Intriguingly, subtype A was more sensitive to gemcitabine. Through WGCNA approach, we constructed 29 EMT subtype-relevant coexpression modules in the TARGET cohort. Especially, lightcyan module presented the strongest interactions with EMT molecular subtypes. Further, MCODE analyses determined 86 EMT-derived hub genes. Functional enrichment analyses unraveled the remarkable interactions of these EMT-derived hub genes with endoplasmic reticulum (ER), ribosome, oxidative phosphorylation, chemical carcinogenesis-reactive oxygen species, thermogenesis, and HIF-1 signaling pathway. For instance, the ER represents the central intracellular organelle of diverse cellular functions and endoplasmic reticulum stress response participates in osteosarcoma pathogenesis [39]. SENP1/ HIF-1alpha regulates hypoxia-mediated EMT in osteosarcoma cells [40].
This study conducted an EMT-derived prognostic model for osteosarcoma in the TARGET cohort, comprising EEF1B2, EIF4A1, RPL10, RPL11, RPL12, RPL13A, RPL21, RPL34, RPL35A, RPL36, RPL37A, RPS8, RPS9, RPS12, RPS23, RPS24, RPS27A, and RPS28. ROC curves proved that the EMT score was capable of accurately predicting osteosarcoma individuals' clinical outcomes. Additionally, external verification proved the feasibility of this model in the GSE21257 cohort. Each gene in the EMT score was linked to undesirable prognosis of osteosarcoma individuals. Previously, mTOR inhibitor blunts the p53 response to nucleolar stress through modulating RPL11 and MDM2 expressions [41]. RPL34 upregulation is indicative of undesirable clinical outcomes in osteosarcoma as well as silencing RPL34 weakens osteosarcoma proliferation [42]. Suppressing RPS9 blunts osteosarcoma cell growth via inactivating MAPK signaling [43]. Nevertheless, the functions of most genes in osteosarcoma progression are lack of experimental evidences. Multivariate analyses suggested the independency of the EMT score in prediction of osteosarcoma outcomes. Further, GSEA results presented that steroid biosynthesis was remarkably activated in high-risk osteosarcoma individuals, indicating that steroid biosynthesis might affect osteosarcoma progression.
A few limitations of our study should be taken into consideration. First, further analyses are required for elucidating the molecular mechanisms underlying the impact of genes in the EMT-derived model on osteosarcoma through in-depth experiments. Second, more osteosarcoma patients can produce more convincing and accurate findings. Hence,   RPS9  RPS23  EIF4A1  RPL12  RPL36  RPL37A  RPL34  EEF1B2  RPS8  RPS28  RPL10  RPS24  RPL35A  RPL11  RPL21  RPS27A  RPS12  RPL13A   Type   Type High Low       (d) Figure 9: Evaluation of the independency of EMT score in osteosarcoma prognosis and its relevant signaling pathways. (a, b) Uni-and multivariate cox regression models for determining the independency of EMT score in estimating osteosarcoma survival outcomes in the TARGET cohort. (c, d) Signaling pathways that were remarkably linked to EMT score via GSEA.