The molecular mechanism of osteosarcoma (OS) based on protein-coding genes has largely been studied in the past decades. However, much remains to be explored when it comes to the role that long noncoding RNAs (lncRNAs) play in the pathogenesis and progression of OS and how they are associated with OS metastasis. In the present study, we collected RNA-seq-based gene expression data of 82 OS samples from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database, along with their clinical information. We found that 50 lncRNAs were significantly associated with patients’ survival by univariable Cox regression model. Moreover, we built multivariable Cox regression model based on 7 lncRNAs and successfully stratified patients into two risk groups, which exhibited significantly different prognostic outcomes. Significantly enriched Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways detected by differential expression analysis on DEGs between the two groups with different prognostic outcomes were both immune-related, indicating that such GO terms and pathways are critical for OS survival. Among the seven lncRNA signatures,
Osteosarcoma (OS) is among the most prevalent malignancies in children and adolescents [
It is crucial for the development of molecularly targeted therapies to identify metastatic-related biomarkers and underlying mechanism in OS, in order to deliver a more accurate prognosis prediction and therapeutic decisions [
In this study, RNA-seq data and clinical information of patients with osteosarcoma from the TARGET database were processed with univariable Cox regression and random forest algorithm, and we selected seven long noncoding RNAs (lncRNAs); all of them have the potential to affect the survival of osteosarcoma patients and to construct a prognosis risk model. Based on the stratification offered by our model, the corresponding biological differences among osteosarcoma patients and how these characteristics would result in varied prognostic outcomes were further explored and explained.
We downloaded RNA-seq-based gene expression data (TPM, transcript per million), somatic copy number alteration (SCNA) data, and clinical data of 82 corresponding osteosarcoma patients from the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) database [
First, based on 9 biotypes for lncRNAs (which were 3prime_overlapping_ncRNA, antisense, lincRNA, macro_lncRNA, non_coding, sense_intronic, sense_overlapping, bidirectional_promoter_lncRNA, and retained intron) in Ensembl, we obtained a total of 3,159 lncRNAs that exhibited
Taking into consideration the expression of qualified lncRNAs in each patient and the patient’s survival status, we applied multivariable Cox regression with survival package in R v3.6.3 to build our osteosarcoma prognosis risk model, and lncRNAs with significant contribution to the model were selected. These lncRNAs were used to construct a risk-scoring method, which assigned a score that reflected the risk of death to each osteosarcoma patient. The patients were then divided by the median score into the high-risk and low-risk groups, accordingly. We visualized the survival curves of the two groups of patients by the Kaplan-Meier method and assessed the differences between the two groups by log-rank test.
As osteosarcoma patients were categorized, their gene expression profiles fell into two groups, accordingly. Utilizing the screening criteria of
The infiltrating levels of immune cells were estimated based on the gene expression profiles and marker genes of immune cells. Single-sample gene set enrichment analysis (ssGSEA) was employed in this study. This analysis was implemented in R GSVA v1.32.0 package [
As shown in Figure
The study design and the expression profiles of the 50 prognostic lncRNAs in OS. (a) The workflow of the present study. (b) The lncRNAs were clustered by hierarchical clustering algorithm, and the samples were ordered by survival status and survival time.
To build a lncRNA-based Cox regression model for OS risk prediction, we first ranked the prognostic lncRNAs by random forest algorithm, and the top 20 lncRNAs were considered candidates for the construction of an OS prognostic risk model. We then built our model with multivariable Cox regression on the samples with clinic information and expression data of these lncRNAs and obtained seven lncRNAs that significantly contributed to the model (Table
The summary for seven prognostic lncRNAs in univariable and multivariable Cox regression model.
Features | Univariable Cox regression | Multivariable Cox regression | ||||
---|---|---|---|---|---|---|
Coefficient | Hazard ratio | Coefficient | Hazard ratio | |||
USP30-AS1 | -1.25 | 0.29 | -2.10 | 0.122 | ||
AC113383.1 | -0.09 | 0.91 | -0.11 | 0.89 | ||
LINC01549 | 0.02 | 1.02 | 0.02 | 1.02 | ||
AC093627.3 | 0.12 | 1.13 | 0.16 | 1.18 | ||
DDN-AS1 | 0.35 | 1.42 | 0.22 | 1.25 | ||
GNAS-AS1 | 0.46 | 1.58 | 0.68 | 1.98 | ||
AC011442.1 | 0.39 | 1.48 | 0.72 | 2.06 |
The performance of the seven lncRNAs in OS survival prediction. (a) Risk scores for each patient in different groups, where the blue points represent low-risk patients, and red points represent high-risk patients at the top panel. In the middle panel, the distribution of survival time and survival status of two groups of patients, of which the
In addition, to assess the independence of this scoring in predicting patients’ prognosis, we performed both univariable and multivariable Cox regression for samples using the precalculated risk scores and their clinical information such as gender, race, and age. We found that this risk score was an independent indicator for OS patients’ survival (Table
The comparative analysis of the risk score with other clinical factors in univariable and multivariable Cox regression models.
Features | Univariable Cox regression | Multivariable Cox regression | ||||||
---|---|---|---|---|---|---|---|---|
HR | Lower 95% CI | Upper 95% CI | HR | Lower 95% CI | Upper 95% CI | |||
Risk score | 19.7 | 8.41 | 46.2 | 19.7 | 8.41 | 46.4 | ||
Gender (female/male) | 0.30 | 0.68 | 0.33 | 1.41 | 0.22 | 0.60 | 0.27 | 1.35 |
Race (white/other) | 0.23 | 0.64 | 0.30 | 1.34 | 0.47 | 0.75 | 0.35 | 1.64 |
Age | 0.82 | 1 | 1 | 1 | 0.72 | 1 | 1 | 1 |
To investigate dysregulated genes in the two risk groups, we compared the gene expressions of these two risk groups. With thresholds at |log2 (fold change)| >1 and
The biological differences between the high-risk and low-risk groups stratified by the multivariable Cox regression model. (a) The overview of the differentially expressed genes between the two risk groups. The red and blue points represent the upregulated and downregulated genes in the high-risk group compared with low-risk group. The differentially expressed genes were significantly enriched in GO terms (b) and KEGG pathways (c).
The GO and KEGG pathway enrichment analyses proved that the immune microenvironment of osteosarcoma patients played a crucial role in OS progression. It can be learned that the top 10 GO terms exhibited close association with immunity, including inflammatory responsive response T cell activation, humoral immune response, lymphocyte-mediated immunity, axonemal dynein complex assembly, positive regulation of T cell activation, and regulation of leukocyte cell-cell adhesion (Figure
As lncRNAs upregulated or downregulated by copy number alterations (CNA) probably acted as driver lncRNAs in cancer, we performed correlation analysis of the expression level and the corresponding copy number status for the seven prognostic lncRNAs in the multivariable Cox model. We observed that
The oncogenic driver lncRNA AC011442.1 and its functionality. (a) The expression patterns of AC011442.1 in OS patients with CNAs and wild type. (b) The predicted pathways that the AC011442.1 might participate in.
To further investigate the biological function of the four lncRNAs, we conducted gene set enrichment analysis on protein-coding genes that highly correlated with identified lncRNAs. We found that
To further explore the immune cells and related markers associated with OS prognosis, we first examined the expression patterns of the immune markers. Specifically, the immune inhibitory genes such as
OS prognosis-related immune signatures. (a) The expression patterns of immune inhibitory receptors in the two risk groups. (b) The differential infiltrating levels of CD8 T and activated NK cells. The high-risk and low-risk groups are represented by the labels of “high” and “low” and colored by green and orange, respectively. (c) The expression profiles of the marker genes of the CD8 and NK cells. The samples are ordered by the risk scores.
The molecular mechanism of OS based on protein-coding genes has largely been studied in the past decades. Despite extensive researches about the molecular mechanism of OS, there is still a lack of understanding of the lncRNAs’ role in OS tumorigenesis, progression, and metastasis. Meanwhile, the identification of the prognostic lncRNAs involved in OS can facilitate the development of new diagnostic or therapeutic biomarkers.
In the present study, we collected 82 OS samples with RNA-seq-based gene expression data and their clinical information from the TARGET database. We found that 50 lncRNAs were significantly associated with patients’ survival by a univariable Cox regression model (
As dysregulated lncRNAs caused by copy number alterations (CNA) may act as driver lncRNAs in cancer, correlation analysis of the expression level and the corresponding copy number status for the seven prognostic lncRNAs was performed to identify the driver lncRNAs. Notably,
As the exploration into varied molecular patterns between the two risk groups revealed that the immune-related pathways were enriched by DEGs in OS, we then examined whether the abundance of immune cells and markers were associated with OS prognosis. Specifically, the immune inhibitors such as
In addition, the limitations of this study should be pointed out. Firstly, the multivariable Cox regression model needs an independent gene expression data for the validation of its robustness. Secondly, though a list of dysregulated lncRNAs associated with OS survival was identified, but future experimental verification is still needed. Moreover, detailed molecular functions of identified dysregulated lncRNAs had not been thoroughly discussed in this study. We hope that, when validation datasets become available in the near future, we can further confirm our findings and perform experimental validation. In summary, the identification of novel prognostic lncRNAs in OS would not only improve our understanding of the lncRNAs involved in OS tumorigenesis or progression but also assist the prediction of OS survival and development of molecularly targeted therapies to some extent, which in turn benefit patients’ survival.
Previously reported gene expression and clinical data were used to support this study and are available at TARGET (Therapeutically Applicable Research To Generate Effective Treatments) database (
The authors declare that they have no conflicts of interest.
Y.G. and Z.Y. designed this study. H.G., Y. G., and M. Z. conducted the experiments. H.G., Y. G., and Y. Z. contributed to the writing of the paper and setting of figures. Hua Gao and Yuanyuan Guo contributed equally to this work.
Supplementary Table S1: the 50 prognostic long noncoding RNAs and statistical significance (