Nomogram Models Based on the Gene Expression in Prediction of Breast Cancer Bone Metastasis

Objective The aim of this study is to design a weighted co-expression network and build gene expression signature-based nomogram (GESBN) models for predicting the likelihood of bone metastasis in breast cancer (BC) patients. Methods Dataset GSE124647 was used as a training set, while GSE16446, GSE45255, and GSE14020 were taken as validation sets. In the training cohort, the limma package in R was adopted to obtain differentially expressed genes (DEGs) between BC nonbone metastasis and bone metastasis patients, which were used for functional enrichment analysis. After weighted co-expression network analysis (WGCNA), univariate Cox regression and Kaplan–Meier plotter analyses were performed to screen potential prognosis-related genes. Then, GESBN models were constructed and evaluated. The prognostic value of the GESBN models was investigated in the GSE124647 dataset, which was validated in GSE16446 and GSE45255 datasets. Further, the expression levels of genes in the models were explored in the training set, which was validated in GSE14020. Finally, the expression and prognostic value of hub genes in BC were explored. Results A total of 1858 DEGs were obtained. The WGCNA result showed that the blue module was most significantly related to bone metastasis and prognosis. After survival analyses, GAJ1, SLC24A3, ITGBL1, and SLC44A1 were subjected to construct a GESBN model for overall survival (OS). While GJA1, IGFBP6, MDFI, TGFBI, ANXA2, and SLC24A3 were subjected to build a GESBN model for progression-free survival (PFS). Kaplan–Meier plotter and receiver operating characteristic analyses presented the reliable prediction ability of the models. Cox regression analysis further revealed that GESBN models were independent prognostic predictors for OS and PFS in BC patients. Besides, GJA1, IGFBP6, ITGBL1, SLC44A1, and TGFBI expressions were significantly different between the two groups in GSE124647 and GSE14020. The hub genes had a significant impact on patient prognosis. Conclusion Both the four-gene signature and six-gene signature could accurately predict patient prognosis, which may provide novel treatment insights for BC bone metastasis.


Introduction
Breast cancer (BC) is one of the most prevalent malignancies and the major cause of cancer-associated deaths of women worldwide [1]. BC is considered to have the highest diagnostic rate in cancer, with more than 1.6 million new cases detected a year, accounting for approximately one-third of all new cancers in women [2]. Substantial improvements in prognosis have been achieved due to better therapeutic approaches over the past 20 years, the overall survival (OS) of BC has increased whereby metastases have become the major cause of death [3]. According to statistics, 627,000 individuals died from BC in 2018, while 684,996 deaths occurred with BC in 2020 [4,5]. e median OS of patients with metastatic BC ranges from 2 to 3 years, with a 5-year OS rate of 27% [6].
Bone is the common site of metastases, and nearly, 70% of BC patients developed bone metastasis, leading to osteolytic and osteoblastic cancers [7]. Tumor cells secreted factors including the parathyroid hormone in the bone to create an environment conducive to osteolysis instead of the direct destruction of bone [8]. In addition, bone metastasis often contributes to adverse skeletal-related events (SLEs) such as hypercalcemia, nerve root or spinal cord compression, fractures, and pain, which severely affect the quality of life in BC patients [9]. e biggest obstacle to good outcomes in bone metastases is the lack of appropriate therapeutic strategies in the management of tumor-induced SREs [10]. Bone metastases are often challenging because therapies effectively developed against the primary tumor are not satisfactory when used in patients with bone metastases [3]. Currently, magnetic resonance imaging, computed tomography, and X-ray are conventional imaging methods to detect bone metastasis but fail to sense tiny tumor masses and negligible tumor-induced osteolysis [11]. Besides, various metastatic bone lesions which resulted from BC are hard to eradicate by adjuvant localized radiotherapy or surgical intervention [12]. erefore, it is imperative to identify risk factors and develop predictive models of bone metastases and patient survival to improve the diagnosis and prognosis of bone metastatic BC patients.
High-throughput microarrays have emerged as a promising and efficient tool for studying the complex pathogenesis of human diseases, including cancer [3]. e gene expression also represents an essential role in the prognosis of patients, thus providing clinically relevant information and targeted therapies [13,14]. Meng et al. constructed a four-long noncoding RNA signature in predicting BC survival based on microarray datasets [15]. Zhao et al. built a gene-expression signature-based nomogram model for the prediction of BC bone metastases, but they did not evaluate the prognostic value of the model [16]. Additionally, a clinical nomogram was constructed to predict bone-only metastasis in patients with early BC [17]. However, the predictive value and independent prognostic significance of the gene expression signature-based nomogram (GESBN) model in bone metastatic BC patients have not been fully elucidated.
On this basis, we constructed a weighted co-expression network using the whole gene expression profile and performed survival analysis to construct predictive nomogram models for bone metastasis that can be used to predict patient OS and progression-free survival (PFS). Besides, the hub genes combined with clinicopathological characteristics were integrated into the nomogram for predicting the occurrence of bone metastases. In addition, we assessed the clinical benefits of the GESBN models and explored their prognostic value in training and validation cohorts. Finally, the expression of the genes in GESBN models and the expression and prognostic value of hub genes were initially explored and validated.

Data Mining from the Gene Expression Omnibus (GEO)
Database.
e GEO database (https://www.ncbi.nlm.nih. gov/geo/) was used to obtain the BC microarray dataset by setting the following filters: (1) more than 50 samples with BC or bone metastasis information; (2) with survival data; and (3) with expression profiling data. Finally, the GSE124647 dataset was chosen as a training set to identify the DEGs between non-bone metastasis and bone metastasis samples.
e platform was Affymetrix Human Genome U133A Array (GPL96). In total, there were 140 samples containing clinical and RNA-seq expression data in the GSE124647 was selected as training cohort. Besides, 107 samples containing OS and RNA-seq expression data in GSE16446, 94 samples containing PFS and RNA-seq expression data in GSE45255 were taken as validation cohorts to verify the prognostic value of GESBN models. Sixty-five samples containing expression data in the GSE14020 dataset were used as validation cohorts to verify the expression levels of key genes. Normalized gene expression was measured as log2-based transformation.

Identification and Functional Enrichment Analysis of DEGs.
e R package limma was used to screen the DEGs between BC non-bone metastasis and BC bone metastasis groups in the training cohort. |log 2 FC| > 1 and P-value < 0.05 were set as the filtering parameters. en, Gene Ontology (GO) including biological process (BP), cellular component (CC), and molecular function (MF), and Kyoto Encyclopedia of Genes and Genome (KEGG) were carried out to determine the major biological functions of these DEGs in the database for annotation, visualization, and integrated discovery (DAVID) (https://david.ncifcrf. gov/summary.jsp). P < 0.05 was considered statistically significant.

WGCNA.
WGCNA is a systemic method that uses gene expression data to build a scale-free network [18]. A weighted co-expression network with the expression profile data of the DEGs was built using the WGCNA package of R. Following this, we screened the key module related to BC bone metastasis and prognosis, and then extracted the genes for further analysis.

Nomogram Model Construction and Model Effectiveness
Evaluation. In the training cohort, using the "survival" package in R, univariate Cox regression analysis was performed to obtain the potential prognostic genes related to OS or PFS. Only genes that had a significant impact on OS or PFS were considered to pass univariate Cox regression analysis screening. In addition, the prognostic value of the significant genes obtained in the univariate Cox regression analysis was evaluated by the Kaplan-Meier plotter analysis. Only genes with statistical significance in OS or PFS analyses were considered to pass the screening. e intersected genes generated in univariate Cox regression and Kaplan-Meier plotter analyses were then entered into the construction of GESBN models in terms of OS and PFS using the "rms" package in R. e calibration curves were drawn to measure the performance of the models. e genes which had the greatest contribution were selected as hub genes. Besides, a decision curve analysis (DCA) was performed to assess the clinical net benefit of different models. Further, the hub genes combined with clinicopathological factors were included in the construction of a nomogram for predicting the occurrence of bone metastasis in BC.
After that, the patients were divided into high-risk or low-risk groups using the optimal cut-off value of risk score, which was calculated by the "MaxStat" package in R.
e Kaplan-Meier plotter analyses were adopted to assess the survival difference between the two groups using "survfit" function of "survival" package in R. Moreover, the Cox and ROC analyses were conducted to further evaluate the prognostic value of the GESBN models in training cohort. Subsequently, we verified the prognostic significance of the GESBN models in the validation cohorts. e same method was conducted to compute risk scores like that in the training cohort. e Kaplan-Meier, Cox and ROC analyses were implemented as described earlier. P < 0.05 was considered as significantly different. e area under curve (AUC) was used as an indicator of prognostic accuracy.

e Expression Levels of Prognostic Genes in Nomogram
Models. e expression levels of key genes between BC nonbone metastasis and BC bone metastasis groups in GSE124647 were first explored using t test. en, GSE14020 as a validation dataset was used to assess the differential expression of the key genes in two groups.

Validation of the Expression and Prognostic Value of the Hub Genes.
e protein levels of the hub genes in BC and normal tissues were evaluated using the immunohistochemistry according to the manufacturer's instructions. e Kaplan-Meier plotter (http://kmplot.com/analysis/index. php?p�background) is capable to assess the effect of 54,000 genes on survival in 21 cancer types. We used this database to verify the prognostic significance of the hub genes in BC. Survival curves were generated by the Kaplan-Meier method using the log-rank test. A log-rank P value less than 0.05 was statistically significant. Figure 1 shows the flowchart of this study.

Identification and Functional Enrichment of DEGs.
Taking BC nonbone metastasis samples as a control group, 1858 DEGs in the training set including 992 upregulated and 866 downregulated genes were generated according to the selection criteria. e volcano plot and heat map of the DEGs are presented in Figures 2(a), and 2(b), respectively.
To have a biological understanding of these DEGs, they were subjected to the DAVID database for GO annotation and KEGG pathway enrichment analysis. e top enriched GO terms in BPs were signal transduction, positive regulation of transcription from RNA polymerase II promoter, and immune response, and those in CCs were cytoplasm, cytosol, and extracellular exosome (Figures 3(a) and 3(b)). e major MFs were protein binding, Poly (A) RNA binding, and identical protein binding (Figure 3(c)). In the KEGG pathway enrichment analysis, these genes were mainly involved in the MAPK signaling pathway, proteoglycans in cancer, and focal adhesion (Figure 3(d)). e detailed information for enrichment of GO and KEGG is shown in Table 1.

WGCNA.
We incorporated the expression profile of integrated DEGs with clinical traits of the BC samples to construct a gene co-expression network. Clinical characteristics including sample group, PFS time, OS time, OS status, and PFS status were clustered with an expression matrix (Figure 4(a)). en, we chose the optimal β � 6 to ensure that network was scale-free (β was a soft-thresholding parameter that could emphasize strong correlations between genes and penalize weak correlations). After choosing the power of 2, the adjacency was transformed into a topological overlap matrix (TOM), which could measure the network connectivity of a gene defined as the sum of its adjacency with all other genes for the network gene ration, and the corresponding dissimilarity (1-TOM) was calculated ( Figure 4(b)). Based on TOM, the average linkage hierarchical clustering was conducted to cluster genes by setting the minimum number of genes for each gene network module to 30. To further analyze the module, we calculated the eigen genes of each module and merged the modules by setting a height of 0.25. Finally, 4 modules were acquired (Figures 4(c) and 4(d)). e genes in the grey module could not be incorporated into any other module. Next, Pearson's correlation coefficients of the module eigen gene of each module and the sample characteristics were calculated. e blue module with 76 genes was closely related to bone metastasis and survival status (Figure 4(e)). us, the genes in the blue module were chosen for further analysis.

Construction of the GESBN Model.
Univariate Cox regression and Kaplan-Meier plotter analyses were carried out on 140 patients in the GSE124647 to evaluate the association of 76 gene expression profiles in the blue module with patient OS and PFS. In univariate Cox regression analysis, significant genes related to OS were SLC44A1, SLC24A3, PDGFC, ITGBL1, and GJA1 ( Figure S1A) (all P < 0.05). Ten genes including MDFI, IGFBP6, GJA1, ANXA2, SLC24A3, TGFBI, CELA2A, CLEC11 A, PPEF2, and SLC44A1 were notably linked to PFS (Figure S1B) (all P < 0.05). However, only four genes related to OS, and six genes related to PFS with statistical differences were extracted in the Kaplan-Meier plotter analysis (Figures S2 and S3) (all P < 0.05). Taken together, GAJ1, SLC24A3, ITGBL1, and SLC44A1 were defined as potential prognostic genes for OS. GJA1, IGFBP6, MDFI, TGFBI, ANXA2, and SLC24A3 were potential genes correlated with PFS. ese prognostic genes were then subjected to the construction of nomogram models based on OS and PFS (Figures 5(a) and 5(b)). For OS, SLC44A1 had the greatest contribution, which could reach 100 points, while MDFI contributed most to PFS. erefore, SLC44A1 and MDFI were considered as hub genes. To ensure the accuracy of the GESBN models, the calibration curves for OS and PFS are shown in Figure S4. e DCA results showed that using the nomogram models to predict OS and PFS of patients could increase net benefit compared with other models (Figures 5(c) and 5(d)). For further exploration, we integrate the hub genes and clinicopathological characteristics into the nomogram for predicting the occurrence of bone metastases in BC. As shown in Figure 5(e), progesterone receptor positive (PR+) and early stage increased the bone metastasis risk in BC, indicating that bone metastases more frequently occurred in less aggressive and earlier stage BC patients.

Evaluation of the GESBN Models.
e GESBN score was calculated for each patient in the training set. Patients were ranked based on their risk scores and assigned into two groups as high-risk and low-risk of bone metastases. Kaplan-Meier survival analysis in the training cohort showed that the OS rate of patients in the high-risk group was low, and the difference between the two groups was statistically significant (Figure 6(a)) (P < 0.001). Similarly, an unfavorable PFS was observed in the high-risk group patients ( Figure 6(b)) (P < 0.001), suggesting that two GESBN models could predict survival well. Further, the time-dependent ROC curves were drawn using the pROC package in R. In terms of OS, the AUCs of the 3-and 5year survival rates were 0.62, and 0.72, respectively ( Figure 6(c)). For PFS, the AUCs of the 3-and 5-year survival rates were 0.88, and 0.94, respectively (Figure 6(d)), indicating that GESBN models had a good predictive ability. We further evaluate the efficacy of the GESBN models in predicting OS and PFS in the validation cohorts. Consistent with the previous results, patients in    Journal of Healthcare Engineering the high-risk group had significantly shorter OS and PFS time than that in the low-risk group (Figures 7(a) and  7(b)). e time-dependent ROC analyses also showed that the GESBN models had favorable performance in predicting OS and PFS (Figures 7(c) and 7(d)).
Moreover, to investigatealyses were carried out in the training and validation cohorts. In the training cohort, univariate analysis exhibited that progesterone receptor (PR) status and four-gene risk score were significantly related to OS, while PR status, prior endocrine sensitivity, and six-gene risk score had close relationship with PFS (all P < 0.05). In the multivariate analysis, GESBN models were independent predictors for OS (HR � 2.289, 95% CI: 1.253-4.180, P < 0.01) and PFS (HR � 2.624, 95% CI: 1.757-3.919, P < 0.001) ( Table 2). Consistently, the GESBN models displayed pronounced capability in predicting OS and PFS in the validation cohorts (all P < 0.05) ( Table 3). ese results suggested that the GESBN models were independent variables.   Journal of Healthcare Engineering Following this, ROC analyses were conducted to evaluate how the GESBN models behaved in predicting prognosis. e results showed that the AUC of the fourgene risk score model performed on OS in the training cohort was 0.691, which was better than that of PR status, prior endocrine sensitivity, and stage (0.524, 0.540, and 0.503, respectively) (Figure 8(a)). In the prediction model of PFS predicted in the training cohort, the six-gene risk score also exhibited a powerful ability with AUC � 0.758, which was superior to other variables (Figure 8(b)). e same results were observed in the validation cohorts (Figures 8(c) and 8(d)).

e Expression Levels of Genes in GESBN Models.
Due to the predictive ability of GESBN models for both OS and PFS, we explored the expression levels of these key genes. In the training dataset of GSE124647, the expression levels of all the prognosis-related genes were significantly different between control and bone-     metastasis groups (Figure 9(a)) (all P < 0.05). In the validation dataset of GSE14020, GJA1, IGFBP6, ITGBL1, SLC44A1, and TGFBI expressions in the bone-metastasis group were different from those in the control group. e differences were statistically significant (Figure 9(b)) (P < 0.05).

Validation of the Expression and Prognostic Value of Hub
Genes. Based on the GESBN result, SLC44A1 and MDFI were the hub genes. e immunohistochemistry images showed that the protein levels of SLC44A1 were higher in BC tissue than that in normal breast tissue (Figures 10(a) and  10(b)). Similarly, an elevated MDFI expression was observed in the BC tissue compared with the normal breast tissue (Figures 10(c) and 10(d)). e Kaplan-Meier plotter was performed to verify the effect of SLC44A1 and MDFI on OS, PFS, and DMFS in BC. Patients in the high SLC44A1 expression group tended to have favorable OS, PFS, and DMFS (Figures 11(a)-11(c)) (P < 0.05). Although the MDFI expression was not significantly linked to OS and DMFS of the BC patients (Figures 11(d) and 11(f )) (P > 0.05), its high expression predicted worse PFS (Figure 11(e)) (P < 0.01). ese results indicated that SLC44A1 and MDFI might be potential biomarkers for BC.

Discussion
BC is a heterogenous tumor driven by various molecular progression pathways [19]. Analyses of BC progression showed that bone is the first metastatic site of this disease possibly due to the favorable chemokine milieu or microenvironment in the bone, as well as the intrinsic molecular characteristics of cancer cells [17,20]. Although these hypotheses are promising, biological information and anatomical characteristics are still the basis for clinicians to determine prognosis; however, the predictors of bone metastasis remain uncertain clinically [17,21]. Some gene signature-based prognostic prediction models for BC patients have been reported via repurposing and analysis of microarray data [22,23]. ese models were built for predicting OS for BC patients but lack the prediction of bone metastasis. By using GEO accession number GSE124647, we obtained 1858 DEGs between BC nonbone metastasis and bone metastasis groups. After screening the prognosis-related genes, we constructed a four-gene expression signature-based nomogram model and a six-gene expression signature-based nomogram model. We first conducted a differential analysis of the GSE124647 dataset in relation to BC bone metastasis and employed functional enrichment analysis to these DEGs, which were found to be mainly related to signal transduction, and positive regulation of transcription in terms of BP.
CCs were mainly enriched in cytoplasm and cytosol. MFs were mainly protein binding, and Poly (A) RNA binding. e potential pathways that they were involved in were MAPK signaling pathway and proteoglycans in cancer. Based on WGCNA, 76 genes in the blue module were initially selected for the following prognostic analysis. After univariate Cox regression and Kaplan-Meier plotter  analyses, OS nomogram including GJA1, SLC24A3, ITGBL1, and SLC44A1, and PFS nomogram including GJA1, IGFBP6, MDFI, TGFBI, ANXA2, and SLC24A3 were constructed. Among them, SLC44A1 and MDFI were considered as hub genes. Since BC is a molecularly heterogenous disease which similar tumors form various clinical outcomes and metastases patterns [24]. It has been demonstrated that BC luminal subtype and patients at lowgrade are more prone to develop metastases [16]. Considering the organ-specific tendency of metastasis regarding molecular subtypes, we constructed a model containing hub genes and clinicopathological characteristics for predicting the occurrence of BC bone metastases. Consistently, we found that PR+ and early stage increased the bone metastatic    risk of BC patients. us, the authors speculated that despite the development of bone metastases, the patient's tumor is still of low malignancy and the clinical outcome might be improved if the bone lesions are well controlled. en, Kaplan-Meier plotter, Cox and ROC analyses confirmed the reliable and superior prognostic ability of GESBN models. AUC can be used to assess the accuracy and predictive capacity of biomarkers in diagnostic tests [25].
After confirming the predictive value of nomograms, we validated the expression and prognostic significance of hub genes in the models. Expectedly, SLC44A1 and MDFI protein levels were higher in BC tissues than those in normal breast tissues. We found that the high SLC44A1 expression was significantly related to favorable OS, PFS, and DMFS. However, patients in the high MDFI group predicted worse PFS. e solute carrier (SLC) superfamily contains various  membrane-bound transporters which are required to transport a wide variety of substrates over biological membranes, and the dysregulated expression of these transporters may be related to cancer metastasis. SLCO1B1 was found to be highly expressed in colon cancer, and its expression level was significantly associated with the degree of differentiation in this type of cancer [26]. SLCO1B3 overexpression may be linked to hormone-dependent growth mechanisms, and the expression of this transporter could serve as a valid prognostic factor for BC [27]. As a member of the SLC superfamily, SLC44A1 is a mitochondrial protein mediating choline transport and is preferentially expressed in neurons and oligodendrocytes [28]. Besides, high activity of the SLC44A1 promoter has been proved to participate in the occurrence of papillary glioneuronal tumors [29]. Our study revealed an important finding that high SLCO4A1 expression contributed to the favorable clinical outcome for BC metastasis patients. MDFI is a transcription factor that negatively regulates myogenic family proteins [30]. A previous study has demonstrated that the loss of MDFI was related to human BC and myeloid neoplasm via negative regulation of the Wnt pathway [31]. In this study, high MDFI expression led to poor PFS for patients with metastatic BC.

Conclusions
Based on the construction of a weighted co-expression network for DEGs between BC nonbone metastasis and bone metastasis, we screened the key module and related genes to investigate a prognostic nomogram model for bone metastatic BC. e study provided some potent biomarkers of BC bone metastasis and enabled the prediction of patient survival. We also found that SLC44A1 and MDFI were the hub genes in BC bone metastasis, which might be the therapeutic targets for this disease.
Data Availability e dataset used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Disclosure
A preprint has previously been published [32].