Development and Validation of a Novel Clinical Prediction Model to Predict the Risk of Lung Metastasis from Ewing Sarcoma for Medical Human-Computer Interface

Background This study aimed at establishing and validating a quantitative and visual prognosis model of Ewing Sarcoma (E.S.) via a nomogram. This model was developed to predict the risk of lung metastasis (L.M.) in patients with E.S. to provide a practical tool and help in clinical diagnosis and treatment. Methods Data of all patients diagnosed with Ewing sarcoma between 2010 and 2016 were retrospectively retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. A training dataset from the enrolled cohorts was built (n = 929). Predictive factors for L.M. were identified based on the results of multivariable logistic regression analyses. A nomogram model and a web calculator were constructed based on those key predictors. A multicenter dataset from four medical institutions was established for model validation (n = 51). The predictive ability of the nomogram model was evaluated by the receiver operating characteristic (ROC) curve and calibration plot. Decision curve analysis (DCA) was applied to explain the accuracy of the nomogram model in clinical practice. Results Five independent factors, including survival time, surgery, tumor (T) stage, node (N) stage, and bone metastasis, were identified to develop a nomogram model. Internal and external validation indicated significant predictive discrimination: the area under the ROC curve (AUC) value was 0.769 (95% CI: 0.740 to 0.795) in the training cohort and 0.841 (95% CI: 0.712 to 0.929) in the validation cohort, respectively. Calibration plots and DCA presented excellent performance of the nomogram model with great clinical utility. Conclusions In this study, a nomogram model was constructed and validated to predict L.M. in patients with E.S. for medical human-computer interface—a web calculator (https://drliwenle.shinyapps.io/LMESapp/). This practical tool could help clinicians make better decisions to provide precision prognosis and treatment for patients with E.S.


Introduction
Ewing sarcoma (E.S.) is the second most frequent primary malignant tumor among children and adolescents, especially in the age group of 4 to 15 years; it was first reported by James Ewing in 1921 [1][2][3][4]. E.S. is usually caused by a chimeric fusion oncogene; the most common one (80-85%) is t (11; 22) (q24; q12) [5,6]. As a result, the multimodal therapeutic approaches involving a combination of chemotherapy, surgery, and radiotherapy for local control were

Materials and Methods
e SEER database is publicly available, and the patients are anonymous. Hence, the informed consent of patients was not required for this study [21,34].

Patients and Data Collection.
is retrospective study extracted data of E.S. patients who were diagnosed and treated between 2010 and 2016 from the SEER database as the training cohort by using SEER * STAT (8.3.5) software. e inclusion criteria were as follows: (1) diagnosis of E.S. with ICD-O-3/WHO 2008 morphology code 60 and (2) complete clinical information. e exclusion criteria were as follows: (1) information on clinicopathological and survival time was missing or unavailable and (2) cases with other primary tumor diseases and unknown metastatic status. e demographic and clinical variables of age, race, survival time, primary site, laterality, T stage, N stage, surgery, radiation, chemotherapy, and distant metastasis were recorded from the SEER database using SEER * STAT (8.3.5) software.
Four medical institutions, including the Second Affiliated Hospital of Jilin University, the Second Affiliated Hospital of Dalian Medical University, Liuzhou People's Hospital, and Xianyang Central Hospital, provided external validation. ere were three investigators responsible for the acquisition and processing of data in each institution for the external validation. Two of them extracted data, and a third investigator conducted the accuracy checks. All data have been checked for consistency and sorted by date using Microsoft Excel (Microsoft Excel, 2013, Microsoft, Redmond, USA).

Construction, Validation, and Clinical Utility of a
Nomogram. Patients from the SEER database for 2010-2016 are taken as the training cohort (n � 929), and patients from multicenter dataset are taken as the validation cohort (n � 51).
We compared clinicopathological characteristics of the training cohort and the validation cohort using the chisquare test. We assessed variables that predicted L.M. in E.S. patients by univariate logistic regression analysis. Subsequently, multivariate logistic regression analysis was used to evaluate each variable at a 0.05 significance level, and the independent factors associated with L.M. were obtained. Based on the multivariable logistic regression analysis, a nomogram has been constructed in the training set. We plotted the receiver operating characteristic (ROC) curves and calculated the area under ROC (AUC) to evaluate the prediction accuracy of the nomogram. e relationship between actual probability and the predicted probability is verified by calibration curves. Moreover, decision curve analysis (DCA) was used to evaluate the clinical utility and value of the nomogram.

Statistical Analysis
Continuous and categorical variables are expressed as mean ± SD and frequency in this study. All statistical methods, including the T-test, chi-square test, Kaplan-Meier analysis, and logistic regression analysis, were conducted via SPSS Statistics software (version 26.0, SPSS Inc., Chicago, USA). R software (version 4.0.5, P value < 0.05) was applied to complete the nomogram, receiver operating characteristic (ROC) curves, calibration plots, and DCA curves with statistical significance. e results with a significance level less than 0.05 were considered statistically significant, and 95% confidence intervals (CIs) were applied for all analysis.

Demographic Baseline Characteristics.
A cohort of a total of 980 patients was enrolled in this study. Of these, 929 patients from the SEER dataset were assigned to the training cohort and 51 patients from multiple centers were assigned to the validation cohort. Results of the T-test and the chisquare test indicated that there was no statistically significant difference between training and validation groups in L.M., age, survival time, sex, primary site, laterality, N stage, surgery, chemotherapy, and bone metastasis at 0.05 significance level ( Table 1, P > 0.05), but there was significant difference in race, radiation, and T stage (  Figure 1(a)). Meanwhile, we designed a medical human-computer interface-an online web calculator (https://drliwenle.shinyapps.io/LMESapp/)-to evaluate the risk of L.M. for each patient. We found that the N stage had the greatest impact on L.M., and surgery had the smallest impact (Figure 1(a)). e AUC in internal validation and external validation was 0.769 (95% CI: 0.740 to 0.795) and 0.841 (95% CI: 0.712 to 0.929), respectively, indicating that the nomogram has a good discriminative ability to assess the status of L.M. (Figures 1(b) and 1(c)). e calibration curve of the nomogram revealed good consistency in training and validation cohorts (Figures 2(a) and 2(b)). e results of the training and validation cohorts consistently showed that the prediction ability of the nomogram was higher than that of a single factor (Figures 2(a) and 2(b)). e results of validation set suggested that the new model had significantly improved accuracy and reliability for cancer prediction compared with the single factor as shown in Table 4.

Clinical Utility of the Nomogram.
e Kaplan-Meier survival curves of the overall survival (O.S.) of the total 980 patients were plotted (Figure 3). e results unveiled that Meanwhile, we observed that the model had good clinical utility in predicting lung metastasis in both the training and the validation cohorts in E.S. patients. e net benefit of the training cohort was slightly higher than that of the validating cohort, which might be caused due to the limitation of the scale of the validation cohort (Figures 4(a) and 4(b)).

Discussion
Ewing sarcoma (E.S.) is a rare high-cell malignant round-cell tumor of bone, which occasionally occurs in soft tissue and extra-bone tissue. E.S. is characterized by dissemination and micro-metastasis that cannot be detected by clinical imaging such as CT, PET-CT, or MRI [30]. e most common metastatic site is the lung, followed by distant bone [14]. Previous researchers [10,12,14,15,17,20,21,[35][36][37] constructed and validated nomograms to predict metastasis and the overall survival and cancer-specific survival in patients with E.S. However, it is innovative to establish a nomogram model combined with data from the SEER database and four independent medical centers to estimate the risk of key predictors of L.M. in E.S. Moreover, we designed a medical human-computer interface (web calculator) as a practical tool for clinicians, using ML algorithm to predict the risk outcomes of patients. ML has the advantage of being highly capable, objective, and repeatable in processing large datasets and reliable data [38][39][40][41]. is artificial intelligencebased strategy can be exploited by clinicians to help them select more rational treatment responses [42][43][44][45].
In this study, five independent factors (survival time, T stage, N stage, surgery, and bone metastasis) were identified associated with L.M. In addition, a nomogram model was built and validated to accurately predict metastasis in patients with E.S.
According to the results of logistic regression, we found that survival time was negatively associated with L.M. as an independent protective factor for the incidence of L.M. in patients with E.S. A study by Leavey et al. indicated that 79% of patients experienced the first recurrence within two years of initial diagnosis. Approximately 30% of them are in the lungs based on 262 cases [46]. Of these independent risk factors for poor prognosis, metastasis appears to be the most common [46], which could cause more death in a short time. It was proved that once the tumor was well controlled and less likely to metastasize, patients would have longer survival.   Computational Intelligence and Neuroscience 5 Furthermore, this paper showed that T stage had a negative effect on the occurrence of L.M. in E.S. patients. In Table 3, it is demonstrated that the size of the tumor contributed most to the nomogram, while in1990s, scholars had testified that tumor size is the independent factor for primary metastasis [47], which was similar to subsequent studies [14,17,23,27,35,48,49]. In the aspect of tumor size, Ramkumar et al. proved that tumors with a diameter greater than 118 mm increased the incidence of L.M. by nearly threefold [12]. Ye et al. explained that the 80 mm tumor is prone to have metastasis [13]. e relationship between tumor size and metastasis is worth further study. e rationale behind this can be explained by the fact that large tumors have invaded into surrounding soft tissues, where lymphatics and blood vessels are abundant, promoting the occurrence of lung metastasis [42].In addition, it is difficult to conduct sufficient surgical resection and acquire proper margins [13,36], which highlights the significance of early detection of E.S. Unfortunately, early diagnosis remains a huge challenge for both patients and doctors as many tumors are painless [50,51].
Our study indicated that N stage is the most significant predictor for L.M. in patients with E.S. Approximately 30.8% (57/185) of patients with L.M. had N1 and NX status in this study, and the rate of lymph node involvement is 8.2% (80 patients) which is higher than 6.3% in previous studies [52]. According to the results of Table 3, the risk ratios for L.M. in N1 and Nx patients were 4.953 and 1.41, respectively, compared with patients without lymph node metastasis. Because lymphatic vessels are not present in the bone [50], lymph node metastasis and L.M. were more likely to occur when E.S. had invaded into surrounding soft tissues. Given that regional lymph node involvement can be an independent adverse prognostic factor and it is more likely to metastasize [52], FDG-PET scan [53] and even biopsies are recommended for suspected patients with lymph node metastasis.
In addition, the results of logistic regression also revealed that surgery was an independent protective factor. We found that 89% of patients who underwent surgery had no L.M., and only 11% of them had L.M. For patients with E.S., distant metastasis is the main cause of relapse. Modern treatments with more aggressive surgical approaches can prevent distant metastases and improve local control, where the disease-free survival for patients with localized disease may be close to 70% [54,55]. Surgery was verified to significantly associate with O.S. [15,32]. It is no doubt that surgery is one of the most successful and vital strategies for the treatment of E.S.
In the present study, patients with bone metastasis had a higher tendency to develop L.M., which was consistent with the statistical results of logistic regression. Approximately 30.8% (57/185) of patients with L.M. had bone metastasis in this study. e most common site of metastasis is the lung, followed by bones [2,4]. Patients with bone metastasis alone had a worse prognosis than those with L.M. exclusively [3,4,13,56]. Owing to the aggressiveness of extra-pulmonary metastasis, once bone metastasis occurs, it is prone to metastasize to lungs [3,13,48]. erefore, bone metastasis is a key manifestation leading to L.M.    Several limitations of this study should be considered. First, as a retrospective study, potential bias cannot be ignored. Second, a host of factors probably related to L.M. should be included, such as carcinoembryonic antigen (CEA), surgical margin status, detailed plan of radiotherapy and chemotherapy, and vascular invasion.
ird, the sample size of external validation was too small.

Conclusion
We comprehensively assessed the predictors to L.M. in E.S. based on a dataset from the SEER database and four independent medical institutions. A novel nomogram model was constructed to enhance the prediction ability of the risk of L.M. and guide clinicians in individualized precision treatment, which is helpful for follow-up management measures.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.
Ethical Approval e study was conducted in accordance with the guidelines of the Declaration of Helsinki and Good Clinical Practice as well as Chinese regulations. Meanwhile, the study was approved by the Institutional Ethics Committee of Xianyang Central Hospital (Ethics Committee number: 20210022).
Considering that this work was a retrospective study, the ethics committee waived the requirement for informed patient consent.
e study was registered in the ClinicalTrials.gov database with the following identification number: researchregistry 6111.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Chengliang Yin and Zhaohui Hu designed the project. Wei Kang and Tao Hong drafted the manuscript. Wenle Li, Chan Xu, Bing Wang, Qiang Liu, Haosheng Wang, and Shengtao Dong analyzed the data. Xin Huang proofread and revised the manuscript. All authors read and approved the final version of the manuscript.