A Novel Prognostic Model for Patients with Primary Gastric Diffuse Large B-Cell Lymphoma

Objectives Primary gastric diffuse large B-cell lymphoma (PG-DLBCL) is a common phenotype of extranodal non-Hodgkin's lymphoma (NHL). This research aims to identify a model for predicting overall survival (OS) and cancer-specific survival (CSS) in PG-DLBCL. Methods A total of 1716 patients diagnosed with PG-DLBCL between 1975 and 2017 were obtained from the SEER database and further randomly divided into the training and validating cohorts at a ratio of 7 : 3. Univariate and multivariate cox analyses were conducted to determine significant variables for the construction of nomogram. The performance of the model was then assessed by the concordance index (C-index), the calibration plot, and the area under the receiver operating characteristic (ROC) curve (AUC). Results Multivariate analysis revealed that age, race, insurance status, Ann Arbor stage, marital status, chemotherapy, and radiation therapy all showed a significant association with OS and CSS. These characteristics were applied to build a nomogram. In the training cohort, the discrimination of nomogram for OS and CSS prediction was excellent (C-index = 0.764, 95% CI, 0.744–0.784 and C-index = 0.756, 95% CI, 0.732–0.780). The AUC of the nomogram for predicting 3- and 5-year OS was 0.779 and 0.784 and CSS was 0.765 and 0.772. Similar results were also observed in the internal validation set. Conclusions We have successfully established a novel nomogram for predicting OS and CSS in PG-DLBCL patients with good accuracy, which can help physicians to quickly and accurately complete the evaluation of survival probability, risk stratification, and therapeutic strategy at diagnosis.


Introduction
e stomach is the most commonly involved organ in extranodal non-Hodgkin lymphoma (NHL), and diffuse large B-cell lymphoma (DLBCL) is the most common histological type with a prevalence estimated at 40-70% [1][2][3]. e International Prognostic Index (IPI) and Lugano stage systems are widely used tools to stage gastrointestinal lymphomas to plan for therapy and surveillance. However, primary gastric DLBCL (PG-DLBCL) is usually diagnosed with low or intermediate IPI, and the prognosis is not consistent with patients with nodal or other extranodal lesions [4]. And since rituximab was approved by the FDA in 1997, the outcomes of DLBCL were significantly improved. erefore, as the present predictive scoring systems are limited, the detailed information obtained for each patient, including age at diagnosis, sex, race, marriage, Ann Arbor stage, primary site, surgery, molecular subclassification, genetic abnormality, and insurance status were recommended to be collected and analyzed to perform a novel predictive nomogram.
Based on the Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute, population-level multiparameters or factors were analyzed and developed to be a predictive nomogram for prognosis among patients with newly diagnosed lymphoma. SEER is one of the most representative large tumor databases in North America, which provides a broad path for the study of malignant tumors and rare tumors. ese prognostic risk parameters are explored from SEER database following a series process. And the prognostic nomogram comprising those parameters always performs more accurate comparing to the traditional or current survival-analysis tools [5,6]. Here, we integrated various types of prognostic parameters, including the well-established demographic and baseline clinical characteristics, primary sites, race, surgery, and chemotherapy, to develop and establish a new model predicting the overall survival (OS) and cancer-specific survival (CSS) of PG-DLBCL patients.

Data Source.
e data of PG-DLBCL patients (between 1975 and 2017) were screened from the SEER registry database of the National Cancer Institute using SEER * Stat software (version. 8.3.5). As all of the data in this study were obtained from the SEER database with a publicly available method, no local ethical approval or declaration was required for this study. e information of total 7200 patients were collected following SEER variables: age at diagnosis, sex, race, marital status, insurance type, Ann Arbor stage, surgery, chemotherapy, radiation therapy, and survival time.
e exclusion criteria include the following: (1) cases with incomplete Ann Arbor staging at diagnosis or with other multiple primary tumors, (2) missing or incomplete information of follow-up, (3) unclear characteristic data above. A total of 1716 gastric-DLBCL patients were randomly divided into training set and validation set at a ratio of 7 : 3, which are 1204 and 512 cases, respectively ( Figure 1).

Construction and Validation of the Nomogram.
We incorporated the characteristics of the training cohort to establish the nomogram. e endpoints were OS and CSS, which were measured from the date of first diagnosis to the date of any cause of death. Survival was estimated using the Kaplan-Meier method and Cox regression analysis. Univariate and multivariate analyses were performed to determine independent prognostic variables, and the factors observed to have significant associations with OS or CSS were applied to construct the nomogram. Next, internal validation was performed. e performance of the nomogram was measured by Harrell's concordance index (C-index) and the area receiver operating characteristic (ROC) curve (AUC) [7]. Finally, comparisons between the nomogram and the Ann Arbor stage system were evaluated by C-index and AUC.

Statistical Analysis.
All the data were analyzed using R version 3.4.2 software (the R Foundation for Statistical Computing, Vienna, Austria. http://www.r-project.org). e bilateral P < 0.05 was regarded as significant.

Clinical Characteristics.
Clinical characteristics of the training and validation cohorts were shown in Table 1. In the training cohort, the majority of patients were over 60 years old (70.0%), male (58.1%), White (78.5%), married (56%), and insured (78.7%). Furthermore, Ann Arbor stages I, II, III, and IV accounted for 40.9, 23.5, 8.9, and 26.7% of all the cases, respectively. Most patients (75.8%) experienced chemotherapy, while just 9.7% and 15.7% patients received surgery and radiation therapy, respectively. Overall, patients in the two sets shared similar clinical characteristics (P > 0.05).

Prognostic Factors in the Training Cohort.
e results of the univariate and multivariate analysis are listed in Table 2. In the two multivariate analyses, age, race, marital status, insurance status, Ann Arbor stage, chemotherapy, and radiation therapy were significantly associated with OS. However, surgery treatment was evaluated as a nonsignificant factor with P value >0.05. In addition, we analyzed the association of each parameter with the CSS of patients in the training cohort, and found significant prognostic factors consistent with the OS generally.

Construction of Nomogram.
e prognostic nomogram for 3-and 5-year OS is shown in Figure 2. By adding up the scores for each selected variable, a patient's probability of individual survival can be easily calculated.
e OS was better for younger patients (particular the patients under 60 years old), patients with early Ann Arbor stage, uninsured and married patients. Furthermore, the patients who had chemotherapy or radiation therapy also exhibited better OS probability. Here, we found Black patients performed the worst OS compared to White and other ethnic patients. In addition, the prognostic nomogram for 3-and 5-year CSS of gastric-DLBCL patients was similar to OS in general.

Validation of Nomogram.
e C-index of the nomogram for the prediction of OS was 0.764 (95% CI, 0.744-0.784), and CSS was 0.756 (95% CI, 0.732-0.780) in the training Total cases of gastric diffuse large B cell lymphoma from SEER between 1975 and 2017 (n=7200) Exclude patients with incomplete Ann Arbor stage (n=3214) Exclude patients with multiple primaries tumors (n=2620) Exclude patinets with incomplete survival data, missing data in SEER cause-specific death classification, unknown surgery, unknown race, unknown marital status, unknown insurance (n=1716) Training set (n=1204) Validation set (n=512)  (Table 3). In comparison, OS and CSS for the Ann Arbor stage system were just 0.564 and 0.589. e nomogram was then validated in the internal gastric-DLBCL validation cohorts (512 cases). e model also showed a good level of discriminative ability to predict OS (C-index 0.745) and CSS (C-index 0.751). e nomogram was well calibrated, as revealed by the calibration curves (Figures 3 and 4). And it also performed well in predicting OS and CSS of patients with gastric DLBCL (Figures 5 and 6). e AUC of the nomogram for predicting 3-and 5-year OS were 0.779 and 0.784 in the training set, and 0.774 and 0.740 in internal validation set. In terms of 3-and 5-year CSS for the nomogram, the AUC was 0.765 and 0.772 in training set, and 0.762 and 0.774 in internal validation set.

Comparison of the Values.
e internal validation cohort calibration curves showed good optimal agreement between prediction by nomogram and observation in the probability of 3-and 5-year survival. As shown in Tables 3 and 4, we further compared the C-index and AUC of the nomogram to the Ann Arbor stage system. e C-index was much lower in Ann Arbor stage system, just 0.562 of OS and 0.552 of CSS in the validation set. In addition, the AUC values of OS and CSS for nomogram also performed much better than the Ann Arbor stage system, particularly the 5-year OS for the training cohort (0.784 of nomogram versus 0.578 of Ann Arbor stage system).

Discussion
Primary gastrointestinal lymphoma (PGIL) is relatively rare and only constitutes less than 5% of gastrointestinal (GI) tract tumors. Primary gastric DLBCL is the most common location of DLBCL in the gastrointestinal tract [8]. Several staging systems have been developed over the past decades to improve prognostic stratification of gastrointestinal lymphoma; unfortunately, there has been no accepted standard till now [9][10][11]. In this study, we involved the information of 7200 patients diagnosed as PG-DLBCL from the large dataset SEER, and then, collected 1204 cases to construct a novel predictive nomogram and validate it with a 512 patient internal validation cohort. We found the characteristics of patients including age, race, marital status, insurance status, Ann Arbor stage, chemotherapy, and radiation therapy were associated with prognosis. Additionally, this nomogram performed with excellent accuracy as assessed by C-index and AUC. Compared to the Ann Arbor stage scoring system, the C-index of the nomogram for OS and CSS prediction were more accurate both in the training and validation sets, and the AUC values of the nomogram for predicting 3-and 5-year OS and CSS were higher, which can help clinicians accurately predict the survival of individual patients. In this predictive model, the PG-DLBCL patients who were unmarried or single, insured, and Black showed worse outcomes. Better financial and psychological support may be beneficial for treatments, so married patients were associated with better prognosis. Previous research has described that marital status was independently associated with the 5-year relative survival of patients with DLBCL [12]. According to the majority of previous results, Black patients had worse outcomes, and lower socioeconomic status for Black patients might have contributed to the worse survival [13][14][15]. Based on these evidences, we concluded the worse survival for Black patients with DLBCL may be associated with religion, habit, and living environment. Although the patients having relatively good financial aid found it easier to follow the treatments, the result was that the PG-DLCBL patients insured showed worse results compared to the patients with insurance (any Medicaid or insured) confused us. We thought there might be a complex interaction among social economics, demographic factors, and cancer outcomes. In general, the results need a large amount of research evidence from the real world.
Since the era of rituximab arrived, the outcomes of DLBCL have been improved. Previous research recommended chemotherapy as the front-line treatment for PGI-DLBCL while surgery was conducted to relieve tumorrelated complications or a make diagnosis [16]. Several case reports found that surgical intervention for gastric DLBCL showed a better prognosis [17,18]. Here, the constructed nomogram confirmed that surgery intervention had no significant association with the prognosis of PG-DLBCL patients. However, the role of radiation therapy in     Journal of Oncology DLBCL was limited in combined modality therapy for DLBCL. Recent evidence demonstrated that selected patients with DLBCL had significantly better outcomes when radiation treatment was added to immunochemotherapy; and Koiwai et al. reported that application of decreased radiation dose might be effective for localized DLBCL patients who showed a good response to chemotherapy [19,20]. Our research also found radiation therapy might impact the prognosis of patients with PG-DLBCL, but it just played a limited part. Even if chemotherapy was the key to improving the prognosis, treatment regimens were unclear in this retrospective study.
To our knowledge, this is currently the largest retrospective case series of PG-DLBCL with the aim of getting a prognostic model to predict OS and CSS. However, it needs further validation by way of large randomized controlled trials. As the data source, SEER, did not provide the IPI scores of the patients, we cannot compare this nomogram with the IPI scoring system. In addition, the results of this study ignored the genetic characteristics, which are now proved to be important in the diagnosis and prognosis of the disease [21,22]. Even so, our study remains an instructive and efficient model of PG-DLBCL prognosis.

Conclusions
We have developed and validated a novel nomogram for predicting OS and CSS in patients with PG-DLBCL, which has never been investigated before. e parameters in the model are routinely evaluated and easily adopted in the clinic, assisting clinicians in making predictions about individual patient survival and providing improved treatment strategies.

Data Availability
Publicly available datasets were analyzed in this study. ese data are available in Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/). e datasets generated in this study are available from the corresponding authors upon request.

Additional Points
All the statistical analyses were performed using R version 3.4.2 software (the R Foundation for Statistical Computing, Vienna, Austria. http://www.r-project.org).

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Jiafeng Lin and Guangrong Lu conceptualized and designed the study. Jialin Pan was responsible for conception, design, and quality control of this study. He Huang performed data extraction and statistical analyses and edited the manuscript. Zijian Lin and Yejiao Ruan participated in data extraction and statistical analyses. All authors have read and approved the final version of the manuscript. Jiafeng Lin and Jialin Pan contributed equally to this work.