Establishment and Validation for Predicting the Lymph Node Metastasis in Early Gastric Adenocarcinoma

Lymph node metastasis (LNM) is considered to be one of the important factors in determining the optimal treatment for early gastric cancer (EGC). This study aimed to develop and validate a nomogram to predict LNM in patients with EGC. A total of 842 cases from the Surveillance, Epidemiology, and End Results (SEER) database were divided into training and testing sets with a ratio of 6 : 4 for model development. Clinical data (494 patients) from the hospital were used for external validation. Univariate and multivariate logistic regression analyses were used to identify the predictors using the training set. Logistic regression, LASSO regression, ridge regression, and elastic-net regression methods were used to construct the model. The performance of the model was quantified by calculating the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals (CIs). Results showed that T stage, tumor size, and tumor grade were independent predictors of LNM in EGC patients. The AUC of the logistic regression model was 0.766 (95% CI, 0.709–0.823), which was slightly higher than that of the other models. However, the AUC of the logistic regression model in external validation was 0.625 (95% CI, 0.537–0.678). A nomogram was drawn to predict LNM in EGC patients based on the logistic regression model. Further validation based on gender, age, and grade indicated that the logistic regression predictive model had good adaptability to the population with grade III tumors, with an AUC of 0.803 (95% CI, 0.606–0.999). Our nomogram showed a good predictive ability and may provide a tool for clinicians to predict LNM in EGC patients.


Introduction
Gastric cancer, the third leading cause of cancer death in the world, is responsible for more than 1 million new cases each year [1]. Morbidity and mortality of gastric cancer were higher in East Asia, East Europe, and South America [2][3][4][5]. In addition, approximately half of the estimated deaths from gastric cancer in 2018 occurred in China [1]. Early gastric cancer (EGC) is defined as gastric cancer confined to the lamina propria or mucosa and submucosa, regardless of the size or presence of regional lymph node metastasis (LNM) [6]. LNM is the most common form of gastric cancer metastasis and a major contributor to the high mortality. In the TNM staging system of gastric cancer, LNM was used to guide the treatment plan, and the prognosis was predicted by the number of pathologically positive lymph nodes and the exact stage of the disease [7]. e main treatment methods for EGC include endoscopic mucosal resection (EMR) or endoscopic submucosal dissection (ESD), wedge resection, laparoscopically assisted gastrectomy, and open gastrectomy [8,9]. Compared with other treatment methods, EMR and ESD can preserve gastric function and maintain quality of life [10,11]. However, the absence of LNM is a prerequisite for EMR and ESD [12]. erefore, a tool that can predict LNM in EGC patients was of great significance for surgical methods selection and of patients' prognosis. Several studies have established nomograms for LNM in patients with EGC [6,13,14]. However, these studies had some limitations, such as small sample size, single-center research, and no external validation. In addition, there were few studies on the predictive effect of LNM on EGC patients in different populations.
Herein, we selected the predictor variables of LNM in EGC patients based on the Surveillance, Epidemiology, and End Results (SEER) database. en, a nomogram to predict the LNM in EGC patients was developed, and external validation was performed to assess the fit of the model.

Study Design and Population.
Data were extracted from the SEER database, which is a national sample of the population-based cancer database proposed by the National Cancer Institute. e SEER database covers approximately 28% of the entire American population. All patients with gastric adenocarcinoma were extracted from the SEER database from 2015 to 2020. For external validation, 494 patients who had been diagnosed with EGC were collected from the Xiangya Hospital Center South University between January 2012 and December 2019. Tumors were staged based on the criteria of the American Joint Committee on Cancer (AJCC) Staging Manual (7th), and EGC in this study included Tis, T1a, and T1b [15]. is study was approved by the Institution Review Board of the Xiangya Hospital Center South University (approval number: 2019030510), and all patients provided written informed consent.

Inclusion and Exclusion Criteria.
Patients who met the following inclusion criteria were eligible for inclusion: (1) patients' age ≥18 years; (2) patients who were diagnosed by histopathology as stage Tis, T1a, or T1b gastric adenocarcinoma; (3) patients with complete baseline data and pathological data. e exclusion criteria were as follows: (1) patients with no surgical resection or microscopic evaluation of lymph nodes; (2) patients who received radiotherapy or chemotherapy before surgery; (3) patients with metastasis at the time of diagnosis; (4) patients with other gastric tumors (neuroendocrine, gastrointestinal stromal tumors or metastatic disease); (5) patients with a history of other malignancies.

Data Collection.
Demographic and clinical data included the patient's age, gender, T stage, primary site, tumor size, tumor grade, and LNM. e tumor stage was assigned to Tis, T1a, and T1b stages. Tumor size was divided into <1 cm, 1-2 cm, 2-3 cm, 3-4 cm, and ≥4 cm. LNM was used as an outcome indicator.

Statistical
Analysis. Data were extracted from the SEER database using SEER * Stat data retrieval software (version 8.3.2). e data were divided into the training set and test set in a 6 : 4 ratio. e clinical practice data were used for external validation. Continuous variables with normal or approximately normal distribution were expressed as mean ± standard deviation (SD), and a t-test was used for comparison between groups. Nonnormal variables were expressed as M (Q1, Q3), and the Wilcoxon rank-sum test was used for comparison between groups. Categorical variables were expressed in numbers and percentages, and the Chi-square test (χ 2 ) or Fisher's test was used for comparison between groups.
Univariate analysis and multivariate logistic regression analysis were used to select prediction variables and establish the prediction model. Logistic regression, LASSO regression, ridge regression, and elastic-net regression methods were used to construct the model. Meanwhile, the nomogram of the prediction model was drawn, and the Hosmer-Lemeshow goodness of fit test was performed on the predictive model.
e performance of the model was quantified by calculating the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals (CI), as well as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
All statistical analyses and drawings were carried out using the R software (version 4.0.2). e caret package was used to normalize the data, and the relevant parameters for modeling were lambdas <-seq (0.0001, 0.01, length. out-� 200). e glmnet package was utilized to construct the LASSO regression, ridge regression, and elastic-net regression models, and threefold cross-validation was performed. Others R packages such as compareGroups, ResourceSelection, rms, and pROC were also used. All tests were twosided, and the test level was α � 0.05.

Baseline Characteristics.
Totally, 842 cases from the SEER database and 494 cases from clinical practice were included in this study ( Figure 1). Among these 842 patients, the mean age was 69.4 ± 11.3 years, with 485 (57.60%) patients being males. e primary location of the tumor was mostly the lower part of the stomach (54.04%) and the middle part of the stomach (35.99%). e numbers of patients with LNM in the SEER database and clinical dataset were 176 (20.9%) and 133 (26.92%), respectively. More detailed characteristics were shown in Table 1. Table 2 shows the characteristics of patients with and without LNM. e results indicated the significant differences between patients with and without LNM in T stage, tumor size, and tumor grade (all P < 0.001). e incidence of LNM was higher in T1b stage patients than in T1a patients (P < 0.001). LNM was more likely to occur in tumors larger than 2 cm than in smaller tumors (P < 0.001). Tumor grade higher II grade was associated with higher LNM (P < 0.001).

Factors Associated with LNM in EGC Patients.
e univariate and multivariate logistic regression analyses were shown in Table 3. e multivariate logistic regression analysis indicated that T stage, tumor size, and tumor grade (1) Without T stage information (n=20); (2) Without grade information (n=9). Exclusion: (1) No examination for lymph nodes (n=366); (2) Radiotherapy or chemotherapy were given before surgical resection (n=8);

Model Comparison and Selection.
Logistic regression, LASSO regression, ridge regression, and elastic-net regression models were established. Table 4 presents the AUC of these models in the training set and test set. e AUC of the logistic regression, LASSO regression, ridge regression, and elastic-net regression models in the testing set was 0.766 (95% CI, 0.709-0.823), 0.740 (95% CI, 0.681-0.799), 0.737 (95% CI, 0.676-0.797), and 0.749 (95% CI, 0.691-0.807), respectively. ere was no significant difference between the AUCs of these models (P > 0.05). e AUC of the logistic regression model was slightly higher than that in the other models, and the results were easier to interpret clinically. erefore, the logistic regression model was chosen. e Hosmer-Lemeshow goodness of fit test showed good calibration (χ 2 � 3.916, P � 0.917) of this prediction model. However, when external validation was performed using clinical practice data, the AUC of the model was 0.625 (95% CI, 0.537-0.678), implying that the model did not adapt to the external validation data ( Figure 2, Table 5).

Nomogram for Prediction of LNM in EGC Patients.
en, a nomogram to predict the LNM in EGC patients was drawn based on the logistic regression model. e nomogram can predict the probability of developing LNM in EGC patients by using the sum of the scores determined on the point scale for each variable (Figure 3(a)). An example of the use of this nomogram was as follows: a patient in the SEER database was randomly selected. e patient with the tumor grade III, stage T1b, and tumor size ≥4 cm. e total score of this patient calculated by the nomogram was 243 points, and the possibility of developing LNM was 0.472. After verification, the patient had LNM, and the prediction was successful ( Figure 3(b)).

Further Validation Based on Different Populations.
Further validation was performed based on gender, age, and tumor grade (Table 6). In the test set, this logistic regression prediction model had a good prediction effect on males, females, patients with age ≥65 years, age <65 years, grade I

Discussion
In this study, a nomogram for LNM in EGC patients was established based on the SEER database, and external validation was performed by using clinical practice data. Factors associated with LNM in EGC patients such as T stage, tumor size, and tumor grade were included in the nomogram. e AUC, sensitivity, and NPV of the prediction model were 0.766, 0.899, and 0.951, respectively. However, the AUC of the external validation data was 0.625, implying a poor fit for the external population. In addition, further validation was performed based on different populations, and the results showed that the prediction model had good adaptability to the population with grade III tumors, with an AUC of 0.803. Predicting LNM is of great significance in EGC patients, especially in the choice of treatment methods. Some models have been developed to predict the possibility of LNM in gastric cancer [6,16]. Chen et al. establish a nomogram to predict the LNM of patients with gastric cancer using some variables such as Boarrmann type, preoperative CA199 level, T stage, and N stage, with an AUC of 0.786 [17]. Eom et al. showed that the prediction performance of conventional models established based on tumor size, histological type, lymphatic blood vessel invasion, and depth of invasion was not enough. e predictive performance of the model can be significantly improved by adding some biomarkers such as CD44v6 and α1 catenin to these models [18]. However, most prediction models were developed using a small sample population, or without external validation and advanced gastric cancer population. Our prediction model was established based on the SEER database, and clinical practice data were used for external validation. e AUC   Our results showed that LNM was associated with T stage, tumor size, and tumor grade. Similar results were found in the study of Pokala et al. Tumor stage, grade, and size were independent predictors of LNM [13]. Previous studies have proposed that the T stage was the independent risk factor for LNM [19][20][21]. Tumor size was a risk factor for LNM in gastric cancer shown in many studies; a larger tumor size was correlated with a higher possibility of LNM [16,22,23]. Our results presented that the risk of LNM in patients with tumor sizes of ≥4 cm was 5.75 times higher than that in patients with tumor sizes <1 cm. Furthermore, T stage, size, and grade can be used to estimate the incidence of  LNM in patients with early gastric adenocarcinoma and to help discuss the risks of different treatment modalities [13,24]. Previous studies have shown that the prevalence of LNM in EGC patients ranges from 7.7 to 19.4% [21,25,26], and most patients underwent excessive surgery and suffered from morbidity [27]. In this case, pretreatment diagnosis of LNM status was very helpful to avoid the high morbidity and mortality of the lymphadenectomy caused by the overtreatment of patients [28]. erefore, a nomogram that can predict LNM in patients with EGC has important clinical significance. A study by Pokala et al. indicated that patients with early gastric adenocarcinoma should be consulted on appropriate treatment options, and the impact of adverse oncological outcomes that may result from endoscopic treatment on surgical morbidity and quality of life related to major organ resection should be weighed [13].
We developed a nomogram to predict LNM in patients with EGC based on the SEER database and externally validated the model using clinical practice data. When the external validation data did not fit the nomogram, we conducted further validation based on different populations.
is tool to predict the likelihood of LNM in EGC patients may help clinicians make surgical decisions. However, this study has some limitations. First, our external validation data did not fit the nomogram, which may be the difference between different races. Second, tumor ulceration [6,29], lymphovascular invasion [6,29], and lymph node  Journal of Healthcare Engineering involvement by endoscopic ultrasound [30] have been reported to be associated with LNM in some studies, but these data lacked in the SEER database.

Conclusion
A nomogram to predict the LNM in patients with EGC was developed based on the SEER database. Patients with higher T stage and tumor grade and larger tumor size were more likely to develop LNM. is tool can predict the possibility of LNM in EGC patients, which may help clinicians to make surgical decisions.
Data Availability e datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.