New Personal Model for Forecasting the Outcome of Patients with Histological Grade III-IV Colorectal Cancer Based on Regional Lymph Nodes

Background Metastases at regional lymph nodes could easily occur in patients with high-histological-grade colorectal cancer (CRC). However, few models were built on the basis of lymph nodes to predict the outcome of patients with histological grades III-IV CRC. Methods Data in the Surveillance, Epidemiology, and End Results databases were used. Univariate and multivariate analyses were performed. A personalized prediction model was built in accordance with the results of the analyses. A nomogram was tested in two datasets and assessed using a calibration curve, a consistency index (C-index), and an area under the curve (AUC). Results A total of 14,039 cases were obtained from the database. They were separated into two groups (9828 cases for constructing the model and 4211 cases for validation). Logistic and Cox regression analyses were then conducted. Factors such as log odds of positive lymph nodes (LODDS) were utilized. Then, a personalized prediction model was established. The C-index in the construction and validation groups was 0.770. The 1-, 3-, and 5-year AUCs were 0793, 0.828, and 0.830 in the construction group, respectively, and 0.796, 0.833, and 0.832 in the validation group, respectively. The calibration curves showed well consistency in the 1-, 3- and 5-year OS between prediction and reality in both groups. Conclusion The nomogram built based on LODDS exhibited considerable reliability and accuracy.


Introduction
Colorectal cancer (CRC) is known as having the third highest incidence, regardless of gender [1], and the predominant type is adenocarcinoma [2]. Te situation of cancer is an infuencing factor for the outcome of patients with CRC [3][4][5]. Several studies reported that patients with CRC of diferent histological grades showed diferent outcomes [6][7][8][9][10]. High histological grades increased the likelihood of bowel obstruction before surgery [11]. Moreover, metastases at regional lymph nodes could more easily occur in patients with CRC with a higher histological grade [12]. Te prognosis could also depend on the condition of regional lymph nodes [5]. Terefore, to ensure lymph node dissection has the desired efect, at least 12 regional lymph nodes should be obtained during surgery, according to the guidelines [13].
In the tumor (T), node (N), and metastasis (M) staging system of the American Joint Committee on Cancer (AJCC), pathologically positive regional lymph nodes were used as a criterion for stratifying patients [5]. Te indicators involved in this system are intuitive and easy to obtain. However, the number of examined regional lymph nodes (ELNs) in the N stage was not considered. Hence, the lymph node ratio (LNR) was proposed as a supplement to the N staging system [14].
LNR refers to the proportion of positive regional lymph nodes (PLNs) in ELNs. It was reported to be a good predictor of outcomes in several kinds of cancer [15][16][17][18]. What's more, a study focused on right colon cancer pointed out that LNR was a potentially valuable factor in predicting the chance of tumor recurrence [15]. However, when the number of PLNs is 0, the value of LNR does not change regardless of the number of ELNs. Log odds of positive lymph nodes (LODDS), which refers to the logarithm of result of the PLNs divided by negative lymph nodes (NLNs), showed better prediction ability than LNR in several studies [19][20][21]. LODDS could make up for the defciency in LNR. Nowadays, few tools built based on LNR or LODDS could be used to evaluate the overall survival (OS) of patients with poorly diferentiated or undiferentiated CRC, even though metastases at regional lymph nodes could easily occur in these patients [12].
By utilizing the Surveillance, Epidemiology, and End Results (SEER) database [22], the prognostic factors for patients with histological grades III-IV CRC were explored on the basis of clinical factors, including LNR and LODDS. A personalized prediction model that could be used to make clinical decisions was further constructed and validated.

Data Sources.
Cases in the SEER database were used in this study (November 2020 Submission deltails could be acquire at the website: https://seer.cancer.gov/data-software/ documentation/seerstat/nov2020/). In this database, the sources of the cases cover all 18 states in the USA [22].

Included Participants.
Te inclusion criteria were as follows: (1) patients aged 18-80 years; (2) diagnosis was confrmed by positive histology; (3) clinical and follow-up data of patients were completed and available; (4) poorly diferentiated or undiferentiated CRC with histological grade III-IV; (5) patients with one primary cancer only. Te exclusion criteria were as follows: (1) patients with an autopsy or death certifcate only; (2) patients whose overall survival times were less than 1 month; (3) patients with two or more primary cancers. Multiple primary cancer refers to cancer with a site and histological type diferent from those of frst primary cancer, according to a previous study [23].

Variates and Defnitions.
In this study, demographic information (age and gender) and the characteristics of cancer (primary location, histological type, histological grade, AJCC TNM stage, LNR, and LODDS) were considered. Age was categorized into three levels following a previous study [24]: <45, 45-60, and >60 years. Te information from the primary site was recoded on the basis of the second edition of the International Classifcation of Diseases for Oncology (ICD-O-2). Te primary site was divided into the right colon (from the cecum to the transverse colon, but the appendix was excluded), the left colon (from the splenic fexure, descending to the sigmoid colon), and the rectum (rectosigmoid junction and rectum). Histologic codes 8140-8389 were identifed as adenocarcinoma, 8480-8481 were defned as mucinous adenocarcinoma/mucinproducing adenocarcinoma (AM/MPA), and 8490 were defned as signet ring cell carcinoma (SRCC). Te histologic codes were coded on the basis of ICD-O-2. Poorly diferentiated cancer was defned as histological grade III, and undiferentiated cancer was defned as histological grade IV. NLNs were calculated using the following formula: NLNs = ELNs − PLNs. Te value of LNR in every case was calculated in accordance with the formula LNR = PLNs/ ELNs [15][16][17][18]. Te value of LODDS in every case was calculated as follows: LODDS = log ((PLNs + 0.5)/ (NLNs + 0.5)) [20]. Te cutof values of LNR, ELNs, and NLNs were decided on the basis of the Kaplan-Meier method. On the basis of these cutof values, LNR, ELNs, and NLNs were divided into two subgroups. LODDS was divided into three levels following Lee et al. [25]: <−1.3222, from −1.3222 to −0.5863, and >−0.5863. Survival months were calculated as survival months = FLOOR ((endpoint − date)/ days in a month)), as defned in the SEER database (details could be acquired at website: https://seer.cancer.gov/survivaltime). OS refers to the time from the day of diagnosis to the day of death.

Risk Factors.
A seven-to-three ratio was used to randomly divide all cases into construction and validation groups. Te cases in the two groups were then compared. Te mean and standard deviation (SD) were used to describe the continuous variables. Logistic regression analyses were conducted sequentially for the initial screening of risk factors associated with patients' OS [26], and the least absolute shrinkage and selection operator (LASSO) regression algorithms were utilized. A cross-validation was also performed to explore the optimal tuning parameters (λ), and the most signifcant variables were screened out. Moreover, a 95% confdence interval (CI) and odds ratio (OR) were used to quantify the efect of features on OS. Ten, a generalized linear model was constructed. A forest plot was drawn to display the model visually. Te receiver operating characteristic curve (ROC) and area under the curve (AUC) were obtained in the construction and validation groups to evaluate the model's predictive accuracy. Te AUC values ranged from 0.5 to 1.0; the larger the AUC, the more reliable the model. Cox regression analyses were performed subsequently [26]. Te hazard ratio (HR) and its 95% CI were applied to quantify the results. Schoenfeld's global test [27] was used to verify whether the variables conformed to the proportional hazard (PH) assumption. Deviance residual diagrams were used to evaluate the distribution of data in each variable.

Nomogram Construction and Validation.
By referring to the results of the above analyses, a nomogram was developed. Nomogram is known as a reliable tool to predict prognosis, and it displays risk factors visually. Te concordance index (C-index) was separately calculated in the two datasets. Furthermore, 1-, 3-, and 5-year ROC analyses were performed, and AUCs were calculated to assess the nomogram's predictive accuracy. Te calibration curves in the two groups were obtained via 1000 resamples bootstrapping method to test the consistency between the prediction of the established model and reality.
2.6. Statistical Analysis. SEER * Stat (version 8.4.0) was used to collect data. Categorical variables were coded numerically and tested using the chi-square test or Fisher exact test, while continuous variables were tested using ANOVA to describe the characteristics between the two groups. Logistic and Cox regression analyses were conducted for variable selection [26]. Te C-index, ROC, AUC, and calibration curves in the two groups were calculated or plotted. All the analyses and fgures were performed or plotted using R software (version 4.1.2, https://www.r-project.org/). Packages such as "survival," "survminer," "caret," "tableone," "glmnet,"     Figure 2: Te cutof values of examined and negative regional lymph nodes and lymph node ratios. (a) Te cutof value of examined regional lymph nodes; (b) the cutof value of negative regional lymph nodes; (c) the cutof value of the lymph node ratio. "forestplot," "pROC," "ezcox," and "timeROC," were used in this study. P values (two-sided) � 0.05 were considered statistically signifcant.

Characteristics of Patients Identifed.
A total of 14,039 cases were downloaded from the SEER database and divided randomly into construction (9828 cases) and validation (4211 cases) groups. Te process of patient selection is shown in Figure 1.
Te patients were separated into two subgroups in accordance with their LNR status as low (≤0.24) and high (>0.24), their ELN status as low (≤11) and high (>11), and their NLN status as low (≤9) and high (>9), respectively. Te characteristics of the cases in the two groups are listed in Table 1. More than half of them were older than 60 years (7839 cases, 55.8%); male (7167 cases, 51.1%); at the AJCC T3 stage (7775 cases, 55.4%); and located in the right colon (7645 cases, 54.5%). Most of them were white people (11,219 cases, 79.9%); at the AJCC M0 stage (10761 cases, 76.7%); grade III (11,721 cases, 84.5%); LNR ≤0.24 (9507 cases, 67.7%); without bone metastases (13,938 cases, 99.3%); without brain metastasis (14,004 cases, 99.8%); without liver metastasis (11,953 cases, 85.1%); without lung metastasis (13,913 cases, 97.0%); high ELNs (12,563 cases, 89.5%); and high NLNs (10,810 cases, 77.0%). Te pathological tissue type with the largest proportion was adenocarcinoma (12,112 cases, 84.5%). A total of 5418 cases (38.6%) resulted in death, while 8621 (61.45%) cases were alive in this study. Te survival time was 33.28 months (SD = 22.82 months) in total, with33.49 months (SD = 22.95 months) in the construction group and 32.80 months (SD = 22.51 months) in the validation group. No statistical diference was found among all variables between the two groups. Te detailed results of the logistic regression analyses are shown in Table 2. Patients aged 45-60 and beyond 60 years, female, black, right colon, T3 stage, T4, M1, AM/MPA, SRCC, LODDS from −1.3222 to −0.5863, LODDS ≥−0.5863, high NLN, bone metastasis, and liver metastasis were preliminarily identifed (P < 0.05). Subsequently, a generalized linear model was built, as shown in Figure 3(c). Te ROCs were drawn, and the corresponding AUC values were calculated to assess the reliability of the established model, as shown in Figures 3(d) and 3(e). Te AUC values in the construction and validation groups were 0.821 and 0.818, respectively, indicating that the established model had a high degree of predictive capacity. Te Cox regression analyses were performed for further exploration (Table 3). Schoenfeld's global test was also conducted, and the results are shown in Figures 4(a) and 4(b).

Exploration of Factors for Patients with Histological
Age, sex, primary site, and NLN did not conform to the PH assumption (P < 0.05) and were thus excluded from the following analyses. Te remaining variables, including race, T, M, histological type, LODDS, liver metastasis, and bone metastasis, conformed to the PH assumption (P < 0.05). Te deviance residual diagram in Figure 4(c) indicated that the residuals of all variables involved in the nomogram were in a symmetric pattern and had a constant, uniform spread throughout the ft. Te results of multivariate Cox regression analysis showed that black race, T3, T4, M1, SRCC, LODDS from −1.3222 to −0.5863, LODDS ≥−0.5863, NLN, metastasis at the bone, and metastasis at the liver, resulted in a worse outcome, whereas other race patients led to an enhanced outcome (P < 0.05).

Construction and Verifcation of Nomogram.
A nomogram was constructed, as shown in Figure 5.
Te C-index of this nomogram in the construction and validation groups was 0.770. Te results of 1-, 3-, and 5-year   Journal of Oncology

Discussion
CRC has the third highest incidence among cancers [1]. Even though nearly 75% of patients with CRC could be potentially treated by surgery [28], CRC still ranks third in the highest mortality among cancers, and it continues to seriously endanger human health. Terefore, clinicians need to estimate the outcome and decide on subsequent treatment.
Te TNM system is a common staging system in the diagnosis and treatment of patients with CRC [5]. Tis system stages cancer based on three aspects: the degree of cancer invasion, metastasis at regional lymph nodes, and the invasion situation of a distant organ. Tis system is simple and easy to use, but it still has its shortcomings. Several studies reported that ELNs during surgery could infuence the prognosis of patients with CRC [29,30]. Le Voyer et al. reported that the outcome of patients with ELNs of more than 40 was obviously better than that of patients with ELNs of less than 10 [29]. One explanation is that insufcient ELNs obtained during surgery could directly impair the accuracy of tumor staging [29], thus infuencing the choice of subsequent treatment options. A study reported that the more ELNs obtained, the better the prognosis of patients with CRC, and at least 20 ELNs should be obtained during surgery [31]. Guidelines also recommended that at least 12 ELNs should be obtained during surgery [13]. However, the AJCC TNM staging system does not take ELNs into consideration. Tus, LNR and LODDS should be introduced [14,32,33].
LNR is the proportion of PLNs that make up ELNs and has been reported as a good predictor of outcomes in several kinds of cancer [15][16][17][18]. One study that focused on right colon cancer pointed out that LNR is a potentially valuable factor in predicting the probability of tumor recurrence [15]. However, LNR also has its inherent shortcomings. When the number of PLNs is 0, the value of LNR does not change regardless of the number of ELNs. LODDS, which refers to the logarithm of the result of PLNs divided by NLNs, showed better prediction ability than LNR in several studies [19][20][21]. Even when the PLN is 0, LODDS could diferentiate patients in accordance with diferent ELNs. A research study reported that LODDS could be a potential factor in predicting the outcome of patients with CRC [34]. Arslan et al. further indicated that the LODDS classifcation showed better prediction ability in patients with ELNs less than 12 during surgery [35]. Additional studies are needed to explore which one is better.
A total of 14,039 cases of histological grades III-IV CRC were downloaded from the SEER database and randomly divided into two groups for model construction and  Journal of Oncology 9 validation. LODDS was identifed in the logistic and Cox regression analyses. Meanwhile, LNR and the AJCC N staging system did not show a signifcant association with patient OS. Finally, a nomogram was created to visualize the results. Tis nomogram was built on the basis of LODDS, and it showed well prediction efciency. Tis result is consistent with previous discussions. Tis study has limitations. First, the cutof value of LODDS was decided on the basis of its tertiles, as in the research conducted by Lee et al. [25]. Terefore, an optimal      14 Journal of Oncology cutof value should be further explored through follow-up studies to improve the reliability of this predictive model. Second, all cases involved in this study were downloaded from the SEER database. Cases from additional sources must be verifed to improve the accuracy of the model.

Conclusion
LODDS was found to be a valuable predictive factor, and it showed better predictive ability for the OS of patients with histological grades III-IV CRC than LNR. Race, AJCC T stage, AJCC M stage, LODDS,histological type, bone metastasis, and liver metastasis were selected as isolated factors to construct a nomogram. Te nomogram performed well in both groups. All variables involved in the nomogram were easily obtained in the clinical diagnosis and treatment of patients with CRC. Te nomogram could provide a certain reference for doctors to assess the outcome of patients with histological grades III-IV CRC and choose subsequent treatment.

Data Availability
Te primary data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest. Journal of Oncology 15