Preoperative Risk Assessment of Lymph Node Metastasis in cT1 Lung Cancer: A Retrospective Study from Eastern China

Background Lymph node status of clinical T1 (diameter ≤ 3 cm) lung cancer largely affects the treatment strategies in the clinic. In order to assess lymph node status before operation, we aim to develop a noninvasive predictive model using preoperative clinical information. Methods We retrospectively reviewed 924 patients (development group) and 380 patients (validation group) of clinical T1 lung cancer. Univariate analysis followed by polytomous logistic regression was performed to estimate different risk factors of lymph node metastasis between N1 and N2 diseases. A predictive model of N2 metastasis was established with dichotomous logistic regression, externally validated and compared with previous models. Results Consolidation size and clinical N stage based on CT were two common independent risk factors for both N1 and N2 metastases, with different odds ratios. For N2 metastasis, we identified five independent predictors by dichotomous logistic regression: peripheral location, larger consolidation size, lymph node enlargement on CT, no smoking history, and higher levels of serum CEA. The model showed good calibration and discrimination ability in the development data, with the reasonable Hosmer-Lemeshow test (p = 0.839) and the area under the ROC being 0.931 (95% CI: 0.906-0.955). When externally validated, the model showed a great negative predictive value of 97.6% and the AUC of our model was better than other models. Conclusion In this study, we analyzed risk factors for both N1 and N2 metastases and built a predictive model to evaluate possibilities of N2 metastasis of clinical T1 lung cancers before the surgery. Our model will help to select patients with low probability of N2 metastasis and assist in clinical decision to further management.


Introduction
Preoperative staging of patients with malignant lung cancer suggests the prognosis and the life quality afterwards. An accurate clinical staging can guide physicians to choose a proper treatment according to the authorized guideline and therefore standardizes the management procedure. Especially for those with positive mediastinal lymph nodes (N2 disease), preoperative chemotherapy is reported to reduce tumor size by 25% [1], downstage nearly half of the N2-positive patients [2][3][4][5][6], and increase the 5-year survival rate of 5-20% compared with surgery alone [7][8][9][10][11]. In that case, the accuracy of TNM staging before surgery is of paramount important.
The European Society of Thoracic Surgeons (ESTS) guidelines compared the diagnostic accuracy of different preoperative examinations for lymph node evaluation. Computed tomography is common and available in most countries, despite its low sensitivity (55%) and specificity (81%) [12,13]. PET-CT scan is reported to be superior to CT in mediastinal lymph node staging and exhibits a high negative predictive value (NPV) for peripheral tumors. The sensitivity of PET-CT is 80-90%, and the specificity is 85-95% [12,13]. However, PET-CT requires more expensive facilities and is not as popularized as CT. Besides, the negative predictive value of PET-CT decreases in patients with central tumors, tumors > 3 cm, and suspected N1 metastasis [12].
Reported data shows the prevalence of occult N2 disease in patients with clinical stage I NSCLC is about 5.0-6.5% [14,15]. In order not to omit this part of patients, a predictive model in combination of assisted examination is needed and previous efforts have been made by researchers. In this study, we aim to analyze the clinical features of patients with lymph node metastasis and create a predicted formula of N2 metastasis for clinical T1 lung cancers.

Patients.
We retrospectively reviewed patients who were diagnosed with lung cancer and underwent radial surgical recession in Second Affiliated Hospital of Zhejiang University (SAHZU) during 2011-2016. Patients with a malignant nodule within 3 centimeters on CT (staged as cT1) were selected, all of which underwent lymph node evaluation via surgical operation. The exclusion criteria were as follows: (1) patients with multiple pulmonary cancers or metastatic pulmonary nodules, (2) patients with a history of preoperative therapy, and (3) patients without CT scan images before surgery. Patients from 2011 to 2015 were enrolled in the development group (n = 924), while patients from 2016 were included in the validation group (n = 380), as shown in Figure 1. This study was approved by the Institutional Ethics of Committee of SAHZU (2017-031).

Clinicopathological
Variables. All the clinicopathological information was collected in the hospital information system (HIS). Information included gender, age, symptoms at presentation, smoking history, smoking index, chronic pulmonary diseases, cancer history, family history of cancer, levels of tumor markers within one month before surgery, histological type of lung cancer, pathological report of resected lymph nodes, tumor location (upper/middle/lower lobe, central/peripheral location), tumor size, consolidation size, C/T ratio (consolidation size/tumor size), and clinical N stage based on CT. Chronic pulmonary diseases included chronic bronchitis, emphysema, and chronic obstructive pulmonary disease (COPD). Tumor size was measured as the largest dimension on CT section in pulmonary window while consolidation size was measured in mediastinal window. Tumors were defined as peripherally located if the center of tumor mass was in the outer one-thirds of pulmonary parenchyma and otherwise as centrally located. A lymph node was considered an enlarged one when its short axis exceeded 1 cm. The seventh edition of TNM classification was referred to in this study.

Data Analysis.
All the continuous variables were described with means and standard deviations, while categorical variables were described with frequencies. In univariate analysis, we performed one-way analysis of variance for continuous variables and Pearson's chi-square tests (adjusted p values using Bonferroni method) for categorical variables. Significant variables in the univariate analysis were further analyzed in multivariate analysis using polytomous logistic regression, in order to estimate different risk factors and odds ratios for each N stage (pN0, pN1, and pN2).
The dichotomous logistic regression was performed to build a predictive model for N2 metastasis, since N2 metastasis is worse in TNM staging and requires different preoperative treatment strategies. All variables collected from HIS were analyzed with forward stepwise selection, which was based on statistics of a conditional likelihood ratio test. A significant p value for entering variables was 0.05, and the p value for excluding variables was 0.10. The optimal cutoff point of the model was set according to the highest Youden's index. A nomogram was developed using the package of rms based on the logistic regression. In addition, calibration of the model was established with the Hosmer-Lemeshow goodness-of-fit test as well as the calibration curve, and the discrimination ability of the model was assessed by receiver operating characteristic (ROC)  analysis. The DeLong test was performed for the comparison of different ROC curves.

Clinicopathological Characteristics for Patients in the
In univariate analysis (Table 1), lymph node metastasis was prone to be found in smoking males who suffered from chronic pulmonary diseases and were hospitalized with respiratory-or cancer-related symptoms (RCRS) and higher levels of carcinoembryonic antigen (CEA). Tumors with larger size (or consolidation size), central location, and lymph node enlargement on CT images were associated with higher likelihood of lymph node metastasis. Besides, patients with squamous carcinoma were more likely to have N1 metastasis, while N2 metastasis in patients with adenocarcinoma was three times more likely to occur than N1 metastasis.

Odds Ratios of N1 and N2 Metastases versus N0
Status. In polytomous logistic regression (Table 2), significant variables in univariate analysis were further analyzed to estimate the risk factors and odds ratios of nodal metastasis stratified by the 7 th TNM staging. Significantly elevated odds ratios were seen in tumors with larger consolidation size and lymph node enlargement on CT for N1 metastasis (OR consolidation size = 5:449, 95% CI: 2.817-10.541; OR lymph node enlargement on CT = 11:424, 95% CI: 3.316-39.360) and N2 metastasis (OR consolidation size = 8:640, 95% CI: 5.002-14.923; OR lymph node enlargement on CT = 8:703, 95% CI: 4.326-17.509) compared to N0 status. A significantly decreased odds ratio was seen in smokers for N2 metastasis (OR smoking history = 0:217, 95% CI: 0.080-0.590) compared to N0 status in nonsmokers. Tumors with a central location seemed to have a negative correlation with N2 metastasis though there was no significant difference.

Logistic Regression Model and Predictors of N2 Metastasis.
Dichotomous logistic regression identified five independent predictors for N2 metastasis: peripheral location, consolidation size, lymph node enlargement on CT, no smoking history, and levels of serum CEA (Table 3). Gender, histological type, and C/T ratio were not involved as significant factors. The formula predicting N2 metastasis for small tumor nodules was established: e x /ð1 + e x Þ, x = −0:756 × central location + 1:921 × consolidation size + 2:145 × lymph node enlargement on CT − 1:065 × smoking history + 0:064 × CEA level − 6:165. The unit for "consolidation size" is cm and for "CEA level" is ng/ml. The value of "lymph node enlargement on CT," "central location," and "smoking history" should be 1 for yes and otherwise 0. A nomogram predicting the probability for N2 metastasis in cT1 patients was developed on the basis of multivariate logistic analysis ( Figure 2).
The Hosmer-Lemeshow goodness-of-fit test, which was not statistically significant (p = 0:839), indicated that the predicted probability was of high concordance to the observed probability. A calibration curve is shown in Figure 3. The area under the receiver operating characteristic curve was 0.931, with 95% confidence interval between 0.906 and 0.955 (Figure 4(a)). We selected the numerical value with the highest Youden's index as our cutoff point for the predicted probability (cutoff for probability = 7:43%).

Validation of the Model and Comparison with Previous
Models. The characteristics of patients in the validation group were shown in Supplementary Table 1. In the external validation, the AUC of our model was 0.906 (95% CI: 0.857-0.956, Figure 4(b)). With the cutoff point set above (cutoff = 7:43%), we tested our model in the validation group. The sensitivity and specificity were 60.0% and 90.3%, respectively. The negative and positive predictive values (NPV and PPV) were 97.6% and 25.5%, respectively. In a subgroup analysis of adenocarcinoma (ADC) and squamous cell carcinoma (SCC), the validated AUC of ADC patients was 0.856 (95% CI: 0.790-0.922) and the validated AUC of SCC patients was 0.864 (95% CI: 0.777-0.952) (p = 0:885, DeLong test).
We also compared our model with the Fudan model [16] and Beijing model [17], as all three studies included clinical T1 NSCLC. Analyzed with all the data from our validation group, the validated AUC of the Beijing model was 0.

Discussion
Lymph node status, especially the assessment of N2 metastasis, largely affects the treatment strategies in the clinic. Therefore, it is of great significance to make an accurate and noninvasive assessment of lymph nodes before operation. In this study, we established a five-variable formula predicting N2 metastasis for malignant nodules within 3 cm. Our model showed a high negative predictive value of 97.6% and specificity of 90.3%, which can select patients with low risks of N2 metastasis and help with the clinical decision-making. As a truly multidisciplinary process, preoperative evaluation of lymph node evaluation has confused clinical physicians for many years. An algorithm that integrates imaging, endoscopic, and surgical techniques recommended by ESTS guidelines has been widely practiced and prospectively validated, with the negative predictive value as high as 0.94 [18]. However, some researchers are more interested in creating a predictive model ahead of biopsy strategy [16,17,[19][20][21], because the accuracy of preoperative invasive staging such as TBNA may largely depend on the experience of operators.
Shafazand and Gould reported the first quantitative model to pretest the probability for N2 metastasis in NSCLC of all stages [20]. The formula consisted of six independent predictors, which were age, tumor size, central location, adenocarcinoma histology, onset of primary symptoms, and abnormal mediastinum on chest X-ray. However, their data was directly collected from a previous randomized controlled trial and no CT images were included at that time. After that, Zhang and colleagues reported a four-predictor model for N2 metastasis in CT-defined T1N0M0 NSCLC in 2012 [16]. Younger patients with a central-located and larger-sized lung adenocarcinoma had higher risks of N2 disease. However, patients with a histology of AIS (adenocarcinoma in situ) and MIA (microinvasive adenocarcinoma) were excluded from their study, despite the fact that the pathology of AIS or MIA could only be confirmed from a resected specimen. In that case, the percentage of adenocarcinoma might be underestimated in their model because there will be AIS   and MIA patients in reality. More recently, there were predictive models evaluating N2 metastasis for NSCLC of all stages [21] and models estimating nodal metastasis in clinical T1a stages [17].
No models above have referred to the different risk factors of N1 and N2 metastases. Analyzed by polytomous logistic regression, we found that consolidation tumor size and lymph node enlargement on CT scan were the most related factors to both N1 and N2 metastases in patients with early malignant nodules (diameter ≤ 3 cm, stage T1). Though it is difficult to differentiate benign lymphadenectasis from lymph node metastasis on CT, our results showed that lymphadenectasis in N1 station was of higher correlation to N2 metastasis. This could be explained by the lymphatic drainage, and the rate for skip N2 metastasis was only 29% [22]. This result was partly in accordance with the previous literature and the ESTS recommendation [13,23].
In both polytomous and dichotomous logistic analyses, consolidation tumor size and lymph node enlargement on CT and CEA levels were correlated to N2 metastasis, which is consistent with previous studies [16,17,24,25]. Smoking history seemed to be negatively associated with N2 disease, as the odds ratio was less than 1 in both analyses. Despite the lack of molecular mechanisms, nonsmokers are more prone to a delayed or incidental detection of lung cancer than smokers and thus are more likely to progress into nodal metastasis, as supported by data from Lee et al. [26]. Apart from that, tumors with peripheral location were found with a higher likelihood of N2 metastasis in this study. The inconsistency between different research studies [27,28] could result from the different criteria of the definition as "central location" and the different target population. Takeda et al. also found that peripheral tumors are more likely to have N2 metastasis by subpleural lymph drainage pathways [29].
Compared with previous logistic analysis, this study exhibited a larger sample size and reduced selective bias by enrolling patients with all pathological type including AIS and MIA, which constituted 8.2% and 16.7% of groundglass nodules in the development group. Our data suggested that consolidation size was a stronger predictive factor of nodal metastasis compared with tumor size and C/T ratio in the multivariate analysis. Squamous cell carcinoma also fit in with this model though it was a minority type of histology. Besides, pathological type was not an independent factor in this multivariate model, suggesting that preoperative histology might not be a necessity for predicting N2 metastasis.
Nevertheless, this study also had several limitations. Firstly, it was a retrospective study and there was no standard on the number of resected lymph nodes. In 2014, American  College of Surgeons Commission on Cancer recommended at least 10 regional lymph nodes to be removed and pathologically examined for resectable NSCLC [30]. Thus, a diagnostic bias might occur in our study. Secondly, we only collected data from a single-center institution and reflected patient characteristics in local areas. Finally, in order to ensure the general use of the model, the proportion of lymph node metastasis in this study was coherent with the prevalence in reality, which was insufficient and influenced the positive predictive value of the model. Therefore, a larger-sized study with more positive data from multiple medical centers will be needed to carry out a more practical model for clinical use.

Conclusions
In this study, we analyzed the clinical features of patients with lymph node metastasis and produced a model predicting the possibility of N2 nodal metastasis for early lung cancers (tumor ≤ 3 cm). Stratified by the cutoff point, a low predicted probability may suggest an operation directly without neoadjuvant therapies, while a relatively high predicted probability needs support from further invasive and expensive examinations. Our model will provide some clues for clinical decision-making.

Data Availability
All relevant data are within the article and the supplementary materials.

Ethical Approval
This study was approved by the Institutional Ethics of Committee of SAHZU (2017-031).