A Prediction Rule for Overall Survival in Non-Small-Cell Lung Cancer Patients with a Pathological Tumor Size Less Than 30 mm

We sought to develop and validate a clinical nomogram model for predicting overall survival (OS) in non-small-cell lung cancer (NSCLC) patients with resected tumors that were 30 mm or smaller, using clinical data and molecular marker findings. We retrospectively analyzed 786 NSCLC patients with a pathological tumor size less than 30 mm who underwent surgery between 2007 and 2017 at our institution. We identified and integrated significant prognostic factors to build the nomogram model using the training set, which was subjected to the internal data validation. The prognostic performance was calibrated and evaluated by the concordance index (C-index) and risk group stratification. Multivariable analysis identified the pathological tumor size, lymph node metastasis, and Ki-67 expression as independent prognostic factors, which were entered into the nomogram model. The nomogram-predicted probabilities of OS at 1 year, 3 years, and 5 years posttreatment represented optimal concordance with the actual observations. Harrell's C-index of the constructed nomogram with the training set was 0.856 (95% CI: 0.804-0.908), whereas TNM staging was 0.814 (95% CI: 0.742-0.886, P = 5.280221e − 13). Survival analysis demonstrated that NSCLC subgroups showed significant differences in the training and validation sets (P < 0.001). A nomogram model was established for predicting survival in NSCLC patients with a pathological tumor size less than 30 mm, which would be further validated using demographic and clinicopathological data. In the future, this prognostic model may assist clinicians during treatment planning and clinical studies.


Introduction
Despite significant treatment advancements, lung cancer remains the leading cause of cancer-related mortality worldwide with non-small-cell lung cancer (NSCLC) accounting for 85% of all lung cancer cases [1,2]. Currently, lung adenocarcinoma and squamous cell lung cancer (SCC) are the two most commonly diagnosed forms of NSCLC. Due to the use of low-dose computed tomography (LDCT) in high-risk and some healthy subjects, it has become easier to detect the disease during its early stages when treatment is most effective [3]. Despite dramatic improvements in diagnosing lung cancer, the 5-year cumulative survival rate for NSCLC has remained unchanged at 18.5%. However, most studies have assessed the overall survival (OS) in patients with advanced-stage NSCLC, as only a limited number of patients were diagnosed with the early-stage disease in the past [4]. Nevertheless, some patients with the early-stage NSCLC present with aggressive characteristics, and there is limited information on how to estimate the survival of these patients. Currently, a limited number of studies have used mathematical models to predict the survival outcomes of patients with early-stage NSCLC [5,6]. The development of prognostic models may aid clinicians during treatment planning and patient stratification in the future.
While several prognostic biomarkers have been investigated in lung cancer, there have been limited imaging agents that have advanced to clinical trials. For example, preoperational or initial peripheral blood carcinoembryonic antigen (CEA) levels were previously shown to be useful prognostic biomarkers for NSCLC patients [7,8]. In addition, some immunohistochemical (IHC) markers, such as p53 and Ki-67, have been successfully used for predicting the prognosis of NSCLC patients [9,10]. Patients with a mutated epidermal growth factor receptor (EGFR) were also shown to benefit from specific molecular-targeted therapies [11]. However, the prognostic role of EGFR-targeted agents in NSCLC patients with a pathological tumor size less than 30 mm remains unclear. The new substaging system defined in the 8th edition of the American Joint Committee on Cancer (AJCC) divides stage IA into IA1, IA2, and IA3, which has shown a significant prognostic value for patients with NSCLC [12]. In addition, other prognostic factors may be used in NSCLC patients with a pathological tumor size less than 30 mm, such as smoking status, histopathology subtype, and lymph node metastasis [13]. The combined prognostic factors based on a cohort may aid in the precise assessment of the disease prognosis in NSCLC patients. Recently, several studies have shown that nomogram models can be superior to the traditional TNM staging system for the prediction of patient outcomes in several types of cancer [14][15][16]. Nomograms can be used to present an intuitive graph of the results from the statistical predictive model, which makes it possible to quantify the prognostic probability for predicting clinical events individually for each patient.
Therefore, the goal of this study was to develop and validate an available nomogram model by combining clinicopathological variables and molecular biomarkers based on the data obtained from NSCLC patients with a pathological tumor size less than 30 mm from the eastern islands of China. We also sought to compare the prognostic value of a nomogram model with the newest TNM staging system.  [17]. The staging was determined following the new substage guidelines found in the 8th edition of the AJCC [12].

Material and Methods
Demographic data, including age, sex, history of tobacco exposure, and pathology, which includes histological type, pathological tumor size, lymph node metastasis, tumor location, and pleural invasion, were obtained. In addition, the type of surgical intervention and pathological TNM stage were included. Other factors, such as preoperational peripheral CEA, IHC markers for p53 and Ki-67 expression, and EGFR mutations, were also included. Follow-up data were obtained from the death registration system of the Zhoushan Center for Disease Control and Prevention, along with the medical review of all patients on an outpatient basis with computed tomography (CT) imaging at 3-month intervals for the first year after treatment and then at 6-month intervals.
The exclusion criteria for this study included patients with tumor sizes greater than 30 mm in diameter, small-cell lung cancer (SCLC) cases, large-cell lung cancer (LCLC) cases, lymphoepithelioma-like carcinoma, and neuroendocrine tumors that were 30 mm in diameter or less. Finally, 786 patients were identified to be in this study cohort and separated into the training or validation sets according to their date of surgery. The 457 patients who underwent surgery from 2007 to 2014 were assigned to the training group and used to develop the nomogram prognostic model, while the other 329 patients who underwent surgery from 2015 to 2017 were used to validate the nomogram model. The last follow-up was the date of death or until April 30, 2018, for patients who are still alive. The OS was calculated from the time of surgery until the time of death or the final followup. This study was approved by the Ethical Review Committee of Zhoushan Hospital, Zhejiang Province, China. Written content was waived by the Institutional Review Board due to the retrospective nature of this study.

Statistical
Analysis. Continuous data were reported as median values with interquartile range. Cumulative survival curves were depicted using the Kaplan-Meier method with a calculated median survival time and a 95% confidence interval (CI). The log-rank test was used to compare the prognostic factors, and the univariate analysis was used to calculate the P values. Those P values of ≤0.05 were considered statistically significant and used in the multivariate analysis for the Cox proportional hazards regression model. The development of a nomogram for the training set was constructed based on the results of the multivariate analysis using the backward stepwise selection method with the Akaike information criterion (AIC) [18]. The nomogram model was subjected to the internal data validation, and the concordance index (C-index) was calculated to evaluate the predictive accuracy of OS. A larger C-index indicated a more accurate probability to distinguish the outcome of the model. The calibration was estimated using a calibration curve for 1-year, 3-year, and 5-year OS after bias correction.
All statistical analyses were performed using SPSS 22.0 (IBM, Chicago, IL, USA) and R version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). P values that were ≤0.05 were considered statistically significant with a two-sided test.

Characteristics of Patients in the Training and Validation
Cohorts. A total of 786 cases were identified as having NSCLC with a pathological tumor size less than 30 mm and were separated into the training set (n = 457) and validation set (n = 329) based on the surgical date. The demographic data and clinicopathological characteristics of the two groups are summarized in Table 1. Among the variables, preoperative serum CEA levels, Ki-67 expression levels, p53 expression levels, and EGFR mutation information were missing in 0.2%, 37.2%, 37.2%, and 41.4% of the cases in the training set and 0.3%, 14.6%, 14.0%, and 0.6% of the cases in the validation set, respectively. The median follow-up intervals were 49 months (range 3-136 months) for the training set and 31 months (range 6-39 months) for the validation set.

Univariate and Multivariate
Analyses of the OS in the Training Set. The findings from the univariate and multivariate analyses of OS are described in Table 2. The univariate analysis indicated that patients who were female (vs. male, P < 0 001), less than 60 years of age (vs. ≥60 years of age, P = 0 004), and nonsmokers (vs. current smokers or those with history of smoking, P < 0 001) and had preoperative CEA levels of less than 5.0 ng/mL (vs. ≥5.0, P < 0 001) showed better OS.
In histology, patients with NSCLC-subtype adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA) had more favorable OS than those patients with IAC or SCC (P < 0 001). A pathological tumor size ≤ 10 mm displayed the most favorable OS, followed by a pathological tumor size of 10-20 mm, while larger tumors (20-30 mm) showed the least favorable OS (P < 0 001). Patients with no lymph node metastasis showed superior survival than those patients with lymph node metastasis (N1 or N2, P < 0 001). For the pathological TNM stage, patients with early-stage disease showed better OS than those with advanced-stage disease (P < 0 001). Moreover, patients positive for P53 or Ki-67 expression experienced less favorable OS when compared with those patients negative for P53 (P < 0 001) and Ki-67 (P < 0 001). However, the surgical procedure had no significant impact on OS between the two groups (P = 0 850), as well as EGFR mutation status. However, patients with a mutant EGFR showed worse OS than patients with the wild-type EGFR, yet this finding was not statistically significant (P = 0 083). All significant factors identified as predictors of OS in the univariate analysis were used for the multivariate analysis based on the Cox proportional hazards regression. The results described that pathological tumor size (P < 0 001), lymph node metastasis (P < 0 001), and Ki-67 expression (P < 0 001) were the independent prognostic factors in the Cox model.

Development of a Nomogram
Model for OS. The nomogram model was established using the independent significant prognostic factors (Figure 1). The nomogram illustrated the points of each predictor ranging from 0 to 100. The results showed that pathological tumor size was the most significant contributor to the prognosis, followed by Ki-67 expression levels and lymph node metastasis. The total scores were calculated and located on the total point scale. The probabilities of OS at 1 year, 3 years, and 5 years posttreatment were individually estimated by drawing a straight line and ranged from 0.80 to 0.98, 0.50 to 0.95, and 0.35 to 0.95, respectively.

Calibration and Validation of the Nomogram in the
Validation Set. The calibration plot presented an optimal prediction for 1-year, 3-year, and 5-year OS between the nomogram prediction and actual observations (Figure 2). In the validation cohort, the calibration curve also showed an accordant agreement for 1 year and 3 years OS (Figure 3). Harrell's C-index, which was used to evaluate the performance of the constructed nomogram, was 0.856 (95% CI: 0.804-0.908) in the training set and 0.820 in the validation set (95% CI: 0.647-0.993). The TNM staging was 0.814 (95% CI: 0.742-0.886, P = 5 280221e − 13) in the training set and 0.812 (95% CI: 0.711-0.913, P = 0 675) in the validation set.

Stratifying the Risk Ability of the Prognostic Nomogram
Model. We divided patients into four risk groups (scores: 0-9.72, 9.72-17.67, 17.67-22.67, and ≥22.67) with the optimal cut-off values for total points in the training set (Table 3). A survival analysis demonstrated that the subgroups showed significant distinctions within the training cohort (P < 0 001, Figure 4(a) and Table 4). The same  cut-off values were also applied to the validation set, and survival differences were represented among the subgroups (P < 0 001, Figure 4(a) and Table 4).

Discussion
In recent years, the increased usage of CT screening has led to an increase in the number of lung cancers with a pathological tumor size less than 30 mm being detected in the clinic [19,20]. However, the prognostic prediction capabilities of a nomogram model have not been constructed for NSCLC patients with a pathological tumor size less than 30 mm. In this study, we developed a nomogram model and internally validated it to predict the prognosis of NSCLC patients from a single institution in the eastern islands of China. This nomogram was not only based on    demographic data and clinicopathological characteristics but also focused on the molecular factors and IHC markers. We proposed that the nomogram could allow for better treatment planning in the future. A nomogram model established on the data from multiple institutions often yields higher accuracy and less bias. However, the nomogram model in this study used data derived from a single institution. In addition, molecular marker data were included in this study. As a vital tumor suppressor, p53 expression is often lost in tumors, which can be directly correlated with the prognosis of patients [21]. Ki-67 is a marker of cell proliferation in NSCLC, and elevated Ki-67 levels have been correlated with poor outcomes in NSCLC patients [9]. Moreover, an EGFR mutation was previously discovered to have a prognostic role in NSCLC patients. Our univariate analysis revealed that patients positive for p53 or Ki-67 expression showed less favorable outcomes than those patients who were negative for p53 or Ki-67 in the training cohort. However, the EGFR mutation status had no significant effect on the outcome of NSCLC patients. In general, these findings agreed with the results from other studies [10,22]. In addition, sex, age, smoking status, preoperational CEA levels, histology, pathological T categories, lymph node metastasis, and pathological TNM stage were also found to be prognostic factors in the univariable analysis. Through the subsequent multivariable analysis, the pathological T stage and N category, as well as Ki-67 expression, were identified as independent prognostic predictors. Previous studies also demonstrated that tumor size and lymph node metastasis were risk factors for NSCLC [4,23].
Notably, Ki-67 expression was found to be associated with poorer survival outcomes in NSCLC patients. To our knowledge, this is the first attempt to include an IHC marker into a nomogram model.
The nomogram model showed a clear distinction capacity for predicting patient outcomes in the training cohort. The C-index of this model was 0.856, higher than those previously reported in published studies [23,24]. Moreover, the nomogram model was more successfully applied than the AJCC TNM staging classification system in the individual evaluation of patient prognosis, which may be attributed to the inclusion of Ki-67 expression data. IHC markers can provide valuable insight into the pathology of NSCLC, and molecular markers are commonly used in the pathological diagnosis of lung cancer in the clinic [25]. We used molecular markers to build our nomogram model, aiming to increase its overall accuracy. In addition, a relatively good calibration was observed in the nomogram of the training cohort. To validate the nomogram model, we use internal validation data to evaluate the accuracy and calibrate the model. The Cindex reflected a good discrimination power in the validation data, but it was lower than that of the training set. This might be due to the shorter follow-up times for patients in the validation cohort. Moreover, using an optimal cut-off value, the nomogram showed excellent prediction capabilities in terms of OS in different risk subgroups. The proposed nomogram may have a potential role in clinically evaluating the OS probability of patients with NSCLC [26].
There were several limitations to this study. The first limitation was the amount of missing data in the training data, such as Ki-67 and p53 expression levels and EGFR mutation status. This might introduce selection bias into the nomogram model. The second limitation is that this study was conducted at a single institution and the established nomogram was validated using an internal cohort. External validation based on a larger number of patients at multiple institutions should be introduced in the future. The third limitation was the retrospective nature of this study that had shorter follow-ups, especially in the validation set. Lastly, the fourth limitation was our inability to include some recognized prognostic parameters, such as comorbidity and postoperative complications in the nomogram. Other parameters were not assessed in this study, such as treatment efficacy, disease-free interval, or progressionfree survival. In future studies, we will improve the model by using multi-institutional data with longer follow-up times, less missing data, and the presence of other predictive factors.
In the present study, we developed a prognostic nomogram model for NSCLC patients with a pathological tumor size less than 30 mm and validated the model using an internal cohort. We also built proportional OS subgroups in the model to discriminate between different patient outcomes. We developed a high-performance nomogram model that includes molecular marker data and displays a C-index of 0.856. This nomogram could be used as a convenient and precise outcome predictive tool for clinicians in the future, yet further external validations using data from multiple institutions should be considered.

Data Availability
The data used to support the findings of this study are available from the corresponding authors upon request.

Disclosure
The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.