Characteristics and Prognostic Nomogram for Primary Lung Lepidic Adenocarcinoma

Background Lepidic adenocarcinoma (LPA) is an infrequent subtype of invasive pulmonary adenocarcinoma (ADC). However, the clinicopathological features and prognostic factors of LPA have not been elucidated. Methods Data from the Surveillance, Epidemiology, and End Results (SEER) database of 4191 LPA patients were retrospectively analyzed and compared with non-LPA pulmonary ADC to explore the clinicopathological and prognosis features of LPA. Univariate and multivariate Cox proportional hazard models were performed to identify independent survival predictors for further nomogram development. The nomograms were validated using the concordance index, receiver operating characteristic curves, and calibration plots, as well as decision curve analysis, in both the training and validation cohorts. Results Compared with non-LPA pulmonary ADC patients, those with LPA exhibited unique clinicopathological features, including more elderly and female patients, smaller tumor size, less pleural invasion, and lower histological grade and stage. Multivariate analyses showed that age, sex, race, tumor location, primary tumor size, pleural invasion, histological grade, stage, primary tumor surgery, and chemotherapy were independently associated with overall survival (OS) and cancer-specific survival (CSS) in patients with LPA. The nomograms showed good accuracy compared with the actual observed results and demonstrated improved prognostic capacity compared with the TNM stage. Conclusions LPA is more frequently diagnosed in older people and women. LPA was inclined to be smaller in tumor size and lower in tumor grade and staging, which may indicate a favorable prognosis. The constructed nomograms accurately predict the long-term survival of LPA patients.


Introduction
Lung cancer is the leading cause of cancer-related death and one of the most commonly diagnosed cancers worldwide [1]. Lepidic adenocarcinoma (LPA), also known as lepidic predominant adenocarcinoma or nonmucinous bronchioloalveolar carcinoma [2], is an infrequent subtype of lung adenocarcinoma (ADC) without precise incidence data. LPA is defined as an ADC of >3 cm in tumor size and/or has >5 mm lymphatic, vascular, or pleural invasion with a nonmucinous lepidic predominant growth pattern [3]. e definition was proposed by the International Association for the Study of Lung Cancer in 2011 and subsequently accepted by the World Health Organization (WHO) in 2015 [4]. LPA exhibits unique clinicopathological features, specific gene mutation profiles, and desirable survival outcomes compared with lung adenocarcinoma, not otherwise specified (NOS) [4][5][6]. However, very few population-based studies have been completed on the analysis of the demographic and clinicopathological features and the factors influencing the prognosis of LPA. Meanwhile, it is quite challenging for clinicians to accurately predict the prognosis of patients relying only on tumor-node-metastasis (TNM) stage. erefore, it is necessary to develop tools for estimating the probability of long-term survival in patients with LPA.
e Surveillance, Epidemiology, and End Results (SEER) database provides a wide range of demographic, clinical, and follow-up information of cancer patients, which was established in 1973 and covers approximately 28% of the population in the United States [7]. Using the SEER database, we retrospectively analyzed the clinicopathological features and survival data of 4191 LPA patients to confirm their clinicopathological characteristics and prognostic indicators. We then developed nomograms estimating the overall survival (OS) and cancer-specific survival (CSS) of LPA patients. Furthermore, we performed nomogram validation in both the training and validation cohorts, as well as decision curve analysis (DCA), to evaluate the accuracy of the nomograms. In addition, we estimated the incidence of LPA and explored the risk factors associated with distant and lymph node metastases of LPA.

Data Source and Selection.
Patient data were obtained from the SEER database using SEER * Stat software, version 8.4.0.1 (https://seer.cancer.gov/seerstat/). Lung adenocarcinoma was classified according to the 2021 WHO classification system. e International Classification of Diseases for Oncology, third edition (ICD-O-3) histology code, was used in this study to identify patients. e inclusion criteria were as follows: (1) primary lung cancer; (2) ICD-O-3 histology code 8250/3 (lepidic adenocarcinoma), 8260/3 (papillary adenocarcinoma), 8230/3 (solid adenocarcinoma), or 8140/3 (adenocarcinoma-NOS); (3) positive histological confirmation; and (4) diagnosis between 2005 and 2016 to ensure a minimum follow-up period of three years. e exclusion criteria were as follows: (1) patients who had multiple primary tumors in their lifetime; (2) unknown survival data or TNM stage; and (3) unknown important and easily accessible information in clinical practice, including age at diagnosis, race, and marital status. To construct and validate the nomograms, the patients with LPA diagnosed in 2010 and 2011 (n � 833) were assigned to the validation cohort, and those diagnosed between 2005 and 2016, except for 2010 and 2011 (n � 3358), were assigned to the training cohort.

Study Variables.
Demographic and clinicopathological variables of the included patients were extracted, including age, sex, race, marital status, tumor location, primary tumor size, separate tumor nodules, pleural invasion, histological grade, 6th edition TNM stage, treatment, vital status, survival time, corresponding death causes, and the status of education and income in the county where patients resided in. In this study, other races were recorded as "Others," except for White and Black races. "Married (including common law)" was recorded as "Married," and other marital statuses were recorded as "Single." e status of education and income was defined as "Low" or "High," meaning that patients resided in counties with lower/higher education or income than the median level. Considering that the survival time in the SEER database was expressed in months, the survival time of 0 month was recorded as 0.5 month. OS was defined as the period from diagnosis to death caused by any cause or the last follow-up, while CSS was defined as the period from diagnosis to death caused by lung cancer.

Statistical Analysis.
For descriptive statistics, the absolute number and percentage of variables were described. e chi-square tests were used to compare the demographic and clinicopathological characteristics among different groups. Propensity score matching (PSM) analysis was used to minimize the impact of confounding factors. e propensity score for each patient with ADC-NOS or LPA was calculated with a logistic regression model, which included the following variables: age, sex, race, marital status, income and education levels, primary tumor location and size, separate tumor nodules, pleural invasion, histological grade, TNM stage, and treatment. Caliper matching within a caliper of 0.02 was performed among the two groups. After PSM analysis, 4165 pairs of patients were successfully matched among the patients included in our study. OS and CSS were compared between matched patients with ADC-NOS and LPA by the Kaplan-Meier curves and log-rank tests.
en, the LPA patient data were used for further analyses. Multivariate binary logistic regression analyses were performed to identify risk factors for distant and lymph node metastases in all LPA patients. Univariate and multivariate Cox proportional hazard models with a backward stepwise selection method were performed to calculate the hazard ratios (HRs) with 95% confidence intervals (CIs) of variables associated with OS and CSS in the training cohort of LPA patients. Based on multivariate Cox analyses, nomograms were constructed and evaluated by the concordance index (C-index), receiver operating characteristic (ROC) curves, and calibration curves, which were used for the comparison between the observed and nomogrampredicted survival outcomes. Ultimately, decision curve analysis (DCA) was performed to compare the prognostic capacity of the nomogram model and TNM stage. To verify the applicability of the nomogram model, nomograms were validated in both the training and validation cohorts. e ages of patients were stratified by the X-tile program (Yale University, USA) [8]. According to the desirable cutoff value of age, in terms of OS, determined by X-tile analysis (Supplementary Figures S1A-S1C), the patients were divided into 3 groups (0-69, 70-79, and 80+ years old). All statistical analyses were performed using R version 3.6.1 (https://www.r-project.org/). A two-tailed value of P < 0.05 was considered to be statistically significant.

Patients and Tumor
Characteristics. Among 1,244,493 patients diagnosed with a primary lung or bronchus malignancy in the SEER database between 1975 and 2016, a total of 27,142 patients were diagnosed with LPA, which accounted for 2.18% of all lung cancer patients. After applying the inclusion and exclusion criteria, the numbers of patients with lung ADC-NOS, LPA, papillary adenocarcinoma, and solid adenocarcinoma enrolled in our study was 95004, 4191, 1545, and 163, respectively. e demographic and clinicopathological characteristics of the eligible patients are shown in Table 1, Supplementary Tables S1 and S2. Among the eligible patients, those with LPA were more common in older age, females, and Yellow race. In addition, patients with LPA were inclined to have smaller tumor sizes, fewer separate tumor nodules, less pleural invasion, and lower histological grades and stages. After PSM analysis, the Kaplan-Meier curves and log-rank tests were performed and showed that patients with LPA had better survival outcomes than those with ADC-NOS (Supplementary Figures S2A-S2B). Furthermore, the survival outcomes were also compared between patients with LPA and papillary adenocarcinoma, or solid adenocarcinoma (Supplementary Figures S3A-S3B, Figures S4A-S4B).

Factors Associated with Distant and Lymph Node
Metastases. As shown in Supplementary Table S3, the factors significantly associated with distant metastasis were identified by univariate Cox regression and further examined by multivariate analysis, which showed that Yellow race, large tumor size, positive separate tumor nodules, and higher histological grade were independent risk factors for distant metastasis. Moreover, age, sex, race, tumor size, pleural invasion, and histological grade were significantly associated with lymph node metastasis in the multivariate analysis (Supplementary Table S4).

Establishment of the Nomograms
Predicting OS and CSS of LPA Patients. As described above, patients with LPA were divided into a training cohort and a validation cohort. Demographic and clinicopathological characteristics in the two cohorts were overall comparable (Supplementary Table S5). In the training cohort, univariate analysis showed that age, sex, race, marital status, education, tumor location, primary tumor size, separate tumor nodule, pleural invasion, histological grade, TNM stage, primary tumor surgery, radiotherapy, and chemotherapy were significantly associated with OS (Table 2). Further multivariate analysis showed that age, sex, race, tumor location, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were significantly associated with OS. Likewise, multivariate analysis identified that age, sex, race, tumor location, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were also significantly associated with CSS (Table 3). According to the multivariate results, two nomograms predicting the survival probability of 1-and     e sensitivity and specificity of predicting the prognosis of LPA were identified by ROC curves. As shown in Figure 3, the area under the curve (AUC) values of the nomogram predicting 1-and 5-year OS were 0.841 and 0.856, respectively, in the training cohort (Figure 3(a)); the AUC values of the nomogram predicting 1-and 5-year OS were 0.821 and 0.859, respectively, in the validation cohort (Figure 3(b)). While the AUC values of the nomogram predicting 1-and 5year CSS were 0.862 and 0.891, respectively, in the training cohort (Figure 3(c)), the AUC values of the nomogram predicting 1-and 5-year CSS were 0.852 and 0.891, respectively, in the validation cohort (Figure 3(d)). Furthermore, calibration plots conducted using the training and validation cohorts both indicated that the OS and CSS nomograms demonstrated excellent agreement between the predicted and actual survival outcomes (Figures 4(a)-4(h)). In addition, the DCA results demonstrated that the nomograms showed better prognostic capacity than the TNM stage (Figures 5(a)-5(d)). As expected, when used to predict the survival outcomes of LPA patients, the nomogram constructed in this study was more accurate than a classic nomogram [9] previously established for overall NSCLC patients ( Supplementary Figures S3A-S3B).
Furthermore, LPA patients were divided into two groups ("low risk" or "high risk") based on the median total scores calculated by the nomograms. As shown in Figure 6(a), the Kaplan-Meier curves and log-rank tests suggested that the median OS of LPA patients in the high-risk group (17.0 months; 95% CI, 16.0-18.0 months) was significantly shorter than that in the low-risk group (not reached) (P < 0.001). Likewise, as shown in Figure 6

Discussion
Concise and accurate prognostic prediction models for patients with malignancy are essential for clinical decision-making and scientific research. Indeed, the TNM stage is the most widely used survival predictor for cancer patients. However, identifying more prognostic factors and a more individualized model will certainly improve the accuracy of clinical outcome prediction. In this study, we used the SEER database, a large-scale population-based cancer registry program, to explore the clinical characteristics of 4191 patients with LPA and identified the factors associated with distant and lymph node metastases in LPA patients. After that, we developed and validated accurate and personalized prognostic nomograms predicting the 1-and 5-year OS and CSS of patients with LPA. e survival outcomes of LPA patients with poor prognostic factors were undesirable, and the median OS of advanced LPA patients was 20.1 months [10]. However, the prognosis of advanced LPA patients could be improved by appropriate treatments, including chemotherapy and EGFR tyrosine kinase inhibitors (TKIs) [10]. e 5-year diseasefree survival of LPA patients after complete surgical resection was approximately 90% [11]. With the evaluation of the nomograms generated in our study, more aggressive treatments are recommended for high-risk patients with LPA, and appropriate shortening of the follow-up interval is encouraged to detect the occurrence of endpoint events as early as possible. For example, older, Black men with sizeable tumors and advanced TNM stages are recommended for frequent follow-up and more aggressive treatments, including primary tumor resection, when they meet the operational criteria.
Compared with other rare histologic subtypes of lung cancer, such as papillary adenocarcinoma [12] and carcinosarcoma [13], our results suggested that the incidence of LPA was much higher. Our results also indicated that LPA patients were more common in older age and females, which is consistent with previous studies [14,15]. In addition, some clinicopathological features of LPA patients indicated a good prognosis, including smaller tumor size, fewer separate tumor nodules, less pleural invasion, and lower histological grade and stage. is is consistent with previous studies [16] and in line with the good prognosis of LPA [3,14,16]. Moreover, LPA possessed some characteristics differing from other histologic subtypes of invasive pulmonary ADC, such as being more common in nonsmokers or light smokers, having a preference for pulmonary peripheral location, and being false-negative in positron-emission tomographic scans [14,17]. Like patients with NSCLC [18], we identified that variables including race, tumor size, separate tumor nodules, and histological grade were associated with distant and lymph node metastases in patients with LPA. Furthermore, asymptomatic at presentation or excessive airway secretion was more common in patients with LPA [19]. In the genetic alteration profiles, EGFR mutations occurred in approximately 50% of patients with LPA, which was significantly higher than other subtypes [5], especially mutations in exon 21 [19,20]. However, KRAS mutations are much less common and account for approximately 10% of the LPA population [5]. Compared with other histologic subtypes, a lower rate of ALK rearrangement and a higher rate of RET rearrangement were reported [6,21,22].

Canadian Respiratory Journal
Most studies supported patients with LPA had desirable survival outcomes compared with other subtypes of invasive pulmonary ADC. Surgery is still the superior option for LPA patients, whereas adjuvant chemotherapy, including oral fluoropyrimidines and platinum-based regimens, conferred no survival benefit on patients with LPA, regardless of the tumor stage [23,24]. In patients with advanced LPA, studies have suggested that taxane-based chemotherapy and pemetrexed might be effective and well tolerated [25,26]. With higher frequencies of EGFR mutations, EGFR-TKI therapy for advanced LPA demonstrated encouraging efficacy [10]. Nevertheless, due to the lower expression level of programmed cell death ligand 1, the efficacy of immune checkpoint inhibitors in patients with LPA may be poor [27][28][29]. Moreover, multiple studies suggested that a higher percentage of lepidic growth patterns were associated with a lower risk of recurrence, and invasive component size was a better predictor for survival than overall tumor diameter [17,19,30,31]. Furthermore, no recurrence was observed in any of the 18 LPA patients with a maximum tumor diameter >3 cm but the maximum diameter of the invasive area <5 mm [32]. erefore, Suzuki et al. [32] proposed that LPA with an invasion of 5 mm or less can be regarded as minimally invasive ADC even if the tumor is larger than 3 cm in diameter. Unsurprisingly, our results suggested that primary tumor surgery was a major prognostic factor of LPA patients following the TNM stage. In contrast, chemotherapy was far less important to the prognosis of LPA patients. Furthermore, our results suggested that radiotherapy had no significant effect on the survival outcomes of LPA patients.  Regrettably, we could not explore the prognostic significance of chemotherapy regimens, targeted therapy, immunotherapy, or the diameter of the invasive area.
In this study, we identified that age, sex, race, tumor location, primary tumor size, pleural invasion, histological grade, TNM stage, primary tumor surgery, and chemotherapy were independently associated with OS and CSS in patients with LPA. Notably, few patients with histological grade IV LPA were included in this study. erefore, the nomograms we constructed to predict the survival outcomes were not suitable for patients with histological grade IV LPA. Similar to previous studies, our results suggested that treatment, tumor size, and some demographic characteristics also had an impact on the prognosis of LPA patients, and we provided a statistical prediction tool that can incorporate and quantify the selected prognostic factors to estimate the survival outcome for an individual patient. Moreover, the nomograms were examined by C-index, ROC curves, calibration plots, and DCA curves, which demonstrated that the nomograms showed excellent agreement between the nomogram-predicted and actual survival outcomes of patients with LPA, as well as better prognostic capacity than TNM stage.
To date, this is the first time that the demographic and clinicopathological features, as well as the incidence of LPA, have been elucidated based on a large-scale populationbased database. Meanwhile, this is the first nomogram predicting the survival outcomes of LPA patients, which could aid in the personalized prognostic evaluation and clinical decision-making. However, there were still some limitations in our study, although the nomograms demonstrated good accuracy and applicability. First, nomograms were constructed based on retrospective data, and prospective external validation is needed. Second, some critical information, such as the diameter of the invasive area in LPA, tumor biomarkers, chemotherapy regimen, targeted therapy, molecular pathology, and genetic tests, was absent in the database. Moreover, the TNM staging information provided by the database is the result of the 6th edition staging system, instead of the latest edition staging system. erefore, we could not analyze those variables or improve the prognostic nomograms in our study. ird, the patients were almost all Americans, and the results might be different for other races. Such drawbacks are inherent to almost all retrospective population-based studies. However, the large sample size and the long follow-up duration of this study compensate to a great extent and provide comprehensive knowledge of LPA. Further prospective studies with more important information are needed for model improvement and independent validation.    Figure 6: Kaplan-Meier curves of overall survival (a) and cancer-specific survival (b) for all patients with lepidic adenocarcinoma divided into two risk stratifications based on the scores calculated by the nomograms.

Conclusion
In summary, we explored the clinical characteristics of LPA patients and developed nomograms predicting the OS and CSS of LPA patients individually. e nomograms showed good accuracy and applicability, which may aid in individualized prognostic prediction for LPA patients and clinical decision-making.
Data Availability e datasets generated and/or analyzed during this study are available in the Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/). R code is available upon request.
Ethical Approval e authors received permission to access the research data file in the SEER program from the National Cancer Institute, USA (reference number 13355-Nov2021). Approval was waived by the local ethics committee, as SEER data are publicly available and deidentified.

Consent
Not applicable.

Disclosure
A preprint of this study has previously been published [33].

Conflicts of Interest
e authors declare that they have no conflicts of interest. Table S1. Characteristics of patients with LPA compared with lung papillary adenocarcinoma. Table S2. Characteristics of patients with LPA compared with lung solid adenocarcinoma. Table S3. Factors associated with distant metastasis in patients with lepidic adenocarcinoma (n = 4191). Table S4. Factors associated with lymph node metastasis in patients with lepidic adenocarcinoma (n = 4191). Table S5. Characteristics of patients with LPA in the training and validation cohorts. Figure S1. (A, B) Optimal cutoff value of age was identified by X-tail; (C) Kaplan-Meier curves of overall survival in three age subgroups (0-69, 70-79, and 80+ years old). Figure S2. Kaplan-Meier curves of overall survival (A) and cancerspecific survival (B) in the patients with lung ADC-NOS and LPA after propensity score matching. Abbreviations: ADC, adenocarcinoma; LPA, lepidic adenocarcinoma; NOS, not otherwise specified. Figure S3. Kaplan-Meier curves of overall survival (A) and cancer-specific survival (B) in the patients with lung papillary adenocarcinoma and LPA. Abbreviations: LPA, lepidic adenocarcinoma. Figure S4. Kaplan-Meier curves of overall survival (A) and cancerspecific survival (B) in the patients with lung solid adenocarcinoma and LPA. Abbreviations: LPA, lepidic adenocarcinoma. Figure S5. Receiver operating characteristic curves of the nomograms predicting OS and CSS in the training and validation cohorts using a classic nomogram previously established for overall NSCLC patients. Receiver operating characteristic curves of 1-and 5-year OS in the training cohort (A) and the validation cohort (B). (Supplementary Materials)