Establishment and Evaluation of EGFR Mutation Prediction Model Based on Tumor Markers and CT Features in NSCLC

Background Lung cancer has become one of the leading causes of cancer deaths worldwide. EGFR gene mutation has been reported in up to 60% of Asian populations and is currently one of the main targets for genotype-targeted therapy for NSCLC. Objective The objective is to determine if a complex model combining serum tumor makers and computed tomographic (CT) features can predict epidermal growth factor receptor (EGFR) mutation with higher accuracy. Material and Methods. Retrospective analysis of the data of patients diagnosed with in nonsmall cell lung cancer (NSCLC) by EGFR gene testing was carried out in the Department of Thoracic Surgery, Jinan Central Hospital. Multivariate logistic regression analysis was used to determine the independent predictors of EGFR mutations, and logistic regression prediction models were developed. The subject operating characteristic curve (ROC) was plotted, and the area under the curve (AUC) was calculated to assess the accuracy and clinical application of the EGFR mutation prediction model. Results Logistic regression analysis identified the predictive factors of EGFR mutation including nonsmoking, high expression level of Carcinoembryonic Antigen (CEA), low expression level of cytokeratin 19 fragments (CYFRA21-1), and subsolid density containing ground-glass opacity (GGO) component. Using the results of multivariate logistic regression analysis, we built a statistically determined clinical prediction model. The AUC of the complex prediction model increased significantly from 0.735 to 0.813 (p = 0.014) when CT features are added and from 0.612 to 0.813 (p < 0.001) when serum variables are added. When P was 0.441, the sensitivity was 86.7% and the specificity was 65.8%. Conclusion A complex model combining serum tumor makers and CT features is more accurate in predicting EGFR mutation status in NSCLC patients than using either serum variables or imaging features alone. Our finding for EGFR mutation is urgently needed and helpful in clinical practice.


Introduction
Lung cancer has become one of the leading causes of cancer deaths worldwide [1]. EGFR gene mutation has been reported in up to 60% of Asian populations and is currently one of the main targets for genotype-targeted therapy for NSCLC [2]. e discovery of structural domain activating mutations in EGFR tyrosine kinase promoted the concept of targeted therapy [3]. EGFR tyrosine kinase inhibitors (TKIs) were the first targeted agents for the treatment of NSCLC [4]. Currently, definitive diagnosis of EGFR mutation status is mainly detected from genomic DNA samples obtained from tumor tissues. However, tissue samples are difficult to obtain for EGFR mutation analysis, and tumor heterogeneity has an impact on accurate detection of EGFR mutations [5][6][7]. Circulating tumor DNA (ctDNA) in plasma samples can be an alternative method to detect EGFR mutations, but the results do not always agree with those of biopsy samples due to tumor heterogeneity, the false-negative rate is relatively high, and the cost is very high [4,7,8].
Serum tumor markers are mainly used in clinical practice to screen high-risk groups, observe and evaluate the effect of tumor treatment, and monitor the progression and recurrence of tumors. e results of the value of different serum tumor markers are inconsistent including significantly higher expression levels of CEA and carbohydrate antigen 199 (CA199), and lower expression level of CYFRA21-1 [4,[9][10][11].
In the diagnosis of lung cancer, CT is a routinely used and relatively cost-effective method that exhibits a variety of imaging features and can provide free data for genomics. Some studies have shown that the tumors typically presented with GGO were correlated with EGFR mutation [12,13], while others found an inverse relationship or the lack of correlation [14]. In addition, other CT image features that have been found to be related with EGFR mutation include the maximum tumor diameter, spiculated margins, and the air bronchial sign [15,16].
Although serum tumor markers and CT features show values in evaluating EGFR mutation status, the accuracy of these two methods by themselves is low. To our knowledge, it remains unclear whether the combination of these two methods can better diagnose EGFR mutation, and there are few studies combining these features to build predictive models. erefore, we aim to construct a comprehensive model and evaluate its clinical application by analyzing serum tumor markers and CT imaging features in NSCLC patients.

Patient Selection.
We included patients with pathologically diagnosed NSCLC between January 2018 and June 2020 at Jinan Central Hospital. Inclusion criteria included (1) pathological diagnosis of NSCLC by surgical resection, (2) serum tumor marker testing for lung cancer was performed at our hospital, (3) preoperative thin-section CT images, and (4) complete clinical information. Exclusion criteria included (1) no documented EGFR mutation testing, (2) history of previous antitumor therapy, (3) difficulty in outlining tumor margins, and (4) incomplete clinical data. Clinical data were collected, including patient gender, age, smoking history, pathological type, and clinical stage.

EGFR Mutation
Detection. EGFR mutation was detected by experienced clinicians in the Department of Pathology of Jinan Central Hospital Hospital using surgically resected specimens. If an exon mutation was detected in EGFR exons 18-21, the tumor was considered to be EGFR mutant.

Image Acquisition and Feature Extraction.
Images are viewed and analyzed by 2 imaging physicians in a doubleblinded fashion on a PACS system. All examinations are extended in an intracranial direction with or without the use of contrast media. All images are archived in digital format. e following data were recorded: (1) maximum diameter (mm) of the lesion; (2) margins; (3) lesion density; and (4) lesion site. When 2 physicians disagreed, a higher level physician was asked to perform the analysis and reach a consensus result.

Statistical Analysis.
e association of clinical characteristics, serum tumor marker levels, and CT image features with EGFR mutation was investigated by univariate analysis. e predictive factors of EGFR mutation was identified by logistic regression analysis and then built a statistically determined clinical prediction model. ROC was produced, and AUC was calculated to assess the accuracy of the prediction model. p < 0.05 was considered a statistically significant difference.

Relationship between EGFR Mutation Status and Clinical
Characteristics. A total of 148 NSCLC patients, 70 men (47.3%) and 78 women (52.7%), with a mean age of 62.9 ± 10.28 years, were included in this study. e patient EGFR mutation rate was 50.6%. e results showed that the mutation rate was significantly higher in women, in nonsmoking patients, and in adenocarcinoma patients, with statistically significant differences (p < 0.05), while the differences in age and clinical stage of tumors were not statistically significant (p > 0.05) when compared between the mutant and wild-type groups.

Correlation of EGFR Mutation with Serum Tumor Markers.
e results showed that CYFRA21-1 levels were significantly higher in the wild-type group (p < 0.001); CA199 levels were higher in the mutant group, with statistically significant differences (p < 0.05), while CA125, CEA, and NSE levels in the mutant group were not statistically significant compared with those in the wild-type group (p > 0.05).

Correlation of EGFR Mutation with CT Features.
e results showed that the maximum tumor diameter was larger in the wild-type group, and the difference was statistically significant (p < 0.05); the proportion of semisolid (with ground-glass density) was significantly higher in the mutant group (p < 0.001); and the differences in lesion location, lobulated sign, and spiculated margins were not statistically significant when compared between the mutant and wildtype groups (p > 0.05).

Possible Predictors and Prediction Model.
Univariate logistic regression analysis identified independent predictors with statistical significance (Tables 1-3). Multifactorial analysis was performed using dichotomous logistic regression, and the results showed that nonsmoking (p � 0.003), high CA199 expression (p � 0.001), low expression of CYFRA21-1 (p < 0.001), and semisolid density containing GGO component (P � 0.003) were independent risk factors for the development of mutations in the EGFR gene (p < 0.05), as detailed in Table 4.

Discussion
Although TKIs can improve the prognosis of patients with EGFR mutations and have significant efficacy in patients with gene mutations, the detection rate of EGFR mutation is lower than expected [17]. Biopsy, the gold standard for EGFR mutation detection, may be limited by the lack of available tissue samples because biopsy and cytology specimens are first used for histological testing to confirm cancer type. In addition, patient refusal to undergo invasive biopsy, location or size of the tumor, difficulty in biopsy sampling, and potential risk of cancer metastasis also limit detection rates [18]. In this study, we attempted to evaluate the effectiveness of a complex prediction model wherein serum markers and CT features are combined. Previous demographic analyses have shown that a high prevalence of EGFR mutation is associated with female, nonsmokers, adenocarcinoma tissue type, and East Asian populations [4,19], which is consistent with the our study results. Furthermore, we found that smoking history was an independent predictor of EGFR mutation by multivariate analysis, consistent with the study by Sabri et al. [20].
Serum tumor markers can be tested quickly and accurately in the hospital at a low cost [11]. Preoperative serum tumor markers have been shown to correlate with EGFR mutation and the efficacy of EGFR-TKI therapy [21]. erefore, it is practical to use STMs to predict EGFR mutation [22]. Our current study demonstrated that the serum CA199 level was significantly higher in the EGFR mutation group while CYFRA21-1 was significantly increased in the wild-type group. Zhang et al. found that serum CEA levels could be used as a predictive tumor marker for the efficacy of EGFR-TKI therapy [23]. Our study did not identify a significant difference in the CEA levels, which is worth to note that over half (77 out of 148 cases) of our selected NSCLC patients were at an early stage of NSCLC development. e CA125 and NSE levels were not significantly altered by mutations in EGFR gene, consistent with a previous report [20].
CT is a routinely used and relatively economical modality for diagnosing lung cancer, and it presents a variety of imaging features that may be used to identify patients with NSCLC who are at risk for EGFR mutations. We found that EGFR mutation was associated with a smaller maximum diameter and subsolid density. Although univariate analysis found tumor size to be a factor associated with EGFR mutation, multifactorial analysis showed that tumor size was not a strong independent predictor, consistent with Rizzo et al. [24]. In addition, our study showed no significant differences in terms of lesion location, lobulated sign, and spiculated margins, which were consistent with some previous studies [20,25] but contrary to the findings of Zhou et al. [26].
Although serum tumor markers and CT features can assess EGFR mutation status in NSCLC, the accuracy of predicting EGFR mutations by these two methods alone is not sufficient. To our knowledge, it remains unclear whether the combination of these two methods can better diagnose EGFR mutation status, and there are few clinical prediction models for EGFR mutations combining these features [20,27]. We identified the independent predictors including nonsmoking, high CA199 expression, low CYFRA21-1 expression, and semisolid density containing a ground-glass opacity (GGO) component. e model was p � e x /(1 + e x ),

Conclusion
In conclusion, EGFR mutation models constructed from serum tumor markers and CT features have good predictive efficacy. When properly combined, the complex model can have better predictive performance and higher diagnostic accuracy, facilitating clinical practice in identifying candidates for targeted therapy.
Our study has several limitations. First, it was a singlecenter retrospective study with a relatively small sample size and lack of external validation, which potentially compromise the generalization, sensitivity, and specificity of our model. erefore, it needs to develop uniform standards for multicenter studies and to establish and test multicenter data. Second, squamous cell carcinoma (SCC) antigen was not included in this clinical prediction model due to incomplete documentation of these tumor markers in the HIS system. erefore, we recommend refinement of our model prior to further validation. Finally, this study demonstrated that the integrated model has good predictive performance, but the accuracy is limited by its logistic regression method. Models built by different methods such as random forest and elastic network regression should be combined to develop the model.

Data Availability
e datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.