A Novel Inflammatory and Nutritional Prognostic Scoring System for Nonpathological Complete Response Breast Cancer Patients Undergoing Neoadjuvant Chemotherapy

Background It has been demonstrated that inflammatory and nutritional variables are associated with poor breast cancer survival. However, some studies do not include these variables due to missing data. To investigate the predictive potential of the INPS, we constructed a novel inflammatory-nutritional prognostic scoring (INPS) system with machine learning. Methods This retrospective analysis included 249 patients with malignant breast tumors undergoing neoadjuvant chemotherapy (NAC). After comparing seven potent machine learning models, the best model, Xgboost, was applied to construct an INPS system. K-M survival curves and the log-rank test were employed to determine OS and DFS. Univariate and multivariate analyses were carried out with the Cox regression model. Additionally, we compared the predictive power of INPS, inflammatory, and standard nutritional variables using the Z test. Results After comparing seven machine learning models, it was determined that the XGBoost model had the best OS and DFS performance (AUC = 0.865 and 0.771, respectively). For overall survival (OS, cutoff value = 0.3917) and disease-free survival (cutoff value = 0.4896), all patients were divided into two groups by the INPS. Those with low INPS had higher 5-year OS and DFS rates (77.2% vs. 50.0%, P < 0.0001; and 59.6% vs. 32.1%, P < 0.0001, respectively) than patients with high INPS. For OS and DFS, the INPS exhibited the highest AUC compared to the other inflammatory and nutritional variables (AUC = 0.615, P = 0.0003; AUC = 0.596, P = 0.0003, respectively). Conclusion The INPS was an independent predictor of OS and DFS and exhibited better predictive ability than BMI, PNI, and MLR. For patients undergoing NAC for nonpCR breast cancer, INPS was a crucial and comprehensive biomarker. It could also forecast individual survival in breast cancer patients with low HER-2 expression.


Introduction
Breast malignant tumors, the most common malignancy, now more prevalent than lung cancer worldwide, are the primary cause of cancer-related deaths in women globally [1]. As breast cancer treatment continues to evolve, neoadjuvant chemotherapy (NAC) plays an increasingly important role in determining patient prognosis [2]. To decrease their clinical stage and increase their likelihood of undergoing breast-conserving surgery, NAC is an excellent option for patients with locally advanced breast cancer. Additionally, physicians may now be able to adjust treatments based on drug-sensitivity information [3].
A pathological complete response (pCR) is defined as a breast and lymph node free of invasive cancer on postoperative pathology, but carcinoma in situ of the breast is allowed [4]. Specifically, for the TNBC and HER-2 positive subtypes, achieving a pCR with neoadjuvant therapy predicts an excellent outcome and long-term survival [4,5]. In contrast, patients with nonpCR breast malignant tumors have a poor prognosis [6]. The CREAT-X trial's findings demonstrated that adjuvant capecitabine therapy could considerably increase OS and DFS in HER-2-negative breast cancer patients who did not achieve a pCR following NAC, with the TNBC group benefiting the most [7]. The KATHERINE study revealed that the 3-year invasive disease-free survival (iDFS) rate of T-DM1 was considerably higher than that of the trastuzumab group for patients who did not achieve a pCR following 6-8 cycles of neoadjuvant therapy [8]. Although the above drugs have improved the prognosis of breast cancer patients with a nonpCR, it is worth considering screening out patients with poor responses to NAC and the standard adjuvant therapy agents and implementing different treatment strategies.
Traditional biomarkers found to be closely connected to a pCR include tumor-infiltrating lymphocytes (TILs), p53, human epidermal growth factor receptor-2 (HER-2), Ki-67 index, estrogen receptor (ER), and progesterone receptor (PR) [9,10]. However, few biomarkers are specifically designed to predict the outcome of breast cancer patients with a nonpCR. Thus, it is essential and meaningful to construct a novel and convenient biomarker for patients with a nonpCR.
Several studies have recently examined the relationships among inflammation, nutrition, and malignant tumors [11]. Different inflammatory and nutritional parameters, such as body mass index (BMI), prognostic nutrition index (PNI), albumin to globulin ratio (AGR), neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR), monocyte to lymphocyte ratio (MLR), systemic immune-inflammation index (SII), and systemic inflammation response index (SIRI), as well as their combinations, have all been shown to be vital predictors for breast cancer patients [7][8][9][10][12][13][14][15]. However, a single variable can only provide limited information. Compared to models based on one or a few inflammatory indices, prognostic models combining multiple indicators can offer improved prediction accuracy [16,17].
Biomedicine has embraced machine learning techniques for predictive modeling and decision-making in contrast to conventional statistical methods since they have the potential to produce prediction models by conducting extensive searches across the parameter space [18]. Machine learning methods are more accurate across various subject areas than traditional logistic regression [19].
The above inflammatory and nutritional parameters have attracted extensive attention. However, few studies have comprehensively explored the relationship between these variables and the prognosis, particularly for breast cancer patients receiving neoadjuvant chemotherapy. Therefore, our study is aimed at constructing a novel inflammatorynutritional scoring (INPS) system based on machine learning models and to investigate its relationship with the outcomes of breast cancer patients with a nonpCR. Then, we compared its predictive ability with commonly used inflammatory and nutritional variables. Additionally, we conducted an exploratory analysis to discuss the relationship between INPS and the HER-2 low expression subtype, as it has become clear from an increasing number of studies that patients with HER-2 low expression breast cancer may have a different prognosis than those with HER-2 negative and positive breast cancer. The inclusion criteria included: (1) being diagnosed by pathology with an invasive, malignant breast tumors through core needle biopsy before NAC; (2) undergoing neoadjuvant chemotherapy and surgery at our hospital; (3) available clinical and pathological data, as well as follow-up data; and (4) a postoperative pathology report that indicated that the patient did not achieve a pCR.

Materials and Methods
The exclusion criteria included (1) achieving a pCR according to the postoperative pathology report; (2) being diagnosed with bilateral breast cancer or other particular types of breast cancer; (3) having distant metastasis; and (4) having an acute or chronic inflammatory disease, such as dermatomyositis.

Classification of Variables.
Peripheral venous blood samples were collected seven days before the first cycle of NAC, and the electronic medical records provided all of the patients' clinical and pathological data. The status of nonpCR was evaluated based on the postoperative pathological report.
Patients were divided into groups based on their median age and BMI (according to Chinese standards) [20]. This study used the eighth edition of the TNM staging system from the American Joint Committee on Cancer [21]. Breast cancer is classified into four main subtypes: luminal A, luminal B, HER-2 overexpression (HER2-OE), and triplenegative breast cancer (TNBC) [22]. A HER-2 IHC score of 1+ or 2+ with negative in situ hybridization (ISH) is considered low expression, a HER-2 IHC score of 0 is considered negative, and 3+ or 2+ with positive ISH is considered HER-2 positive [16].
With OS and DFS as the state variables, the maximally selected rank statistics were used to determine the best cutoff values for PNI, AGR, NLR, PLR, MLR, SII, SIRI, lymphocytes (L), neutrophils (N), monocytes (M), hemoglobin (Hb), platelets (P), albumin (ALB), and globulin (GLOB). Then, they were divided into low and high groups according to the following cutoff values: OS.PNI (60.4), OS.AGR      2.3. Follow-Up. Patients were followed up every three months after surgery for the first two years and then every six months for the following three years. Follow-up was up to five years after surgery or the date of death from any cause. OS was defined as the time between the date of operation and the date of death from any cause or last follow-up, and DFS was defined as the time from the date of surgery to the date of metastasis to distant organs, local recurrence, or death from any cause.

Machine Learning, Inflammatory and Nutritional
Variables. Seven robust machine learning models were used to predict OS and DFS, including logistic regression (LR), support vector classification (SVC), k-nearest neighbor classification (KNN), extreme gradient boosting (Xgboost), random forests (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost). This study adopted the hold-out method (simple cross-validation) to address the overfitting issue brought on by the small sample size. The performance of each model was compared through the area under the curve (AUC) of the receiver operating characteristic (ROC). The most effective machine learning model was used to determine the importance of the inflammatory and nutritional variables as features.
2.5. Statistical Analysis. Statistical analyses were conducted with Python (version 3.9), R software (version 3.6.1), and MedCalc software (version 19.0.7). The cutoff values of the INPS and hematological variables were determined by the maximally selected rank statistics through the maxstat.text function based on the "maxstat" package in R software [17], with an initial cutoff score of 1 being assigned to variables above the cutoff value and an initial score of 0 to variables below it. Frequencies and percentages (%) were applied to describe the categorical variables, while the chi-squared test or Fisher's exact test were used to assess differences. The median value of the continuous variables is presented with the interquartile range (IQR). The multicollinearity relationship among INPS, inflammatory and nutritional variables was tested by multiple linear regression analysis via variance inflation factor (VIF), with a VIF ≤ 2 considered noncollinear [23]. The Kaplan-Meier method was employed to estimate the survival curves, which were then compared by the log-rank test. The independent prognostic factors were determined with the Cox proportional hazards model, and pH assumptions were checked by the log minus log (LML) survival function. Abbreviations: INPS, inflammation and nutrition prognostic score; IQR, interquartile range; ER, estrogen receptor; PR, progesterone receptor; HER-2, human epidermal growth factor receptor 2; HER-2 OE, HER-2 overexpression; L, lymphocyte; N, neutrophil; M, monocyte; Hb, hemoglobin; P, platelet; ALB, albumin; GLOB, globulin; BMI, body mass index; PNI, prognostic nutritional index; AGR, albumin-globulin ratio; NLR, neutrophil-lymphocyte ratio; PLR, platelet-lymphocyte ratio; MLR, monocyte-lymphocyte ratio; SII, systemic immune inflammation index; SIRI, system inflammation response index. 6 Disease Markers

Disease Markers
The Z test was used to compare different groups' predictive functions, with a P value <0.05 indicating statistical significance.

Construction of INPS.
Multiple linear regression analysis was conducted to test the possibility of multicollinearity between the inflammatory and nutritional variables, which showed that all of the variables had a VIF ≤ 2. Eight inflammatory and nutritional variables were included in the seven machine-learning models to predict OS and DFS. The Xgboost model exhibited the highest AUC compared to other models for predicting OS or DFS (AUC = 0:865 and 0:771, respectively, Figures 1(a) and 1(b)). Then, the relative importance of the inflammatory and nutritional variables for predicting OS and DFS was calculated using the Xgboost model (Figures 1(c) and 1(d)). Variables below the respective cutoff value were scored 0, and those above the cutoff value were scored 1.   Additionally, the INPS was constructed based on these inflammatory and nutritional variables. Therefore, the Cox regression analysis excluded BMI, PNI, AGR, NLR, PLR, MLR, SII, and SIRI. The relationship between the inflammatory and nutritional variables and OS and DFS is illustrated in Table S1. Meanwhile, the pH assumptions were checked using the log minus log (LML)  Figures 4(a) and 4(b)). In the clinical T1 + T2 subgroup, patients with low INPS had significantly higher 5-year OS and DFS rates than those with high INPS (77.4% vs. 57.9%, X 2 = 6:9, P = 0:0087; 59.1% vs. 39.5%, X 2 = 5:3, P = 0:021, respectively, Figures 4(c) and 4(d)). In the  Figures 4(e) and 4(f)).

Discussion
This study investigated the clinical significance of a novel inflammatory-nutritional prognostic scoring (  [24,25]. Cancer-related inflammation occurs when cancer and inflammatory responses are entangled, resulting in a dramatically poor prognosis and a failure to respond to cancer therapy [11]. As a part of the inflammatory parameters, neutrophils may promote proliferation and metastasis by releasing inflammatory mediators [26]. Monocytes are also correlated with the metastasis and progression of malignant tumors [27]. In contrast, lymphocytes are essential for the antitumor effect [28]. Additionally, malnutrition is associated with cancer progression, as it may cause a poor immune response [29]. As a manifestation of malnutrition, poor survival is associated with low serum albumin levels [30]. As a holistic variable that incorporates many common inflammatory and nutritional variables, the utility of the INPS has been explored in other malignant tumors. Wang et al. found that preoperative INPS is an independent predictor of outcomes for stage III GC patients [31]. Hua et al. demonstrated that patients with high INPS had significantly worse survival than those with low INPS [32]. In that research, the authors chose the LASSO regression model to   Tables 3 and 4, respectively). Many studies have demonstrated that inflammatory and nutritional parameters are associated with survival; however, some of their results are inconsistent. According to a metaanalysis, a high NLR was significantly correlated with a poor pathological response in breast malignant tumor patients, with no association found with DFS or OS [33]. In contrast, another meta-analysis found that patients with high NLR and PLR had short OS and an increased risk of recurrence [34]. In addition, compared with NLR, which could only offer limited clinical information, our results noted that SII, an inflammatory parameter composed of neutrophils, platelets, and lymphocytes, was a better predictor of OS [9]. Therefore, we assumed that a biomarker integrated with various inflammatory and nutritional parameters should be more accurate than an individual biomarker. Our results proved that the INPS had a higher AUC for OS and DFS than the other inflammatory and nutritional variables. Pairwise comparisons of INPS, inflammatory and nutritional variables and the results of the Z test revealed that OS.INPS had a significantly larger AUC than OS.BMI, OS.PNI, and OS.MLR, and DFS.INPS had a substantially larger AUC than DFS.PNI.
We also conducted an exploratory analysis in patients with different clinical T stages, HER-2 statuses, Ki-67 indices, and P53 levels. Although significant survival differences could not be found in among above subgroups, patients with different INPSs showed considerable differences in OS and DFS. Especially in the distinct HER-2 status subgroups, patients with low INPS had better OS and DFS in HER-2 negative and low expression subgroups, with no difference observed in the HER-2 positive group. More recent studies have shown that breast cancer patients with low HER-2 expression have improved 3-year OS and DFS compared to HER-2-negative patients [35]. However, it is unclear whether low HER-2 expression is correlated with the long-term prognosis in breast cancer patients. Thus, the INPS may be a promising biomarker for HER-2 low breast cancer patients.
Although comprehensive and novel, this study had some limitations. First, it was a retrospective analysis conducted in a single center, and validation with data from additional centers may be necessary. Second, a more extended follow-up period is necessary to identify the long-term clinical significance of INPS. Last, the dynamic changes in INPS should be explored to identify its predictive ability more fully.

Conclusions
For nonpCR breast cancer patients receiving NAC, the INPS based on eight common inflammatory and nutritional variables is an independent predictor of survival. As a comprehensive parameter, it is superior to BMI, PNI, and MLR in predicting survival time. Additionally, it may be a promising biomarker for breast cancer with low HER-2 expression.

Data Availability
Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

Ethical Approval
The ethics committee of Harbin Medical University Cancer Hospital approved this research. It complies with the World Medical Association Declaration of Helsinki in 1964 and its later amendments. All patients signed informed consent before each treatment.

Conflicts of Interest
There are no conflicts of interest for all authors.

Authors' Contributions
Cong Jiang and Yuanxi Huang conceptualized and designed the work. Yuting Xiu, Shiyuan Zhang, and Xiao Yu collected the data. Cong Jiang and Kun Qiao drafted and analyzed the manuscript. All authors contributed to the article and approved the submitted version. Cong Jiang and Yuting Xiu contributed equally to this work.