Construct and Validate a Predictive Model for Surgical Site Infection after Posterior Lumbar Interbody Fusion Based on Machine Learning Algorithm

Purpose Surgical site infection is one of the serious complications after lumbar fusion. Early prediction and timely intervention can reduce the harm to patients. The aims of this study were to construct and validate a machine learning model for predicting surgical site infection after posterior lumbar interbody fusion, to screen out the most important risk factors for surgical site infection, and to explore whether synthetic minority oversampling technique could improve the model performance. Method This study reviewed 584 patients who underwent posterior lumbar interbody fusion for degenerative lumbar disease at our center from January 2019 to August 2021. Clinical information and laboratory test data were collected from the electronic medical records. The original dataset was divided into training set and validation set in a 1 : 1 ratio. Seven machine learning algorithms were used to develop predictive models; the training set of each model was resampled using synthetic minority oversampling technique. Finally, the model performance was assessed in the validation set. Results Of the 584 patients, 33 (5.65%) occurred surgical site infection. Stepwise logistic regression showed that preoperative albumin level (OR 0.659, 95% CI 0.563-0.756), diabetes (OR 9.129, 95% CI 3.816-23.126), intraoperative dural tear (OR 8.436, 95% CI 2.729-25.334), and rheumatic disease (OR 8.471, 95% CI 1.743-39.567) were significant predictors associated with surgical site infection. The performance of the AdaBoost Classification Trees model was the best among the seven machine learning models, and synthetic minority oversampling technique improved the performance of all models. Conclusion The prediction model we constructed based on machine learning and synthetic minority oversampling technique can accurately predict surgical site infection, which is conducive to clinical decision-making and optimization of perioperative management.


Background
Posterior lumbar interbody fusion (PLIF) is a classic operation for the treatment of lumbar degenerative diseases such as lumbar disc herniation, lumbar spinal stenosis, and lumbar spondylolisthesis. Surgical site infection (SSI) is a serious and costly complication, and the reported incidence varies from 0.2% to 16.1% [1]. Surgical site infections can lead to catastrophic consequences such as instrumentation failure, osteomyelitis, pseudoarthrosis, prolonged hospitalization, increased hospital costs, readmissions, and even sepsis or death, increasing patient suffering and placing a heavy burden on families [2]. With the aggravation of the aging population, the number of patients with degenerative lumbar diseases has gradually increased, and correspondingly, the number of those who need to perform this procedure has also increased; at the same time, surgical site infections are increasing; this poses a serious challenge to family and social health systems [3]. Therefore, developing an accurate predictive model for early identification of patients at high risk of surgical site infection and targeted intervention is the most cost-effective approach.
In recent years, artificial intelligence has played an important role in the medical field, such as coronavirus disease 2019 (COVID-19) diagnosis [4], detection of gastrointestinal polyps [5], retinal vessel segmentation [6], image diagnosis of lung cancer [7], diagnosis of atrophic gastritis [8], and confidentiality management of electronic medical records on the cloud [9]. Machine learning, a form of artificial intelligence, combined with medical big data can create algorithms that rival those of human doctors [10]. Unfortunately, few studies have applied machine learning algorithms to predict surgical site infection after posterior lumbar interbody fusion. Therefore, we trained seven machine learning prediction models to early predict the risk of surgical site infection after PLIF using easily available preoperative and intraoperative factors. However, given that most patients after posterior lumbar interbody fusion do not develop surgical site infections, such data structures suffer from category imbalance (which refers to the unequal number of samples between categories in the classification problem), and the effectiveness of machine learning algorithms in this situation is reduced [11]. However, synthetic minority oversampling technique (SMOTE) is a common method to deal with unbalanced data [12]. Therefore, this study uses SMOTE to optimize our machine learning prediction model.
Our study is aimed at developing and validating a machine learning prediction model for surgical site infection after PLIF. To the best of our knowledge, our study is the first to combine SMOTE with multiple machine learning algorithms to develop and validate a predictive model for surgical site infection after PILF. Clinicians can identify high-risk patients with surgical site infection early through this prediction model, which is helpful to optimize patient selection and perioperative management. Early preventive intervention in this population can reduce the occurrence of serious complications and may prevent the occurrence and development of surgical site infection.

2.1.
Patients. This study was approved by the ethics committee of The First Affiliated Hospital of Chongqing Medical University. And the informed consent was waived for the retrospective study. From January 2019 to August 2021, a total of 584 patients underwent posterior lumbar interbody fusion (PLIF) at our center for degenerative lumbar disease. Inclusion criteria were as follows: (1) age ≥ 18 years; (2) diagnosis of lumbar degenerative diseases, including lumbar disc herniation, lumbar spinal stenosis, spondylolisthesis, and lumbar instability based on lumbar magnetic resonance imaging (MRI) and clinical manifestations; (3) undergoing primary single-level or multilevel PLIF surgery. The exclusion criteria were as follows: (1) patients with a previous history of open lumbar surgery; (2) patients with preoperative concurrent active infection of the spine or other parts of the body, spinal deformity, and tumors.
All operations and perioperative management were performed by the same experienced spine surgical team. All procedures were performed in a standard vertical stratospheric operating room. We performed antibiotic prophylaxis 30 minutes before the start of surgery and extended it to 72 hours after surgery, and uniform criteria were adopted for the type, time, and dose of perioperative antibiotics. All patients were asked to follow the same wound care and functional exercise protocol.
SSI was defined according to the Centers for Disease Control (CDC) and prevention criteria [13,14]. Patients who meet one of the following conditions can be diagnosed with SSI (monitored for 90 days after surgery): (1) clinical manifestations such as redness, swelling, heat, pain, tenderness, and/or purulent drainage appear in the wound; (2) the abscess was aspirated from the wound surface, and the culture was positive; (3) positive fluid or tissue culture collected during revision surgery; (4) histopathological and radiological examination confirmed SSI evidence; (5) SSI is diagnosed by the surgeon and clearly recorded in the medical record. SSI can be divided into superficial infection and deep infection according to the location of occurrence.

Data
Collection. The following clinical information of the patients was retrospectively collected through the electronic medical record system, surgical anesthesia system, and mobile nursing system: the clinical information of the patients, including age, the American Society of Anesthesiologists (ASA) classification, New York Heart Association (NYHA) classification, body mass index (BMI), smoking, drinking, whether there was a history of rheumatic disease, whether there was osteoporosis, hypertension, diabetes, diagnosis, and whether it was cold season or warm season at discharge. Routine laboratory tests, including routine blood tests, liver function tests, and renal function tests, were collected. At the same time, we recorded surgery-related parameters, including operation time, estimated intraoperative blood loss, number of fusions, and whether there was dural tear during the operation. Smoking status was classified as current smoking, regardless of the amount or type of tobacco smoked, and all passive smokers and former smokers were considered nonsmokers. The American Society of Anesthesiologists physical status is a classification that evaluates a patient's physical status before surgery [15]; it is also commonly used in preoperative risk prediction in recent years [16], coded according to the 1963 American Society of Anesthesiologists five-level classification system of physical conditions (1 = a healthy individual, 2 = mild systemic disease, 3 = severe systemic disease, 4 = persistent lifethreatening severe systemic disease, and 5 = a dying person who is not expected to survive with or without operation). Patients with rheumatoid arthritis, ankylosing spondylitis, psoriatic arthritis, or systemic lupus erythematosus were considered to have a history of rheumatic diseases. [12]. We divided the original dataset into training set and verification set according to 1 : 1 and resampled the training set of each model using the synthetic minority oversampling technique. It should be pointed out that we did not resample the validation set. 2 Computational and Mathematical Methods in Medicine

Development and Validation of Machine Learning
Models. In this study, univariate logistic regression and multivariate logistic regression were used, and then, factors that were significant in both univariate and multivariate analyses were included in stepwise logistic regression to determine the important factors associated with SSI after PLIF. Next, the dataset was randomly divided into training set and validation set, each accounting for 50% of the study cohort. In order to solve the problem of data imbalance, SMOTE algorithm was used to preprocess the training set. Machine learning algorithms (Boosted Classification Trees [17], Boosted Logistic Regression [18], Extreme Gradient Boosting [19], Stochastic Gradient Boosting [20], Generalized Linear Model [21], AdaBoost Classification Trees [22], and Random Forest [23]) model the training set, then the accuracy of the model was verified in the validation set.

Model Evaluation.
To evaluate the performance of the machine learning model, the confusion matrix, accuracy, precision, recall, F1 score, F3 score, and the area under the receiver-operating characteristic (AUC) value of the machine learning model were calculated in the validation set. Among them, accuracy, precision, recall, and Fα score determined by the following formula: in these formulas, TP: true positive; TN: true negative; FP: false positive; FN: false negative. Confusion matrix is a form of summarizing prediction results of classification prediction model in machine learning. The rows of confusion matrix represent predicted values, and the columns of the matrix represent true values. Fα score is the result of comprehensive consideration of precision and recall, indicating that the weight of recall is α times of precision weight in the scoring generation process. Fα score was, respectively, calculated when α is 1, 2, and 3, and F3 was finally determined as the evaluation index of the model. A good model should have high Fα scores and AUC values when evaluating the performance of different machine learning algorithms. Compare the predictive performance of the seven machine learning models before and after preprocessing training sets using synthetic minority oversampling technique. The algorithm with the best performance was taken as the final prediction model, and the importance of variables was ranked.

Logistic Regression.
Univariate analysis showed that the factors with statistical significance (P < 0:05) were age, number of fusion levels, intraoperative dural tear, diabetes, history of rheumatic disease, preoperative red blood cell count, preoperative albumin level, and ASA grade ( Figure 1). Multivariate analysis showed that factors with statistical significance (P < 0:05) included intraoperative dural tear, diabetes, history of rheumatic disease, and preoperative albumin level (Figure 2(a) Figure 1: Univariate logistic regression. LSS: lumbar spinal stenosis; LSO: lumbar spondylolisthesis; EBL: estimated blood loss; DT: dural tear; RD: rheumatic disease; CHD: coronary heart disease; Pre WBC: preoperative white blood cell count; Pre RBC: preoperative red blood cell count; Pre Hb: preoperative hemoglobin; Pre erythrocyte volume: preoperative erythrocyte volume; Pre PLT: preoperative platelets; Pre neutrophil percentage: preoperative neutrophil percentage; Pre lymphocyte percentage: preoperative lymphocyte percentage; Pre Alb: preoperative albumin; Pre globulin: preoperative globulin; Pre ALT: preoperative alanine aminotransferase; Pre AST: preoperative aspartate aminotransferase; Pre creatinine: preoperative creatinine; ASA: American Society of Anesthesiologists physical status; NYHA: New York Heart Association Class; BMI: body mass index. * P value < 0.05; * * P value < 0.01; * * * P value < 0.001. 5 Computational and Mathematical Methods in Medicine prediction performance of the remaining models for patients at high risk of infection was also significantly improved by synthetic minority oversampling technique, as shown in Figure 4.

Variable Importance.
In the AdaBoost Classification Trees model, the relative importance of variables is shown in Figure 6, in descending order of importance as follows: preoperative albumin level, diabetes, intraoperative dural tear, and history of rheumatic disease.

Discussion
In this study, we developed and validated a predictive model for surgical site infection after posterior lumbar interbody fusion using multiple machine learning algorithms and    7 Computational and Mathematical Methods in Medicine SMOTE. We found that SMOTE used in the training set improved the performance of the prediction model, and the AdaBoost Classification Trees model combined with SMOTE provided the best performance compared to other models. This predictive model based on SMOTE and machine learning can help early identify patients at high risk for surgical site infection, optimize perioperative management, and facilitate clinical decision-making.
Surgical site infection has always been a concern for spinal surgeons; the surgical site infection rate after posterior lumbar interbody fusion was 5.65%, which was consistent with previous studies [1]. Although studies have reported risk factors for surgical site infection after lumbar interbody fusion [24], however, these studies only described risk factors as relative risk (RR) or odds ratio (OR), which is not sufficient to comprehensively assess the risk of surgical site infection after PLIF for individual patients. Therefore, we used machine learning algorithms to develop a predictive model for surgical site infection after posterior lumbar inter-body fusion, which is the first prediction model to predict surgical site infection after PILF using synthetic minority oversampling techniques and machine learning algorithms in imbalanced datasets.
Synthetic minority oversampling technique is an algorithm that combines oversampling of minority classes with undersampling of majority classes. It is a common method to deal with data imbalance. It can construct new minority samples rather than directly copy the minority samples, that is, the data constructed by the algorithm is new samples and does not exist in the original dataset [25]. It selects two or more similar samples under the small category based on the distance measure, then selects one of the samples, and randomly selects a certain number of adjacent samples to add noise to an attribute of the selected sample, so as to construct more new data [12].
Unbalanced data refers to the unequal number of samples between categories in classification problems [25]. In our study, patients with SSI accounted for 5.65%, while patients without SSI accounted for 94.35%. Therefore, data imbalance existed in this study. When dealing with the classification problem of imbalanced data, machine learning prediction models tend to predict all results into most classes to achieve high accuracy [26]. However, when minority categories are more important (in this case, identifying patients at high risk for surgical site infection is more important), imbalanced data often leads to poor predictive performance. The synthetic minority oversampling technique and ensemble learning method are commonly used to deal with data imbalance [27]. Therefore, we used SMOTE to oversampling the minority (abnormal) classes and undersampling the majority (normal) classes in the training set to overcome this problem (class imbalance) and optimize our machine learning algorithm [12]. Previous studies have also shown that synthetic minority oversampling technique helps improve model accuracy without compromising research results [26,28]. Our study confirmed that applying synthetic minority oversampling technique to the training set can improve the performance of machine learning prediction models when we need to improve the sensitivity of the model without losing too much specificity.
Since the purpose of the prediction model proposed in this study is to identify patients at risk for surgical site infection, the sensitivity of the model is more important than the  Fα score is a model evaluation index that comprehensively considers precision and recall, indicating that the weight of recall is α times of precision when generating scores. α < 1 indicates that the precision of the model is more important. A > 1 indicates that the recall of the model is more important. In clinical practice, early identification and stratification of patients at high risk of infection may be beneficial for better prevention of surgical site infection. We did not want to miss any patients at high risk of postoperative infection, that is, we wanted to emphasize the recall rate of the model over the precision. We calculated Fα scores when α values were 1, 2, and 3, respectively, and finally determined F3 as the most important index to evaluate the model performance. The results of this study also prove that it is necessary to appropriately expand the α value in the study of severe and infrequent complications.
Our study further confirms that low preoperative albumin levels, diabetes, history of rheumatic disease, and intraoperative dural tear are risk factors for surgical site infection after posterior lumbar interbody fusion, which is consistent with previous findings [24,29]. Therefore, preoperative optimization of nutritional status, perioperative monitoring of albumin, and careful intraoperative operation to avoid dural injury may help prevent surgical site infection.
To the best of our knowledge, this study is the first to use SMOTE combined with machine learning algorithms to develop and validate a predictive model for surgical site infection after PILF. Clinicians can use this prediction model to preliminarily identify the high-risk population for SSI and conduct early preventive intervention to reduce the incidence of serious complications. Examples include correction of hypoproteinemia and careful intraoperative procedures to avoid dural tears. In addition, since most lumbar degenerative diseases are elective surgeries, clinicians can preliminarily judge the risk of surgical site infection through the prediction model proposed in this study, grasp the timing of surgery, weigh the advantages and disadvantages of surgery, and answer the consultation of patients about infection complications. The synthetic minority oversampling technique is an effective method to improve the prediction performance of machine learning prediction models for unbalanced datasets. Similar imbalanced data exist for many diseases and postoperative complications [30]; synthetic minority oversampling technique used in this study can be applied to study of other diseases. Synthetic minority oversampling technique may be a feasible method to improve the performance of machine learning prediction model as a data pretreatment process.

Conclusion
Our prediction model based on machine learning and SMOTE can successfully predict patients at high risk of infection. It is helpful for clinicians to optimize patient selection and timing of surgery (such as elective surgery after correcting hypoalbuminemia in high-risk patients) and answer patients' consultation on infection complications; early identification and early intervention can reduce the occurrence of serious complications and may prevent the occurrence of surgical site infections. The method adopted in this study also provides reference for the study of other diseases and complications. The limitation of this study is that the single-center retrospective study may introduce selection bias and limit its generalization, which needs to be verified in more and broader populations in the future. Meanwhile, in addition to the methods used in this study, we are looking 9 Computational and Mathematical Methods in Medicine forward to more future research using some of the most representative computational intelligence algorithms which can be used to solve the problems, like monarch butterfly optimization (MBO) [31], earthworm optimization algorithm (EWA) [32], elephant herding optimization (EHO) [33], moth search (MS) algorithm [34], slime mould algorithm (SMA) [35], hunger games search (HGS) [36], Runge Kutta optimizer (RUN) [37], colony predation algorithm (CPA) [38], and Harris hawks optimization (HHO) [39].

Data Availability
The datasets generated and/or analyzed during the current study are not publicly available due to the data is confidential patient data but are available from the corresponding author on reasonable request.

Ethical Approval
This study was approved by the ethics committee of The First Affiliated Hospital of Chongqing Medical University.

Consent
The informed consent was waived for the retrospective study.