Artificial Intelligence and Big Data Technologies in the Construction of Surgical Risk Prediction Model for Patients with Coronary Artery Bypass Grafting

The objective of this work was to predict the risk of mortality rate in patients with coronary artery bypass grafting (CABG) based on the risk prediction model of CABG using artificial intelligence (AI) and big data technologies. The clinical data of 2,364 patients undergoing CABG in our hospital from January 2019 to August 2021 were collected in this work. Based on AI and big data technology, business requirement analysis, system requirement analysis, complication prediction module, big data mining technology, and model building are carried out, respectively; the successful CABG risk prediction system includes case feature analysis service, risk warning service, and case retrieval service. The commonly used precision, recall, and F1-score were adopted to evaluate the quality of the gradient-boosted tree (GBT) model. The analysis proved that the GBT model was the best in terms of precision, F1-score, and area under the receiver operating characteristic curve (ROC). According to the CABG risk prediction model, 1,382 patients had a score of <0, 463 patients had a score of 0 ≤ score ≤ 2, 252 patients had a score of 2 < score ≤ 5, and 267 patients had a score of >5, which were stratified into four groups: A, B, C, and D. The actual number of in-hospital deaths was 25, and the in-hospital mortality rate was 1.05%. The mortality rate predicted by the CABG risk prediction model was 2.67 ± 1.82% (95% confidential interval (CI) (2.87–2.98)), which was higher than the actual value. The CABG risk prediction model showed the credible results only in group B with AUC = 0.763 > 0.7. In group B, 3 patients actually died, the actual mortality rate was 0.33%, and the predicted mortality rate was 0.96 ± 0.78 (95% CI (0.82–0.87)), which overestimated the mortality rate of patients in group B. It successfully constructed a CABG risk prediction model based on the AI and big data technologies, which would overestimate the mortality of patients with intermediate risk, and it is suitable for different types of heart diseases through continuous research and development and innovation, and provides clinical guidance value.


Introduction
Coronary artery bypass grafting (CABG) is one of the surgical methods commonly used in the clinical treatment of coronary heart disease (CHD). It can signifcantly relieve the symptoms of chest tightness, shortness of breath, and exercise limitation caused by coronary ischemia in patients with CHD, and reduce the incidence of angina pectoris and myocardial infarction [1][2][3]. As the number of patients undergoing CABG increases year by year, some patients may even need to undergo bypass surgery to relieve their condition, which makes cardiac surgeons face severe challenges in terms of patient condition assessment and surgical risk prediction [4][5][6]. If the patient's surgical risk can be evaluated in advance, and an individualized treatment plan tailored to the patient's own can be prepared, the success rate of CABG will be greatly improved and the patient's prognosis will be improved. Terefore, establishing a disease risk prediction model based on artifcial intelligence and big data analysis technology is of high meaning [7,8]. Preoperative risk prediction is a method to quantitatively evaluate the risk level according to the physiological status of patients, basic diseases, the efects of anesthesia, and surgery on the body. According to the risk level of patients, anesthesiologists formulate and revise anesthesia and perioperative medical plans and take preventive measures, so as to reduce potential risks and improve safety and medical care quality.
With the development and wide application of electronic technology, a large-scale information database integrating multicenter has been established and improved [9][10][11]. Scholars have made a statistical analysis of large sample data from multiple centers so that the related research on CABG risk assessment has gradually developed from a single-center study to multicenter joint research, and the research scale has expanded from the number of thousands of cases to hundreds of thousands. At present, there are more than ten kinds of cardiac surgery risk scoring systems, among which EuroSCORE, Cleveland risk-scoring system, and Ontario risk-scoring (OPR) system are commonly used internationally [12][13][14]. Compared with patients in Western developed countries, Chinese patients are diferent in demographics, comorbidities, preoperative risk factors, disease course, and even medication compliance. Moreover, compared with Western bypass grafting, most of the bypass grafting in China is of-pump. Terefore, the cardiac surgery risk assessment system based on patient data in Western countries cannot better assess the surgical risk of Chinese patients [15][16][17][18]. Terefore, it is urgent for doctors to fnd a breakthrough in the surgical risk prediction mode with the help of machine learning, big data analysis, and other technologies.
Terefore, this work collected the clinical data of patients undergoing CABG in the Afliated Changzhou No.2 People's Hospital of Nanjing Medical University from January 2019 to August 2021, conducted business demand analysis, system demand analysis, complication prediction module, big data mining technology, and model establishment, and successfully constructed the risk prediction model of CABG based on AI and big data technology, so as to test the application performance of this model for patients undergoing CABG in our center, and provide a reference for clinical selection of risk prediction model of CABG.

Research Objects.
In this work, it collected the clinical data of 2,364 routine CABG patients in the Afliated Changzhou No.2 People's Hospital of Nanjing Medical University from January 2019 to August 2021. It mainly included the basic demographic data, preoperative medical history, preoperative laboratory examination data, coronary angiography data, cardiac ultrasound data, surgical data, postoperative complications, and discharge follow-up data. Death was the primary study endpoint. Te CABG risk prediction model based on AI and big data technologies constructed in this work was applied to calculate the score of each patient.

Business Need Analysis.
Te CABG risk prediction model needed to implement three parts of services, namely, case feature analysis service, risk early-warning service, and case retrieval service. Te service process of the CABG risk prediction model is shown in Figure 1.
Te model included an efcient and intuitive case feature analysis service that assisted clinicians in assessing the risk of complications in a case. First, it was necessary to use the data acquisition module to automatically collect the relevant data of the cases, adopt the data processing module to flter and clean the characteristic data of the cases, and then apply the feature engineering of the myocardial infarction prediction module to mine and analyze the key factors. Finally, the output results were displayed with the visualization module. Clinicians can evaluate the distribution of key pathogenic factors and risk factors based on the visualization module, such as the level of factor scores and the degree of ft of factor values.
Te model included a risk early warning service, which can assist clinicians in monitoring the risk of complications in cases. Te service needed to go through the data acquisition module and data processing module step by step, adopt the risk prediction module to realize the binary classifcation of the occurrence of complications, and fnally display the output results in the form of probability. Different colors represented diferent probability results. Te larger the probability value, the darker the color. If the probability value exceeded the confgured threshold, an alert message would be pushed. Te message push module can be confgured according to the administrator's rules, and send multichannel messages such as phone calls, text messages, and emails.
Te case retrieval service needed to complete the rapid screening of in-hospital case data and retrieval of combined conditions, provide case diagnosis and treatment data and medical record details, assist doctors in quickly locating cases, and fnally select the most appropriate diagnosis and treatment methods and management plans based on the analysis and prediction results and medical record records.

System Requirement Analysis.
Te users of the model may be algorithm engineers, scientifc research analysts, system administrators, medical staf, log reviewers, etc.
Te system database needed to have both massive data retrieval and business data storage capabilities at the same time. Te data table structure had 4 parts, including basic data, case information, management information, and log security. Te basic data included users of diferent roles and basic data dictionaries. Case information included case visit information, medical history, and the diagnosis and treatment data. Management information included task confguration, permission confguration, and scheduling policy data. Log security included system operation, operation, and data processing. Computational Intelligence and Neuroscience Te overall deployment of the system was carried out in the intranet environment; the platform was connected with the large dataset of the hospital; and the integrated platform service gateway was adopted to complete the external network access and message push, so as to realize the security authentication management using the unifed portal.

Overall System Design.
Te model constructed in this work adopted a browser/server (B/S) architecture. Te frontend view, business application logic, and back-end data are separated using SpringBoot layered technology. Te overall system architecture is shown in Figure 2. Te Vue + ElementUI technology was used to achieve full compatibility of front-end browsers with diferent types of browsers such as IE, Chrome, and Firefox. Using the database access object technology, according to diferent business and scenarios, the back-end server uses the underlying basic units to stitch together business logic, including modules such as data collection and processing, complication prediction, visualization, and message push. Te database is composed of MySQL and Elasticsearch. Te MySQL stored business data, which was highly feasible, and the Elasticsearch, as a storage layer for massive clinical diagnosis and treatment data of cases [19], used a highperformance distributed architecture to improve data query efciency.

Complication Prediction
Module. Te complication prediction model was established by using the supervised machine learning algorithms. Medical big data was complex and specifc, and most of it was an unbalanced dataset. Tis dataset contained 2364 positive cases and 5387 negative cases. Terefore, when a model was trained, the optimization of the model often lied in the change of the algorithm, the iteration of the hyperparameters, and especially the construction of the dataset that needs to be incorporated into the model optimization as a key adjustment method. First, 2,364 cases were randomly sampled from the negative data by the downsampling method to maintain the balance with the positive data and avoid learning bias. Ten, the data were divided into a training set and a test set in a ratio of 6 : 4 on the dataset. Finally, multiple models were constructed with random forest (RF), GBT, logistic regression (LR), and K-nearest neighbor (KNN) algorithms. Te F1-score was undertaken as the evaluation index to select the optimal solution in each model as the fnal model. Te F1-score was one of the indicators used in machine learning to measure the accuracy of the binary classifcation model. Compared with the accuracy rate, it had both the precision rate and the recall rate of the classifcation model, and the real prediction ability of the model was more accurate and objective. Te route diagram for building a complication prediction model is shown in Figure 3.

Big Data Mining Technology and Model Building.
Big data mining technology is to discover the logical relationship between dependent variables and outcome variables through various classifcation models. Te main steps are data preparation, data cleaning, data mining, result expression, and analysis. Five data mining methods include the regression model, classifcation model, association model, clustering model, deviation model, and so on. Te project team used fve data mining methods for analysis, such as rotating random forest, random forest, Bayesian network, and naive Bayesian network support-vector machine.

Model Training for Incremental Learning GBT.
Te incremental learning GBT batch processing method was adopted to predict the target value of the preset tree structure, and obtain the classifcation label or predicted value. In this work, (·) represents the loss function of the model. Te calculation equation of the loss function of the incremental learning GBT model was as follows:  Computational Intelligence and Neuroscience (1) In the above equation (1), G(a) � t�|T| t�0 g t (a) and t represent the t-th tree. Being similar to the neural network where only one layer of the network was updated at a time, the incremental learning GBT only modifed the weights of the leaf nodes of one tree at a time in the process of backward propagation. Terefore, it had to calculate the minimum value of the loss function of the t-th tree with the following equation below:    In the above equation (2), T/t represents all decision trees except the t-th tree in the model. Since the model had to minimize l t min by updating the g t (x), the partial derivative of l t min had to be calculated during the training. If f � zl t /zg t (x), f represents the frst derivative of the loss function, and the equation for the model to calculate the new t-th tree was as follows:

Prediction of complications
(3) Risk assessment system of coronary artery bypass grafing  Computational Intelligence and Neuroscience In the above equation, δ represents the learning rate of the model. When the loss function was the root mean square error, the frst derivative f of the loss function belonging to the t-th tree can be derived as the following equation: In the above equation (4), b t was the ideal cumulative value of the remaining trees except for the t-th tree. According to the calculation result of the frst derivative, it can be generalized to other derivable loss functions.

Evaluation System Application Interface.
Te data acquisition module of the system was connected with the big data integration platform of the hospital. Te data acquisition module collected the business data of risk factors in real time, and then used the data processing module to fnally summarize and save the data to the complication database [20][21][22]. Te patient management interface includes name, gender, treatment type, treatment department, treatment date, and risk assessment results, as shown in Figure 4.
Te model provided interactive access in the form of web pages, providing data query, visual data analysis, and confguration services for scientifc researchers, and ofering system maintenance, authority allocation, and real-time monitoring functions for system administrators. Based on the clinical data of patients, machine learning algorithms were employed to predict the occurrence of complications, providing doctors with useful information and assisting them in formulating treatment plans for patients. Te model management interface includes risk prediction results and risk prediction trends of diferent models, as shown in Figure 5.

Grouping Methods.
Te CABG risk prediction model was used to score the patients, and the total score was stratifed for cardiovascular risk by the quartile method, and each risk stratifcation was matched to the corresponding group. Te mortality rate of all patients and each group was predicted using the CABG risk prediction model.

Predictive Efciency Evaluation Methods.
Te performance of the CABG risk prediction model was analyzed using discrimination and calibration. Discrimination referred to the ability of the model to analyze in-hospital deaths or postoperative survival. Model discrimination was expressed as the area under the receiver operating characteristic curve (AUC). Te AUC foated in the range of 0.5-1.0. When AUC ≥0.8, it indicated high reliability, and AUC � 0.7∼0.8 indicated credibility. Te H-L goodness of ft was applied to test the calibration power of the model, measuring expected and actual results. When the H-L P value >0.05, it indicated that the model had good calibration power. Te observed mortality rate was compared with the actual mortality rate, and the observed mortality rate was constructed, which referred to the calibration point of the actual mortality rate ratio [23,24].
Te commonly used precision, recall, and F1-score metrics were selected to evaluate the quality of the GBT model. Te specifc calculation equations were as follows: In the above equations, TP i denotes the number of samples that belonged to category i and was classifed as category i by the classifer, FP i denotes the number of samples that was actually noncategory i but was classifed into category i by the classifer, and FN i refers to the number of samples that belonged to category i but was classifed as noncategory i by the classifer.

System Model Test Results.
Te test results of GBT, RF, LR, and KNN are shown in Table 1. It can be found that compared with the other three models, the GBT was the best model in terms of precision, F1-score, and AUC. Te GBT confusion matrix evaluation refected the stability of the model in all aspects, as shown in Table 2. Te ROC curve refected the strong generalization ability of the system model, as shown in Figure 6.

Analysis Results of the Convergence of the GBT Model.
To further verify the convergence of the GBT model, the comparative experiments were performed on three datasets of CASP, superconductor, and year prediction. Te depth of the tree was 10, the number of trees was 100, and the number of model iteration rounds was 2000. Tese three datasets and diferent tree structure initialization methods were used to train the incremental learning GBT model. It should record and count the training error after every model update. Tree datasets were trained using random initialization, forgetting feature initialization, median initialization, and information gain-based initialization, respectively. As illustrated in Figure 7, the results showed that no matter which of the proposed built-in tree structure initialization methods were used to train the incremental learning GBT, the model will be updated with iterative training to achieve convergence.

Analysis Results of the Capacity Improvement of the GBT Model.
A special feature of the GBT model was that when the user increased the depth of the decision tree or the number of decision trees, the ftting ability of the model would signifcantly increase. Terefore, this work tested whether GBT models had similar properties. When the depth of the decision tree was fxed to 10 and the number of decision trees was increased, the training error results of the GBT model are shown in Figure 8. Te results showed that under the condition that the number of decision trees increased, the ftting ability of the GBT model would be greatly improved.
Similarly, when the number of decision trees was fxed at 40, the depth of decision trees was gradually increased. Te training error results of the GBT model are shown in Figure 9. Te results suggested that under the condition that the depth of the decision tree increased, the ftting ability of the GBT model would also be greatly improved.  Computational Intelligence and Neuroscience Te above two experiments proved that the GBT model could increase the capacity of the model by increasing the depth or number of trees.

Statistics of Basic Information of Cases.
A total of 2,364 patients undergoing CABG were included in this work. Te average age of the included patients was (61.21 ± 11.24) years old; the proportion of men was 53.34%, and the proportion of women was 46.65%. Tere were 18 patients with a history of previous cardiac surgery, accounting for 0.76%; 255 patients with moderate renal impairment, accounting for 10.78%; 1765 patients with grade II cardiac function, accounting for 74.28%; and patients undergoing single coronary artery surgery were 1962 cases, accounting for 82.99%. Te basic information of the cases was given in Table 3.

Grouped Based on CABG Risk Prediction System.
According to the CABG risk prediction model for stratifcation, the stratifcation points selected by the quartile method were 0, 2, and 5, which can be divided into 4 risk stratifcations. When the score was ≥6, it was a high-risk stratifcation (Table 4).
After stratifcation, it was found that there were 1382 patients with a score <0, 463 patients with a score of 0 ≤ score ≤ 2, 252 patients with a score of 2 < score ≤ 5, and Computational Intelligence and Neuroscience 9 267 patients with a score >5. Te stratifed results of the CABG risk prediction model were used for grouping, and the corresponding stratifcation was four groups: A, B, C, and D. Groups A, B, and C were the low-intermediate risk group, and D was the high-risk group. Te grouping results are shown in Table 5. Te results showed that there were 25 patients with actual in-hospital deaths, and the in-hospital mortality rate was 1.05%. Te mortality rate predicted by the CABG risk prediction system model was 2.67 ± 1.82% (95% CI (2.87, 2.98)), which was higher than the actual value. Among diferent risk stratifcation subgroups, the CABG risk prediction system assessment was credible only in group B with AUC � 0.763 > 0.7. In group B, 3 patients actually died, the actual mortality rate was 0.33%, and the model predicted mortality rate was 0.96 ± 0.78 (95% CI (0.82, 0.87)), which overestimated the mortality rate of patients in group B. In this work, the selected patients were randomly rolled into a modeling group and a validation group according to the ratio of 6 : 4. After the model was established, the calibration and discrimination of the model predictions were verifed using the data of the patients in the validation group. According to the CABG risk prediction model, a score ≤1 was diferentiated into a low-risk group; a score of 2-5 was diferentiated into an intermediate-risk group; and a score ≥6 was diferentiated into a high-risk group. Te ability of the CABG risk prediction model to predict the mortality risk of these three groups of patients was separately verifed. Te 95% CIs of the CABG risk prediction system for the low-risk group, intermediate-risk group, and high-risk group were 0.82-0.87, 3.16-3.89, and 8.63-8.98, respectively. Tis suggests that the CABG risk prediction system has a better ability to predict surgical risk in dangerous subgroups [25][26][27][28].
All patients included in this work were risk stratifed according to the ESC/EACTS clinical guidelines for myocardial revascularization [29,30]. Te results showed that the CABG risk prediction system overestimated its mortality rate for subgroup. Hung et al. [31] have developed and verifed a simple risk score based on clinical variables, which can accurately predict the risk of complications in patients undergoing cardiac surgery before surgery and have a similar predictive function as this study.
Predicting the future development direction of diseases will be based on polymorphic data, that is, structured data such as text, data center, image, ECG center data, time-series data, and unstructured data. It is an important technical challenge to integrate and predict such polymorphic data. With the research results of this project, it will cooperate with many doctors to optimize the model of the risk assessment system and improve the intelligence and accuracy of prediction.

Model Calibration Verifcation.
Te degree of calibration of the model was verifed by using the H-L goodness of ft. If the P value of H-L >0.05, it indicated that the model had a good degree of calibration. In this work, the overall H-L goodness-of-ft test P value was 0.06, which was greater than 0.05, showing that the CABG risk prediction system model ft was good. Te P values of H-L in diferent subgroups A, B, and C of the CABG risk prediction system model were 0, 0.47, and 0.025, respectively, all less than 0.05, suggesting that the CABG risk prediction system was poorly calibrated among diferent subgroups A, B, and C. Te P value for H-L in group D was 0.153, which was greater than 0.05, indicating that the CABG risk prediction system was well calibrated in group D. In the entire patient cohort, the CABG risk prediction system (AUC � 0.727 > 0.70) had a better discrimination. Among subgroups with diferent risk levels, the CABG risk prediction system achieved better discrimination only in patients with AUC � 0.763 > 0.7 in class B patients. In the operation type grouping, the AUC values of the CABG risk prediction model in single bypass surgery and bypass combined with other surgery groups were 0.778 and 0.782, respectively, which were all greater 10 Computational Intelligence and Neuroscience than 0.7, indicating that the CABG risk prediction system had the corresponding ability to discriminate in the operation type grouping ( Figure 10).
Gunertem et al. [32] collected the perioperative data from 550 CABG patients and used a surgical risk assessment system to predict the incidence of patient death.  Te results showed that the mortality rate predicted by the system was 2.69%, H-L test P � 0.612, and AUC � 0.796. Ultimately, 8 patients had in-hospital deaths with a mortality rate of 1.50%. Te system was able to predict the center's patient mortality rate well, with similar results to this work.

Conclusions
In this work, it collected the clinical data of 2364 routine CABG patients in our hospital from January 2019 to August 2021. Te AI and big data technologies were adopted to build a CABG risk prediction model, which was adopted to predict the risk of complications in CABG patients. It successfully constructed a CABG risk prediction system that overestimated the mortality rate of intermediate-risk patients. Te disadvantage was that it was a single-center retrospective study with a limited number of patients, and a multicenter study was required to include a large number of samples to confrm the conclusions. Te risk prediction system of CABG needs to be improved, and the risk prediction system should be continuously developed and innovated to be suitable for diferent types of heart diseases. In a word, this study included the cases of coronary artery bypass grafting for retrospective study and provided a theoretical basis for risk prediction of perioperative complications.

Data Availability
All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest
Te authors declare that they have no conficts of interest.