Deep Learning Model for Predicting Rhythm Outcomes after Radiofrequency Catheter Ablation in Patients with Atrial Fibrillation

Current guidelines on atrial fibrillation (AF) emphasized that radiofrequency catheter ablation (RFCA) should be decided after fully considering its prognosis. However, a robust prediction model reflecting the complex interactions between the features affecting prognosis remains to be developed. In this paper, we propose a deep learning model for predicting the late recurrence after RFCA in patients with AF. Aiming to predict the late recurrence (LR) of AF within 1 year after pulmonary vein isolation, we designed a multimodal model based on the multilayer perceptron architecture. For quantitative evaluation, we conducted 4-fold cross-validation on data from 177 AF patients including 47 LR patients. The proposed model (area under the receiver operating characteristic curve-AUROC, 0.766) outperformed the acute patient physiologic and laboratory evaluation (APPLE) score (AUROC, 0.605), CHA2DS2-VASc score (AUROC, 0.595), linear regression (AUROC, 0.541), logistic regression (AUROC, 0.546), extreme gradient boosting (AUROC, 0.608), and support vector machine (AUROC, 0.638). The proposed model exhibited better performance than clinical indicators (APPLE and CHA2DS2-VASc score) and machine learning techniques (linear regression, logistic regression, extreme gradient boosting, and support vector machine). The model will support clinical decision-making for selecting good responders to the RFCA intervention.


Introduction
Radiofrequency catheter ablation (RFCA) is accepted as the first-line therapy for patients with symptomatic atrial fibrillation (AF) refractory to antiarrhythmic drugs [1], since Haïssaguerre et al. suggested it as a treatment modality [2]. However, the benefits of RFCA in patients with AF are frequently offset by late recurrence (LR) after the procedure [3]. Moreover, various attempts to modify the atrial substrate, including linear ablation lesion set, ablation targeting rotor, or complex fractionated atrial electrogram, do not demonstrate superiority to pulmonary vein isolation [4,5]. erefore, the most recent treatment guidelines for AF recommend assessing the benefit to patients for ensuring a high probability of success after RFCA [1].
Although observational studies have suggested the duration of AF, age, left atrium (LA) size, renal function, and other factors as predictors of LR [6][7][8][9], no single factor can accurately predict recurrence after AF ablation [1]. To improve prediction, various models providing indicators such as the CHA 2 DS 2 -VASc score [10,11] and acute patient physiologic and laboratory evaluation (APPLE) score [12] have been developed. However, these models have shown modest performance [13], as they are based on simple linear regression where each risk factor is assigned one or two points and the sum represents the final score.
Recently, machine learning (ML) methods have been proposed to analyze high-order interactions between different features [14,15]. For instance, a predictive model based on a support vector machine (SVM) showed an area under the receiver operating characteristic curve (AUROC) of 0.75 for predicting LR within 1 year after RFCA, by considering the AF type (paroxysmal vs. persistent), previous ablation procedure, LA volume, and epicardial fat volume as inputs [16]. In addition, deep learning methods that automatically extract hierarchical features have outperformed traditional ML methods. A recent study [35] employed convolutional neural networks to predict LR from the N-terminal probrain natriuretic peptide, paroxysmal AF, LA appendage volume, and LA volume. e multilayer perceptron (MLP) technique which analyzes complex nonlinear relations between input features has demonstrated promising performance in various medical applications [18][19][20]. erefore, we proposed an MLPbased model for predicting rhythm outcomes after RFCA in patients with AF and compared our model with conventional prediction models and other ML approaches.

Study Population and Ethical Statement.
We analyzed consecutive patients with AF who underwent RFCA at Chungbuk National University Hospital (CBNUH) from February 2017 to October 2020. All the patients were over 18 years old and underwent their first RFCA. Exclusion criteria included patients with repeated RFCA, with substrate modification lesion sets (e.g., ablation of complex fractionated atrial electrogram or linear ablation), with a follow up period below 1 year, and with missing values in study features. is study was approved by the Institutional Review Board of CBNUH (approval no. 2021-12-009-001). As this was a retrospective observational study, the requirement for informed consent was waived. is study was conducted in accordance with the Declaration of Helsinki.

Preprocedural Preparation and Evaluations.
Class I or III antiarrhythmic drugs were discontinued at least half-lives of five times before RFCA. Direct oral anticoagulants were not interrupted during the periprocedural period. One day before the procedure, transthoracic echocardiography and transesophageal echocardiography were acquired from the patients. In addition, the following echocardiographic parameters were collected for this study: left ventricular ejection fraction (LVEF), left ventricle mass index, and LA anterior-posterior diameter. e estimated glomerular filtration rate (eGFR) was also evaluated 1 day before the procedure.

Radiofrequency Catheter
Ablation. RFCA was performed under sinus rhythm at our institution, except when AF recurred immediately after cardioversion.
ree-dimensional mapping of the LA was constructed using the EnSite NavX/Velocity system (St. Jude Medical, St. Paul, MN, USA). Circumferential pulmonary vein isolation around the antrum of the ipsilateral pulmonary veins was performed using an irrigated TactiCath Quartz or TactiCath TM Contract Force ablation catheter (St. Jude Medical) with a maximum power of 25-40 W. Radiofrequency energy with contact force above 10-20 g was applied in each ablation lesion point until the force-time integral exceed 400 gs. After verifying the electrical isolation of the four pulmonary veins with a bidirectional block, the existence of a nonpulmonary vein trigger was assessed by cardioversion for AF evoked by rapid atrial pacing under isoproterenol infusion.

Clinical
Follow-Up. Intake of class I or III antiarrhythmic drugs continued until 3 months after catheter ablation. e rhythm status was assessed by surface electrocardiography (EKG) and Holter monitoring at 2 weeks and 1, 2, 3, 6, 9, and 12 months after discharge. In addition, whenever a patient felt symptoms, EKG and Holter monitoring were performed in our institution. Anticoagulants were prescribed to all patients up to 3 months after discharge, and they were selectively prescribed according to the CHA 2 DS 2 -VASc score afterward.

Definitions.
e endpoint of this study was the LR of sustained atrial tachyarrhythmia within 1 year after RFCA. Sustained atrial tachyarrhythmia was defined as atrial flutter, atrial tachycardia, or AF lasting for more than 30 s in Holter monitoring or more than 10 seconds in 12-lead EKG. LR was defined as sustained atrial tachyarrhythmia within 1 year of RFCA, but early recurrence during the blanking period of 3 months after ablation was not regarded as late recurrence.
e AF duration was defined as the difference between the date of the first AF documented on EKG and the date of index RFCA. e eGFR was calculated using the CKD-EPI (chronic kidney disease-epidemiology collaboration) equation as follows: eGFR � 141 × min(Scr/κ, 1) α × max(Scr/ κ, where Scr is the serum creatinine level, κ is 0.7 for women and 0.9 for men, α is −0.329 for women and −0.411 for men, and min and max denote the minimum and maximum between their arguments, respectively [21].

Dataset Preparation.
For a small dataset, selecting informative features is essential to ensure training stability and convergence.
us, extreme gradient boosting (XGBoost) was utilized to select informative features while omitting the irrelevant ones according to the weight of each feature during analysis ( Figure 1).
In our dataset, only 47 patients (26.5%) experienced LR. To address the data imbalance that drastically degrades ML performance, we adopted the synthetic minority oversampling technique [22] for data augmentation according to the neighborhood of the minority class. Note that this technique was applied to the training set but not to the test set.
We performed 4-fold cross-validation on our dataset. e dataset was split into 4 equally sized folds. Specifically, in each iteration, one-fold was used for testing (25%) and the others were used for model training (75%). Figure 2 shows the proposed deep learning model based on the MLP architecture. e model consists of three MLP blocks and one output layer. Each MLP block comprises a dense layer, a batch normalization layer, rectified linear unit  (ReLU) activation, and a dropout layer with a rate of 0.2 to avoid overfitting. For the output layer, a dense layer followed by sigmoid activation is used to calculate the LR probability. We use weighted binary cross-entropy as the loss function to handle data imbalance. In this study, the model was trained for 1,000 epochs using the Adam optimizer with a learning rate of 10 −4 .

Evaluation of Model Performance.
We compared the performance of the proposed model with that of various ML techniques: linear regression, logistic regression, XGBoost algorithm, and SVM. e proposed model and ML techniques were implemented and evaluated using the Keras and TensorFlow 2 platforms in Python 3.8. e proposed model was trained on the CUDA 11.0.3 toolkit using an NVIDIA GeForce RTX 3090 graphics processor.
For the quantitative evaluation, we determined the AUROC, F1 score, sensitivity, and specificity. e receiver operating characteristic (ROC) curve is a statistical performance measure that depicts the true positive rate according to the false positive rate. e AUROC ranges between 0 and 1, with 0.5 indicating random guessing and 1 indicating perfect classification. e F1 score ranges from 0 to 1 and is the harmonic mean of the precision and recall. e accuracy indicates the similarity between measured and actual values, being an intuitive indicator of model performance. e sensitivity is a measure of the true positive rate, and the specificity is a measure of the true negative rate.

Statistical
Analysis. Categorical features were compared using Pearson's χ 2 test or Fisher's exact test when the numbers were below five. e normality of continuous features was evaluated using the Shapiro-Wilk test. e difference for continuous features with normal distribution was compared using Student's t-test, and distributions with skewed features were compared using the Mann-Whitney test.
e ROC curves were plotted with the AUROC to evaluate the diagnostic accuracy of the APPLE and CHA 2 DS 2 -VASc score for LR after the procedure. All the statistical analyzes were performed using SPSS version 28.0 (IBM, Armonk, NY, USA). We compared the performance of the proposed model with that of the conventional APPLE and CHA 2 DS 2 -VASc scores.  Figure 3 shows the relative importance of risk factors for LR following RFCA. We selected representative features with an f-score higher than 100 as input of the proposed model: age, sex, height, weight, hypertension, AF type, AF duration, LA diameter, left ventricular mass index, LVEF, and eGFR.

Performance Evaluation.
To evaluate the model performance, we performed 4-fold cross-validation on our dataset. e resulting AUROC, F1 score, sensitivity, specificity, and accuracy per iteration of the proposed model are listed in Table 2. Each fold showed AUROC ≥ 0.73. Figure 4 shows the confusion matrix for every fold, and Figure 5 shows the ROC curves for the 4-fold average and all folds. Table 3 shows the quantitative evaluation results for the evaluated models. e APPLE and CHA 2 DS 2 -VASc score underperformed the ML techniques (i.e., linear regression, logistic regression, XGBoost algorithm, and SVM). Figure 6 shows the confusion matrices for all the evaluated models, and Figure 7 shows the ROC curves. Our model achieves an average AUROC of 0.766, F1 score of 0.632, sensitivity of 0.745, and specificity of 0.777. e model showed the highest performance compared to those of the conventional prediction models and ML approaches.

Discussion
Our results demonstrate that an MLP-based model using easily accessible clinical and echocardiographic features obtained during the preprocedural stage can suitably predict LR after RFCA. Our model outperformed conventional prediction models and ML approaches. is approach may support decision-making for selecting patients with AF considering the LR probability after RFCA.

Interpretation of the Feature Importance.
To observe the impact of each feature on the MLP model training, we utilized Shapley additive planation (SHAP) [23], which is one of the widespread methods to explain model predictions and provide visualization charts. More specifically, the SHAP algorithm calculates the relative importance of the features on the prediction. e magnitude of the SHAP value means the degree of influence on the prediction. e positive SHAP value indicates that the feature contribution to the probability of AF recurrence is higher, and a negative SHAP value indicates that the feature contribution to the probability of AF recurrence is lower.
As illustrated in Figure 8, the LA diameter was the most powerful feature followed by AF duration, weight, eGFR, and LV mass index. In addition, the SHAP results demonstrated that the higher value of LA diameter, the higher value of AF duration, patients with heart failure, patients with hypertension, the higher value of age, the higher value of height, the lower value of sex (0: female, 1: male), patients with diabetes mellitus, patients with stroke were associated with an increased risk of AF recurrence. Interestingly, some features showed unexpected contributions to the AF recurrence in the SHAP results. e AF type gave a lower influence than AF duration, LV mass index, and LA diameter, while the AF type is widely accepted as the primary risk factor for LR after RFCA [24].
is finding can be explained by the fact that the rhythm outcome after RFCA may differ according to the different burdens of AF, especially paroxysmal AF [38].

Conventional Prediction Models for Radiofrequency Catheter Ablation Prognosis in Patients with Atrial
Fibrillation. Clinical risk factors including the AF type and duration [16,17,25], obesity [26], sleep apnea [27], and hypertension [28] are associated with the development of abnormal atrial substrate that leads to AF recurrence after RFCA. Moreover, the LA diameter and volume [29], the volume of epicardial fat [30], and the severity of atrial tissue fibrosis [31] are structural predictors of the RFCA outcome in patients with AF. However, no single factor has shown superiority over others in predicting the outcome.
To overcome this limitation, various prediction models combining well-known risk factors have been developed, with models including ALARMEc, HATCH, CHA 2 DS 2 -VASc, and APPLE scores showing a moderate performance with AUROC ranging from 0.44 to 0.74 [32]. However, these scoring models may not reflect high-order interactions between various features because they consider simple linear equations, in which one or two points are arbitrarily assigned to the corresponding risk factors. Although the concordance statistics of the BASE-AF2 score have shown good to excellent discrimination ability of 0.61-0.94 [33], postprocedural features such as early recurrence after RFCA should be included to show such performance. Similarly, the MB-LATER score with AUROC ranging from 0.57 to 0.83 should include early recurrence after RFCA to calculate the scoring system to provide an excellent performance [34].

Machine Learning Models for Radiofrequency Catheter
Ablation Prognosis in Patients with Atrial Fibrillation. A deep learning model has shown a good prediction performance (C-index of 0.76) by simply using four features: N-terminal pro brain natriuretic peptide, AF type, LA appendage volume, and LA volume [35]. e easily obtainable input data used in this model may enable practical application. However, this model excludes accepted clinical features including AF duration and comorbidities related to LR after RFCA. More recently, Baalman et al. [36] proposed an ML prediction model using input data selected from 166 clinical features. Although the model suitably predicts LR (AUROC of 0.73, 95% confidence interval of 0.68-0.77), the continuous input features, such as age, LA volume index, and CHA 2 DS 2 -VASc score, are expressed as discrete values, leading to information loss.     Journal of Healthcare Engineering Unlike existing ML models, the proposed MLP-based deep learning prediction model uses continuous and multiple features without information loss. Moreover, it can achieve a superior discriminative ability compared with established prediction models such as the APPLE and CHA 2 DS 2 -VASc score. e promising performance of our model may be attributed to the ability of MLP to learn highorder interactions between accepted risk factors and the rhythm outcomes after RFCA.

Limitations of the Study.
Various limitations of this study should be noted. First, LR may have been underestimated because the rhythm outcome after RFCA was evaluated by intermittent EKG and Holter monitoring.
Second, the study population included in this study was limited. Nevertheless, to prevent overfitting due to the small sample size and evaluate the test model robustness, we employed 4-fold cross-validation, consistently achieving an AUROC above 0.73 in each fold. ird, deviations of the patient's characteristics may have occurred owing to relatively short AF duration, a smaller proportion of heart failure with reduced ejection fraction, and pulmonary vein isolation lesion set at index procedure. erefore, the model performance should be confirmed by considering an external validation cohort in future work. Finally, we did not consider a recent deep learning model that uses LA fibrosis findings in magnetic resonance images as an input feature [37].

Conclusion
We proposed an MLP-based model that outperforms conventional prediction models and state-of-the-art ML methods in predicting rhythm outcomes after RFCA in patients with AF. e model may support clinical decision-making for selecting good responders to the RFCA intervention. In future work, we will further to improve the proposed model by considering imaging data related to the atrial substrate.
Data Availability e data underlying this article cannot be shared publicly due to ethical issues.

Ethical Approval
is study was approved by the Institutional Review Board of Chungbuk National University Hospital (approval no. 2021-12-009-001). is study was conducted in accordance with the Declaration of Helsinki.

Consent
is was a retrospective observational study, so the requirement for informed consent was waived by the institutional review board.

Conflicts of Interest
e authors declare that they have no conflicts of interest.