All-Cause Death Prediction Method for CHD Based on Graph Convolutional Networks

Coronary heart disease (CHD) has become one of the most serious public health issues due to its high morbidity and mortality rates. Most of the existing coronary heart disease risk prediction models manually extract features based on shallow machine learning methods. It only focuses on the differences between local patient features and ignores the interaction modeling between global patients. Its accuracy is still insufficient for individualized patient management strategies. In this paper, we propose CHD prediction as a graph node classification task for the first time, where nodes can represent individuals in potentially diseased populations and graphs intuitively represent associations between populations. We used an adaptive multi-channel graph convolutional neural network (AM-GCN) model to extract graph embeddings from topology, node features, and their combinations through graph convolution. Then, the adaptive importance weights of the extracted embeddings are learned by using an attention mechanism. For different situations, we model the relationship of the CHD population with the population graph and the K-nearest neighbor graph method. Our experimental evaluation explored the impact of the independent components of the model on the CHD disease prediction performance and compared it to different baselines. The experimental results show that our new model exhibits the best experimental results on the CHD dataset, with a 1.3% improvement in accuracy, a 5.1% improvement in AUC, and a 4.6% improvement in F1-score compared to the nongraph model.


Introduction
According to the World Health Organization, more than 12 million people die each year as a result of cardiovascular disease. Cardiovascular disease (CVD) is widespread in China. Relevant studies show that the number of people suffering from CVD in China is about 290 million [1], and the mortality rate is higher than that of cancer and other diseases. Coronary heart disease (CHD) is a kind of CVD with a high mortality rate as well as a significant likelihood of recurrence after being cured and discharged from the hospital, all of which are indicators of a bad prognosis. e pathogenesis of coronary heart disease [2] is due to the continuous accumulation of fat or harmful cholesterol in the arterial wall, which eventually leads to the narrowing and blockage of the arterial wall. e common clinical manifestations of coronary heart disease are arrhythmia, myocardial infarction, and angina pectoris. e mainstream risk factors associated with CHD are a combination of controllable factors (such as lifestyle habits) and uncontrollable factors (such as age, gender, and family history) [3]. e current clinical methods for coronary heart disease detection mainly include [4] ECG, ECG stress test, echocardiography, Holter, hematology, CT angiography, and other technologies. ese inspection methods are limited to a certain extent by the personal subjective judgment and long-term experience of doctors.
Establishing an appropriate disease risk assessment model is a critical step in CHD risk assessment and subsequent management decisions. In the past ten years, some medical organizations and institutions have studied disease prediction models based on machine learning (ML) methods [5,6]. In [7], the authors propose a common collaboration framework (CSHCP). It aims to evaluate people's health through ML technology and provide the best medical plan in a timely manner. Clinically, various physiological indicators of patients with coronary heart disease, such as blood pressure, blood sugar, and cholesterol, will be abnormal. ML methods can accurately uncover hidden factors in the data and perform a prediction of CHD. Giri et al. [8] used the discrete wavelet transform to deconstruct the heart rate signal and four ML classifiers to detect coronary heart disease. Its advantage is that principal component analysis is applied to the wavelet coefficient set to reduce the data dimension. Alickovic et al. [9] extracted features from ECG data using an autoregressive model and used K-nearest neighbors, support vector machines, etc., to distinguish arrhythmia patients from healthy people. Tayefi et al. [10] found that the important variable in CHD is serum hs-CRP level, and they built a prediction model based on a decision tree algorithm. But that limits their expressiveness. D'Ascenzo et al. [11] developed a risk stratification model (PRAISE) for predicting all-cause mortality, myocardial infarction, and postdischarge major bleeding in patients with ACS. However, these previous methods only focus on the differences between clinical features while ignoring the interaction modeling between individual features and global features. erefore, the high-dimensional nonlinear relationship between the captured features is very limited.
Graph neural networks (GNNs) are a class of methods based on deep learning to deal with the graph domain. It aims to learn low-dimensional vector representations of graphs and nodes by mapping graphs and nodes on graphs to a low-dimensional space by means of artificial neural networks. Scarselli et al. [12] first introduced graph neural networks. However, the disadvantage is that the convolution operation is not considered. Bruna et al. [13] attempted to introduce convolution on graphs and developed spectral graph convolutional networks (GCNs). Defferrard et al. [14] introduced a Chebyshev network (Chebyshev). Kipf et al. [15] simplified the previous method by using only a first-order approximation of the convolution kernel. It enables GCN to directly define convolutions on graphs, providing an end-to-end framework for learning-related tasks.
Recently, graph convolutional neural networks (GCNs) have helped to solve important problems in medicine, especially in the application scenarios of medical images and nonimage information. In some literature, many methodological advances have been made, such as autism and Alzheimer's prediction [16,17], brain shape analysis [18], pulmonary artery-vein separation [19], mammogram analysis [20], and brain imaging [21]. Graphs provide a powerful and intuitive way to model individuals (nodes) and the relationships or similarities (edges) between individuals. In this scenario, a node can represent the acquired data of a subject at a specific modality or at a specific point in time, and edge weights are used to capture the similarity between each pair of nodes. But there are the following deficiencies: (1) it focuses too much on pairwise similarity between subjects, relying on a single way to construct graphs or edges. (2) GNN may be incapable of learning some deep correlation information between topology and node features. is makes such tasks more challenging and performance limited since they are harder to generalize.
In response to the above challenges, this paper investigates different machine learning techniques to predict the level of uncertainty in CHD based on the risk attributes. In this work, we use a graph neural network approach for the first time to tackle the CHD prediction problem. We used the graph convolution method described by Kipf et al. [15] because of its excellent performance in the node classification task. e main contributions to this paper are as follows: (1) Compared with the past methods of coronary heart disease prediction (naive Bayes, random forest, support vector machine, etc.), we use a new graph convolutional neural network to deal with coronary heart disease. (2) We evaluated two graph construction methods for patients with coronary heart disease, which can automatically construct similarity networks between patients instead of using a single graph structure. (3) A new GCN composite framework is built, which combines the results of different graph channels with the attention mechanism, which is better than the ordinary GCN method. e rest of this paper is organized as follows: In Section 2, we introduce related methods and models. In Section 3, we will introduce the dataset and conduct experiments. In Section 4, the experimental results are discussed. Finally, Section 5 gives the conclusion. Figure 1 depicts the entire process, from raw data collection to predictive model development and their evaluation process. e risk probabilities of patients with coronary heart disease are determined at the end of the process. e pipeline's three operating steps are data mining and modeling, model construction, and model evaluation.

Graph Convolutional Neural Networks.
Graph convolution network (GCN) is a typical GNN model which processes the graph by aggregating the node representation from its neighbors and iteratively updating the representation of each node [15]. erefore, it is widely used in the supervised and semi-supervised tasks of undirected graphs. Any undirected graph can be expressed as G � (A, X), where A ∈ R N×N is a symmetric adjacency matrix with n nodes on a graph, X ∈ R N×D is the feature of the input node, and D is the dimension of the node feature. If there is an edge between nodes i and j in the graph, A i,j � 1; otherwise, A i,j � 1. e l + 1 th layer in GCN can be expressed as where H (l+1) is the output of the l + 1 th graph convolutional layer, and initially H(0) � X. Here, A � A + I, A is the adjacency matrix of the undirected graph, and I is the identity matrix. D is the diagonal matrix of A. W (l) is the trainable weight matrix of the l th layer, and ReLU is the activation function. When calculating A, it can usually be 2 Computational Intelligence and Neuroscience For the supervised node classification task, given an arbitrary original graph structure G ori , the embedding of the final output Z after a two-layer GCN structure is expressed as In (2), W (0) ∈ R d×nhi d is the weight matrix with a d-dimensional feature input to the hidden layer output. W (1) ∈ R nhi d×C is a weight matrix from the hidden layer to the C class outputs. Define softmax(x i ) � exp(x i )/ i exp(x i ) as the normalizer for all classes. Given input features X and topological graph A, output labels Y are obtained after GCN model training.

Population Graph Construction.
e concept of a population graph was first proposed by Parisot in the problem of Alzheimer's disease (AD) and autism spectrum disorder (ASD) disease prediction [22]. Population graphs take advantage of phenotypic information to represent populations as a sparse graph. Define the eigenvectors of the subject of the trainee as its vectorized connectivity matrix. Due to the high dimensionality of the connectivity matrix, a classifier is used to select the most discriminative features from the training set. Figure 2 illustrates the process of constructing a population graph from CHD data. e population graph G � (V, E, W) is constructed on the entire population of patients, where |V| is N nodes, E ∈ R N×N is the edge connection of the graph, and W is the weight of the edge. Consider each patient as a node n i in the graph, consider a set of H phenotypic important measures E � M h , and define the adjacency matrix A p of the population graph as where the Exp function will weight the edges according to the relative distance between the features of each node. If the graph is not fully formed, it will increase the edge weight between the most similar graph nodes. ρ is a correlation, which measures the distance between phenotypic measures; σ is the mean of the correlation matrix, which determines the width of the kernel. e implication of this similarity measure as in 3 is that patients who belong to the same category (low or high risk) tend to have more similar networks than patients from different categories.
In (  Computational Intelligence and Neuroscience the threshold β. And the size of β is determined by experience.
In order to maximize the structure of our graphs, we will evaluate the impact of each phenotypic and similarity measure in our experimental section.

KNN Graph Construction.
In the field of graph neural networks, the KNN method is often used for data with missing or no graph structure. In this case, we first need to build a K-nearest neighbor graph so that we can apply GCN to extract feature embeddings. We construct the graph according to the idea of K-nearest neighbors (KNNs) [23]. Nodes in the same neighborhood have the most similar features in this graph. e K-nearest neighbor classification is capable of performing discriminant analysis based on uncertainty about the reliable parameters of the probability density. For each sample, we connect the neighbors by finding the top K similar neighbors and setting edges. First, based on the key features of the patient, the similarity between the node features is calculated. e method of calculating the similarity of node features is mainly based on the method of cosine similarity. Specifically, the cosine value between node features x i and x j is expressed as By calculating the similarity between different vectors, we can get a similarity matrix S ∈ R N×N in (5). In this similarity matrix, the cosine similarity between each node n i and the rest of the nodes n j≠i is recorded. e angle between the vectors is closer to 0°as the cosine value approaches 1, indicating that the two vectors are more similar. en, select the top k most similar node pairs according to the cosine value of each node and set them as edges. In this way, each node has k neighbors that are most similar to it. We aggregate this edge information into an undirected K-nearest neighbor graph to get the adjacency matrix A f .

2.3.
Model. GCN has some weaknesses in fusing node features and topology. To better learn graph embedding information, we refer to and extend the adaptive multichannel graph convolutional neural network (AM-GCN) [24] architecture.
ere are three main channels in this model. One is to train the GCN in the original graph channel using the population graph. e other is to use the K-nearest neighbor graph as the input of the feature graph channel to train the GCN. e third is to train the GCN on channels using the common information shared by the original and feature graph. en, the embeddings trained on the three channels are concatenated, an attention mechanism is used to assign input-specific weight coefficients, and the final embedding is used to predict node classification. e framework of AM-GCN is shown in Figure 3. Algorithm 1 gives the specific process steps.

Multi-Channel Graph Input.
In the graph construction method mentioned in Section 2.2, we can get the original topological graph structure through training, which is the population graph and K-nearest neighbor graph, respectively. We take it as the graph branch of the AM-GCN input channel. For the population graph, input the graph (A p , X) with the adjacency matrix A p obtained in the topological space, and the node embedding extracted from the l + 1 th layer is Z (l+1) p . For the K-nearest neighbor graph, input the graph (A f , X) with the adjacency matrix A f obtained in the feature space, and the node embedding extracted by the l + 1 th layer is Z (l+1) f . Topological space and feature space are not completely unrelated, and the node classification task may be related to the respective information in these two spaces. erefore, a common channel with parameter sharing is added to the model to share the embedding in the two spaces, and this common embedding is denoted as Z (l+1) c . e specific calculation method is as follows: e channel now has three specifics embedded: Z p , Z f, and Z c (see (6)-(8)). Considering that node labels may be related to one of them or their combination, we use the attention mechanism to learn their corresponding importance (a p , a f , a c ), as follows: where a p , a f , a c ∈ R n×1 represents the attention weight of n nodes, and the value range of a is (0, 1). For any node i, its embedding in the i th row of Z p is Z i p ∈ R 1×h . e embedding is transformed by nonlinear transformation, and then a shared attention vector q ∈ R h′×1 is used to obtain the attention value ω i p as follows: In equation (10), W p ∈ R h′×h is the weight matrix trained by the linear layer, and b p ∈ R h′×1 is the bias vector of the embedding matrix Z p . Similarly, we can get the attention weight matrix W f , W c and attention value ω i f , ω i c of the embedding matrix Z f , Z c for any node i. After that, we normalize the attention value ω i with the softmax function to get the final weights a i Note that the larger the value of a, the more important the current embedding is, and the higher the proportion in the final result. For all n nodes, there is a learning weight a � [a i ] ∈ R n×1 , and diagonalize the weight as a � diag(a). Finally, we combine embedding and attention weight to obtain the output Z a of the attention layer as follows:

Objective Function.
Here, the output Z a obtained through the attention layer in (11) is used for a supervised binary classification task with linear transformation and softmax transformation. e task of the model is to predict the classification label Y, each node i has a probability Y i belonging to the class C after transformation, and Y can be calculated by way of Assuming that the training set is L, the true label corresponding to each piece of data lϵL is Y l , and the model predicted label is Y l . For the AM-GCN model, evaluate the cross-entropy error of node classification on all training nodes, denoted by L Our research on graphs focuses mostly on binary node classification (CHD risk prediction). As a result, we have C � 2 in the equation above (Algorithm 1).

Algorithm.
e specific algorithm flow is as follows: Time Complexity. It is known that the batch size of model training is T, the amount of data is N, the number of edges is ε, the feature length is D, the number of input channels is M, the number of output channels is C, and the number of hidden layers of the model is n, m, k. e number of twolayer GCN channels is F � n 2 m, and the time complexity of GCN is O(εMDF); the time complexity of AM-GCN is O(T(4εMDF + 3N 2 mk 2 + NMC)) in all training batches.

Data.
e dataset used in this study is real hospital patient data, and the data are partly provided by the Department of Cardiovascular Medicine of a tertiary hospital in Fujian Province, China. In order to protect the privacy and Computational Intelligence and Neuroscience safety of patients, we have removed private data such as the patient's real name, ID number, and mobile phone number.

Dataset
Description. e dataset includes data on patients with coronary heart disease collected through follow-up visits during the five-year period from 2016 to 2021. Including 5,850 patients who were discharged from the hospital after surgery, each patient has 430 records of various indicators, and there are about 2,515,500 records in total. But the actual dataset contains a large number of missing patient records, the data are noisy and irregular, and the number of valid records is much lower than this. e dataset consists mainly of structured and unstructured text data. Structured data include basic information such as the patient's age, gender, and living habits. Unstructured text data include patients' ECG examinations, doctors' diagnostic records, and surgical operation records. In general, the content of the dataset can be divided into seven categories: basic patient information, past medical history, electrocardiogram indicators, cardiac color Doppler ultrasound indicators, blood test indicators, medication status, and coronary vascular lesions. Table 1 shows the clinical and treatment features of the study cohort. Table 2 shows the patient outcomes of the study data, including healthy and death groups. e data are expressed as n (%), n/n (%), or median (IQR). e qualitative index is the proportion of the data, and the quantitative index is the median and the first and third quartiles of the data (25%-75%).

Statistical Analysis.
e main predictors varied by study results. Draw a correlation heat map to observe the (1) Prepare: CHD dataset, after data preprocessing,  (3) and (4), respectively. Normalize the graph matrix. One-hot encoding of Y � y1, y2, . . . , yn . en, randomly shuffle the order and divide the dataset into training set and test set according to the ratio of 6 : 4.
(2) Input: training set S � (x i , a ij , y i ) N i�1,j�1 , xi ∈ R N×D , aij ∈ R N×N , yi ∈ R N×2 , N is the number of nodes, and D is the feature dimension.
(3) initialization: Initialize network model training parameters W, b, dropout � 0.5, early-stopping � 40 (4) for epoch � 1:epochs do: (5) While (the model is not converging) or (Convergence rounds < early-stopping) do: (6) Input-specific feature X and topology graph A on different GCN channels in the model. e output of channel 1 is Z p ← GCN(X, A p ), the output of channel 2 is Z k ← GCN(X, A k ), and the output of channel 3 is Z c ← (Z p + Z k )/2.
e obtained graph embedding outputs Z p , Z k , Z c calculate their respective attention weights (a p , a c , a k ) ← Attention(Z p , Z p , Z f ) and do a product operation with the graph embedding to get the attention layer input Z a . (8) e output Z ← MLP(Z a ) is obtained through the linear classification layer, where the main operation of the MLP layer is to convert the space vector into a probability output through the softmax activation function. (9) Calculate the cross-entropy error L, and update the network parameters W, b by gradient descent.   Before data processing, 139 (2.4%) of 5764 patients died, and 5625 (97.6%) of them were healthy within one year of follow-up; after data processing, 120 (4.4%) of the 2702 records died, and 2582 (95.6%) of the patients were healthy within one year of follow-up. 6 Computational Intelligence and Neuroscience correlation between multiple features in the data table. e darker the color, the higher the correlation coefficient. Figure 4 illustrates the associated heat map for the top 14 features. Figures 5(a) and 5(b), respectively, show the sex distribution histograms of low-risk and high-risk patients with coronary heart disease. Among them, there were 2127 (82.4%) males and 455 (17.6%) females in normal (lowrisk) patients after the operation; there were 80 (66.7%) males and 40 (33.3%) females in the death (high-risk) patients. Figure 5(c) shows a boxplot of the BMI index, where 1 means death (high risk) and 0 means low risk. In the low-risk group, the median BMI was 24.0, the upper quartile (Q3) was 31.3, the lower quartile (Q1) was 17.5, and the number of outliers was 48; in the high-risk group, the median BMI was 24.0, Q3 was 27.3, Q1 was 20.1, and the number of outliers was 10. In the age distribution of patients with coronary heart disease, the median age of the low-risk group is 65, and the number of patients aged 57-71 is the largest, showing a dense distribution; the median age of the high-risk groups is 72, while 63-78 years old is the peak of all-cause death. Figures 5(d) and 5(e) show histograms of patients with coronary heart disease's lifestyle habits (smoking, diabetes history, and hypertension history). 0 means no, 1 means yes, and 0.5 means data loss. e number of smokers in low-risk patients was 1129 (43.9%), and the number of smokers in high-risk patients was 42 (38.5%); the number of diabetics in low-risk patients was 748 (29.4%), while those in high-risk groups were 46 (42.6%); the number of patients with hypertension in lowrisk patients was 69 (65.1%), and the number of smokers in high-risk patients was 42 (38.5%).

Data Preprocessing.
Data preprocessing ensures the quality of the predicted data by cleaning and transforming the original data so as to obtain high-accuracy results during data analysis and avoid large deviations in the prediction. In the process of Figure 1, we perform the following operations (1-4) on the CHD dataset: (1) is data cleaning. ere is a lot of redundant and confusing data in the original phenotype data. We manually screened important factors and eliminated characteristic factors that had little impact on the classification results. We excluded records that were not helpful to the study results, including name, hospital number, and date of surgery, as well as data records of some surgical operations. We initially selected 88 relatively important features. We also included basic variables in this dataset, such as age, gender, BMI, smoking status, diabetes history, and hypertension history, based on relevant research on high-risk factors for coronary heart disease [3]. (2) is data duplication. We select the patient ID number as the unique attribute, delete the data whose ID number does not exist, and keep 5764 valid records. en, the ID number attribute is deduplicated, a large amount of redundant data is removed, and the patient data record at the latest time point First, all patients whose information loss rate exceeds 80% are filtered, and the patient data with relatively complete information are retained, with a total of 2702 pieces of data. en, we process the 88 columns of data features in turn, using the interquartile spacing to detect the abnormal value, setting the default value as the abnormal value, and then setting the upper and lower limits of the standard for the indicators of each feature to restrict the abnormal value. e abnormal value beyond the limit will be replaced by the upper and lower bounds under the current column attribute. (4) is data conversion. e multi-dimensional features of the dataset are discretely distributed, including both qualitative data distribution and quantitative data in different ranges, so the data need to be standardized. We use the Z-score normalization method (Z � X − X/σ) to keep the range of each feature between [0, 1] with a mean of 0 and a variance of 1 to reduce the variance between features.

Parameter.
e parameters recommended in the study are used to initialize all baseline procedures (see Table 3). Later, we tune their parameters to get the best performance. For each GCN, we train all channels with the same hidden layer size nhi d1 and output embedding layer size nhi d2, where nhi d1 ∈ 16, 32, 64 { } and nhi d2 ∈ 8, 16, 32 { }, respectively. For all neural network models, set the dropout rate to 0.5. Furthermore, for the construction of K-nearest neighbor graphs with different values, we set k ∈ 2, . . . , 9 { }. We employ  Computational Intelligence and Neuroscience the Adam optimizer with a learning rate of 0.001 ∼ 0.01 and weight decay ∈ 1e − 4, 5e − 4, 1e − 3, 5e − 3 { } throughout training. All experiments are divided into the same intervals on the CHD dataset, that is, 60% training set and 40% test set, and the same random number seed is set in the experimental process to ensure fairness.

Baseline.
We compare AM-GCN with other stateof-the-art methods, covering five common non-neural network models in classification and three neural network models. To evaluate the effectiveness of machine learning models on the coronary heart disease dataset, we chose the following representative models as benchmarks for performance comparison, which are implemented through libraries provided by scikit-learn [25], including: AdaBoost: e adaptive boosting algorithm (AdaBoost) is an algorithm that iteratively builds strong classifiers [26]. Bayes: Naive Bayes classifier (NBC) is a widely used classifier algorithm. Here, we choose Bernoulli Bayes [27]. DT: Decision tree (DT) algorithms use a tree model to identify possible outcomes [28]. SGD: Stochastic gradient descent (SGD) is a streamlined classifier for fitting linear classifiers under a convex loss function [29]. SVC: Support vector machines (SVMs) classify data by judging the hyperplanes of the boundary lines between classes in the training data [30]. DNN: A deep neural network (DNN) is a neural network with multiple hidden layers that update information through backpropagation. Here, we use a multilayer perceptron implementation [31]. Population-GCN: e graph convolutional neural network (GCN) is a supervised classification model [15] that learns node representations by aggregating adjacent nodes. Here, the graph topology is the population graph, hereinafter referred to as p-GCN. KNN-GCN: e graph topology is a K-nearest neighbor graph, hereinafter referred to as K-GCN. AM-GCN: e model is introduced in Section 2.3 of this paper.

Performance Metrics.
After introducing different machine learning algorithms, we compare the performances of different models for predicting CHD by measuring their performance under different indicators. e evaluation of binary classification models in medicine (cases vs noncases) is based on performance statistics in terms of sensitivity (TP/TP + FN) and specificity (TN/TN + FP), where TP, FP, TN, and FN denote the number of true positives, false positives, true negatives, and false negatives, respectively. We compute and evaluate performance metrics commonly used in classification models [32], such as accuracy, F1score, AUC (area under the ROC curve), macro-precision, and macro-recall. e receiver operating characteristic curve (ROC) was plotted to understand the relationship between the variables FPR and TPR.

Influence of the Phenotypic(P) Measures.
According to the theoretical research in Section 2.2.1, we conduct experiments on different P combinations (sex, age, BMI, smoking) to study the effect of P selection on GCN stability, as shown in Figure 6. e experimental results show that the performance of Acc does not change significantly according to different P values, and it fluctuates between 96.4% and 96.7% (±0.3%). In a single-factor study, graphs constructed from the "gender" measure alone achieved decent performance, with an AUC of 85.9% and an F1 of 75.7%. In the two-way combination, the AUC of the measure combination of "gender + age" was 86.8%, and the F1 was 74.7%. e AUC of the measure combination of "gender + BMI" was 86.2%, and the F1 was 75.5%. e AUC of the "age + BMI" combination was 86.0%, and the F1 was 73.7%, which was the worst F1 effect among all combinations. e AUC of the "age + smoking" combination was 87.7%, and the F1 was 75.7%, which was the combination with the best AUC effect. In the multivariate (H > 2) measure, the AUC of the combination of "sex + age + BMI" was 86.2%, the F1 was 75.7%, and the effect of the AUC was close to that of the combination of "sex + BMI." e AUC of the combination of "sex + age + smoking" was 86.9%, and the F1 was 76.1%. e AUC of the combination of "sex + age + smoking + BMI" was 86.4%, and the F1 was 76.1%. From the point of view of AUC and F1, the combination of "age + smoking" in the two-measure factor is the most stable and best overall. For the coronary heart disease population map, we used H � 2 to maximize the collected information to construct a weighted adjacency matrix (i.e., patient age and smoking similarity) and selected these measures to constitute the population groups for our final prediction task.

Influence of K-Nearest Neighbors.
According to the theoretical research in Section 2.2.2, we conduct experiments with different K values (2, ..., 9) to study the influence of the K value on GCN, as shown in Figure 7. e is 80.0%. Overall, when K < 4, the F1 performance dropped significantly compared to the others, which we do not want to see (we want to detect more high-risk patients). As the K value increases, AUC has a slight downward trend, while F1 has a certain room for improvement. At (K � 9), there is a good AUC and F1, but at the same time, the model training time is also increasing. e value of K represents the number of neighbors of each patient. e larger the value of K is, the number of neighbors allocated to each node increases exponentially (2 K ). at is, the composition becomes more complex. For the stability of subsequent experiments, we chose K � 6 as the experimental standard for K-nearest neighbor graphs.

Comparison to Other Methods. Experimental results
show that AM-GCN has the best performance (see Table 4) in terms of accuracy (97.3%), AUC (90.4%), and F1score (80.9%). After calculating the F1-score and the area under the ROC curve, it can be observed that AM-GCN performs much better than other machine learning models. A graphical comparison of each model's accuracy, precision, recall, F1-score, and AUC is shown in Figure 8.

Discussion
e results of the CHD database are shown in Table 4. Among the non-neural network models (AdaBoost, Bayes, DT, SGD, SVC), the accuracy indicator that performs best is the adaptive boosting algorithm, which is 96.0%, and the AM-GCN in this paper is 1.3% higher than it. e best performance of the AUC indicator is the support vector machine, reaching 85.3%, while AM-GCN is 5.1% higher than it. e best F1-score indicator is the adaptive boosting algorithm, reaching 76.3%, and AM-GCN is 4.6% higher than it. e best performer in the recall is the naive SVM, with 75.2%. e best precision performance is the adaptive boosting algorithm, reaching 78.3%, while AM-GCN is 15.1% more effective than it. Among the neural network models (DNN, p-GCN, and K-GCN), the best accuracy indicator is K-GCN, with 97.2% accuracy, while AM-GCN is 0.1% more effective. e best AUC indicator is K-GCN, reaching 88.6%, while AM-GCN is 1.8% better; the best F1score indicator is K-GCN, reaching 79.0%, while AM-GCN is 1.9% better than it. e best performer in the recall is K-GCN, reaching 71.5%, while AM-GCN is 2.9% better than it. e best performance in precision is K-GCN, reaching 96.5%.
Compared with the nongraph neural network model, the performance of ACC, AUC, and F1 of GCN with  appropriate graph structure is better than theirs because GCN has excellent node and edge information aggregation ability, which is not possessed by the nongraph model. In AUC and F1-scores, K-GCN has better performance than p-GCN. e new model combined with p-GCN and K-GCN not only makes up for the shortcomings of two GCNs with different graph structures but also integrates the advantages of the two models. erefore, the comprehensive performance index is better than the two models without fusion. At the end of our analysis, we present the ROC curve in Figure 9 to visually demonstrate that our proposed algorithm (AM-GCN) shows a higher AUC (90.4%) than other algorithms.    In addition, ROC (receiver operating characteristic) curves were drawn for further study of each machine learning model. e performance of each machine learning model on the research results is visually represented by the ROC in the test set, as shown in Figure 9. e abscissa of the ROC curve is the false-positive rate (or 1-specificity)-the proportion of actual CHD nondeaths identified as deaths by the model, and the ordinate is the true-positive rate (or sensitivity)-the proportion of actual CHD deaths correctly identified by the model. e closer the curve is to the upper left corner, the better the classifier. For a clearer comparison of the differences, we add AM-GCN to the curves.

Conclusion
In this work, we built a novel neural network model to predict CHD. We employ an attention mechanism to acquire adaptive importance weights for the embeddings while simultaneously extracting unique and common embeddings from topology, node attributes, and their combinations. According to our thorough testing on the dataset, AM-GCN pulls the most essential information from node features and topology and improves classification accuracy by a large margin. Experiments show that the proposed method exhibits better results in various performance metrics compared to several existing baselines.
erefore, the prediction model proposed in this paper is more effective in distinguishing high-risk CHD from low-risk CHD. We found that it can significantly improve the prediction performance, and the excellent prediction ability will optimize its application in the diagnosis and treatment of postoperative recurrence while simplifying the diagnosis process. In the future, we will use more computational techniques to improve the model so that it can predict CHD risk more accurately and effectively.
Data Availability e CHD dataset (chd.csv) used to support the findings of this study is restricted by the Institutional Review Board of Fujian Medical University Union Hospital in order to protect patient privacy. Data are available from 569284142@ qq.com (HL) for researchers who meet the criteria for access to confidential data.

Ethical Approval
is research was approved by the Institutional Review Board (IRB) of Fujian Medical University Union Hospital (Approval number: 2021KJCH082). Interviews were conducted following confirmation of informed consent, which was recorded verbally prior to the interview questions.
is consent process was approved by the ethics committee.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
YX and KC contributed to conception and design of the study. YX, KC, and HL organized the database. YX and HL performed the statistical analysis. YX and KC wrote the first draft of the manuscript. HL and SZ wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.