A Complex Heterogeneous Network Model of Disease Regulated by Noncoding RNAs: A Case Study of Unstable Angina Pectoris

MicroRNAs (miRNAs) are important types of noncoding RNAs, and there is a lack of holistic and systematic understanding of the functions they play in disease. We proposed a research strategy, including two parts network analysis and network modelling, to analyze, model, and predict the regulatory network of miRNAs from a network perspective, using unstable angina pectoris as an example. In the network analysis section, we proposed the WGCNA & SimCluster method using both correlation and similarity to find hub miRNAs, and validation on two datasets showed better results than the methods using correlation or similarity alone. In the network modelling section, we used six knowledge graph or graph neural network models for link prediction of three types of edges and multilabel classification of two types of nodes. Comparative experiments showed that the RotatE model was a good model for link prediction, while the RGCN model was the best model for multilabel classification. Potential target genes were predicted for hub miRNAs and validation of hub miRNA-target gene interactions, target genes as biomarkers and target gene functions were performed using a three-step validation approach. In conclusion, our study provides a new strategy to analyze and model miRNA regulatory networks.


Introduction
Noncoding RNAs play an important role in the development of complex diseases, and their functions can be elucidated to help us understand the complex processes of disease and develop appropriate drugs [1]. MicroRNAs (miRNAs) are important types of noncoding RNAs that are key regulators of a variety of biological pathways, both in disease and normal states of the body [2]. Tey mostly play a negative regulatory role in promoting the degradation of mRNAs or inhibiting translation [3]. An increasing number of studies have reported the role of miRNA-mRNA regulatory networks in disease development [4,5], suggesting that miRNAs may systematically perform gene regulation through a number of regulatory networks.
Unstable angina (UA) is one of the acute coronary syndromes in which the frequency and duration of attacks are unstable and may lead to myocardial infarction in severe cases [6]. Unstable angina is a complex cardiovascular disease associated with multiple causative factors [7]. Current studies suggest that the disease is caused by myocardial ischemia and hypoxia following the formation of a thrombus in the coronary arteries [8], but the exact etiology and pathogenesis remain to be further elucidated. Many studies have shown that some miRNAs and some of the genes they regulate may be diagnostic markers or therapeutic targets for unstable angina [9,10]. Exploring the role played by miRNAs is an efective means of understanding the mechanisms of the disease and developing relevant drugs.
WGCNA (weighted gene coexpression network analysis) allows the analysis of experimentally measured genes or RNA expression data [11]. After calculating the correlation of genes or RNAs using expression data, a weighted At present, the functions of many miRNAs are still not well understood, and the regulatory roles that miRNAs play in complex diseases are not systematically elucidated [23]. To fnd better hub miRNAs, it is also worth investigating whether correlations and similarities in complex networks can be combined to obtain a better network analysis method than WGCNA [24]. We used miRNAs as a representative of noncoding RNAs and constructed a miRNA regulatory network, the MTP network, using unstable angina as an example. For this regulatory network of noncoding RNAs, we proposed a research method that contains a network analysis part based on multiple network analysis methods and a network modelling part based on knowledge graph algorithms. Hub miRNAs were obtained by an improved network analysis method, while the MTP network was modelled using a knowledge graph algorithm, and interactions in this regulatory network were predicted as a prediction task for edges, while the class of nodes was predicted as a prediction task for node labels [25]. Finally, the potential targets and functions of hub miRNAs were predicted. Te overall study fowchart is shown in Figure 1, and the fowchart of the network analysis part is shown in Figure 2. Te acronym table for this study is Table S1 in Supplementary Materials.

miRNAs and Teir Target Genes.
Expression data for noncoding RNAs analyzed by an array were obtained from the GEO database (GSE94605), specifcally miRNA expression data in plasma from healthy subjects and patients with unstable angina, with 7 and 6 sample pools in the control and case groups, respectively [26]. Diferential expression analysis by using the GEO2R tool was used to obtain diferentially expressed plasma miRNAs, and then, we set |logFC| > 2 and adjusted p value <0.05 to screen for signifcantly diferentially expressed miRNAs [27]. Te miRecords, miRTarBase, and TarBase databases were used to fnd experimentally validated miRNA-target gene interactions [28], and the top 10% of target genes were taken as true target genes.
Te GeneCards database and the DisGeNET database were used to fnd genes for unstable angina, and the target genes of miRNAs were intersected with disease genes. Te intersecting genes were used as target genes regulated by miRNAs that were diferentially expressed in the disease condition [29].

Tissue Localization of Genes and Protein-Protein
Interactions. Diferent genes are expressed diferently in diferent tissues, and gene expression is tissue specifc [30]. We used the expression of genes in diferent tissues to indicate the tissue localization of genes. Te TISSUES database was used to fnd the expression of miRNA target genes in diferent tissues, with gene expression data for up to 21 tissues [31].

Computational Intelligence and Neuroscience
Protein-protein interaction data were retrieved from the String database, keeping default parameter settings [32].

Functional Enrichment and Pathway Categorization.
KEGG enrichment analysis of target genes was performed using the clusterProfler package in R software [33], with p values set to less than 0.05. Pathways were categorized in the KEGG database, with the top level containing seven broad categories, and we used the KEGG database to classify pathways.

Datasets for Validation.
Te dataset for miRNA expression in plasma used to construct the complex network is referred to as the original dataset. To make the results more convincing, two additional independent external datasets were used to validate miRNAs and target genes, respectively, which are referred to as new datasets.
In the new GSE49823 dataset [34], plasma miRNA expression data were recorded for the unstable angina patient group and the control group, with 13 samples for the disease group and 13 samples for the control group, making a total of 26 miRNA expression samples. Tis miRNA expression dataset was used to test the performance of the network analysis algorithm and the reliability of hub miRNA.
Te new GSE60993 dataset [35] contains gene expression data from the peripheral blood of patients with unstable angina and normal controls, with 9 samples from the disease group and 7 samples from the control group. Tis gene expression dataset was used to test the performance of the network modelling algorithm and the reliability of hub miRNA target genes.

Network Analysis.
Te WGCNA analysis method is based on the expression correlation between genes or RNAs for network analysis, however, using only correlation networks to fnd hub miRNAs does not fully utilize the information of complex networks. Similarity is diferent from correlation, so we proposed a method to calculate the similarity of miRNAs based on MTP heterogeneous networks to construct similarity networks and fnd hub miRNAs, which is called the SimCluster analysis method. Te fowchart of the network analysis part is shown in Figure 2.

WGCNA.
We used the WGCNA method to analyze the expression data of diferentially expressed miRNAs and fnd hub miRNAs for unstable angina for subsequent studies [12]. Te WGCNA method can transform the coexpression network into a scale-free network by setting the β parameter, with fewer nodes of a high degree and more nodes of a low degree [36].
Network analysis of miRNA expression data was performed using the ImageGP website [37] based on the WGCNA method, with parameters set to the signed network and Pearson correlation and R-squared set to 0.9. After calculation, the β parameter value was fnally chosen as 18.
Te obtained weighted miRNA coexpression network was hierarchically clustered to classify modules, and the top 10 most important miRNAs in each module were taken as hub miRNAs.

2.2.2.
SimCluster. Te SimCluster algorithm consists of two main parts, similarity network construction and hub miRNA screening, and the fow of the algorithm consists of the following 3 steps: (1) We used the MTP network described in Section 3.1 to calculate the frst-order similarity and secondorder similarity of miRNAs, where frst-order similarity refers to the similarity of miRNAs at the target gene network level (M-T), while second-order similarity refers to the similarity of miRNAs at the enriched pathway level (M-P). In the MTP network, each miRNA has its target set and pathway set, and the Jaccard similarity formula [38] is used to calculate the frst-order similarity value and secondorder similarity value of any two miRNAs. (2) We defned similarity values as thresholds and obtained diferent similarity matrices by setting different threshold lower limits, which were then converted into corresponding similarity networks. Drawing on the idea of the WGCNA method to construct a scale-free network [39], we calculated the distribution of degree values and degree frequencies of all miRNAs in the similarity network to determine whether the network was a scale-free network or not. Specifcally, we used Pearson's correlation coefcient and R 2 of linear regression to calculate whether the logarithm of degree values (lg (K)) and the logarithm of degree frequencies (lg (pK)) were highly correlated and linearly correlated. (3) For scale-free similarity networks, we used several network clustering algorithms to divide modules and selected the optimal network clustering algorithm using modularity values [40]. For each divided module, miRNA with the largest degree value was selected as hub miRNA.

WGCNA & SimCluster.
Te WGCNA method uses correlation between miRNAs, while the SimCluster method uses similarity between miRNAs. Te WGCNA & SimCluster method combines these two methods, containing both correlation and similarity information. Specifcally, the results of the WGCNA method and the SimCluster method were intersected to obtain fnal hub miRNAs.
We compared the performance of hub miRNAs obtained by the WGCNA method, the SimCluster method, and the WGCNA & SimCluster method on original and new datasets, respectively, to evaluate their potential as biomarkers, using the AUC (area under the ROC curve) and AUPR (area under the PR curve) as metrics. A comparison of the above three network analysis methods is shown in Table 1, including data, methods, and some results. 4 Computational Intelligence and Neuroscience

Network Modelling.
We used the data from the data preparation section to construct a miRNA-target genepathway heterogeneous network (MTP) containing three types of nodes and edges: M, T, and P and M-T, T-T, and T-P. Next, we modelled this complex heterogeneous network regulated by miRNAs using a series of knowledge graph models (including graph neural network models).

Models for Knowledge Graphs.
A knowledge graph is a set of many triples (h, r, and t) [41], each (h, r, and t) representing the head entity h, the tail entity t, and the relationship between them r. Knowledge graph models are advantageous in dealing with complex heterogeneous graphs consisting of diferent types of nodes and edges [42], and a number of models have been successively published.
We have modelled the constructed MTP network using a series of knowledge graph models or graph neural network models that have been published in recent years. After much experimentation, we selected the RotatE model [43] for the link prediction task and the RGCN model [44] for the multilabel classifcation task.
(1) Te RotatE model maps entities and relationships to a complex vector space and defnes each relationhip as a rotation between the head entity and the tail entity. It allows the modelling and inference of relationships such as symmetry, antisymmetry, inversion, and composition, which are difcult to accomplish with other models [43]. Te core formulations of the RotatE model are shown as follows: f(h, r, t) � − ‖h○r − t‖.
In the above four formulas, h, r, and t are the embedding representations of the head entity, relationship, and tail entity, respectively. Te Hadamard product is denoted by the symbol ○, and d r (h, t) is the distance calculated for each triple. Once the distance values are obtained, they can be optimized using Equation (2). d r (h, t) is the distance value for positive samples, while d r (h i ′ , t i ′ ) is the distance value for negative samples. p(h i ′ , r, t i ′ ) represents a new negative sampling method proposed by the authors of the model, where they designed a distribution function and sampled negative triples, calculating a probability value to be the weight value of the negative sample. Equation (3) is the scoring function for triples, with a higher score indicating a more realistic triple.
(2) Te RGCN model (relational graph convolutional network) is a graph neural network model for heterogeneous graphs. By following the idea of message passing network calculation [45], the formulas are shown as follows: N r i denotes the neighboring nodes of i under the r relation, c i,r is a regularization factor, W denotes the parameters of the layer, and σ is the activation function. Te implication of Equation (4) is that, for the node i, the representation of the neighboring nodes under each relationship connected to it is aggregated and added to the representation of the node i itself as the fnal representation. Te node update based on the heterogeneous network graph is thus completed by Equation (4), and triples are then scored by Equation (5). h L , R, and t L represent the fnal embedded representations of the head entity, relationship, and tail entity, respectively, and a higher score in Equation (5) indicates a more realistic triple.
(3) In addition to the RotatE model and the RGCN model, other advanced models were selected for comparison experiments to fnd the best model in order to perform the subsequent prediction task. RotatE is essentially a distance transformation model, RGCN is a classical heterogeneous graph neural network model, and we also selected TransE [46], which is also a distance transformation model, the Gaussian embedding model KG2E [47], the semantic matching model DistMult [48], and CompGCN [49], a model that combines knowledge graph algorithms with graph neural network algorithms.

Link Prediction Model.
Based on the constructed MTP network, we used knowledge graph-or graph neural network models to perform link prediction, i.e., to predict whether there is an edge between two nodes or whether it is a fact triple [46]. In the prediction task, for each triple, there are both fxed head entity and relationship to predict tail entity, and fxed tail entity and relationship to predict head entity. We refer to the model that performs the link prediction task as the link prediction model. Te constructed MTP network for unstable angina as a dataset contains 573 nodes and 12629 edges. We used the above model to complete ten-fold cross-validation, training on the training set while validating on the testing set [50]. Te performance of the model is evaluated using three metrics, Hits@k, MR, and MRR [46]: In Equation (6), the average of the number of correct predictions among the top k predictions for each triple is calculated. I is the indicator function, which is 1 if the condition is true and 0 otherwise. rank i is the rank of the correct triple among the predicted triples. N denotes the number of all triples to be predicted. MR and MRR are the mean values of the correct triple rank and the inverse of the correct triple rank, respectively.

Multilabel Classifcation Model.
We used the six models described above to obtain the embeddings of nodes and then used a two-layer fully connected neural network (MLP) to predict multiple labels for target genes or pathways , with the sigmoid function as the fnal activation function. We refer to the model that performs the multilabel classifcation task as the multi-label classifcation model. For the multilabel classifcation task, we still used the MTP network as the dataset for ten-fold cross-validation to assess the performance of the model using accuracy as a metric. Accuracy was calculated by equations (8)-(10).
Tere are 21 labels for targets (T) and 5 labels for pathways (P), representing the tissue localization of targets and classifcation of pathways, respectively. We frst converted the label values to 0 or 1 using Equation (8), and then, we used Equations (9) and (10) to calculate accuracy values.
Label i and Pred i are the label value and predicted value, respectively. I is the indicator function, and if the label value exists and is >0, then the label value is converted to 1; otherwise, it is converted to 0. K represents the number of label values, while N represents the total number of T or P.
Equations (9) and (10) show that the accuracy of each T or P is calculated and then averaged.

Comparison Experiments.
Te aim of the comparison experiments is to fnd the best link prediction model and the best multilabel classifcation model [52]. We used RotatE, TransE, KG2E, DistMult, RGCN, and CompGCN models to perform M-T, T-T, and T-P link prediction tasks, respectively, as well as the multilabel classifcation of T and the multilabel classifcation of P, respectively. Te embedding dimension of nodes was 64, the learning rate was 0.001, and the epoch was 50.

Parameter Optimization Experiments.
Te aim of the parameter optimization experiments is to fnd the optimal parameters for each model [53]. Tree parameters were selected for the parameter optimization experiments, namely, the embedding dimension, epoch, and learning rate.

Case Studies.
Elucidating the function of miRNAs can help us understand a disease more accurately. Tere is still a lack of systematic studies on miRNAs in unstable angina [4]. Terefore, to obtain hub miRNAs by the best network analysis method, we conducted a case study using the best link prediction model. Specifcally, the potential target genes of these hub miRNAs were predicted, i.e., the M-T link prediction task. For the prediction results, we performed a three-step validation method.

Tree-
Step Validation Method. Te importance and reliability of hub miRNAs have been validated in the network analysis section. In this section, we validate the results of the network modelling section using a designed three-step validation method.

Validation of Hub miRNA-Target Gene Interactions.
First, we assessed the reliability of the predicted hub miRNA-target gene interactions by searching the literature or other databases [54]. Te assessment was performed using the TopK metric, meaning the proportion of predictions that was correct in the top K rankings [55]. Te TopK results of all hub miRNAs were then averaged.

Validation of the Potential of Target Genes as
Biomarkers. In the second step, we validated the predicted new and existing target genes for hub miRNAs, which comprised two validation methods.
(1) Te target genes of hub miRNAs were validated for a new dataset (GSE60993) containing gene expression from unstable angina and healthy controls, using AUC and AUPR as evaluation metrics. (2) Transcription factors (TFs) are important proteins that regulate gene expression [56], and their dysregulation will cause abnormal gene expression, which is closely associated with the development and progression of complex diseases [57]. Te TF-Marker database [58] provides cell-and tissue-specifc TFs and related markers. We searched the TF-Marker database to verify whether the target genes of hub miRNAs are TFs or related markers in tissues such as the heart, blood vessels, and arteries, which are closely associated with the development of unstable angina.

Validation of the Function of Target
Genes. Finally, KEGG functional enrichment analysis was performed on these target genes [59]. Te reliability of the predictions was further assessed to know whether enriched pathways were classical and critical pathways in unstable angina. Trough these three steps, the reliability of miRNA target genes, the reliability of target genes as biomarkers, and the reliability of the functions performed by the target genes were successively validated.

MTP Network.
Tere were 386 diferentially expressed miRNAs in unstable angina, obtained by diferential expression analysis and after screening. By searching the target genes of miRNAs and the genes of unstable angina and taking the intersection, 232 intersecting genes were obtained, corresponding to 238 miRNAs, with a total of 2706 miRNA-target gene interactions.
For these intersecting genes, after setting the species to Homo sapiens and the minimum interaction score to 0.4, a total of 8696 protein-protein interactions were found. Next, KEGG functional enrichment was performed, and a total of 103 pathways were screened, resulting in a total of 1361 gene-pathway interactions. Te MTP network is summarized in Table 2, and the detailed data are available in Table S2 in Supplementary Materials. M, T, and P refer to miRNAs, target genes, and pathways, respectively. In the knowledge graph model, M-T, T-T, and T-P also denote the (M, MT, and T), (T, TT, and T), and (T, TP, and P) triples, respectively.

Modules and Hub miRNAs Based on the WGCNA
Method. Diferentially expressed miRNAs may play an important role in disease states [60], and either overexpressed or underexpressed miRNAs were included in our study. We used the expression data from these miRNAs to construct a weighted miRNA coexpression network, divided modules using the dynamic tree cutting method, and merged the modules to fnally obtain four modules. Te results are shown in Figure 3(a).
A total of four colored modules, yellow, turquoise, brown, and blue, were generated, from which hub miRNAs were searched, respectively. Te correlation between miRNAs was used as the weight of edges, and the importance of the nodes in each module was ranked according to the connectivity [61]. Te top 10 nodes in each module are shown in Figure 3(b), and a total of 40 hub miRNAs were fltered out. Te thickness of edges is proportional to the correlation value.

Similarity Network and the Scale-Free Network
Based on the SimCluster Method. As can be seen from Table 3, the results of frst-order similarity are better than those of second-order similarity. Figure 4(a) shows the variation of Pearson's correlation coefcient and linear regression (ordinary least squares) R 2 when diferent frst-order similarity thresholds are set. Te similarity value at which both Cor    Computational Intelligence and Neuroscience and R 2 are greater than 0.9 is used as the threshold (0.25), and links greater than this threshold are retained, while links less than this threshold are excluded. Figures 4(b) and 4(c) show the evaluation of whether a similarity network greater than the threshold is a scale-free network. It can be seen that lg (K) is highly correlated with lg (pK) and that the distribution of the two is linear. Tere are more nodes with a small degree K and fewer nodes with a large degree K. Fewer nodes connect most of the nodes, which is a characteristic of scale-free networks. Te scalefree frst-order similarity network is shown in Figure 5.
Te linear regression equations for the SimCluster_1 method (SimCluster based on frst-order similarity) and the SimCluster_2 method (SimCluster based on second-order similarity) can be found in Table S3 in Supplementary Materials.

Hub miRNAs Obtained Based on the SimCluster
Method. Modules can be obtained after network clustering of the scale-free similarity network, and thus, hub miRNAs afecting each module can be obtained. We have demonstrated in a previous study that the fast greedy algorithm is a good network clustering algorithm [29], and in this study, we have also demonstrated that the fast greedy algorithm is the best among the four network clustering algorithms by calculating modularity values. Te results of the comparison of network clustering algorithms are presented in Supplementary Materials.
We used the fast greedy algorithm to perform network clustering of the scale-free similarity network, fnding the nodes with the largest degree value from each module as hub miRNAs. In some modules, there were many nodes sharing the largest degree value, and these were included in subsequent studies.
Final hub miRNAs were obtained by taking the intersection of the results of the SimCluster method and the WGCNA method (see the nodes marked with red borders in Figure 3(b) or the red nodes in Figure 5). Hub miRNAs obtained by the six network analysis methods, including the SimCluster method, are listed in Supplementary Materials.

Performance Comparison of Network Analysis
Methods. We compared the performance of these network analysis methods using the mean AUC values and mean AUPR values of hub miRNAs obtained from each method when distinguishing between the disease and control groups. Table 3 shows the performance of hub miRNAs from the six methods. It can be seen that SimCluster_1 has better mean AUC values than the WGCNA method for both datasets, and the WGCNA & SimCluster_1 method has signifcantly improved over the WGCNA method in terms of both mean AUC values and mean AUPR values. Te important point is that these results reveal that the network analysis approach combining correlation and similarity (WGCNA & SimCluster) gives better results than the approach using correlation (WGCNA) or similarity (SimCluster) alone, which is a new approach to fnding hub nodes in the network. In addition, the WGCNA & SimCluster_2 method and the FC_hub method performed well for the original dataset and mediocrely for the new dataset, suggesting that neither the second-order similarity results nor the Fold-Change results generalize well.
Hub miRNAs obtained by the WGCNA & SimCluster_1 method achieved the best results for the new dataset. We next used hub miRNAs obtained based on the WGCNA & SimCluster_1 method for our subsequent study. Figure 6 shows two of these hub miRNAs, hsa-miR-30a-5p and hsa-miR-502-3p. Both of these hub miRNAs had an AUC and AUPR above 0.97 for the original dataset, and both had an AUC above 0.67 and an AUPR above 0.71 for the new dataset. Tis shows the potential of these hub miRNAs as biomarkers, and it would be meaningful to conduct in-depth studies.

Link Prediction.
Te MTP network constructed is a small complex heterogeneous network, as shown in Table 2, containing three types of nodes and edges. A knowledge  Figure 4: Generation of the scale-free frst-order similarity network. (a) Pearson's correlation coefcient Cor and linear regression R 2 between lg (K) and lg (pK) when setting diferent frst-order similarity thresholds. K refers to the degree value, while pK refers to the frequency of the degree value. (b) Te distribution of lg (K) and lg (pK) and the Pearson correlation coefcient. (c) R 2 of the linear regression equations for lg (K) and lg (pK), and the relationship between the predicted and true values. Te threshold value was set to 0.25.   graph model or a graph neural network model can complete node-level, edge-level, or even graph-level modelling of complex heterogeneous networks, using existing data to predict unknown data [42]. We frst performed link prediction for M-T, T-T, and T-P, which are predictions for the triples (M, MT, and T), (T, TT, and T), and (T, TP, and P) in the knowledge graph model. Comparison experiments and parameter optimization experiments were carried out using six evaluation metrics, Hits@5, Hits@10, Hits@,20, Hits@50, MR, and MRR. We refer to the models performing the link prediction task collectively as link prediction models.

Comparison Experiments.
Comparison experiments were conducted using six models, RotatE, TransE, KG2E, DistMult, RGCN, and CompGCN, to fnd the best link prediction model. Figures 7(a)-7(c) show the link prediction results for M-T, T-T, and T-P calculated by Hits@k, respectively. Table 4 shows the link prediction results calculated by MR and MRR. Te data in the fgure and table are the means and standard deviations of the results after tenfold cross-validation.
It can be seen that for M-T link prediction, the RotatE model achieved the best results for all six metrics. For T-P link prediction, the RotatE model achieved the best results in four of the six metrics, namely, Hits@5, Hits@10, MR, and MRR. Finally, in T-T link prediction, the RotatE model was slightly inferior to the best performing CompGCN model, but the run time of the RotatE model was only one-twentieth of that of the CompGCN model. Terefore, the RotatE model showed strong predictive power for M-T, T-T, and T-P predictions, and in particular, it showed the best prediction results for M-T and T-P predictions.
In addition, we can fnd that all these knowledge graph models or graph neural network models performed better for T-T link prediction and T-P link prediction, while they performed worse for M-T link prediction. In the M-T prediction task, Hits@5 of the best model was only 0.1789 ± 0.0167, while in the T-T and T-P prediction tasks, Hits@5 of the best model was 0.5751 ± 0.0227 and 0.3220 ± 0.0377, respectively. Tis may be due to the limitations of the model or insufcient information contained in the MTP network.
We fnally selected the RotatE model as the link prediction model to perform subsequent link prediction tasks. Te results of this section are also presented in Table S4 in Supplementary Materials.

Parameter Optimization Experiments.
We demonstrated through comparison experiments that the RotatE model has good performance on all three types of edge prediction and is therefore a good link prediction model. Next, we chose the RotatE model for parameter optimization experiments to fnd the most suitable parameters. Te three parameters optimized were the embedding dimension, learning rate, and epoch. In this study, we focused more on the function played by miRNAs, so we only performed parameter optimization experiments for M-T link prediction.  When the parameters were changed, the RotatE model had the best results for Hits@5 and Hits@10 at an epoch of 50, for Hits@5, Hits@10, Hits@20, and Hits@50 at a learning rate of 0.001, and for Hits@5 at an embedding dimension of 64. Table 5 shows the results calculated by MR and MRR. In the range of parameter variations, the RotatE model gave the best results for the metrics at an epoch of 50 and a learning rate of 0.001. Te results were very close for both MR and MRR at 64 and 128 embedding dimensions, and again, since the best results were obtained for Hits@5 at an embedding dimension of 64, we still set the embedding dimension to 64.
Terefore, by optimizing the three parameters, we fnally chose an epoch of 50, an embedding dimension of 64, and a learning rate of 0.001 as the fnal parameter settings for the M-T link prediction task.

Multilabel Classifcation.
Te multilabel classifcation task that we have accomplished is to classify targets and pathways into multiple categories [25]. Specifcally, the multilabel classifcation of targets refers to the classifcation of the tissue localization of targets. From the data obtained, targets can be distributed to as many as 21 tissues, with the top fve being the blood, liver, nervous system, lungs, and heart. Figure 9(a) shows the distribution of targets in different tissues, with the thickness of the lines proportional to the amount of gene expression.
Te multilabel classifcation of pathways refers to the division of pathways into diferent broad categories. Te KEGG database classifes all human pathways into 7 broad categories, and the pathways enriched by the target genes of these miRNAs can be classifed into 5 of these 7 broad categories. Figure 9(b) shows the classifcation of the enriched pathways into the fve categories of organismal systems, cellular processes, human diseases, environmental information processing, and metabolism.
We frst converted the label value of the target or pathway to 0 or 1, and if the target or pathway existed in a category, then the label value was 1; otherwise, it was 0. We then used a knowledge graph or graph neural network model to obtain the embedding representation of nodes and a 2layer MLP to predict the multilabel classifcation of nodes. Te means and standard deviations of the results after tenfold cross-validation are shown in Table 6.
It can be seen that the RGCN model achieved the best performance in the multi-label classifcation prediction task for both T and P, with accuracy rates as high as 0.8312 ± 0.0220 and 0.8356 ± 0.0404, respectively. Terefore, the RGCN model is the best multilabel classifcation model for both types of nodes for the prediction task.

Case Studies.
Trough comparison experiments, we have demonstrated that the RotatE model is the best link prediction model, and through parameter optimization experiments, we have found the optimal parameters for the  RotatE model. In this section, we conduct a case study of hub miRNAs obtained by the WGCNA & SimCluster_1 method. Te role of miRNAs in regulating gene expression is well worth investigating in depth. Unstable angina is a relatively complex disease, and the functions played by miRNAs in this disease are still not fully elucidated. Studying the function of hub miRNAs is a convenient way to understand the function of all miRNAs [62]. We used the RotatE model to predict the potential target genes of hub miRNAs, the unknown M-T link prediction task. Tis is also referred to as a complementary task for the knowledge graph.
Eleven hub miRNAs were obtained by the WGCNA & SimCluster_1 method. We performed M-T link prediction for these 11 hub miRNAs, and the detailed prediction results are shown in Table S5 in Supplementary Materials. When predicting potential M-T links, for each hub miRNA, the top 10 target genes in terms of predicted scores were taken, so there were 110 hub miRNA-target gene interactions in total. After de-duplication, only 38 genes of the predicted target genes remained.

Validation of the Results.
In the network analysis section, we have used two datasets to validate the best network analysis method and hub miRNAs obtained by the method. In the network modelling section, the results for the  Computational Intelligence and Neuroscience 13 potential target genes of hub miRNAs also need to be validated to demonstrate the reliability of model predictions [63]. Using the designed three-step validation method, we frst validated the reliability of hub miRNA-target gene interactions. As can be seen in Figure 10(a), 50%, 53%, 48%, and 40% of the top 1, 3, 5, and 10 predictions, respectively, were validated by the literature or other databases. Tat is, after discarding those interactions that had already appeared in the dataset, a large proportion of the predicted unknown hub miRNA-target gene interactions were validated. Second, the predicted target genes of hub miRNAs were validated using two diferent methods. On the one hand, based on a new gene expression dataset, we tested whether these target genes could distinguish unstable angina samples from control samples, i.e., whether they had potential as biomarkers. Figures 10(b) and 10(c) show the ROC curves and PR curves for 5 of the 38 target genes. It can be seen that these target genes can distinguish well between disease and control samples, and their diferential expression between samples perhaps gives them the function of biomarkers. On the other hand, we searched the TF-Marker database to determine whether these target genes were transcription factors or related markers, or whether they were transcription factors or related markers that were closely associated with unstable angina. Table 7 shows the search results of the TF-Marker database. It can be found that six of the ten newly predicted target genes are transcription factors or related markers, and three of them are directly associated with unstable angina, with the proportions of 60% and 30%, respectively, which are larger than the proportions in the training or validation sets. Terefore, we validated the potential of these target genes as biomarkers in two ways.
Finally, we performed KEGG functional enrichment on the predicted target genes of hub miRNAs, and the top 20 pathways at a p value are shown in bubble plots (see Figure 10(d)). Te bubble size indicates the number of genes, with red colors indicating smaller p values, and GeneRatio is the ratio of the number of genes enriched into the pathway to the total number of genes used, with larger ratios indicating a greater number of genes involved in the pathway. As shown in Figure 10(d), these target genes were enriched in pathways such as lipid and atherosclerosis (gene number � 7, p < 0.0001), fuid shear stress and atherosclerosis (gene number � 7, p < 0.0001), and the NF-kappa B signaling pathway (gene number � 7, p < 0.0001) [64][65][66], which were associated with the development and progression of unstable angina. Terefore, the predicted target genes were validated from a functional enrichment perspective, and many of the enriched pathways are involved in the pathology of unstable angina. In conclusion, we validated the veracity and reliability of the model's results through a cascade of three steps.

Discussion
Many noncoding RNAs are regulators and play an important role in the process of gene expression in a regulatory manner [67]. Currently, miRNAs are considered important types of noncoding RNAs and mostly play a role in regulating gene expression in a negative regulatory manner [3].
Te expression of some miRNAs and their regulated target genes (mRNAs) varies in diferent diseases and is diseasespecifc [68]. Terefore, it is theoretically feasible to fnd miRNAs or genes that are specifc to a disease and act as biomarkers or diagnostic markers [69]. Unstable angina is a complex disease with multifactorial and multisystemic involvement, and the pathogenesis and treatment of this disease still require in-depth investigation. Previous studies on unstable angina have mostly focused on genes and function while neglecting regulatory functions played by regulatory factors such as miRNAs and are therefore inadequate and incomplete [70,71]. In this study, we designed a research strategy using miRNA expression data and miRNA regulatory networks to frst fnd the best performing WGCNA & SimCluster_1 method by comparing six network analysis methods for original and new datasets, and then, we used this method to obtain hub miRNAs that could be used as biomarkers. Te best model from the network modelling section was then used to predict unknown functions based on the existing functions of hub miRNAs.
Te miRNA regulatory network is also a complex network and the WGCNA approach only analyzes the correlation network constructed based on the expression levels of miRNAs and does not take full advantage of the other information in the complex network. Te similarity of nodes in a network is also important information that can be exploited [72]. We constructed a frst-order similarity network and a second-order similarity network based on the similarity of miRNA actions and designed a network analysis algorithm to fnd hub miRNAs using the similarity network. Notably, we used the formation of scale-free networks as the judgment criterion when screening similarity thresholds, which coincided with the WGCNA method when screening soft thresholds [61]. A comparison of multiple network analysis methods showed that the WGCNA & SimCluster_1 method, which combines correlation and similarity, achieved the best results for the new dataset. Tis suggests that it is feasible to use correlation and similarity between miRNAs to screen for hub miRNAs, with better results than using similarity or correlation alone.
Knowledge graph models are mostly used for large, complex heterogeneous graphs, often containing hundreds or thousands of types of nodes and edges [73]. Because various heterogeneous graphs are so diferent from each other, researchers have developed a variety of dedicated knowledge graph models, so that each model has its own advantages [74]. In fact, the MTP network that we have constructed is a small, complex heterogeneous network, and the knowledge graph model performs equally well on small heterogeneous networks. Of the six models we used, RotatE, TransE, KG2E, and DistMult are knowledge graph models, RGCN is a classical graph neural network model, and CompGCN is a model combining graph neural networks with knowledge graphs, all of which are trained in the same way as knowledge graph modelling. Te knowledge graph model uses a set of many triples as a dataset and evaluates the performance of the model by predicting missing head or tail entities and using a scoring function. Te task of predicting missing head or tail entities can also be considered a link prediction task, and the discovery of unknown links is of practical importance [46].
In this study, we performed node-level and edge-level predictions for the MTP network using a knowledge graph or graph neural network model. In edge-level modelling, we made predictions for all three types of edges in the network. For the Hits@5 metric, the RotatE model yielded results of 0.1789 ± 0.0167, 0.5600 ± 0.0185, and 0.3220 ± 0.0377 for M-T, T-T, and T-P link prediction, respectively, with the best performance for M-T and T-P link predictions and the second-best performance for T-T link prediction after the CompGCN model. In terms of MR and MRR metrics, the RotatE model also performed best for M-T and T-P link predictions and worse than the CompGCN and RGCN models for T-T link prediction. Tis shows that the RotatE model has good predictive ability and performs better than    other advanced models. In node-level modelling, for all six models, the RGCN model achieved the best performance for multilabel classifcation prediction for both T and P, with accuracy rates of 0.8312 ± 0.0220 and 0.8356 ± 0.0404, respectively. Tis suggests that the graph neural network model has an advantage in node-level prediction [75]. We also performed parameter optimization experiments on the RotatE model when performing the M-T link prediction task, and the best performance was achieved when setting the epoch, embedding dimension, and learning rate to 50, 64, and 0.001, respectively. Te hub miRNAs that we have selected as key nodes in each module may play a vital regulatory role in the disease [76]. Te functions of these hub miRNAs are well worth investigating in depth. We performed an M-T link prediction task on 11 hub miRNAs using the best link prediction model and optimal parameters to predict the target genes of these hub miRNAs and perform functional enrichment. Te reliability of the results can only be demonstrated after the model has been validated [77], so we developed a three-step validation method for the designed model and the content of this study to perform a three-part validation. In the frst part, we validated a total of 110 predicted hub miRNA-target gene interactions by searching the literature or other databases. Te percentage of correct predictions in the top 1, top 3, and top 5 was 50%, 53%, and 48%, respectively. In the second part, the target genes of hub miRNAs had the ability to distinguish well between diferent groups of samples for the gene expression dataset. Moreover, 60% of the predicted novel target genes were transcription factors or markers, and 30% were directly associated with the development of unstable angina. In the third part, many of the enriched pathways were associated with unstable angina, which, on the other hand, proved the reliability of predicted target genes. In terms of specifc mechanisms, lipid levels in blood and fuid shear stress of local blood are important for the pathogenesis of coronary atherosclerosis [65,66], which is an important pathological feature of unstable angina. NF-kappa B is a key transcription factor involved in many physiological and pathological processes, including immune response, apoptosis, and infammation [64]. Studies [78,79] have shown that in the pathology of atherosclerosis, NF-kappa B is critical for the crosstalk between cytokines, adhesion molecules, and growth factors, leading to the formation, growth, and eventual rupture of atherosclerotic plaques. Tis three-step validation method contains three parts in sequential order, validating the results of our model in a cascading manner by verifying miRNA, target gene, and function in succession.
It can be seen that although the WGCNA & SimCluster_1 method performs best for the new dataset, the AUC and AUPPR are still low, and further improvement on this basis is necessary. In addition, none of the six knowledge graphs or graph neural network models that we used performed very well for M-T link prediction, which may be due to limitations in the models themselves or insufcient information in the MTP networks used. In the future, we will seek to develop superior models or use larger heterogeneous networks and we will also conduct generalization ability experiments in order to generalize the present modelling strategy to other diseases.

Conclusions
We constructed a complex heterogeneous network regulated by miRNAs in unstable angina and then analyzed and modelled it, which explored the functions played by noncoding RNAs in complex diseases from the miRNA perspective. Among the six network analysis methods for fnding hub miRNAs, the WGCNA & SimCluster_1 method yielded the best results for the new dataset, identifying hub miRNAs that could act as biomarkers. Comparative experiments with six knowledge graphs or graph neural network models demonstrated that RotatE is a good link prediction model and that RGCN is the best multilabel classifcation model for the miRNA regulatory network. Optimal parameters for the M-T link prediction task were obtained by parameter optimization experiments. Te results of the predicted target genes of hub miRNAs based on the best model and the best parameters were validated by three methods. Our modelling strategy can be used as a reference for other disease and noncoding RNA studies.

Data Availability
All data can be found in this article or in Supplementary Materials. Codes can be requested from the corresponding author.

Conflicts of Interest
Te authors declare no conficts of interest.