Recently, in most existing studies, it is assumed that there are no interaction relationships between drugs and targets with unknown interactions. However, unknown interactions mean the relationships between drugs and targets have just not been confirmed. In this paper, samples for which the relationship between drugs and targets has not been determined are considered unlabeled. A weighted fusion method of multisource information is proposed to screen drug-target interactions. Firstly, some drug-target pairs which may have interactions are selected. Secondly, the selected drug-target pairs are added to the positive samples, which are regarded as known to have interaction relationships, and the original interaction relationship matrix is revised. Finally, the revised datasets are used to predict the interaction derived from the bipartite local model with neighbor-based interaction profile inferring (BLM-NII). Experiments demonstrate that the proposed method has greatly improved specificity, sensitivity, precision, and accuracy compared with the BLM-NII method. In addition, compared with several state-of-the-art methods, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) of the proposed method are excellent.
Targets refer to biological macromolecules which can specifically bind to small molecule compounds in the organism and produce specific physiological or pharmacological effects. They have the function of organism physiological accommodation or disease prophylaxis and treatment. The most common targets are ion channels, enzymes, receptors, and other molecules. Drug-target interaction prediction is widely used nowadays. Furthermore, it has important implications for elucidating the mechanism of drug molecules, which can be used for the manufacture of new drugs [
Traditional drug-target interaction prediction methods are roughly split into docking simulation methods [
In recent years, scholars from all over the world have proposed some methods from all aspects for the study of drug-target interactions (DTIs), which greatly improved the prediction efficiency and accuracy. Compared with traditional methods, these methods make full use of computer technology to assist research, which is helpful for shortening the development cycle of new drugs and reducing research costs. The commonly used methods are mainly divided into four kinds, including prediction methods based on matrix decomposition, prediction methods based on network inference, prediction methods based on drug and target characteristics, and prediction methods based on the bipartite graph model. The mentioned method will be introduced in the following.
The method based on matrix decomposition predicted the relationships between drugs and targets by matrix decomposition. Gonen et al. [
Predicting the relationship derived from the network inference method mainly refers to constructing a heterogeneous network by using the similarity of drug to drug and similarity of target to target. Then, DTIs can be predicted based on the network. These kinds of methods can be split into three types: supervised, semisupervised, and unsupervised. Cheng et al. [
Most DTI prediction methods based on machine learning demand features of drugs and targets to predict the interaction [
Bleakley et al. [
In the existing research, most researchers do not know which interaction pairs are negative samples. However, there may be some drug-to-target pairs in unlabeled samples which have interactions but have not been verified by experiments. In this paper, these unknown interaction pairs are regarded as unlabeled samples. The unlabeled samples are screened by three methods: the drug similarity method, the random walk with restart method, and the WNN-GIP method. Then, the weighted fusion method of multisource information is used to fuse the screened results obtained by the three methods. Finally, the interaction matrix in the training set is revised according to the fusion results, and then, we utilize the BLM-NII model to predict interactions. Experiments show that the proposed method can obtain a superior prediction effect.
Graph is a kind of data structure which can be used to express the complex interactive relationship in the real world. Each graph has two basic components, namely, nodes and edges. Nodes are connected by edges. In terms of drug-to-target interaction prediction, drugs and targets are expressed by nodes and the relationship is expressed by edges. For the graph composed of drugs and targets, the random walk can be made on the graph so as to predict the interactions.
Random walk is a common method of information dissemination. The fundamental principle of the random walk is to walk from one vertex by traversing a graph. At each vertex, a random walker has two choices: one of the choices is to walk to the neighbor of this vertex with probability
For drug-target relationship prediction, the heterogeneous network is shown in Figure
Drug-target interaction heterogeneous network.
The random walk with a restart can effectively integrate the abovementioned networks into a framework. The constructed heterogeneous network does not depend on the three-dimensional structure information of the drug and target. However, it is known that the drug-to-target interactions only account for a small part, which leads to sparse interactions in heterogeneous networks. For sparse networks, new drugs or new targets are often isolated. It is difficult for us to predict the interaction, which also limits the improvement of random walk capacity. To promote the predictive power of the random walk, the multisource information fusion method can be used to select drug-to-target pairs with high interaction probability. Then, we add the selected drug-to-target pairs to the positive samples. Thus, more reliable drug-target interaction relations can be obtained, the sparsity of the network can be reduced, and isolated subnetworks can also be reduced.
In this paper, we assume that
Similarly, we can combine
Combining
GIP can only deal with drugs that have at least one known interaction. For new drugs, weighted nearest neighbor (WNN) information is used to predict drug interaction relationships, which is shown in the following formula:
WNN infers the interaction of new drugs according to the interaction relationship in the dataset, and the prediction score is the weighted sum of all drug interactions. Among them, the weight is determined by how similar the current drug is to the new drug. The drug with high similarity to the new drug has high weight, while the drug with low similarity has low weight and makes little contribution to the final prediction results.
GIP is used to predict drugs with at least one known interaction, and WNN is used to predict new drugs. Combining the advantages of the mentioned two algorithms, WNN-GIP can be obtained to predict drug-target interactions. However, WNN-GIP [
Prediction methods based on drug similarity, random walk with restart, and WNN-GIP have their own advantages. The methods based on drug similarity can make better use of the structural similarity between drugs to predict their interactions. Random walk with restart can integrate multiple networks, which makes full use of the correlation between nodes to predict. WNN-GIP can predict new drugs with low computational complexity. For the sake of combining the advantages of the abovementioned three methods, decreasing the computational complexity, and improving the prediction accuracy, a drug-to-target prediction method based on multisource information weighted fusion is proposed. The flow chart is shown in Figure
Flow chart of multisource information weighted fusion.
In this paper, based on the chemical structure information of the KEGG LIGAND database [
According to the selected drugs with high similarity and the abovementioned hypothesis, some pairs with possible interactions are selected, and the existing interaction matrix is revised to get a new interaction matrix. Using revised datasets for prediction can reduce the false-negative error caused by treating unlabeled samples as negative samples. The process of interaction matrix revision based on drug similarity is shown in Figure
Screening based on drug similarity.
In Figure
The abovementioned selected pairs are added to the positive samples in the training set, and the transition matrix of the random walk is represented by
The random walk process in a heterogeneous network can be written as follows:
In the training set, the abovementioned selected pairs are added to the positive samples, and
The weight of each method represents its contribution to the results, and the weight is determined by its prediction effect. The method with good effect contributes a lot to the result, and the corresponding weight is also large. The final matrix
In this paper, we adopt datasets summarized in the literature [
Summary of the datasets.
Dataset | Drugs | Targets | Drug-target interactions | Unknown interactions |
---|---|---|---|---|
NR | 54 | 26 | 90 | 1314 |
GPCR | 223 | 95 | 635 | 20550 |
IC | 210 | 204 | 1476 | 41364 |
E | 445 | 664 | 2926 | 292554 |
The pairs with known interactions only account for a small fraction of the available data and most relationships are unknown, which leads to a small number of positive samples in the current datasets. The proportion of unlabeled samples is large, and the dataset is unbalanced. If only one evaluation index is used to evaluate our method, it is not comprehensive enough. Therefore, four basic indexes of accuracy, sensitivity, specificity, and precision are intended for assessing the model capability.
For better describing the superiority of the proposed method, the receiver operating characteristic curve (ROC) is also intended for assessing the capability of the DTI method. The ROC curve was drawn with true positive rate as the ordinate and false-positive rate as the abscissa. The closer the ROC curve gets to the top left corner, the higher the accuracy of the DTI prediction method. The ROC curve combines sensitivity and specificity with a graphic method, which can simply and intuitively analyze the accuracy of the experimental method. The values of AUC and AUPR are also given. AUC is the area under the ROC curve. AUC is greater than 0 and less than 1 [
To prove the validity of the dataset revised by multisource information fusion in the proposed method, we compared the proposed method with the BLM-NII method in accuracy, sensitivity, specificity, and precision. A 10-fold cross-validation is used in this paper. When calculating, all prediction results are sorted. The top 1% pairs are taken as positive samples. The accuracy, sensitivity, specificity, and precision of the prediction results can be acquired by comparing the prediction results with known datasets. Table
Comparison of accuracy, sensitivity, specificity, and precision between our method and BLM-NII.
Dataset | Method | Accuracy | Sensitivity | Specificity | Precision |
---|---|---|---|---|---|
NR | BLM-NII | 91.66 | 71.43 | 92.70 | 33.33 |
Ours | 93.06 | 85.71 | 93.43 | 40.0 | |
GPCR | BLM-NII | 92.28 | 88.89 | 92.38 | 26.29 |
Ours | 92.75 | 96.83 | 92.62 | 28.64 | |
IC | BLM-NII | 93.28 | 92.22 | 93.32 | 35.90 |
Ours | 93.56 | 95.81 | 93.47 | 37.30 | |
E | BLM-NII | 90.81 | 92.83 | 90.79 | 8.76 |
Ours | 90.86 | 95.70 | 90.82 | 9.04 |
From Table
To analyze the capability of our method more intuitively, Figure
ROC of the proposed method in each dataset. (a) ROC in NR. (b) ROC in GPCR. (c) ROC in IC. (d) ROC in E.
To state the validity of the multisource information fusion method, we compare the proposed method to the prediction results when the drug-target dataset is revised by a single method. These methods are as follows: (1) SIM: selected pairs based on drug similarity; (2) RS: selected pairs based on the random walk with a restart; and (3) WS: selected pairs based on WNN-GIP. The objective evaluation indicators adopted in this paper are AUC and AUPR. The experimental results are demonstrated in Table
Comparison of AUC and AUPR values between the proposed method and other single screening methods.
AUC/AUPR | NR | GPCR | IC | E |
---|---|---|---|---|
SIM | 0.922/0.586 | 0.960/0.547 | 0.978/0.777 | 0.982/0.686 |
RS | 0.909/0.567 | 0.936/0.483 | 0.976/0.830 | 0.972/0.687 |
WS | 0.908/0.582 | 0.943/0.518 | 0.984/0.718 | 0.971/0.569 |
Ours | 0.925/0.717 | 0.963/0.707 | 0.986/0.914 | 0.985/0.898 |
Influence of different fusion methods on the prediction of drug-target interactions.
AUC/AUPR | NR | GPCR | IC | E |
---|---|---|---|---|
AVE | 0.903/0.655 | 0.883/0.351 | 0.964/0.718 | 0.966/0.436 |
VOTE | 0.897/0.616 | 0.894/0.400 | 0.970/0.759 | 0.964/0.437 |
Ours | 0.925/0.717 | 0.963/0.707 | 0.986/0.914 | 0.985/0.898 |
AUC and AUPR values of our method and several state-of-the-art methods in the NR dataset.
NR | AUC | AUPR |
---|---|---|
NetLapRLS | 0.808 | 0.457 |
BLM-NII | 0.903 | 0.655 |
WNN-GIP | 0.871 | 0.584 |
ALADIN | 0.664 | 0.310 |
MOLIER | 0.911 | 0.683 |
Ours | 0.925 | 0.717 |
AUC and AUPR values of our method and several state-of-the-art methods in the IC dataset.
IC | AUC | AUPR |
---|---|---|
NetLapRLS | 0.967 | 0.827 |
BLM-NII | 0.964 | 0.718 |
WNN-GIP | 0.953 | 0.653 |
ALADIN | 0.980 | 0.875 |
MOLIER | 0.983 | 0.912 |
Ours | 0.987 | 0.914 |
AUC and AUPR values of our method and several state-of-the-art methods in the GPCR dataset.
GPCR | AUC | AUPR |
---|---|---|
NetLapRLS | 0.913 | 0.590 |
BLM-NII | 0.882 | 0.350 |
WNN-GIP | 0.930 | 0.498 |
ALADIN | 0.946 | 0.680 |
MOLIER | 0.952 | 0.753 |
Ours | 0.963 | 0.707 |
AUC and AUPR values of our method and several state-of-the-art methods in the E dataset.
E | AUC | AUPR |
---|---|---|
NetLapRLS | 0.964 | 0.784 |
BLM-NII | 0.966 | 0.436 |
WNN-GIP | 0.957 | 0.748 |
ALADIN | 0.966 | 0.822 |
MOLIER | 0.985 | 0.897 |
Ours | 0.986 | 0.898 |
In Table
To state the validity of the weighted fusion method, the fusion method is replaced by the average fusion method (AVE) and the voting fusion method (VOTE). The results are displayed in Table
AVE stands for the experimental results obtained by averaging DTI matrices
To certify the availability of our method, we compared the proposed method with several state-of-the-art methods, whichare as follows: (1) NetLapRLS [
Tables
In Table
In Table
Table
Table
In this paper, a DTI prediction method based on the weighted fusion of multisource information is proposed. In this method, the samples with unknown interaction relationships are regarded as unlabeled samples. The samples which may have interaction but have not been verified by experiments are screened out, and the original dataset is revised according to the screening results. According to the experimental results, we can find that the proposed weighted fusion method is more reasonable than the averaging and voting methods. The weighted fusion method increases the effectiveness and reliability of the screening results. Both the AUC and AUPR of the proposed method have achieved better results. However, the proposed method also has some limitations. It performs better in datasets with more samples, while the generalization ability will become worse in datasets with fewer samples. Especially for datasets with fewer positive samples, the prediction accuracy needs to be improved. It may be that the fusion model has brought some restrictions, and AUPR should be further improved. In the future, we can combine more biological information in prediction so that more drug-target pairs with known interactions can be introduced. Because more known relationships can reduce isolated nodes in the network, it is more helpful to predict edge relationships in the network. Meanwhile, we can further explore the fusion method. The goal is to find a fusion model that can be flexibly change to achieve a better fusion effect. Next, we can reduce the constraints brought by fusion to optimize the model.
The DTI prediction test data used to support the findings of this study were supplied by
The authors declare no conflicts of interest regarding the publication of this paper.
This work was supported in part by the National Natural Science Foundation of China under Grant nos. 62172139 and 61401308, Natural Science Foundation of Hebei Province under Grant nos. F2020201025, F2019201151, and F2018210148, Science Research Project of Hebei Province under Grant nos. BJ2020030 and QN2017306, Foundation of President of Hebei University under Grant no. XZJJ201909, Natural Science Foundation of Hebei University under Grant nos. 2014-303 and 8012605, and Open Foundation of Guangdong Key Laboratory of Digital Signal and Image Processing Technology (2020GDDSIPL-04). This work was also supported by the High-Performance Computing Center of Hebei University.