Network inference and local classification models have been shown to be useful in predicting newly potential drug-target interactions (DTIs) for assisting in drug discovery or drug repositioning. The idea is to represent drugs, targets, and their interactions as a bipartite network or an adjacent matrix. However, existing methods have not yet addressed appropriately several issues, such as the powerless inference in the case of isolated subnetworks, the biased classifiers derived from insufficient positive samples, the need of training a number of local classifiers, and the unavailable relationship between known DTIs and unapproved drug-target pairs (DTPs). Designing more effective approaches to address those issues is always desirable. In this paper, after presenting better drug similarities and target similarities, we characterize each DTP as a feature vector of within-scores and between-scores so as to hold the following superiorities: (1) a uniform vector of all types of DTPs, (2) only one global classifier with less bias benefiting from adequate positive samples, and (3) more importantly, the visualized relationship between known DTIs and unapproved DTPs. The effectiveness of our approach is finally demonstrated via comparing with other popular methods under cross validation and predicting potential interactions for DTPs under the validation in existing databases.
Since experimental determination of compound-protein interactions or potential drug-target interactions remains very challenging (e.g., requiring a huge amount of money and taking a very long period) [
In terms of DTI network, predicting newly potential DTI is equivalent to predicting new edges in the network. Researchers developed network-based inference model (NBI) to deduce the potential interactions among unapproved DTPs in given DTI networks and further confirmed them from
With a different idea of regarding similarity matrices of drugs and targets as kernel matrices, kernel-based techniques of classification, such as bipartite local model (BLM) [
To summarize, three issues in existing predictive models are not yet solved. (1) Predicting interactions between drugs and targets occurring in isolated subnetworks of DTI network is difficult. (2) Inadequate positive samples usually cause biased local classifiers and local classification approach requires a number of classifiers. (3) The global relationship between approved DTIs and unapproved DTPs cannot be investigated in a consistent space.
Except for the predictive model, similarity measuring is another crucial factor in DTI prediction because similar drugs tend to interact with similar targets [
In this paper, we believe that the difference between the similarities of drugs/targets sharing targets/drugs and the similarities of drugs/target sharing no target/drug in DTI network should be statistically significant. To address abovementioned issues, we first characterized each drug-target pair from the views of both drugs and targets, respectively. Under the publicly acceptable assumption that similar drugs tend to target similar protein receptors [
Subsequently, we represented each drug-target pair as a feature vector which uniformly consists of four scores, regardless of the available path between drugs and targets. Each drug-target pair was labeled as positive or negative sample, depending on whether it is an approved DTI or an unapproved DTP. The use of all DTIs can guarantee that enough positive samples can be used to train the only one global classifier. After performing principal component analysis on feature vectors, we generated a drug-target pair space which provides a visualized way to investigate the relationship between known DTIs and unapproved DTPs.
In addition, to obtain a better combination between topological similarity and chemical/sequence similarity, we proposed an adaptive combination rule instead of the former linear combination and introduced a complete metric of topological similarity of drugs/targets by considering both the targets/drugs shared by two drugs/targets and the targets/drugs interacting with none of them.
Finally, based on four benchmark datasets, we demonstrated the effectiveness of our approach, by comparing with NBI, BLM, and BLM’s extensions in cross validation and predicting potential interactions in unapproved DTPs under checking in existing databases.
In this paper, the adopted datasets, involving targets of ENZYME, ION CHANNEL, GPCR, and NUCLEAR RECEPTOR, were originally from [
Four datasets used in this work.
Dataset name | #Drugs | #Targets | #Interactions | Proportion of unreachable paths between drugs | Proportion of unreachable paths between targets |
---|---|---|---|---|---|
EN | 445 | 664 | 2926 | 0.479 | 0.479 |
IC | 210 | 204 | 1476 | 0.019 | 0.029 |
GPCR | 223 | 95 | 635 | 0.345 | 0.593 |
NR | 54 | 26 | 90 | 0.615 | 0.778 |
# denotes the number of drugs, targets, or drug-target interactions in dataset.
The metrics of drug similarity and target similarity popularly adopted in former methods are chemical structure-based similarity and protein sequence-based similarity, respectively [
In order to capture the real similarity between drugs/targets sharing common targets/drugs in a better way, former methods tried to propose new similarities and integrate them into abovementioned similarities. Under the framework of BLM, Gaussian interaction profile (GIP) was introduced to measure topological similarity between drugs/targets by considering DTI matrix as the adjacent matrix of DTI network [
In former work [
We observed that the topological similarity always works better when those drugs link to a target node of small degree; in contrast, chemical similarity always works better when those drugs link to a target node of large degree, respectively. Consequently, we designed an adaptive combination rule to expectedly achieve better prediction for MI. For target
A publicly acceptable assumption is that similar drugs tend to target similar protein receptors [
Given
For drug
For target
Totally, we group all interactions into four types according to DTI network (Figure
Topological motifs in drug-target network. (a) Multiple, (b) drug-centered, (c) target-centered, and (d) single pairs. Drugs and targets are denoted by circle nodes and rounded squares nodes, respectively. The pairs between concerned drugs (blue) and concerned targets (pink) are denoted by thick lines. The interactions between concerned nodes (filled by colors) and other nodes (hollow) are represented by dotted lines.
Either the target or the drug of a multiple interaction has >1 links to drugs or targets, respectively. The target of a drug-centered interaction has only one link to the drug interacting with >1 targets. The drug of a target-centered interaction has only one link to the target interacting with >1 drugs. Both the target and the drug of a single interaction only link to each other. A single interaction is usually newly approved [
With the representation of feature vector, we can map all drug-target pairs, including the pairs between new drugs and new targets, into the same space regardless of whether the drug and the target are in the same subnetwork or not.
To check whether or not known interactions and unapproved pairs can be classified well in certain dimensions, we made the distributions of
The distributions between known interactions (four types of motifs) and unapproved drug-target pairs. All histograms were generated by sorting scores into specific bins from 0.35 to 1.1. The
Multiple motifs
Drug-centered motifs
Target-centered motifs
Single motifs
Known DTIs and unapproved DTPs show separations in terms of distributions of four scores. That is to say, they can be classified in certain dimensions (scores). In detail, (1) for multiple motifs (Figure
In terms of
On the other hand, both
Therefore, integrating all four scores together by combination, such as principal component analysis (PCA), can hopefully generate a better separation because known DTIs and unapproved DTPs can be classified in individual dimensions. After performing PCA on these four scores, we showed a space of drug-target pairs on the first three principal components (in Figure
Drug-target pair space. Unapproved DTPs are marked by cyan crosses. Approved DTPs of drug-centered, target-centered, single, and multiple motifs are marked by red circles, green triangles, yellow diamonds, and purple squares, respectively.
In this section, we shall first demonstrate the effectiveness of our topological similarity metric and our adaptive combination of similarities, compare our approach with other popular methods, including NBI [
By applying PCA on feature vectors of all drug-target pairs, we used the distances of both known interactions and unapproved pairs to the origin as the confidence scores for both validating the performance of our approach and predicting potential drug-target interactions (more details in Section
To illustrate why our approach achieved better results, we first compared GIP similarity and our MI similarity under BLM framework and our approach, respectively. Using the topological similarities only, we selected the sparsest DTI network (NR dataset) from the work [
Comparison between topological similarities.
GIP (AUC/AUPR) | MI (AUC/AUPR) | |
---|---|---|
BLM |
|
|
Ours |
|
|
Then, we also applied linearly weighted combination to integrate MI with chemical structure similarity/sequence similarity in our approach, respectively. In terms of the values of AUC and AUPR, the linear combination achieved 0.977 and 0.826 while the adaptive combination achieved 0.982 and 0.949. Again, our adaptive combination is better than the linear combination.
To validate the effectiveness of our approach, we made a comparison with other approaches [
Comparison with other three methods under LOOCV.
BLM* | BLM-GIP* | Our* | NBI# | BLM-GIP# | BLM-NII# | Our# | |
---|---|---|---|---|---|---|---|
EN | 0.976/0.833 | 0.966/0.845 |
|
0.975 | 0.978/0.915 | 0.988/0.929 |
|
IC | 0.973/0.781 | 0.971/0.807 |
|
0.976 | 0.984/0.943 | 0.990/ |
|
GPCR | 0.955/0.667 | 0.947/0.660 |
|
0.946 | 0.954/0.790 | 0.984/0.865 |
|
NR | 0.881/0.612 | 0.864/0.547 |
|
0.838 | 0.922/0.684 | 0.981/0.866 |
|
#Combining topological similarities (MI) with chemical similarity and sequence similarity, respectively. NBI only provides AUC values and run tests under 5-fold cross validation (5CV) which is statistically same as LOOCV when the number of samples is enough.
Moreover, our approach has other advantages. First, our approach holds a sufficient number of positive samples (all known DTIs) even if the number of negative samples is large, while BLM may suffer from biased classifier models since each of its local models is trained by few positive samples (even 0 or 1 sample sometimes). Then, our approach only needs to train only one classifier whereas BLM and its extensions need to build many classifiers accounting for all targets and all drugs. Last but most importantly, with the representation of feature vector, we are able to put all drug-target pairs, including the pairs between new drugs and new targets, into the same space regardless of whether the drug and the target in the concerned pair are in the same subnetwork or not. Consequently, our approach is generally superior to other former approaches.
After performing PCA on feature vectors, we represented all DTPs as points shown by their first three principle components (denoted by
The significant distribution of DTPs in the space allows us to visually investigate the relationship between known DTIs and unapproved DTPs. Therefore, after calculating the distances of all pairs to the origin, we are not only able to build classifiers by training a specific threshold of the distances when testing the performance of our proposed method (refer to Sections
According to the distribution in DTP space, the farther the pair is from the origin, the more possible it is to be a potential interaction. Thus, we only focused on the unapproved drug-target pairs remarkably far away from the origin. In order to validate them, we selected the top five out of them as the interaction candidates in terms of their distance to the origin for each dataset and checked them in popular drug/compound databases, ChEMBL (C), DrugBank (D), and KEGG (K). Since ChEMBL provides the predicted interactions (not approved yet), we only selected the most confident interactions with the score of 1 under the cut-off of 1
The top five predicted interactions of nuclear receptor.
Rank | En | IC | GPCR | NR | ||||
---|---|---|---|---|---|---|---|---|
Validation | Pair | Validation | Pair | Validation | Pair | Validation | Pair | |
1 | D | D05458 |
D, K | D00438 |
C | D03966 |
C, K | D00348 |
|
||||||||
2 | D | D00947 |
— | D00619 |
— | D03966 |
C, K | D00348 |
|
||||||||
3 | — | D00039 |
— | D00816 |
— | D01346 |
— | D01132 |
|
||||||||
4 | — | D00437 |
D | D00619 |
K | D00442 |
C | D00348 |
|
||||||||
5 | — | D03365 |
— | D00619 |
— | D00049 |
C | D00348 |
C, D, and K label the validated interactions in ChEMBL, DrugBank, and KEGG, respectively.
In this paper, we have addressed crucial issues in predicting drug-target interactions, which have not yet been solved well by former methods. These issues include the powerless inference in the case of isolated subnetworks, the biased classifiers derived from few positive samples, the need of training a number of classifiers, and the unavailable relationship between known DTIs and unapproved DTPs.
By characterizing each drug-target pair as a feature vector of within-scores and between-scores, our approach has the following advantages: (1) all types of drug-target pairs are treated in a same form, regardless of the available path between drugs and targets; (2) enough positive samples are able to reduce the bias of training model and only one classifier needs to be trained; (3) more importantly, the relationship between known DTIs and unapproved DTPs can be investigated in the same visualized space.
In addition, to capture similarity better, we have introduced a complete metric of topological similarity of drugs/targets by considering both the targets/drugs shared by two drugs/targets and the targets/drugs interacting with none of them. We also have proposed an adaptive combination rule, instead of the former linear combination between topological similarity and chemical/sequence similarity, by considering that the drug/target nodes’ degrees follow the power-law distribution.
Finally, the effectiveness of our approach is demonstrated by comparing with existing popular methods under the cross validation and predicting potential interactions for DTPs under the validation in existing databases.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by Hong Kong Scholars Program (no. XJ2011028) and China Postdoctoral Science Foundation (no. 2012M521803) and was partially supported by NWPU Foundation for Fundamental Research (no. JCY20130137).