Drug-Target Interaction Prediction Based on Adversarial Bayesian Personalized Ranking

The prediction of drug-target interaction (DTI) is a key step in drug repositioning. In recent years, many studies have tried to use matrix factorization to predict DTI, but they only use known DTIs and ignore the features of drug and target expression profiles, resulting in limited prediction performance. In this study, we propose a new DTI prediction model named AdvB-DTI. Within this model, the features of drug and target expression profiles are associated with Adversarial Bayesian Personalized Ranking through matrix factorization. Firstly, according to the known drug-target relationships, a set of ternary partial order relationships is generated. Next, these partial order relationships are used to train the latent factor matrix of drugs and targets using the Adversarial Bayesian Personalized Ranking method, and the matrix factorization is improved by the features of drug and target expression profiles. Finally, the scores of drug-target pairs are achieved by the inner product of latent factors, and the DTI prediction is performed based on the score ranking. The proposed model effectively takes advantage of the idea of learning to rank to overcome the problem of data sparsity, and perturbation factors are introduced to make the model more robust. Experimental results show that our model could achieve a better DTI prediction performance.


Introduction
Drug repositioning is to discover new indications for existing drugs, which means that drug development based on approved drugs does not need to consider the safety and effectiveness of the original drug, effectively reducing the time of drug development process and cost. Prediction of drug-target interaction (DTI) which refers to the recognition of interactions between chemical compounds and the protein targets in the human body has become a key step in drug repositioning [1].
Due to the high cost of conducting animal experiments and clinical trials for a new drug [2], a large number of machine learning-based methods have been widely used in DTI prediction in recent years, and the cost of drug development has been greatly reduced through rapid screening of potential drug-target combinations [3,4].
Existing machine learning-based methods often use the features of drugs and targets for prediction [5,6]. They treat the prediction problem as a binary classification problem [7]. Drug-target pairs with interaction are considered positive samples, while pairs without interaction are treated as negative samples. The output of the binary classification is the label with higher prediction probability [8][9][10]. Bleakley and Yamanishi used a support vector machine (SVM) framework based on bipartite local models (BLM) to predict DTIs [11]. Mei et al. improved the original DTI prediction framework by integrate neighbor-based interaction-profile inferring (NII) into the existing BLM method [12]. Buza and Peška extended the BLM method to predict DTIs by using the hubness-aware regression technique [13]. Laarhoven et al. proposed a Gaussian interaction profiling (GIP) kernel to represent the interactions between drugs and targets [14] and then integrated the weighted nearest neighbor method into it to predict DTIs [15]. Chen et al. proposed a Random Walk with Restart-based method on the heterogeneous network to infer potential DTI [16]. Some studies constructed a heterogeneous network which integrates diverse drug-related information to predicted DTI [17,18]. Thafar et al. utilized graph embedding for DTI prediction [19]. Zhao et al. integrated graph convolutional network and Deep Neural Network to predict DTI [20]. Since the number of positive samples is small, the machine learning-based methods can easily learn to predict unknown samples as negative to reduce the training penalty [3]. Recommendation system is aimed at obtaining accurate prediction results of unknown data even with a small amount of observed data. Considering the problem of data sparseness, learning to rank (LTR) in the recommendation system is able to accurately predict even with a small amount of known data. Therefore, in this study, we defined the DTI prediction problem as a ranking problem. The following paragraph introduces how we define the DTI prediction problem as a ranking problem.
LTR implies a scoring mechanism in which interacting drug-target pairs should have a higher score than those without interaction. In this way, samples with higher scores are treated as interacting drug-target pairs [21,22]. Recently, there are some studies that apply the idea of LTR to predict DTI [23,24]. Bagherian et al. showed that matrix factorization algorithms have outperformed other methods in DTI prediction [25]. Thus, we utilized matrix factorization of LTR to predict DTI in this study. Bayesian Personalized Ranking (BPR) which is a matrix factorization of LTR approach has been shown to be an excellent approach for various preference learning tasks even when data are sparse [26,27].
However, the existing methods do not effectively combine the features of drug and target with the matrix factorization method. Thus, in this study, we propose a DTI prediction model in which BPR is the core and combined gene expression to improve the prediction performance. In the proposed model, the principle of ordering is that interacting drug-target pairs (i.e., positive samples) should be ranked before noninteracting drug-target pairs (i.e., negative samples). Firstly, a set of ternary partial orders is generated based on the positive samples and the negative samples. The set is divided into a training set and a test set. Next, the Adversarial Bayesian Personalized Ranking (ABPR) method is used to train the latent factors of drugs and targets, and the drug-drug similarity and target-target similarity are calculated based on their features, respectively, to improve the training of the latent factors. Finally, for each drug, the inner product of drug's latent factor and target's latent factor is used as the score for ranking. The top-ranked drug-target pairs are predicted with interaction, and the bottom-ranked drug-target pairs are predicted without interaction. This study has the following three contributions: (i) Aiming at the existing problem of DTI prediction, the idea of matrix factorization of LTR is introduced to process a sparse matrix (ii) BPR is not robust and vulnerable to adversarial perturbations on its parameters [28]. Perturbation factors are introduced to make the model more robust (iii) This study also uses the drug and target expression profiles to calculate the drug-drug and target-target similarity, respectively, to improve the training of latent factors Experimental results show that our method is significantly better than the traditional DTI prediction methods, such as Deep Neural Network (DNN) [8,29], Generalized Matrix Factorization (GMF) [30], and other state-of-the-art LTR methods, like Neural Matrix Factorization (NeuMF) [30] and Adversarial Matrix Factorization (AMF) [28].

Data and Definition
2.1. Data Source. The Library of Integrated Network-Based Cellular Signatures (LINCS) project is a mutual fund project administered by the National Institutes of Health (NIH). This project uses L1000 technology to generate approximately one million gene expression profiles [31]. The L1000 technology uses the correlation between gene expressions to drastically reduce the amount of gene expression that needs to be measured, from more than 20,000 to 978. In this study, we use the drug perturbation and gene knockout transcriptome data from seven cell lines including A375, A549, HA1E, HCC515, HEPG2, PC3, and VCAP. There are three reasons to choose drug perturbation and gene knockout transcriptome data as feature data of drugs and targets: (1) both drug perturbation and gene knockout transcriptome data are from LINCS project and are processed by using L1000 technology. So they are naturally suited to be combined as the feature data. (2) There is a correlation between drug perturbation transcriptome data and the drug's target gene knockout transcriptome data. Pabon et al. have verified in their work that drug perturbation-induced mRNA expression profile correlates with the knockout-induced mRNA expression profile of the drug's target gene and/or genes on the same pathway(s) [32]. The correlation reveals drug-target interactions. Therefore, the correlation based on the expression profile suggests that we can treat the expression profiles as feature data for dual similarity regularization. (3) Transcriptome data can capture the complexity of drug activity in cells. So the use of information obtained from transcriptional profiling studies has a huge impact on multiple areas of the drug discovery including target identification, validation, compound selection, pharmacogenomics, biomarker development, clinical trial evaluation, and toxicology [33].
DrugBank is a comprehensive, freely available web resource containing detailed drug, drug-target, drug action, and drug interaction information about FDA-approved drugs as well as experimental drugs going through the FDA approval process [34]. To obtain complete DTI data, Pub-Chem ID is used as the identifier of drug in the DrugBank and LINCS databases.
The data volume for the seven cell lines is listed in Table 1. The positive drug-target interactions from Drug-Bank are used to generate interacting drug-target pairs. To avoid treating unknown drug-target interactions in Drug-Bank as negative interactions, we constructed the nontarget 2 BioMed Research International set that any member of this set has no interaction record with any drug from the same cell line in DrugBank. That means the pair of a nontarget and a drug from the same cell line could be more likely to be treated as a negative sample.

Problem Definition.
In this study, DTI prediction is defined as a ranking problem of drug-target scores.
, ⋯, t α n g represents the set of n targets and nontargets in cell line α, where t α j = ft α j,1 , t α j,2 , ⋯, t α j,978 g represents the expression profile of j-th target or nontarget.
Definition 3. Y α represents the interaction relationship, and y α i,j ∈ f0, 1g. If y α i,j =1, the pair of the drug d α i and target t α j is a positive sample; otherwise, y α i,j = 0, and the pair of d α i and t α j is a negative sample.
As shown in Table 1, the numbers of drugs, targets, and interacting drug-target pairs in this study are all limited (for each cell line). Therefore, Y α is a small-sized sparse matrix.
All combinations of drug and target with interactions in each cell line are used as positive samples; all drug and nontarget combinations are used to construct a negative sample candidate set. Since the number of negative samples is much larger than the number of positive samples in each cell line, we randomly sampled some negative samples from the negative sample candidate set to ensure that the number of selected negative samples is consistent with the number of positive samples within the same cell line.
Based on the known relationships of drug-target pairs, the score of drug-target pairs is sorted. The drug-target pairs with higher scores are more likely to interact. Conversely, the drug-target pairs with lower scores are more likely not to interact. Therefore, we transformed the DTI prediction problem into a problem that finds out a reasonable ranking strategy for a drug-target pair. In this paper, the methods are discussed in the same cell line, so the superscript α is omitted.

Methods
The proposed method (AdvB-DTI) is based on the method of BPR. Firstly, according to the interaction relationship Y, a ternary partial order set is generated as . H i combines the target t j of one positive sample and the target t k of the corresponding negative sample with the same drug d i into a partially ordered triple ðd i , t j , t k Þ, which means that ðd i , t j Þ should be ranked before ðd i , t k Þ. Then, H is divided into two parts, the training set and test set. Next, based on the training set, BPR is used to train the latent factor matrix of drugs and targets (nontargets). F D represents the latent factor matrix of the drug (F D ∈ ℝ m×f , f is the size of latent factor), F T represents target (nontarget) latent factor matrix (F T ∈ ℝ n×f , f is the size of latent factor). Among them, F D i ∈ ℝ 1×f represents the latent factor of drug d i , and F T j ∈ ℝ 1×f represents the latent factor of target (nontarget) t j . r i,j = F D i •F T j is the predicted score for ranking the interaction of d i and t j .
In order to improve the training of latent factors, we use the dual similarity regularization method based on the similarity theory to increase the latent distance between latent factors to increase the gap between the scores of different drug-target pairs.
Finally, gene expression data of LINCS project were treated as the features of drugs and targets to calculate drug-drug similarity and target-target similarity to improve training latent factors which represented key features of gene expression. Because the gene expression data are the observed values obtained from experiment, thus, the error between the observed value and the true value does exist. Therefore, latent factors of the drug and target (i.e., the model parameters) learned in this study can fluctuate within a certain range but the model's prediction results should be stable. Consequently, the perturbation factor Δ is introduced into the training process of F D and F T to make the trained model more robust. The overall process of model training is shown in Figure 1.
After the model is trained, calculate the value of r i,j for all drug-target pairs, and sort them in a descending order. The top-ranked drug-target pairs are predicted as the interaction, and the bottom ranked drug-target pairs are predicted as the noninteraction. The prediction process is shown in Figure 2. Next, we will introduce the related methods in detail.

Bayesian Personalized
Ranking. BPR is a pairwise LTR method. It learns in an implicit feedback manner through personalized ranking and is widely used in the recommendation systems [26].
As shown in Table 1, the numbers of drugs, targets, and interacting drug-target pairs in this study are all limited (for each cell line). Since one partially ordered triple was generated based on one positive sample and the corresponding negative sample, the number of partially ordered triples is also limited. Therefore, what we faced in this study were not only a small amount of partially ordered triples but also 3 BioMed Research International high-dimensional data. BPR is able to accurately predict even with a small amount of known data [26]. And BPR could map both drugs and targets into a shared low-dimensional latent feature space and to use this representation to calculate the probability of drug-target interactions to overcome the problem of high dimensionality [27].
According to the study of [26], BPR was derived for solving the personalized ranking task that only positive observations are available. In the problem of DTI prediction, only positive drug-target interactions can be directly obtained from the DrugBank database which is a key challenge in the DTI prediction problem. Hence, these advantages make BPR suitable for the DTI prediction problem.
In this study, we use this method to rank the score of drug-target pairs. For where θ denotes the parameters of the model and t j > d i t k denotes that for d i the possibility of interacting with t j is greater than the possibility of interacting with t k . Since the interaction of d i and t j has no interference on the interaction of d i and t k , all drug-target interactions are independent.

Top ranking pairs
Targets & Non-targets latent factors

Drug-target interactions
In order to calculate pðt j > d i t k | θÞ, we use the logistic sigmoid function [26]: where σð•Þ is the logistic sigmoid function and σðxÞ = 1/ð1 + e −x Þ. ðr i,j − r i,k Þ captures the ranking relation between t j and t k with the given d i . If t j is more likely to interact with d i than t k , Any standard collaborative filtering model can be applied to predict the value of ðr i,j − r i,k Þ. Matrix factorization has been successfully applied in many studies [35][36][37]. Thus, the matrix factorization model is used in this study.
Next, consider pðθÞ of formula (1). It is a Gaussian distribution with zero mean and variance-covariance matrix λ θ I [26], where λ θ is a model-specific regularization parameter and I is an identity matrix, so According to formulas (2)-(4), the maximum posterior probability of the BPR method can now be rewritten as where k•k 2 is an L2 regularization term. From the maximum likelihood estimation for parameter θ in formula (5), an equivalent optimization objective formula can be obtained: 3.2. Adversarial Bayesian Personalized Ranking. As mentioned, since the error between the observed value and the true value does exist, in order to enhance the robustness of the model, it is necessary to consider gene perturbations. It is unreasonable to add noise (such as changing the labels of training data) at the input layer. For example, modifying the training data ðd i , t j , t k Þ to ðd i , t k , t j Þ means that the noninteracting drug-target pair ðd i , t k Þ is ranked higher than interacting drug-target pair ðd i , t j Þ. Obviously, the latent fac-tors obtained by such training data are unreasonable. Therefore, it is necessary to add perturbations to the latent factors. For drug and target gene perturbations, we defined it as the perturbation factor that are added to Bayesian Personalized Ranking: where Δ is the gene perturbations on model parameters, ε controls the magnitude of adversarial perturbations, k•k 2 denotes the L2 norm, and θ denotes the current model parameters (i.e., latent factors).
Δ can be optimal by adversarial perturbations Δ adv as follows [28]: Finally, we define the objective function of ABPR as follows: where λ controls the adversarial strength. The training process of AdvB-DTI can be expressed as playing a minimax game: where the learning algorithm for model parameter latent factor θ is the minimizing player, which is aimed at obtaining accuracy prediction results. And the perturbation factor Δ acts as the maximizing player, which is aimed at identifying the worst-case perturbations against the current model. Finally, by playing this minimax game, it is able to make the model robust and simulate the error.

Dual Similarity Regularization.
In the process of latent factors training, when drugs or targets are similar, their latent distance should be small. Conversely, when drugs or targets are different, their latent distance should be large. In order to meet this requirement, dual similarity regularization was introduced into this process.
In order to effectively combine the features of drugs and targets with matrix factorization methods, a Gaussian function needs to be introduced. Through this function, the features of drugs and targets can effectively influence the training of latent factors. Zheng et al. made the point that this function is sensitive to the latent distance of similarity between different drugs or targets [38]. The similarity between drugs (or targets) is negatively related to their latent distance. The function is defined as where S D denotes drug-drug similarity matrix (S D ∈ ℝ m×m ), k•k 2 denotes latent distance, and Simð•Þ is a similarity calculation method.
Similarly, we can obtain where S T denotes target-target similarity matrix (S T ∈ ℝ n×n ).
Commonly used similarity calculation methods include cosine similarity, Tanimoto coefficient, structural similarity index, and Spearman's rank correlation coefficient.
Tanimoto coefficient is an extension of Intersection over Union. It can be used to measure the similarity of nonbinary features. It calculates the degree of correlation based on the magnitude of the feature vector. The closer the calculation result is to 1, the more similar the two vectors are. It is defined as Cosine similarity is determined by the angle between two vectors. The smaller the angle is, the more similar the two vectors are. It is defined as Structural similarity index is a common similarity calculation method used in computer vision to measure image quality [39]. It is defined as where μ is the mean, σ 2 is the variance, σ xy is the covariance, and c 1 = 0:001 and c 2 = 0:001 are constants to avoid the denominator being 0. The closer the calculation result is to 1, the more similar the two vectors are. Since technologies originating from computer vision have been widely used in DTI prediction in recent years, we attempt to use these methods to calculate the similarity between drugs and targets. Originally, μ is used as an estimate of the image brightness, σ 2 is an estimate of the image contrast, and σ xy is the measure of the similarity of the image structure. In our problem, μ is used as an estimate of the amount of change in gene expression, σ 2 is used as an estimate of the relative change in gene expression, and σ xy is used as an estimate of the change trend in gene expression.
Spearman's rank correlation coefficient is a similarity calculation method based on the ranking of feature data. It is defined as where g i is the difference in the ranks of x i and y i and the size of features is n. For example, if x = ð1, 0, 3Þ and y = ð1, 5, 2Þ, then the rank of x = ð2, 1, 3Þ and y = ð1, 3, 2Þ, thus g = ð1,−2, 1Þ. Similarly, the closer the similarity value is to 1, the more similar the two vectors are.
Because the Gaussian function is a numerically "sensitive" function, which means it can increase the impact of similarity on latent factor training. Thus, it can extend the latent distance between drugs (or targets) to increase the scores of different ðr i,j − r i,k Þ, which is to increase the penalty for wrong rankings and optimize the training latent factors.
We use stochastic gradient descent to optimize the final objective formula: where λ adv and λ sim are adversarial and similar hyperparameters, respectively.

Experiment and Analysis
The experiments are designed to answer the following three questions:  (18): BioMed Research International The set of interacting drug-target pairs is called the positive set, and the set of noninteracting drug-target pairs is called the negative set. One drug-target pair is randomly selected from the positive set and the negative set, respectively. AUC means the probability that the model correctly predicts that the score of the drug-target pair from the positive set is larger than that of the drug-target pair from the negative set. AUC can better reflect the overall performance of the model. The larger the value of AUC is, the better the performance of the model is.
Topk i means for drug d i , among the k top-ranked drugtarget pairs, the proportion of targets that interact with d i in all the targets that interact with d i , which is defined as Top_k is the average of all Top_k i ð1 ≤ i ≤ m). This assessment metric is equivalent to the recall rate. Top_k is defined as The meaning of prec_k i is, for drug d i , among the k topranked drug-target pairs, the proportion of targets that interact with d i . Its definition is shown in prec_k is the average of all prec_k i ð1 ≤ i ≤ m). This assessment metric is equivalent to the precision rate. prec_k is defined as With different k values, drug d i has different (Top_k i , prec_k i ) pairs. Connecting all (Top_k i , prec_k i ), we can obtain a curve. The area enclosed by the obtained curve and the coordinate axes is the AUPR i of d i . AUPR i is also a comprehensive assessment metric, which is defined as AUPR calculates the average of all AUPR i ð1 ≤ i ≤ m). The closer the value is to 1, the better the model performance. It is defined as 4.2. Results and Analysis. We adopted 5-fold nested crossvalidation to evaluate the performance of the proposed method, which means that when analyzing the impact of hyperparameters, we only utilized the training set. For fair comparison, we tuned the parameters of each method so that they could achieve the best performance in comparison. The hyperparameters used in the experiments and their values are listed in Table 2.
Matrix factorization methods demonstrated their power and versatility in bioinformatics, for example, in the prediction of disease subtype alignment [41], drug repositioning [42], and protease target prediction [37]. Thus, we treat a state-of-the-art method which predicts DTI via DNN [8] as baseline and compare it with other state-of-the-art matrix factorization methods [28,30]. Table 3 lists the results of comparative experiments of different similarity calculation methods performed independently in the seven cell lines. Four different methods were used for comparison.

Comparative Experiment of Different Similarity Calculation Methods.
From Table 3, it can be found that the prediction results of Tanimoto coefficient are better than those of the other three methods in seven cell lines. The performance based on Spearman's rank correlation coefficient is second to that of the Tanimoto coefficient in this experiment, and they are very close. The traditional cosine similarity calculation method was unstable in the experiment, and AUC is under 90% in cell lines A549 and HEPG2. The prediction performance of structural similarity index is similar to that of Spearman's rank correlation coefficient. Except cosine similarity, three similarity calculation methods all consider the value of the features in calculating the similarity. Cosine similarity only considers the angle between vectors. If two feature vectors have the same direction, they are considered similar regardless of value of the features. From the results of cosine similarity, it can be inferred that ignoring feature values may cause poor prediction performance. Therefore, based on the above results, Tanimoto coefficient is more suitable to the prediction problem. Figure 3 reflects the relationship between the number of latent factors and the result of Top_10. For example, when factor_size = 5, Top_10 ≈ 0:5. It means that ten top-ranked drug-target pairs of a particular d i predicted by the model contain about half of all interacting drug-target pairs of this drug (i.e., the recall rate is about 0.5). The meaning of latent factors is to map high-dimensional feature vectors to lowdimensional latent space and capture the implicit features of gene expression. The larger the size of the lowdimensional latent space, the more sufficient the feature information of the original high-dimensional drug and target expression can be that can be extracted. That is why the value of Top_10 significantly rises with the increase of the latent factor size. As shown in Figure 3, when the size of the latent factor increases to a critical size (e.g., factor_size > 25), the feature information is almost completely extracted, and the performance of AdvB-DTI becomes stable. Figure 4 shows the impact of λ sim on the values of AUC. When dual similarity regularization was not used (i.e., λ sim = 0), the values of AUC are lower than those using this method, which indicates that the method can improve the prediction performance.

Impact of Different Settings of Hyperparameters.
Firstly, how does dual similarity regularization improve the training of latent factors? r i,j is the score to rank. The ranking interval between different drug-target pairs is calculated by the difference of different scores. If λ sim is set to a larger value, the latent distance between the drug and the target will also become large, and the same thing happens to different scores. Therefore, making the interval between different drug-target pairs increase will aggravate the penalty for the model when ranking errors occur during the training process. Thus, dual similarity regularization improves the training of latent factors.
Secondly, how to select a proper value for λ sim ? The difference in r i,j between different drug-target pairs increases with λ sim . Thus, the interval between different rankings increases. In cell lines with fewer positive samples, the model parameter θ will not be too large and increasing λ sim can effectively improve the prediction performance. However, in cell lines with more positive samples, increasing λ sim means that θ needs to increase beyond the limit of its regular term kθk 2 , so the model will be underfitting and the value of AUC decreases, as shown in Figure 4. AUC increases with λ sim but decreases when λ sim is greater than a critical value.
Therefore, in a cell line with fewer positive samples, a larger λ sim will improve the prediction performance; however, in a cell line with more positive samples, a smaller λ sim is suitable.    Figure 4: Impact of λ sim on AUC. AUC increases with λ sim but decreases when λ sim is greater than a critical value. 8 BioMed Research International In HEPG2 cell line, the number of positive samples is the smallest among the 7 cell lines. In PC3 cell lines, the number of positive samples is the largest among 7 cell lines. Therefore, in this experiment, we select these two cell lines as representatives to study the impact of λ adv on prediction performance. In Figures 5(a) and 5(b), the curve of λ adv = 0 represents that ABPR was not used in the model, and the other curves represent that ABPR was used in the model. In the early stages of training, the values of AUPR by using ABPR are better than those by not using ABPR. This is because when using ABPR, the parameters of the model could change within a certain range without changing the past prediction results, that is, learning new knowledge without forgetting the knowledge learned in the past. Thus, the prediction performance of the model can be effectively and quickly improved in the early stages of model training. Using ABPR as far as possible, the better performance will be obtained in the early stage of training.
Because of using Dual Similarity Regularization, the difference of scores of different drug-target pairs will increase; that is, the model parameters can withstand a certain range of perturbations to improve the model prediction performance. However, when the value of λ adv exceeds a certain range, due to the constraints of the regular terms of the model parameters, they cannot resist excessive perturbations, which leads to the model being underfitted. Therefore, if λ adv is given a large value, the model converges fast. The upper bound of model convergence depends on the ability of model parameters to resist the perturbations, which can be verified in the PC3 cell line. As shown in Figures 5(a) and 5(b), the larger λ adv is, the lower the upper bound of model convergence. When λ adv = 0:3, the model obtained the best prediction performance.

Comparison with Other
Methods. AdvB-DTI was compared with other state-of-the-art methods, and the prediction performances are listed in Table 4. The comparison methods include DNN [8], GMF [30], NeuMF [30], and AMF [28].
Xie et al. used a DNN framework [8] for DTI prediction based on transcriptome data in the L1000 database gathered from drug perturbation and gene knockout trials. We used the same configurations for DNN training.
NeuMF [30] is a deep learning matrix factorization framework for recommendation task with implicit feedback. In this method, DNN's input layer is defined as a latent vector instead of drug and target features. It is an improvement of GMF and DNN. To compare with NeuMF and GMF fairly, our model uses the same number of latent factors as NeuMF and GMF. AMF [28] is a state-of-the-art approach designed for item recommendation with users' implicit feedback. It introduces the concept of ABPR and improves the method of BPR [26].
The results of DNN are used as baseline in Table 4. Since the DTI data are too sparse that each drug only has interactions with few targets, and DNN needs sufficient data for training, the performance of DNN is not attractive. DNN utilizes the transcriptome data as drug and target's feature. However, the transcriptome data has much noise, which also limits its performance. As shown in Table 4, other state-ofthe-art matrix factorization methods' performances are better than that of the baseline.
When comparing AdvB-DTI with other state-of-the-art matrix factorization methods (NeuMF, GMF, and AMF), we could observe that only utilizing the relationship of drug and target could not guarantee an ideal prediction performance and efficiently exploiting the similarity of drug-drug and target-target will has a positive impact on the performance.
Notice that the performance of AMF is only second to that of AdvB-DTI. It demonstrates that adding perturbations to latent factors could make model learn noise, rather than utilize noise data to train model like DNN. That is the reason 9 BioMed Research International that AMF could achieve a better performance than other models except AdvB-DTI.
NDCG is mainly used for evaluating ranking methods [43]. As our model is a ranking method, we compared AdvB-DTI with AMF, which has the best performance in Table 4 except AdvB-DTI, as shown in Table 5. It can be seen from the results that AdvB-DTI outperforms AMF and it is verified that AdvB-DTI can effectively deal with the class imbalance problem and the problem of data sparsity.
Finally, we compared the computing resource consumption of these methods. All the algorithms were written using Python programming language and operated on a computer (Ubuntu 16.04.4 LTS, Core i9-7900X CPU, 3.3 GHz, 128 GB memory space). The algorithms were executed by CPU. We conducted 10 experiments in the cell line of A549, and each experiment concurrently executed 10 training procedures with 5-fold cross-validation. The average results are shown in Table 6.
It can be found that DNN has the largest memory cost because of its many parameters. GMF is a traditional matrix decomposition framework with simple structure and few parameters, so its memory cost is minimum. NeuMF is the framework of matrix decomposition combined with neural network, so its memory cost is slightly higher than that of GMF. AdvB-DTI improves AMF and NeuMF improves GMF. Comparing the two groups of models based on Tables 4 and 6, it can be found that the convergence time of the model is related to its final prediction performance, and the improvement of model performance may lead to the increase of training time. In addition, the neural network-based methods, such as DNN and NeuMF, take up a lot of CPU resources.
In summary, AdvB-DTI efficiently utilizes the similarity of drug-drug and target-target and the relationship of drugs and targets to train latent factors for drugs and targets to improve DTI prediction performance.

System Analysis of AdvB-DTI
After the comparison with other methods, we utilize top 1% of all the prediction results to demonstrate the strength of our method to predict novel DTIs. In order to verify our model, all the known DTIs which have been utilized in our model are removed for discussion in this section and the following analysis is in A375.
We used r i,j to rank all predicted DTIs and calculated pair counts that overlap between the predicted results and the interactions from other databases. Then, we counted the number of overlapping pairs in the sliding bins of 500 consecutive interactions (as shown in Figure 6). It suggests that our model can predict novel DTIs validated by known knowledge in other databases. Considering that DTIs in CTD database are curated from the published literature, these interactions are both direct (e.g., "chemical binds to protein") and indirect (e.g., "chemical results in increased phosphorylation of a protein" via intermediate events); it is reasonable that CTD database covers a wider variety of drug-target interactions than other DTI databases.

Enrichment Analysis.
In this study, the DrugBank database is considered the gold standard. The drug-target interactions from the DrugBank database are the most accurate and strict drug-target interactions. Besides the DrugBank database, there are some other databases containing a large amount of drug-target interaction data. These drug-target interaction data are much larger than the gold standard we used. Therefore, we compare our prediction results with the   In order to characterize and quantify the appearance of predicted drug-target relationships (and known drug-target interactions) in other databases, we used the enrichment score and P value.
We calculated enrichment score (ES) as follows: where k is the number of predicted drug-target interactions that appear in the specified database (or the number of known drug-target interactions (i.e., drug-target interactions in our gold standard) that appear in the specified database); N is the number of all possible interactions between the drug set and the target set, that is, the drug-target interactions when the drug set and the target set are fully connected; n is the number of predicted drug-target interactions (or the number of known drug-target interactions in our gold standard); and m is the number of drug-target interactions in a specific database. And the interactions mentioned above only concern drugs and targets present in the gold standard.
Then, we used the hypergeometric distribution to calculate the P value as follows: FDR correction is used to correct the P values for multitesting [50].
As shown in Table 7, the known drug-target interactions and the drug-target interactions predicted using AdvB-DTI are significantly enriched on other datasets except for the STITCH database. Obviously, the known drug-target inter-actions (drug-target interactions in our gold standard) have larger enrichment scores and smaller P value than predicted drug-target interactions.
The results indicate that the drug-target interactions predicted by AdvB-DTI can be verified on other DTI datasets and have a potential practical value.

Drug Treatment Property.
Drug ATC (Anatomical Therapeutic Chemical) label, which reflects drugs' therapeutic, pharmacological and chemical properties, is an important label of drugs. By comparing the distribution of drug ATC label in the known drug-target interactions and that of drug ATC label in the predicted drug-target interactions, we can find out which type of drug is more likely to be predicted to be associated with targets.
The distribution of drug ATC label in the known drugtarget interactions and that of drug ATC label in the predicted drug-target interactions are illustrated in Figures 7(a) and 7(b). The relative ratio between known and predicted DTIs for each ATC label is shown in Figure 7(c). If there are 25% of drugs with ATC label A in the gold standard and 50% of drugs with ATC label A in the prediction result, the relative ratio is 0:25/0:5 = 0:5. The smaller the ratio, the more potential the drugs with that specific ATC label has to target proteins. So, the drugs with that specific ATC label should be studied further for broader use.
In Figure 7, the distributions of drug ATC labels for the gold standard and for the predictions (note that only the top 1% of all prediction results are taken) are almost the same. Notably, drugs with ATC label "B" (Blood and Blood Forming Organs) have a low relative ratio. In addition to A375, in most other cell lines, we also predicted more targets for drugs with ATC label "B". The result suggests that drugs with ATC label "B" have more potential to target proteins and should be studied further for broader use.

Case Study
To illustrate the reliability of the prediction results of AdvB-DTI, we studied several cases in this section. These examples are all from our prediction results.
Olomoucine (CID: 4592) is a cyclin-dependent kinase inhibitor. For Olomoucine, its predicted target is MAPK3 through AdvB-DTI.   [51]. By observing whether the edges (between two proteins) exist or not, we can judge whether drug known targets and predicted targets are neighbors in the PPI network. The closer two proteins are in the PPI network, the more likely they share the same functionality. Therefore, if the predicted targets are neighbors to (c) Figure 7: Distribution of ATC labels between DTIs in the known (a) and predicted (b) interactions. The relative ratio between known and predicted DTIs for each ATC label is shown in the right panel. ATC labels include the following: A-alimentary tract and metabolism; B-blood and blood-forming organs; C-cardiovascular system; D-dermatological; G-genitourinary system and sex hormones; H-systemic hormonal preparations, excluding sex hormones and insulins; J-anti-infectives for systemic use; L-antineoplastic and immunomodulating agents; M-musculoskeletal system; N-nervous system; P-antiparasitic products; R-respiratory system; S-sensory organs; and V-several others.
the known targets of drugs, they might be targeted in the same way as known targets and the prediction results would be relatively reliable. Indeed, recent research has shown that MAPK3 can be substantially inhibited by Olomoucine [52,53]. This indicates that MAPK3 may be a novel target of Olomoucine.
Drug acetylsalicylic acid (commonly known or available as Aspirin, CID: 2244) is used for the treatment of pain and fever due to various causes. For acetylsalicylic acid, its predicted target is cyclin-dependent kinase-2 (CDK2) through AdvB-DTI.
CDK2 (Entrez ID: 1017) is a neighbor to two known targets of acetylsalicylic acid in the PPI network (Entrez IDs: 7157, 6256). Recent research has shown that CDK2 may be a novel target of acetylsalicylic acid [54]. This verifies our prediction.
CDK2 is a member of protein kinase family. It plays an important role in regulating various events of eukaryotic cell division cycle. Accumulated evidence indicated that overexpression of CDK2 should cause the abnormal regulation of cell-cycle, which would be directly associated with hyperproliferation in cancer cells [55]. Moreover, the examination of different kinds of human cancers, with defined molecular features, for their susceptibility to CDK2 inhibition has unveiled the scope in which CDK2 might represent a good therapeutic target [56][57][58][59][60][61][62][63].
Based on the above information, we speculate that acetylsalicylic acid, which is predicted to target CDK2, may have potential anticancer effects. Interestingly, the results of various studies have demonstrated that long-term use of acetylsalicylic acid may decrease the risk of various cancers, including colorectal, esophageal, breast, lung, prostate, liver, and skin cancer [64]. The predicted target CDK2 explains acetylsalicylic acid's anticancer effect to some extent.
Next example is the drug Panobinostat. Panobinostat (CID: 6918837) is an oral deacetylase (DAC) inhibitor approved on February 23, 2015, by the FDA for the treatment of multiple myeloma. It acts as a nonselective histone deacetylase inhibitor (HDACi).
Histone deacetylase inhibitors (HDACis) are promising agents for cancer therapy. However, the mechanism(s) responsible for the efficacy of HDACi have not yet to be fully elucidated [65].
In this study, we predicted that Panobinostat's target is ATF3 through AdvB-DTI.
ATF3 (Entrez ID: 467) is a neighbor to six known targets of Panobinostat in the PPI network (Entrez IDs: 3065, 10013, 83933, 9759, 10014, 8841). As a proapoptotic factor, it plays a role in apoptosis and proliferation, two cellular processes critical for cancer progression [66][67][68]. And ATF3 has been postulated to be a tumor suppressor gene because it coordinates the expression of genes that may be linked to cancer [69].
Recent research has shown that ATF3 plays an important role in HDACi-induced apoptosis in multiple cell types [70]. HDACi can induce upregulation of ATF3 expression, thus eliciting the antitumor response [71]. Therefore, Panobinostat, as a HDACi, may treat myeloma by targeting ATF3.
Another interesting case is caffeine. Caffeine (CID: 2519) is a widely consumed pharmacologically active product. It can be used for a variety of purposes, including the short-term treatment of apnea of prematurity in infants and pain relief and to avoid drowsiness [72].
PTGS2 is one of two cyclooxygenases in humans. As a proinflammatory gene, it plays an important role in inflammation. Recent research has shown that caffeine treatment can reduce the expression of proinflammatory genes, including PTGS2 [73]. And caffeine can bind to PTGS2 acetaminophen complex with high energy, therefore modulating PTGS2 inhibition [74]. Furthermore, upregulation of PTGS2 is a critical oncogenic pathway in skin tumorigenesis. Han et al. verified that caffeine could block UVB-induced PTGS2 upregulation [75]. All these studies show that PTGS2 is a potential target for caffeine.
PPARG, another predicted target, is a ligand-activated transcription factor and important modulator for inflammation and lymphocyte homeostasis. There is also a study showing that PPARG were suppressed even with a low caffeine dose [76]. This suggests that PPARG is also a potential target for caffeine.
The above cases illustrate that our prediction results have a potential practical value and can provide clues to the analysis of the mechanism of action of certain drugs.

Conclusion
In this paper, we propose a DTI prediction framework named AdvB-DTI. Based on Bayesian Personalized Ranking, it uses the method of matrix factorization to predict DTIs. In order to solve the problem of existing DTI prediction methods based on matrix factorization, the proposed method combines the features of drugs and targets with the matrix factorization method. The advantage of this method over other similar methods is that BPR is combined with the perturbation factor and dual similarity regularization to make the model more robust and the training results more accurate. Experimental results verify that AdvB-DTI efficiently utilizes the similarity of drug-drug and target-target and the relationship of drugs and targets to train latent factors for drugs and targets to improve DTI prediction performance.
This study has the following positive impacts on the biomedical research.
Firstly, by integrating transcriptome data from drugs and genes, our model provides a practically useful and efficient tool for DTI prediction. The results of our study demonstrate that our method could discover reliable DTIs, thereby reducing the size of the search space for wet experiments and improving the drug discovery process.
Secondly, effective DTI prediction is achieved based on the transcriptome data. Our model used drug perturbation and gene knockout transcriptome data from the L1000 database of the LINCS project. Because the cost of experiments in LINCS project is relatively low, our prediction based on LINCS data not only ensures high accuracy but also has low cost.

BioMed Research International
Thirdly, our effective predictions verify that there is indeed a correlation between drug perturbation and the drug's target gene knockout at the transcriptional level. This correlation not only provides a basis for high-precision drugtarget predictions but also provides a transcriptional perspective for the interpretation of drug mode of action. The correlation can also provide clues for future drug discovery.