Drug discovery is a costly process which usually takes more than 10 years and billions of dollars for one successful drug to enter the market. Despite all the safety tests, drugs may still cause adverse reactions and be restricted in use or even withdrawn from the market. Drug-induced liver injury (DILI) is one of the major adverse drug reactions, and computational models may be used to predict and reduce it. To assess the computational prediction performance of DILI, we curated DILI endpoints from three databases and prepared drug features including chemical descriptors, therapeutic classifications, gene expressions, and binding proteins. We trained machine-learning models to predict the various DILI endpoints using different drug features. Using the optimal feature sets, the top-performing models obtained areas under the receiver operating characteristic curve (AUC) around 0.8 for some DILI endpoints. We found that some features, including therapeutic classifications and proteins, have good prediction performance towards DILI. We also discovered that the severity of DILI endpoints as well as the selection of negative samples may significantly affect the prediction results. Overall, our study provided a comprehensive collection, curation, and prediction of DILI endpoints using various drug features, which may help the drug researchers to better understand and prevent DILI during the drug discovery process.
The drug discovery process is both time-consuming and costly. It typically takes 10-17 years and costs $2.6 billion to develop a new drug [
However, predicting DILI is a challenging task since DILI involves different types of mechanisms such as direct hepatotoxicity, immune reactions, and mechanisms that are not completely understood [
The workflow of this study is shown in Figure
The workflow of this study. We collected drug features from various databases including DrugBank, LINCS, and WHO’s ATC database and curated DILI labels from DrugDex, DrugPoints, and DailyMed for oral drugs. We split 20% of the dataset as an independent test set and used the remaining 80% for ten-fold cross-validations. We generated or collected the drug features and developed two types of models, logistic regression (LR) and random forest (RF), using different combinations of parameters and used the best parameters for independent tests.
For feature processing, we categorized some continuous features into bins referring previous studies [
The relationship between oral drugs and different types of DILI endpoints was extracted and curated from three databases, DrugDex, DrugPoints, and DailyMed, referring the extraction methods and criteria from previous studies [
For each endpoint, we defined two types of negative samples, NSap1 and NSap2. For a given hADR endpoint, NSap1 is defined as drugs that have no reported hepatotoxic reaction for the specific endpoint, while NSap2 is defined as drugs that have no reported hepatotoxic reaction across all endpoints. According to these definitions, NSap2 is a “cleaner” subset of NSap1.
For each dataset, we randomly held 20% as an independent test set and used the remaining 80% for training and validation. In this study, we trained two classifiers, logistic regression and random forest, using the scikit-learn package in Python. To minimize the data imbalance problem, the “class weight” parameter of each model was set to “balanced.” For each classifier, the best model parameters were selected by grid search based on areas under the receiver operating characteristic curve (AUC) during 10-fold cross-validations. Then, the model with the best parameters was evaluated on the independent test set.
Since we have two types of negative samples, NSap1 and NSap2, to find out whether the two types of negative samples had an impact on the model performance, we performed paired
We trained two types of classifiers, logistic regression and random forest, to predict different DILI endpoints using different types of features for drugs in the DrugDex, DrugPoints, and DailyMed databases. 10-fold cross-validations and independent tests were conducted to estimate model performance on the three databases. The AUC values of 10-fold cross-validations on the datasets using best parameters were visualized by heat map in Figure
AUC values of different sets of features and DILI endpoints using random forest for drugs in the DrugDex database during 10-fold cross-validations. In the table, each row represents a set of drug features, each column represents a DILI endpoint and the negative sample set (NSap1 vs. NSap2), and each cell represents an AUC value (colored by its value). For DrugDex, there are seven DILI endpoints (fatal hADRs, liver failure, liver transplantation, jaundice, biomarker increase, hepatomegaly, and hepatitis). They were categorized as “severe hADRs” and “less severe hADRs.”. “All hADRs” include all DILI endpoints.
Like the previous study [
ATC codes are hierarchical therapeutic classifications of drugs. A previous study has identified associations between drug indications and side effects [
According to the DILIN prospective study [
The gene expression features used in this study [
To explore the importance of protein features in predicting DILI, we trained models to predict various DILI endpoints using drug-binding proteins including targets, carriers, transporters, and enzymes. We found that using a single type of protein features alone, the models obtained various results with the highest AUC value around 0.8. Meanwhile, combining all types of protein features could improve model performance even more. Additionally, we found combining the protein features with the chemical fingerprints or molecular descriptors could significantly improve the performance of just using chemical fingerprints or molecular descriptors in most cases of DrugDex and DrugPoints and some cases of DailyMed (Table
Paired
Logistic regression | Random forest | ||||
---|---|---|---|---|---|
Database | Features | ||||
DrugDex | ECFP6 fingerprints | -3.51 | 1.96 | -2.48 | 1.80 |
PubChem fingerprints | -3.09 | 5.38 | -2.56 | 1.48 | |
Standard fingerprints | -3.32 | 2.86 | -2.26 | 2.94 | |
Constitutional descriptors | -2.12 | 4.35 | -2.96 | 5.41 | |
Electronic descriptors | -4.44 | 1.14 | -6.10 | 7.04 | |
Geometrical descriptors | -5.75 | 4.22 | -8.30 | 6.47 | |
Hybrid descriptors | -3.50 | 1.90 | -8.79 | 5.96 | |
Topological descriptors | -2.35 | 2.43 | -1.93 | 6.11 | |
All fingerprints | -2.34 | 2.68 | -1.94 | 5.95 | |
All descriptors | -2.63 | 1.29 | -2.48 | 1.78 | |
All combined | -10.25 | 2.76 | -10.56 | 3.79 | |
DrugPoints | ECFP6 fingerprints | -2.06 | 5.60 | -2.99 | 8.91 |
PubChem fingerprints | -3.26 | 9.78 | 0.10 | 9.19 | |
Standard fingerprints | -2.66 | 2.10 | -2.49 | 2.51 | |
Constitutional descriptors | -3.20 | 4.97 | -2.18 | 4.28 | |
Electronic descriptors | -3.31 | 5.00 | -3.51 | 2.98 | |
Geometrical descriptors | -5.42 | 4.06 | -5.21 | 6.70 | |
Hybrid descriptors | -4.80 | 9.79 | -2.31 | 3.55 | |
Topological descriptors | -4.04 | 8.19 | -3.04 | 7.08 | |
All fingerprints | -2.41 | 2.75 | -2.03 | 5.80 | |
All descriptors | -4.61 | 3.56 | -2.35 | 3.08 | |
All combined | -10.13 | 2.42 | -7.30 | 1.04 | |
DailyMed | ECFP6 fingerprints | -0.79 | 4.50 | -0.31 | 7.62 |
PubChem fingerprints | -2.24 | 7.56 | -0.35 | 7.37 | |
Standard fingerprints | 0.00 | 1.00 | -0.85 | 4.19 | |
Constitutional descriptors | -0.94 | 3.80 | -1.56 | 1.53 | |
Electronic descriptors | -1.25 | 2.58 | -1.65 | 1.30 | |
Geometrical descriptors | -2.10 | 8.66 | -4.80 | 7.95 | |
Hybrid descriptors | -2.81 | 3.74 | -1.49 | 1.79 | |
Topological descriptors | -0.27 | 7.97 | -0.26 | 8.00 | |
All fingerprints | 0.10 | 9.26 | -0.23 | 8.24 | |
All descriptors | -0.90 | 3.97 | -0.56 | 5.87 | |
All combined | -3.16 | 2.06 | -2.88 | 4.74 |
For each
In this section, we did network and pathway analyses of the protein features using the DrugDex database as an example. To find out which proteins and pathways are important to DILI prediction, we calculated the Gini importance values for the protein features using ExtraTrees [
For fatal hADRs as the endpoint, (a) the network of proteins according to the feature importance and (b) KEGG pathway analysis of important protein features. In (a), each protein is represented by its gene symbol. The node size represents feature importance of protein to DILI models. The line thickness presents the combined score made by the STRING database. In (b), the important protein features were selected and analyzed by Cytoscape ClueGO using KEGG pathways. The stars indicate the significance levels for the enrichment tests.
We also used the ClueGO plugin in Cytoscape [
We compared the AUC values of all the features between the endpoints of severe hADRs and less severe hADRs and found the models mostly performed better on severe hADRs (Table
Paired
Logistic regression | Random forest | |||
---|---|---|---|---|
Database | ||||
DrugDex | 2.51 | 1.77 | 3.72 | 8.13 |
DrugPoints | 3.36 | 1.92 | 1.73 | 9.18 |
DailyMed | -0.07 | 9.45 | 5.16 | 2.41 |
For each endpoint, the AUC score vectors of model performance on all features were paired up and compared.
To elucidate the differences of selecting negative samples in DILI model performance, we prepared two types of negative drugs in three databases, NSap1 and NSap2. In general, the models performed better using NSap2 as negative samples compared to NSap1 (Figure
Paired
Logistic regression | Random forest | ||||
---|---|---|---|---|---|
Database | Features | ||||
DrugDex | Fatal hADRs | -3.80 | 7.69 | -2.83 | 7.53 |
Liver failure | -3.33 | 2.46 | -1.51 | 1.40 | |
Liver transplantation | -2.33 | 2.63 | -2.50 | 1.69 | |
Jaundice | -3.10 | 4.04 | -3.69 | 1.01 | |
Biomarker increase | -2.76 | 9.05 | -0.59 | 5.60 | |
Hepatomegaly | -0.35 | 7.28 | -0.72 | 4.77 | |
Hepatitis | -3.15 | 3.52 | -3.00 | 4.70 | |
All hADRs | -0.12 | 9.02 | -0.03 | 9.78 | |
Severe hADRs | -3.65 | 1.06 | -0.68 | 5.00 | |
Less severe hADRs | -2.74 | 9.73 | -0.58 | 5.65 | |
DrugPoints | Liver failure | -0.82 | 4.20 | 0.42 | 6.75 |
Jaundice | -0.11 | 9.15 | 1.18 | 2.47 | |
All hADRs | -0.81 | 4.21 | 0.04 | 9.67 | |
Severe hADRs | -1.37 | 1.78 | -0.03 | 9.74 | |
Less severe hADRs | 0.85 | 4.01 | -0.41 | 6.81 | |
DailyMed | All hADRs | 0.00 | 1.00 | 0.00 | 1.00 |
Severe hADRs | 5.22 | 6.75 | -0.60 | 5.50 | |
Less severe hADRs | 1.41 | 1.72 | 10.04 | 1.57 |
For each endpoint, the AUC score vectors of model performance on all features were paired up and compared.
Defining an accurate negative set is important to study DILI; however, different sources may lead to different negative sets. Zhu and Li [
In this study, we collected different types of drug features, including chemical fingerprints, molecular descriptors, binding proteins, gene expression, and therapeutic classifications, and collected the DILI endpoints from three databases, DrugDex, DrugPoints, and DailyMed. We trained machine-learning models to predict the DILI endpoints using the various features. The models were assessed via 10-fold cross-validations, and the results were analyzed by different types of features and endpoints. We found that
the features of ATC codes or binding proteins may have significant implications for prediction performance. Analyzing the important protein features using networks and pathways may elicit potential insights regarding DILI mechanisms severe liver injury, such as fetal hADRs, severe hADRs, and liver failure, had better prediction performance compared to nonsevere endpoints the selection of negative samples had an impact on DILI prediction. Clean negative samples of drugs without any DILI information in their labels may produce better performance for DILI predictions
We also provided all the curated DILI labels from three databases. We believe our study provides valuable information and comprehensive evaluations for computational DILI prediction and may help researchers to better understand DILI and improve drug safety.
The data used to support the findings of this study are available from the article and supplementary information file.
Heng Luo present address is BenevolentAI, 1 Dock 72 Way, 7th Floor, Brooklyn, NY 11205 , USA.
The authors declare that there is no conflict of interest.
Xiaobin Liu and Danhua Zheng contributed equally to the study.
This work was supported by Funds of the Joint Plan for Health Education in Fujian (#WKJ2016-2-25), the Project in Fuzhou Science and Technology Bureau (#2018-G-49), and the National Natural Science Foundation of China (81971837).
Supplementary Figure 1: AUC values of different sets of features and DILI endpoints using logistic regression for drugs in the DrugDex database during 10-fold cross-validations. Supplementary Figure 2: AUC values of different sets of features and DILI endpoints using logistic regression for drugs in the DrugPoints database during 10-fold cross-validations. Supplementary Figure 3: AUC values of different sets of features and DILI endpoints using random forest for drugs in the DrugPoints database during 10-fold cross-validations. Supplementary Figure 4: AUC values of different sets of features and DILI endpoints using logistic regression for drugs in the DailyMed database during 10-fold cross-validations. Supplementary Figure 5: AUC values of different sets of features and DILI endpoints using random forest for drugs in the DailyMed database during 10-fold cross-validations. Supplementary Figure 6: for the other DILI endpoints in DrugDex, the network of proteins according to the feature importance (a), and KEGG pathway analysis of important protein features (b). Supplementary Table 1: DILI endpoints curated from DrugDex. Supplementary Table 2: DILI endpoints curated from DrugPoints. Supplementary Table 3: DILI endpoints curated from DailyMed. Supplementary Table 4: AUC values of different sets of features and DILI endpoints for drugs in the DrugDex database during an independent test. Supplementary Table 5: AUC values of different sets of features and DILI endpoints for drugs in the DrugPoints database during an independent test. Supplementary Table 6: AUC values of different sets of features and DILI endpoints for drugs in the DailyMed database during an independent test. Supplementary Table 7: association between DrugDex DILI endpoints and top-level ATC codes. Supplementary Table 8: feature importance of using top-level ATC codes to predict DrugDex DILI endpoints.