^{1}

^{2}

^{1}

^{1}

^{2}

A novel method for predicting maximum recommended therapeutic dose (MRTD) is presented using quantitative structure property relationships (QSPRs) and artificial neural networks (ANNs). MRTD data of 31 structurally diverse Antiretroviral drugs (ARVs) were collected from FDA MRTD Database or package inserts. Molecular property descriptors of each compound, that is, molecular mass, aqueous solubility, lipophilicity, biotransformation half life, oxidation half life, and biodegradation probability were calculated from their SMILES codes. A training set (

Acquired immunodeficiency syndrome (AIDS) is a degenerative disease of the immune and central nervous systems caused by the human immunodeficiency virus (HIV). There are an estimated 33.2 million people living with HIV/AIDS globally [

We believed that since the MRTD estimates are derived from human data, they would provide a more relevant, accurate, and specific estimate for toxic dose levels compared to risk assessment models based on animal data alone. In this article, we predict the MRTD of antiretroviral drugs from their molecular structures using relevant molecular property descriptors and neural network software as a data mining tool. Predictive performance of the models were evaluated and statistically compared with the results obtained clinically or reported in the literature. The application of predictive models in the design of safe, effective antiretroviral drug delivery systems is discussed.

The physicochemical descriptors, molecular weight (MW), aqueous solubility (ASol), and lipophilicity (AlogP) were determined using ALOGPS 2.1. Virtual Computational Chemistry Laboratory, (

The MRTD of 31 structurally diverse antiretroviral drugs were taken from the FDA MRTD database or package inserts. This “clinical MRTD” dataset was randomly split into training and validation subsets as shown in Table

Clinical MRTD data consisted of 23 training set compounds (italic) and 8 test set compounds (bold). Mean and standard deviation statistics are shown indicating the distribution and central tendency of each molecular property descriptor.

Drug | MRTD (mg/kg/day) | OxidHL (days) | P[BD] (%) | logBioHL (days) | AlogP (:) | ASol (g/L) | MW (Da) |
---|---|---|---|---|---|---|---|

13.30 | 0.135 | 0.5475 | −2.2874 | −1.45 | 8.65 | 225.21 | |

20.00 | 0.139 | 0.2005 | −2.418 | −2.62 | 3.20 | 225.21 | |

6.67 | 0.034 | 0.0014 | −2.3428 | 2.77 | 0.086 | 456.57 | |

6.67 | 0.140 | 0.5085 | −1.9956 | −1.26 | 6.43 | 236.23 | |

25.00 | 0.051 | 0.991 | −3.6235 | 0.13 | 1.32 | 321.38 | |

120.00 | 13.37 | 0.772 | −2.4547 | −1.63 | 16.76 | 126.01 | |

16.70 | 0.038 | 0.0501 | −4.6478 | 3.26 | 0.048 | 613.81 | |

0.040 | 0.123 | 0.0792 | 0.9861 | 3.31 | 0.15 | 259.14 | |

5.00 | 0.06 | 0.0719 | −3.9261 | −1.29 | 2.76 | 229.26 | |

3.33 | 0.171 | 0.4624 | 0.4539 | 3.28 | 0.009 | 179.31 | |

200.00 | 0.264 | 0.8963 | −2.8715 | −1.92 | 33.17 | 244.21 | |

50.00 | 0.044 | 0.944 | −3.3309 | −1.03 | 1.49 | 326.41 | |

0.0375 | 0.096 | 0.0909 | −3.5083 | −1.29 | 7.05 | 211.22 | |

0.333 | 0.038 | 0.8247 | −4.4141 | −2.29 | 1.49 | 332.32 | |

10.00 | 0.139 | 0.0432 | −3.0445 | −0.1 | 16.35 | 267.28 | |

33.330 | 0.052 | 0.9780 | −3.9394 | 4.04 | 0.002 | 670.84 | |

53.00 | 0.094 | 0.0001 | −1.8175 | 1.76 | 0.067 | 547.66 | |

6.670 | 0.043 | 0.0000 | −0.0265 | 5.71 | 0.0002 | 602.66 | |

20.00 | 0.105 | 0.9488 | −4.61 | 4.24 | 0.0012 | 720.94 | |

20.00 | 0.129 | 0.0700 | −0.2247 | 4.3 | 0.0106 | 513.66 | |

10.00 | 0.048 | 0.0016 | −3.0098 | −1.51 | 1.87 | 287.21 | |

25.00 | 0.055 | 0.6868 | −1.6709 | 6.00 | 0.0002 | 567.78 | |

1.333 | 0.088 | 0.0768 | −3.082 | −0.8 | 40.51 | 224.21 | |

23.00 | 23.00 | 23.00 | 23.00 | 23.00 | 23.00 | 23.00 | |

mean | 28.54 | 0.672 | 0.402 | −2.5133 | 0.939 | 6.15 | 362.72 |

SD | 45.53 | 2.769 | 0.394 | 1.5738 | 2.809 | 10.90 | 178.20 |

CV% | 159.52 | 412.00 | 97.00 | −62.620 | 299.04 | 177.25 | 48.86 |

10.00 | 0.040 | 0.0229 | −2.1757 | 0.61 | 1.21 | 286.38 | |

4.00 | 0.08 | 0.0566 | −3.7784 | −0.8 | 2.00 | 247.28 | |

30.07 | 0.094 | 0.0006 | −2.6157 | 1.7 | 0.095 | 444.47 | |

3.000 | 0.167 | 0.0473 | −3.7336 | 1.75 | 0.10 | 266.33 | |

10.10 | 0.254 | 0.0001 | 0.1111 | 3.88 | 0.008 | 315.67 | |

46.70 | 0.067 | 0.0007 | −2.4135 | 0.84 | 0.068 | 585.68 | |

5.35 | 0.109 | 0.0202 | −3.4392 | 4.37 | 0.003 | 704.96 | |

53.33 | 0.103 | 0.9933 | −3.3281 | 4.07 | 0.002 | 614.86 | |

8.00 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 | 8.00 | |

mean | 20.32 | 0.114 | 0.143 | −2.672 | 2.05 | 0.436 | 433.20 |

SD | 20.29 | 0.067 | 0.344 | 1.278 | 1.878 | 0.753 | 180.48 |

CV% | 99.86 | 58.99 | 241.300 | −47.830 | 91.50 | 173.000 | 41.66 |

The results of the multiple linear regression analysis for antiretroviral drugs versus MRTD. In the training, dataset only two of the six molecular descriptors showed a statistically significant correlation with therapeutic dose, that is, P[BD] (

Multiple linear regression | RSD = 33.89 | MCC = 0.7727 | ||

independent variables | Coefficient | SE | ||

Constant | −34.3303 | |||

AlogP | −12.1995 | 6.6976 | 0.0873 | −0.221 |

ASol | 2.1239 | 0.7697 | 0.0140 | 0.476 |

logBioHL | 15.9000 | 8.0377 | 0.0654 | −0.071 |

MW | 0.2159 | 0.1015 | 0.0494 | −0.116 |

OxidHL | 5.6589 | 2.8478 | 0.0643 | 0.448 |

P[BD] | 46.5133 | 20.1746 | 0.0349 | 0.427 |

ANOVA | F-ratio = 3.9507 |

The results of the neural network analysis for antiretroviral drugs versus MRTDs. All molecular descriptors show weak correlation with MRTD except for ASol, OxidHL, and P[BD]. The 6-2-1 neural network model predicted training set MRTD values with high accuracy (

6-2-1 neural network | MAX = 13.64 | RMSE = 5.53 | ||

Model versus clinical MRTD | Learning rate = 0.700 | |||

Independent variables | Correlation coefficients | |||

AlogP | −0.221 | 0.049 | ||

ASol | 0.476 | 0.226 | ||

logBioHL | −0.071 | 0.005 | ||

MW | −0.116 | 0.013 | ||

OxidHL | 0.448 | 0.201 | ||

P[BD] | 0.427 | 0.182 | ||

ANOVA | F-ratio = 1340.73 |

The predictability of each of the multiple linear regression (MLR) and neural network (TNN) models was evaluated by a cross-validation procedure [

In this multivariable system, a quantitative relationship between certain molecular property descriptors and maximum recommended therapeutic dose was characterized using two datasets (i.e., MRTD and TEST). A multiple linear regression (Table

A neural network model was constructed with (2) hidden neuron, (1) output variable that is Clinical MRTD, and (6) input variables that is, OxidHL, P[BD], logBioHL, AlogP, ASol, MW. Total number of patterns (23) were loaded in the data of which 23 were complete and available for training. Two nonlinear neurons were used and the model error minimization was stable for 20 minutes.

The results of the neural network model for antiretroviral MRTDs is shown in Table

Each of the models was then validated using external TEST MRTD dataset and a cross-validation procedure. Model “goodness of fit” and predictability are summarized in Table

Test dataset Goodness of fit comparisons. Clinical MRTD values from 8 ARVs were using as and external test dataset the validate model predictability. Model performance is characterized here in terms of root means squared error (RMSE), Kendall’s correlation coefficient (tau), and Type II error probability (

Drug | MRTD | MLR | SE | TNN | SE |
---|---|---|---|---|---|

Abacavir | 10.00 | −10.6738 | 427.4060 | 6.4861 | 12.3474 |

Emtricitabine | 4.00 | −23.9239 | 779.7441 | 3.3678 | 0.3996 |

Raltegravir | 30.07 | 0.0628 | 900.4320 | 19.7700 | 106.0900 |

Nevirapine | 3.00 | −54.1862 | 3270.2614 | −16.5700 | 382.9849 |

Efavirenz | 10.10 | −10.2863 | 415.6012 | −5.7782 | 252.1172 |

Fosamprenavir | 46.70 | 44.0515 | 7.0145 | 50.7500 | 16.4025 |

Atazanavir | 5.35 | 11.4360 | 37.0393 | 29.8700 | 601.2304 |

Lopiravir | 53.33 | 42.6361 | 114.3594 | 64.5900 | 126.7876 |

Mean | 20.32 | −0.1105 | RMSE = 27.27 | 19.06 | RMSE = 13.67 |

Kendall’s tau | 0.714 | 0.643 | |||

0.019 | 0.035 |

Method comparisons graph. MRTD values predicted by multiple linear regression (hollow squares, □) nearly traced those values predicted by the 6-2-1 neural network (solid squares, ■) as shown. Multiple linear regression estimates for MRTD were consistently lower that those predicted by the neural network model.

Bland Altman plot for method comparisons. Bland Altman plots are shown for (a) multiple linear regression predicted versus clinical MRTD, and (b) neural network model predicted versus clinical MRTD. Horizontal lines are drawn at the mean difference, and at the limits of agreement. All predicted values were within limits of agreement for both models, althought these limits were more narrow for the neural network model estimates. The plots are useful revealing the relationship between the differences and the averages, the slight deviation from symmetry in the multiple regression indicates some systematic bias but no possible outliers were identified.

MRTD values and SMILES codes for antiretroviral drugs were collected from the FDA MRTD database which is a highly reliable source pharmacologic activity based on extensive clinical evidence (

We began our study looking at dose-related adverse effects of commercial antiretroviral drugs or new ARVs in development. Although the appearance of serious long-term metabolic complications, such as cardiovascular disturbances [

The molecular descriptors used in this study were selected to represent physicochemical (MW, AlogP, ASol) and bioaccumulation (OxidHL, P[BD], logBioHL) property influences therapeutic dose. Although molecular weight does not strongly correlate with toxicity of most compounds, the larger the molecular size of a compound, the smaller its membrane permeability and diffusion coefficient become [

Any chemical (even water) can produce toxic side effects in the body if allowed to accumulate to sufficiently large concentrations. While much of the effort in bioaccumulation modeling [

Artificial neural networks (ANNs) are biologically inspired data-mining algorithms which work by detecting the patterns and relationships in data. We used the back propagation rule in which the neural network is trained to map a set of input data by iterative adjustment of the weights. A tangent sigmoid transfer function on the first layer and two neurons with a nonlinear transfer function on the hidden layer were minimalistic structures used to reduce overfitting. Our training processes for TNN were allowed to run until no change in RMSE was observed for 20 minutes, at which point the model was saved. This learning method is commonly used for neural network predictive models given dose-response type data. However, ANNs have several limitations, a major theoretical concern is the “black box” nature of the output, that is, conclusions are generated without mechanistic explanations. ANNs also are limited by the quality of their data and may need to be retrained periodically if its performance changes over time. This is not necessarily counterproductive, since it indicates robustness in the model which adapts to changes in the predictive criteria. Real-time monitoring of the training process is also important since overtraining can easily occur, especially when the datasets are small in size. This is may be one of the unique advantages of real-time visualization of the data-mining process allowing the investigator to make “intermediate evaluations” of model predictability and then continue training until the reliability and accuracy required of the predictions are met.

In conclusion, antiretroviral drugs are a chemically diverse class of compounds in terms of both physicochemical properties and bioaccumulation potential. However, commercial ARVs may be categorized for predictive modeling purposes into two groups based on aqueous solubility and lipophilic character, in which hydrophilic compounds may be administered at higher doses (MRTD) and prediction of their MRTD value may be possible using simple multiple linear regression models. In contrast, the prediction of MRTD values for antiretrovirals with poorer aqueous solubility would be the most effective when the neural network approach is used and when both physicochemical and bioaccumulation property descriptors are available for training. With regard to future studies, ANN represents a promising tool for predicting maximum therapeutic dose, especially for antiretroviral drugs with narrow therapeutic index in the treatment of AIDS.

Calculated octanaol/water partition coefficient

Aqueous solubility

Biotransformation half life

Coefficient of variation

Coefficient of determination

Multiple linear regression

Maximum recommended therapeutic dose

maximum absolute error

Molecular weight

Multiple correlation coefficient

Oxidation half life

Biodegradation probability

Tiberius Neural Network

Quantitative Structure Property Relationships

Residual standard deviation

Root mean squared error

Simplified molecular input line entry system

Squared error

Standard deviation

Zero order correlation coefficient.