Exploring QSAR for Antimalarial Activities and Drug Distribution within Blood of a Series of 4-Aminoquinoline Drugs Using Genetic-MLR

Malaria has been one of the most signi�cant public health problems for centuries. QSAR modeling of the antimalarial activity and blood-to-plasma concentration ratio of Chloroquine and a new series of 4-aminoquinoline derivatives were developed using genetic algorithms with multiple linear regression (GA-MLR) method. We obtained two different models against Chloroquinesensitive (3D7) and Chloroquine-resistant (W2) strains of Plasmodium falciparum with good adjustment levels. Drug distribution in blood, de�ned as drug blood-to-plasma concentration ratio (RRbb), is related to molecular descriptors. Leave-many-out (LMO) and Y-randomization methods con�rmed the models’ robustness.

Blood-to-plasma concentration ratio (  de�ned as   /  ) is a measure of the drug distribution within the blood.Drugs, when reaching the blood stream, can bind to plasma proteins and/or to blood cells.If a drug binding in plasma exceeds its binding in blood cells,  b values are below 1 (  >   ).When drug binding in blood cells exceeds its plasma binding (  >   ), then   values are above 1.Red blood cells (RBCs) are the host cells for malaria parasites, and any effect of the drug on red cell membranes might be relevant for its in vivo effects [11].  may be an important parameter in drug potency and is therefore worthy of investigation.It is related to either the volume of distribution or clearance of the drug.Even though the determination of   is relatively simple, such data is absent in most pharmacokinetic studies [12,13].
e objective of this study was �rst to develop QSAR models and to explain the antimalarial activity of a new series of 4-aminoquinoline, structurally related to CQ, against P. falciparum various clones (3D7, W2) in vitro using theoretical molecular descriptors.Second, the aim was to establish regression models to predict the blood-to-plasma concentration ratio (  ) using mainly in silico molecular descriptors.

Soware.
A Pentium IV personal computer (CPU, 3.2 GHz) under Windows XP operating system was used.Molecular modeling and geometry optimization were employed by Hyperchem [14].Dragon soware [15] was employed for calculation of theoretical molecular descriptors.SPSS soware [16] was used for MLR analysis.Other statistics calculations were also performed in the MATLAB [17] environment.

Ensemble Data and Molecular
Descriptors.We used a series of 4-aminoquinoline antimalarial compounds with experimentally determined ADME properties, taken from the Ray et al. paper [18].Based on the results of their research group, antimalarial compounds effective against drug-resistant strains of P. falciparum by varying the chemical substitutions around the heterocyclic ring and the basic amine side chain of the popular antimalarial drug chloroquine have developed [19,20].Recently, they have screened a panel of these novel antimalarial compounds for improved leads based on the evaluated ADMET properties [18].Each compound in the studied database was characterized by growth inhibition of 3D7 and W2 strains of P. falciparum, blood-to-plasma concentration ratio.Figure 1 2. e molecular structures of all the Chloroquine derivatives were built with Hyperchem (Version 7, HyperCube, Inc.) soware.AM1 semiempirical calculation was used to optimize the 3D geometry of the molecules.e Polak-Ribier algorithm with root mean squares gradient 0.1 kcal/mol was selected for optimization.By using DRAGON [15] we derived a total of 1481 1D, 2D, and 3D molecular descriptors from the 3D structure of each compound.
To decrease the redundancy existed in the descriptors data matrix, the correlation of descriptors with each other and with the properties of the drugs was examined, and collinear descriptors (i.e.,   05) were detected.Among the collinear descriptors, one with the highest correlation with activity was retained, and the others were removed from the data matrix.e list and meaning of the molecular descriptors is provided by the DRAGON package, and the calculation procedure is explained in detail, with related literature references, in the Handbook of Molecular Descriptors [21].

MLR Modeling Procedure. Multiple Linear Regression
(MLR) which demonstrates great ease of implementation along with the interpretability of resulting equations were the statistical method of choice for building the QSAR model.e forward-stepping variant of MLR was utilized, starting with the selection of a single variable which contributes most to the model based on its highest -statistics or lowest  value.At each step, MLR alters the model from the previous step by adding predictor variables and terminating the search when a statistically signi�cant model has been obtained [22,23].QSAR Modeling [24] is free JAVA-based soware developed by the courtesy of the eoretical and Applied Chemometrics Laboratory's research group.Genetic algorithm (GA) search was carried out exploring MLR models.e GA used was the same as that previously used [25,26].

Results and Discussions
3.1.e Selected Descriptors.e majority of the selected descriptors in our GA-MLR modeling are composite descriptors, which can be divided into �ve groups: GETAWAY, 3D-MoRSE, RDF, WHIM, and 2D autocorrelations descriptors.e GETAWAY (Geometry, Topology, and Atom Weights AssemblY) try to match the 3D molecular geometry provided by the molecular in�uence matrix and atom relatedness by topology with chemical information by using various atomic weighting schemes.3D-MoRSE descriptors, which are representations of the 3D structure of a molecule and encode features such as molecular weight, van der Waals volume, electronegativities, and polarizabilities.e radial distribution function (RDF) descriptors are based on the distance distribution of the compounds.WHIM descriptors are based on statistical indices calculated on the projections of atoms along principal axes.2D autocorrelations descriptors, in general, explain how the considered property is distributed along the topological structure.ree spatial autocorrelation vectors including unweighted and weighted Moran and Geary and Broto-Moreau autocorrelation vectors were calculated.e physicochemical property was considered in atomic masses (m), atomic van der Waals volumes (v), atomic Sanderson electronegativities (e), and atomic polarizabilities (p) as weighting properties [21].Table 3 depicts the names and meanings of the molecular descriptors used in this work.
Tables 4 and 5 show the data of the descriptors used in this study.e correlation matrixes of the descriptors used in this study are given in Tables 6, 7, 8, and 9. Inspection of these results shows that all the values deviate from unity are noticeable so there is no signi�cant correlation between the independent variables.

Validation of the Models.
A good �t was assessed based on the determination squared correlation coefficients ( 2 ), adjusted determination coefficient ( 2 adj ), standard deviation (s), root-mean-square error (RMSE), Fisher's statistic (F) and number of variables.Most of the QSAR modeling methods implement the leave-one-out (LOO) or leave-manyout (LMO) cross-validation procedure, which are internal validation techniques [27].LOO cross-validation procedure consists of removing one data point from the training set and constructing the model only on the basis of the remaining training data and then testing on the removed point.LMO cross-validation procedure calculate the models leaving multiple observations out at a time, reducing the number of times it has to recalculate a model.e outcome from the cross-validation procedure is cross-validated  2 (LOO- 2 or LMO- 2 ), which is used as a criterion of both robustness and predictive ability of the model.In this paper, we have performed the LOO cross-validation and leave-5-out cross-validation method as the internal validation tool.e robustness of the model was examined by the Y-randomization test [28].For the Y-randomization test, performed ten times,  2 ≤ 0.3 and Q 2 Loo ≤ 0.05 for all results were considered acceptable.ese limits were selected based on Eriksson and coworkers' suggestions [28].e Yrandomization test is capable of verifying if models with high values of  2 and Q 2  Loo present chance correlation [29,30].
In order to make more realistic validation of the predictive power of the models, external validation was also performed.For that purpose, six Chloroquine derivatives (3, 6, 8, 15, 18 and 19) were selected from 21 compounds at random to construct the external test set, and the remaining 15 Chloroquine derivatives comprised the training set that was employed to calibrate the QSAR models.

QSAR Models for 2D7 and W2 Strains
. By using the best multilinear regression method equations for both antimalarial activities against Chloroquine-sensitive (3D7) and Chloroquine-resistant (W2) strains of P. falciparum were constructed with up to �ve descriptors.e predicted log IC 50 values and the residuals for the compounds are listed in Table 1.QSAR models generated for the two strains (3D7, W2) are shown in Table 10.ese models have good capacity to explain the observed values of biological activity because it possesses excellent adjustment level: high correlation coefficient and low root-mean-square error ( 2 = 0.94,  2 adj = 0.92 and RMSE = 0.14 for 3D7 strain and  2 = 0.94,  2 adj = 0.91, and RMSE = 0.16 for W2 strain).To validate the selected prediction function, a cross-validation, and an external test were carried out.e models also have good predictive capacity ( 2 = 0.86 for the both strains).In general, MLR models were able to explain data variance and were quite stable to the inclusion-exclusion of compounds as T 4: Data of the selected descriptors used in this study for 3D7 and W2 strains.
Log (    T 10: Multivariate linear regression models and statistical parameters for 3D7 and W2 strains. Log ( [28].is indicates that the explained variance by the model is not due to chance correlation.Y-randomization results are shown in Figures 2 and 3.Each of related training set equations and statistical parameters is summarized in Table 11.In turn, plots of LOO cross-validation and test set predictions versus experimental log IC 50 values (for 3D7 and W2 strains) for the MLR models are shown in Figure 4.

QSAR Model for Blood-to-Plasma Concentration Ratio.
e best linear models consisted of the �ve descriptors in order to relate them to the log   values tabulated in Table 12.e predicted  b values and the residuals for the compounds are listed in Table 2.As can be seen, the MLR models have good statistical quality with low prediction error.e models obtained were validated by calculating the cross-validated  2 values obtained using the LOO cross-validation method.is is the measure of the predictive power of regression equations.e  2 values for the best regression models for log  b were suggestive of robust models.e results of the LMO test are collected in Table 3.On average, the overall test steps  2 ≥  2  LMO and  2 LOO ≈  2 LMO which is another proof that the model is not underdetermined.e model was further validated by applying the Y-randomization.Several random shuffles of the Y vector were performed.Yrandomization results are in agreement with the suggested limits [28].Y-randomization results are shown in Figures 5  and 6. e   prediction ability of the MLR models were also tested using the validation set of data (Table 13).e correlations between the predicted and experimental values of   (from LOO cross-validation and external test) are shown in Figure 7.

Conclusions
A quantitative structure-activity relationship (QSAR) study was applied to the series of 4-aminoquinoline antimalarial compounds potentially active against the 3D7 and W2 strains of P. falciparum.For each strain, statistically signi�cant models were obtained using the GA-based MLR method.ese models may be considered as mathematical equations for the prediction of antimalarial activities of the compounds structurally similar to those used in this study.Models based on GA-MLR were developed to predict the blood-toplasma concentration ratio of the analogues based on selected molecular descriptors.e predictive ability of the test and its validation set were con�rmed by the   models.e LOO and LMO cross-validation methods, the Y-randomization technique, and the external validation indicated that the model is signi�cant, robust, and has good internal and external predictability.e use of these models may be an important tool in early drug discovery by providing a relevant pharmacokinetic parameter.

F 1 :
Chemical structures of 4-aminoquinolines analogues used in this study.T 9: Correlation matrix of the 5 selected descriptors for   B/P, 10 M.

1 F 3 :F 4 : 1 F 5 : 1 F 6 :
Plots of Y-randomization for log (1/IC 50 ) W2. Plot of the predicted versus the observed of log (1/IC 50 ) growth inhibition of 3D7 and W2 strains of P. falciparum.e LOO cross-validation compounds are represented as grey dots and six derivatives, used as test set, as black dots.Plots of Y-randomization for log  b , B/P(1 M).Plots of Y-randomization for log  b , B/P(10 M).
T 1: Experimental and calculated antimalarial activity (log IC 50 ) for 3D7 and W2 strains.
a Number of compounds given in Figure1.bValuescalculated by equations in Table10.c Observed minus calculated values.
T 2: Experimental and calculated values for Blood-to-plasma concentration ratio (  ).
a Number of compounds given in Figure1.bValuescalculated by equations in Table12.c Observed minus calculated values.
T 3: Brief description of molecular descriptors used in the different modeling approaches.
Data of the selected descriptors used in this study for blood-to-plasma concentration ratio.log   B/P, (1 M) RDF065u GATS8m RDF090m R2u + Mor24m log   B/P, (10 M) R3m RDF095u G3p R4m + MATS4m Evaluation of the prediction ability of the MLR models in the external validation set for 3D7 and W2 strains.Plot of the predicted versus the observed log  b that was measured for each of the compounds at 1 and 10 M.e LOO crossvalidation of compounds is represented as grey dots and test set as black dots.T 13: Evaluation of the prediction ability of the MLR models in the external validation set for   .