Virtual Screening Based on QSAR and Molecular Docking of Possible Inhibitors Targeting Chagas CYP51

Chagas is a neglected tropical disease caused by the parasite Trypanosoma cruziwith no effective treatment in all its forms.&ere is a need to find more effective therapeutic alternatives with reduced toxicity. In this contribution, multiple linear regression models were used to identify the molecular descriptors that best describe the inhibitory activity of 52 fenarimol analogues against Trypanosoma cruzi. &e topological, physicochemical, thermodynamic, electronic, and charge descriptors were evaluated to cover a wide range of properties that frequently encode biological activity. A model with high predictive value was obtained based on geometrical descriptors and descriptors encoding hydrophobicity and London dispersion forces as necessary for the inhibition of Trypanosoma cruzi-CYP51. Docking methodology was implemented to evaluate molecular interactions in silico. &e virtual screening results in this study can be used for rational design of new analogues with improved activity against Chagas disease.


Introduction
Chagas, a tropical disease endemic to Latin America, is caused by the parasite Trypanosoma cruzi (T. cruzi), first described in 1909 [1][2][3]. e most affected population is very low-income people, who do not have the economic resources to pay for expensive treatments and live in conditions of high vulnerability [4][5][6]. e pharmacological therapy currently used is aimed at preventing the chronicity of the infection since the efficacy of antiparasitic drugs decreases in the chronic phases of the disease. e effectiveness of the treatment in the acute phase has been demonstrated [7], while the chronic stage can take up to 20 years resulting in multiple-organ damage [6][7][8]. Studies show that treatment effectiveness in the acute phase is 50-70%, while in the chronic phase, the effectiveness is reduced to 8-30% [9,10]. Additionally, treatment depends on age of the patient and long-term side effects.
ere is no effective treatment for Chagas disease in all its forms; treatment with nifurtimox and benznidazole is highly toxic [6,7,11]. Both compounds are heterocyclic with a furan and imidazole-nitrogenated ring, respectively, which act by inducing cytotoxicity in the parasite by interacting with the nitroreductase enzymes of the parasites [12][13][14][15]. Mutagenic and carcinogenic activities, hepatotoxicity, and bone marrow suppression have been shown to be significant with the use of these medications. Buschini et al. [16] reassessed the genotoxic and mutagenic activity for nifurtimox and benznidazole, respectively. ey found that both drugs damage the infectious agent and DNA of host cells at concentrations in the range of plasma concentrations of patients treated by chemotherapy. Benznidazole and nifurtimox contain nitro groups that produce nitrogenated metabolic radicals that affect the parasite and are responsible for the toxicity, mutagenicity, genotoxicity, and carcinogenicity attributed to them [17]. Owing to the high toxicity, few therapeutic alternatives go to clinical trials [18]; a greater selectivity of the drug and reduced adverse effects on the host are required. us, there is a need to develop therapeutic alternatives with improved efficacy and reduced toxicity [18][19][20]. Sterol 14α-demethylase cytochrome P450 (CYP51) inhibitors with trypanocidal activity have been identified since 1990 [21]. CYP51 is an important target in the treatment of Chagas disease because the inhibition of sterol synthesis is lethal to the parasite. Urbina et al. reported the use of triazoles as inhibitory compounds of CYP51 in T. cruzi in 2002 [22][23][24][25]. e binding site of the ligand in CYP51 contains a hydrophilic region, a hydrophobic region, and a channel formed by hydrophobic residues between the β-helix and the N-terminal helix [26]. Azoles, such as D0870, posaconazole, and ravuconazole, act through the coordination of nitrogen with heme iron of the CYP51 enzyme. e crystallographic structure of T. cruzi-CYP51 linked to posaconazole and fluconazole ( Figure 1) has been reported [27] which favors the design of therapeutic alternatives based on methods such as structure-based drug design. e active site on CYP51 takes the form of Y, one end being longer than the other two. e longer end is a pocket at an angle to the heme group plane, which can interact with up to four fused aromatic rings in this highly hydrophobic pocket.
Other types of substances, such as naphthoquinones, diamines, nitroimidazoles, and their derivatives have been evaluated, albeit with high toxicity. In addition to CYP51 inhibitors, other therapeutic targets for the development of new drugs include inhibitors of cruzipain, inhibitors of pyrophosphate enzymes, HGPRT (hypoxanthine-guanine phosphoribosyltransferase), and trypanothione reductase, the principal enzymes of T. cruzi metabolism [23,28].
In the last 25 years, significant advances have been made in the research of T. cruzi and the complete elucidation of its genome [30,31]; however, the role of many of the encoded proteins is unknown. In addition, significant advances have not been reported in the field of natural product research owing to the lack of knowledge about human toxicity [17].
us, the only treatment available for Chagas disease is the use of nifurtimox and benznidazole. In two decades of research, it has not been possible to find more effective therapeutic alternatives with reduced toxicity, and the therapeutic combinations of the drugs already available are under clinical study. Cheminformatics and molecular modeling approaches represent an important initiative considering the high costs of traditional research [32,33].
Fenarimol 1 is a fungicide discovered with moderate activity in vitro (IC 50 , 350 nM) against T. cruzi [34]. Structure-activity relationship studies to improve the inhibitory capacity of CYP51 led to pharmacophore 2, in which an aromatic ring is replaced by piperazine with substitutions at the N-4 atom.
e new derivatives with amide, sulfonamide, aromatic, carbamate, and carbonate substituents were evaluated for their ability to inhibit T. cruzi [35] in vitro and showed very promising results. Additionally, Keenan et al. [35] demonstrated that piperazine analogues of fenarimol do not exhibit cytotoxicity. ese findings make the piperazine analogues of fenarimol excellent candidates for more advanced optimization studies.
Quantitative structure-activity relationship (QSAR) is an approach that uses statistical models to correlate the molecular properties encoded in molecular descriptors with the biological activity of a group of molecules that share a pharmacophore. To the best of our knowledge, there is a lack of QSAR studies of piperazine analogues of fenarimol reported.
Several regression methods are available for establishing the correlation between the activity and the molecular structure descriptors [33]. Linear methods are efficient in relating the structure to a given biological activity, but the accuracy of nonlinear methods is considered superior [33]. However, nonlinear methods are harder to interpret and suffer from overfitting when the number of descriptors is greater than the number of samples in the dataset [36]. Consequently, we used the multiple linear regression (MLR) model because of its reliability, accuracy, and ease of interpretation, which is supported by an extensive number of studies conducted in the field of medicinal chemistry. MLR is one of the most widely applicable methods to the conduct of QSAR studies [33,[37][38][39].
A large number of descriptors of molecular structures are available; their selection is one of the most important initial tasks of QSAR studies. Descriptors based on topological indexes contain information related to molecular shape, size, flexibility, and the degree of branching. eir correlation with biological activity has been widely demonstrated. e physicochemical descriptors encode properties of hydrophobicity and electronic and steric effects, properties that are attributed to different forms of chemical interaction with biological targets, because of which they are most frequently used for QSAR studies. e partial charge descriptors contain chemical information resulting from the electronic distribution and molecular geometry combined in the same descriptor. e charge descriptors allow prediction of electronic density regions favorable for interaction with biological targets. A total of 53 descriptors were selected as widely as possible to evaluate the structural diversity of the analogues under study. ermodynamic and electronic molecular descriptors have been calculated with Gaussian 09 [40]; other descriptors including topostructural, topochemical, and topological indexes; physicochemical descriptors; and charge descriptors were calculated with MOE [41] to determine the structural factors with greater relevance as determinants of the biological activity against T. cruzi in the series of piperazine analogues of fenarimol used in this article. Additionally, this study evaluated piperazine analogues of fenarimol's interaction with the CYP51 receptor through docking methodology.

Descriptor Selection and Model
Generation. Molecular geometries of the compounds were optimized using the MM2 molecular mechanics method; the generated structures were further optimized using DFT employing B3LYP and 6-311+G(d,p) basis sets. e results were used for descriptor calculations. DFT calculations with B3LYP functional have been successful in the prediction of several molecular properties [42]. No imaginary frequencies were found. erefore, the calculated geometries were local minima in a potential energy surface. Molecular-geometry optimizations were performed using Gaussian 09 Revision-A.02-SMP software [40]. For all calculations and comparisons in this study, only the most stable conformations were considered. Several types of molecular descriptors were calculated using MOE [41]. Microsoft Excel Data Analysis add-in and several validation analyses were used to evaluate the statistical significance of the employed models.
A molecular descriptor is a numerical representation of a molecular property derived from a standardized experiment or mathematical procedure. Any molecular property may be used, and descriptors can be determined experimentally from physical properties or computed from structural features such as van der Waals forces, atomic and bond counts, distances, pharmacophoric features, partial charges, volume, and shape. To reduce the number of descriptors, the correlation coefficients between each descriptor and pIC 50 were calculated. e selected descriptors for the creation of the model have a coefficient of determination (R 2 ) ≥ 0.5 [43]; in total, 12 descriptors were selected, which are defined in Table 1. e values for the selected descriptors are found in Table 2.
e descriptors of Table 2 were selected using the successive step mode [44], and the following parameters were taken into account: use of 1 descriptor for every 5 compounds [45] and selection of descriptors of different nature to avoid collinearity between the variables [46]. Subsequently, multiple linear regression (MLR) was used to study the linear relationship between pIC 50 and the remaining descriptors, but only those models with R 2 higher than 0.568 were considered valid [47]. e linear relationship between pIC 50 and descriptors was determined using the standard equation: where Y is an nx1 vector of values of the dependent variable pIC 50 , X is an nxk matrix of explanatory variables (known as molecular descriptors), B is a kx1 vector of the estimated parameters, and ϵ is an nx1 residual vector whose components are assumed to be independent, normal, random variables with mean zero and known variance σ 2 .
Stepwise regression fitting was performed with the successive addition of descriptors that significantly improved the fit; this process was repeated until no further improvement in the model was possible. Next, the descriptors whose loss gives the most statistically insignificant deterioration of the model fit were removed. is process was repeated until no further descriptors could be deleted without a statistically significant loss of fit. Using these conditions, 21 models were obtained; these models can be accessed from supplementary Table S1 with their corresponding statistics in supplementary Table S2. e best predictive model with the lowest number of descriptors, a high determination coefficient (R 2 ), and no collinearity among the selected descriptors was selected for further improvement.
e "leave-one-out" (LOO) cross-validation scheme was used to assess the validity of the models. During the validation process, the model function is trained on data from all molecules excluding one. pIC 50 for the molecule that was left out was then predicted using the descriptors for the model. Statistical parameters such as Q 2 (cross-validated correlation coefficient), R 2 (determination coefficient), standard deviation (SD), and SE (standard error) were taken into account to evaluate the quality of the proposed QSAR models: where y i ,y i , and y i are the measured, predicted, and averaged (over the whole dataset) values of the dependent variable, respectively; σ 2 XY is the covariance of X and Y; σ 2 x is the variance of X; σ 2 y is the variance of Y; and n is the number of samples used for model building.

Enzyme Collection and Docking Preparation.
e structure of the receptor was obtained from the Protein Data Bank (https://www.rcsb.org/) database, identified in the repository as a crystal structure of sterol 14α-demethylase (CYP51) from Trypanosoma cruzi in complex with the inhibitor fluconazole. PDB ID: 3KHM. e initial crystal

Results
Pharmacological data of 52 fenarimol analogues (molecules 3-54 in Table 3) were taken from the literature [35]. pIC 50 (−logIC 50 ) was used as a measure of biological activity, IC 50 values corresponding to concentration of the compound required to inhibit 50% of T. cruzi measured under the same experimental conditions. Figure 2 shows the pIC 50 distribution of the data, spanning three orders of magnitude. We utilized the principal component regression (PCR) and partial least square regression (PLS) for examining the linear relationship between IC 50 and the corresponding descriptors. However, MLR afforded the best predictive model with the lowest number of predictive variables and a high determination coefficient (R 2 ) of the most significant variables.
Model 01 was obtained using MLR with 57% of the variation in biological activity explained by the descriptors and standard error (SE) 0.374 (Table 4).
Model 01: e jackknife technique was implemented to eliminate the outliers [51] to improve the statistical quality of model 01. After removing molecules 10,11,24,25,27,45,52,53, and 54 as outliers, model 02 was achieved. Occurrence of outliers depends on three main factors [46]: errors in the reported biological activity or calculated descriptors' values; different mechanisms of action for the dataset used to build the model; and sampling design errors. When applied to the information used to build our model, none of the above three factors were able to sufficiently explain why molecules 10,11,24,25,27,45,52,53, and 54 were classified as outliers.
Model 02: e coefficient of determination R 2 increases from 0.568 in model 01 to 0.817 in model 2, and the SE decreases from 0.374 to 0.256 (Table 5). e improved statistical quality of model 02 is reflected since 82% of the variation in biological activity is explained by the selected descriptors. Table 6 shows no collinearity in the selected descriptors, and therefore, the resulting model 02 has good stability.
From the LOO cross-validation procedure in Table 7, the square of the cross-validation coefficient LOO (Q 2 ) is obtained to evaluate the robustness and the predictive capacity of the model. Complete LOO cross-validation statistics for model 02 are available in supplementary Table S3. Good internal prediction power was achieved because the correlation coefficient of cross-validation LOO (Q 2 ) was greater than 0.5 [47]. Cross-validated pIC 50 predicted values are shown in Table 8.
On the contrary, the R 2 value of the original model should not be significantly greater than the Q 2 value, and the difference between R 2 and Q 2 should not be more than 0.3 [52]. In our case that the difference is 0.040 (R 2 � 0.820; Q 2 � 0.780) indicates that the model is not overfitted, and R 2 � 0.820 indicates a good correlation between the observed and predicted pIC 50 (Table 7).
Model 02 was used to calculate pIC 50 of all fenarimol analogues (Table 8).
With these new values, the observed pIC 50 vs. predicted pIC 50 graph is shown in Figure 3. Figure 3 demonstrates the reproducibility of the predicted data with respect to that observed in the pIC 50 data taken to develop the model. e results showed a Pearson correlation coefficient of 0.904,    Journal of Chemistry which demonstrates a good linear correlation between observed pIC 50 and predicted pIC 50 , and an R 2 correlation coefficient of 0.817 (Table 9). In accordance with the statistical parameters discussed, model 02 meets the statistical quality to predict the activity of new piperazine analogues of fenarimol.

Model Interpretation.
A QSAR model with statistical validity was obtained. e coefficients of the descriptors in a QSAR model can be used to evaluate the contribution of each descriptor toward biological activity; this information can be utilized to develop effective strategies in designing compounds with improved specificity and biological activity. Figure 4 shows the estimates of the regression coefficients obtained from model 02. ere are descriptors that       Remarkably, the biological activity in our proposed model is explained by geometrical descriptors and descriptors encoding hydrophobicity and London dispersion forces. Descriptors such as length and Log P(o/w) describe the molecular dimensions and hydrophobicity, respectively, for successful binding with the substrate. Log P(o/w) is the log value of the octanol/water partition coefficient. is property was calculated in MOE from a linear atom-type model using 1,847 molecules [53].
Q_VSA_FPPOS is the code for fractional positive polar van der Waals surface area. Q_VSA_FPPOS is the sum of vi (van der Waals surface area of atom i) such that qi (partial charge of atom i) is greater than 0.2, divided by the total surface area [54]. According to model 02, low positive polar surface areas favor biological activity against T. cruzi. is highlights the importance of descriptors Log P(o/w), length, and Q_VSA_FPPOS in achieving T. cruzi inhibition over a wide range of pIC 50 from 0.056 to 2.301, which in turn makes this model significant [52].
CYP51 is an enzyme with a rigid substrate-binding cavity; fenarimol analogues interact via a coordination bond with iron in the enzyme's heme group, and the other two rings are directed to substrate cavities via van der Waals contact with residues Val213, Ile105, Tyr103, Met106, Tyr116, Ala115, Phe110, Leu127, Met284, Leu130, Ala287, Phe290, Ala291, r295, Leu356, Met460, Met360, and Met358 [26]. Two aromatic planar rings (pyridine/pyrimidine and substituted benzene) and a heterocyclic nonplanar ring (piperazine) are the pharmacophoric features of fenarimol analogues. Clearly, the nature of the van der Waals interactions in the binding pocket is due to multiple aromatic stacking and hydrogen bonding. A decrease in the molecular surface area bearing a polar positive charge would favor intermolecular interactions and coordination with iron in the enzyme's heme group.
Low polarity contributes to our model because 5 out of 7 residues (Val102, Ile105, Met106, Phe110, and Ala115) at the active site in CYP51 are nonpolar [55]. Additionally, a 42 residue-long hydrophobic tunnel connects the heme group with the protein surface [55], so high partition coefficients such as Log P(o/w) values are required to enter the mostly hydrophobic CYP51 active site. e longer groups substituted at R1 of the piperazine ring (length descriptor) favor T. cruzi inhibition due to a deeper entrance in the binding pocket allowing further contact with the residues.
Consequently, if we want to increase the value of the activity, we will start with the most active group of fenarimol analogues, carbonates, or carbamates 33-44 and then (1) Increase Log P: substituents should favor hydrophobicity (2) Increase length: the substitution pattern in the piperazine moiety should contain carbonates or carbamates substituted with a large aromatic ring (3) Decrease Q_VSA_FPPOS: avoid polar positive groups

Correlation of Biological Activity with Descriptor Values.
For the biological activity of amides 3-17, it appears that the bulkier alkyl amides 3-8 are more potent than benzamides 10 and 11 as well as being more potent than amides 13, 14, and 16 possessing smaller alkyl groups (Table 3). Amides generally have the shortest lengths (7.596-7.951Å) at the piperazine end of the pharmacophore (length descriptor) ( Table 2). is shorter length of amides at the piperazine end might limit the access to active sites and interactions with residues that are more important for activity. Pyrimidine amides 12 and 13 are less potent than pyridine amides (Table 3) and have Log P(o/w) values of 2.565 and 1.715, respectively (Table 2); these structures have the lowest values of hydrophobicity in the amide group. erefore, the low hydrophobicity of pyrimidine derivatives might determine their low activity compared to pyridine derivatives. In the sulfonamide group (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32), aryl sulfonamides 18-23 showed higher activity than alkyls 24-32. e length of aryl sulfonamides ranged 8.006-10.866Å, while that of alkyls ranged 6.398-6.708Å except for 26 (8.253Å). Alkyl sulfonamides 31 and 32 are pyrimidine derivatives that showed lower potency than the pyridine derivatives. In addition, low Log P(o/w) values (0.434-0.777) were observed for alkyl sulfonamides 31 and 32 and aryl sulfonamide 30 (Table 2). In general, pyrimidinyl analogues were less potent than the corresponding pyridyl compounds (e.g., amides 3, 12 and 5, 13; sulfonamides 24, 25 and 28, 31, as well as 29, 32; and carbamates 42, 40 and 46, 53). Lower basicity of pyrimidinyl rings could be accounted for the reduced coordination of nitrogen with heme iron of the CYP51 enzyme and hence lower activity against T. cruzi.
Carbonate and carbamate analogues 33-44 contained piperazines with the longest chain length (10.305-12.952Å (Table 2)) among the compounds examined in this study and were found to be the most potent against T. cruzi. In general,    Table 10.
Accessibility to the binding pocket, hydrophobicity, and low positive polar surface area make carbamates the most potent against T. cruzi, followed by sulfonamides. e structures of carbamates and sulfonamides contain high electron density areas because of the presence of multiple  electronegative heteroatoms that facilitate the formation of van der Waals contacts with the polar residues present at the active site.

4.3.
Docking. Stereochemistry plays a major role in biological activity; there is no scientific literature about the activity of isolated stereoisomers of piperazine analogues of fenarimol used in this paper. Docking is an in silico methodology to evaluate interactions at the active site and therefore three-dimensional structural properties such as chirality. Docking was performed using the R and S stereoisomers of the molecules with the best biological activity for each substituent subtype, that is, amide 03, sulfonamide 18, carbamate 33, and aryl-substituted molecule 45. All stereoisomers act on the active site reported in the literature. Multiple interactions of the aromatic stacking type, π-π, π-alkyl, and π-sulfur type were found for all tested molecules ( Figure 5). Hydrogen bonding was found only in 18(R) stereoisomers ( Figure 5). Among the stereoisomers, those with the highest affinity for the receptor are 3(R), 18(S), 33(S), and 45(S) ( Table 11).
Stereoisomers with coordination distances to Fe between 2.62 and 2.80Å display the strongest interacting conformations. Stereoisomers with greater distances to Fe demonstrated no possible coordination and were, by comparison, weaker interacting conformations. Molecule 45 was an exception because the strongest interacting conformation was not coordinated to Fe (Table 11).
Experimental evaluations of the biological activities of isolated stereoisomers in the series of piperazine analogues of fenarimol are necessary to correlate the findings in this paper. e biological activity expressed as IC 50 based on currently available data was determined using racemic mixtures of the molecules under study. e spatial arrangement and different coordination distances to Fe can lead to different IC 50 values depending on the type of R/S stereoisomerism exhibited by the fenarimol analogues. Stereoselectivity is one of the most important factors to consider to improve drug safety and efficacy; therefore, efforts to elucidate the relevance of stereoselectivity in the inhibition of sterol 14α-demethylase (CYP51) from Trypanosoma cruzi would benefit this research area.
By superimposing the more stable conformations resulting from docking of molecules 18(S), 33(S), and 45 (S), an almost perfect match is observed between piperidines, piperazine, and lateral aromatic substituents except for  sulfonamide 18(S), which suffers a considerable deviation in the piperazine ring ( Figure 6(a)). On the contrary, molecule 3(R) does not coincide with the superimposed conformations due to its chiral arrangement ( Figure 6(b)).

Conclusions
In this study, multiple linear regression (MLR) models were developed to determine the molecular properties relevant to the inhibitory activities of 54 fenarimol analogues against Chagas disease. Twelve descriptors were identified and statistically validated to build a QSAR model. e length, Q-VSA_FPPOS, and Log P(o/w) are determinant descriptors in the prediction of biological activity against Chagas disease. e MLR model has demonstrated that the most active compounds have low values of Q-VSA_FPPOS and high values of length and Log P(o/w). e stability and robustness of the model are supported by the statistical parameters and validation tests applied, indicating that the strategy implemented in this study can be used to predict the biological activity of piperazine analogues of fenarimol and to adopt a rational design for developing new analogues with improved activity against Chagas disease. Stereoisomers 3(R), 18(S), 33(S), and 45(S) showed the highest affinity to the receptor, with coordination distances of approximately 2.62-2.80Å that were found to have the strongest interactions with the receptor. e most frequently found interactions with CYP51 active sites were aromatic stacking, π-π, π-alkyl, and π-sulfur interactions. Experimental determination of the biological activity of isolated stereoisomers has not been reported in the literature for piperazine analogues of fenarimol; therefore, this paper makes an important contribution to the research needed in this area. For virtual screening and computational design of drugs against Chagas disease, it will be essential to define stereoselectivity to improve drug safety and efficacy. Knowledge of different pharmacokinetic and pharmacodynamic profiles of stereoisomers could lead to substantial development in this research area. Chirality in drug design is an issue for both pharmaceutical industry and regulatory authorities. Factors such as industrial scale, production of stereoisomers, stereochemical stability, toxicological profile, and differential clinical results highlight the importance of stereoselectivity in drug design. In the case of Chagas disease, a neglected tropical disease with no effective treatment and high toxicity levels of current therapeutics, it is of vital importance to explore these issues in depth to find the most effective treatment.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest. Table S1: regression models. Table S2: statistics of regression models.