3D-QSAR Study of the Chalcone Derivatives as Anticancer Agents

For their biological properties and particularly for their anticancer activities, chalcones are widely studied. In this work, we have submitted diverse sets of chalcone derivatives to the 3D-QSAR (3-dimensional quantitative structural-activity relationship) to study their anticancer activities against HTC116 (human colon cancer), relying on the 3-dimensional descriptors: steric and electrostatic descriptors for the CoMFA (comparative molecular ﬁeld analysis) method and steric, electrostatic, hydrophobic, H-bond donor, and H-bond acceptor descriptors for the CoMSIA method. CoMFA as well as the CoMSIA model have en-couraging values of the cross-validation coeﬃcient ( Q 2 ) of 0.608 and 0.806 and conventional correlation coeﬃcient ( R 2 ) of 0.960 and 0.934, respectively. Furthermore, values of R 2test have been obtained as 0.75 and 0.90, respectively. Besides, y-randomization test was also performed to validate our 3D-QSAR models. Based on these satisfactory results, ten new compounds have been designed and predicted by in silico ADMET method. This study could expand the understanding of chalcone derivatives as anticancer agents and would be of great help in lead optimization for early drug discovery of highly potent anticancer activity.


Introduction
Cancer is one of the most serious health problems in the world. Over the past three decades, the number of people with different types of cancer has increased, and the problem is expected to worsen in the next few years if new methods of treatment are not discovered [1,2].
Chalcone precursors, called α-b-unsaturated ketones (Figure 1), play an important role in synthetic manipulations and also represent a major component of natural products. Chalcones as well as their synthetic analogues ( Figure 1) display enormous number of biological activities [3]. e presence of chalcone derivatives in the biologically active compounds as a substituent, or main component, has prompted synthetic organic chemists to synthesize novel compounds bearing this moiety. Current research shows that molecules containing more than one pharmacophore are useful for the treatment of cancer [4,5]. e biological activities of chalcones due to the presence of double bonds in conjunction with carbonyl functionality, as removal of this functionality, make them inactive [6].
In the modern drug design and discovery, quantitative structure-activity relationship (QSAR) methodology generally plays a crucial role [7]. QSAR studies are performed by correlation structural descriptors from a set of chemicals to their known biological activities and predict the activity values of nonexamined compounds lying in the applicability domain of the model [8,9]. 3D-QSAR studies are successfully applied to guide the development and design of new novel molecules with an improved activity [10]. Steric and electrostatic fields have been used as an independent variable against the biological activity of compounds [11]. One of the types of 3D-QSAR technique is the CoMFA method. is method has an extension called CoMSIA with a difference only in the implementation of the fields [12,13]. e correlation between the 3D structures of the studied molecules and their activities can be better interpreted relying on the calculation of hydrophobic, H-bond donor, and H-bond acceptor similarity fields, as well as the steric and electrostatic fields in the CoMSIA approach. e objective behind this work is developing some 3D-QSAR techniques. is development includes CoMFA and CoMSIA methods for modeling and prediction of the anticancer activities for some chalcone analogues. e results of this study serve not only anticipating the inhibitory activities of untested chemicals but also designing more active ones.

Computer Simulations.
e study done in these research procedures relies on two main components ( Figure 2): data preparation and ligand-based study (3D-QSAR model).

Data Set.
A database of 19 compounds consisting of chalcone derivatives as anticancer agents against HTC116 (human colon cancer) is collected from the literature [14]. All compounds are chemically synthesized and experimentally evaluated in biochemical assays by Wang et al. [14]. e selection of the compounds used for our 3D-QSAR study was done by considering the fact that compounds represent structural diversity and a range of biological activities, as well as the chemistry of chalcones remains a fascination among researchers in the 21 st century due to the large number of replaceable hydrogen molecules that allows a large number of derivatives and a variety of promising biological activities to be generated [15]. e structures and biological activities of both training and test sets are given in Table 1 and Figure 3. For the 3D-QSAR study, the reported IC 50 values of compounds have been converted into corresponding pIC 50 (−log IC 50 ) and subsequently used as the dependent variable for 3D-QSAR model development [16]. e total set of molecules has been randomly split into a training set (14 compounds) for generating the QSAR model and a test set (5 compounds) for validating the predictive ability of the model. e selection of training and test set molecules has been carried out manually such that the test set contains a range of low-, moderate-, and high-activity compounds similar to the training set.

Molecular Modeling.
e molecular modeling studies of 19 compounds have been performed utilizing SYBYL-X 2.0 software package (Tripos Inc., St. Louis, USA), working on Windows 7.0 and 64 bit workstation. After the construction of the 3D structures of the molecules, energy minimization has been performed using the Tripos force field [17], with Powell conjugate gradient methods under the conditions of a gradient convergence of 0.01 kcal/(Å mol) in Gasteiger-Huckel charge [18], using 5000 iterations.

Molecular Alignment.
e data set has been aligned on the common core by distill rigid alignment technique available in SYBYL [19], using the most active compound (compound N°17) as the template (Figure 4). e structural alignment of the molecules is the most critical routine in building a reliable 3D-QSAR model. e accuracy of the prediction and the statistic quality of 3D-QSAR models depend strongly on the alignment [20]. Figure 4 describes the proposed alignment and common substructure for the alignment.

CoMFA and CoMSIA Studies.
In the generation of CoMFA models [21], the aligned molecules have been imported in a 3D cubic lattice generated automatically with a grid spacing of 2Å. In the CoMFA studies, Tripos force field takes a sp3 carbon probe atom with a Van der Waals radius of 1.52Å and positive charge +1 to generate steric (Lennard-Jones 6-12 potential) field energies and electrostatic (Coulombic potential) fields with a distance-dependent dielectric at each lattice point. To reduce noise and speed up the calculation of potentials, the column filtering value has been set to 2.0 Kcal/mol (default parameters).    In CoMSIA, steric (S), electrostatic (E), hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A) descriptors have been calculated using the sp3 atom with charge +1 and radius 1Å to develop different CoMSIA models. Attenuation factor α has been set by default at 0.3 [22]. e statistical evaluation of CoMSIA and CoMFA has been performed in the same way. [23], together with cross-validation test, has been used for the generation and internal validation of the CoMFA and CoMSIA models. PLS is a powerful tool to derive multilinear relationships between independent and dependent variables. e self-consistency of models generated by PLS can be then evaluated by cross-validation test. In the current study, CoMFA and CoMSIA descriptors (explanatory properties) are used as independent variables, and pIC 50 (target properties) is used as a dependent variable. e first run of PLS analysis has been performed using the full cross-validated leave-one-out (LOO) method, which gave the cross-validated correction coefficient (Q 2 ) and the optimum number of components. In this step, sample-distance partial least square (SAMPLS) method has been used to increase the speed of cross-validation calculation [24]. In the second run of PLS, the models have been calculated by the non-cross-validation method with the optimum number of components. e models in this step have been accessed by statistical indices such as squared correlation coefficient (R 2 ), standard error of estimation (SEE), and F-test value [25].

Validation of the Models.
In order to be reliable and predictive, 3D-QSAR models should (1) be statistically significant and robust, (2) be validated by making accurate predictions for external data sets that have not been used in the model development, and (3) have their application boundaries defined. In this study, five compounds have been used as a testing set [26]. e molecules used as test data are also aligned by the same methods, before determining their predictive anticancer activities, using CoMFA and CoMSIA models established by the training set. e coefficient of determination for the test set (R 2 test ) is calculated by the following equation [27]: In the above equation, Y iObs(test) and Y ipred(test) indicate observed and predicted values, respectively, of the test set compounds, and Y i training indicates the mean activity value of the training set.

Y-Randomization Test.
e generated model was further validated by the Y-randomization method [28]. e activities of the studied molecules (pIC 50 ) are randomly shuffled many times, and after every iteration, a new QSAR model is developed. e new QSAR models are expected to have lower Q 2 and R 2 values than those of the original model. is technique is performed to eliminate the possibility of chance correlation. If higher values of Q 2 and R 2 are obtained, it means that an acceptable 3D-QSAR cannot be generated for this data set because of structural redundancy and chance correlation.

Model Acceptability Criteria. According to Alexander
Tropsha and Alexander Golbraikh, if methods give a Q 2 value >0.5 and R 2 > 0.6, models are termed as acceptable [29]. Journal of Chemistry

Lipinski's Rule and ADMET Prediction.
Lipinski's rule and ADMET [30] parameters of the newly designed compounds and the most potent inhibitor in the dataset (M 17 ) were calculated using pkCSM [31] and SwissADME [32] web servers. Lipinski's rule including log P, number of hydrogenbond acceptors, number of hydrogen-bond donors, number of rotatable bonds, and molecular weight was determinate.
Molecules violating more than one of these parameters may have problems with bioavailability and a high probability of failure to display drug-likeness [33].  Table 4. Graphs of observed versus predicted biological activity for the training set and test set molecules exhibited satisfactory linear correlation, as shown in Figure 5. erefore, the constructed CoMFA and CoMSIA models are robust and can be used to design new compounds and predict the activities of new compounds in the future.

Analysis of CoMFA and CoMSIA Contour Maps.
e greatest advantage of CoMFA and CoMSIA methodologies is that the field effect on the target property can be viewed with the contour map. e contour maps of CoMFA and CoMSIA models provide information that can be used to identify important regions in the 3D space around the molecules where any change in the steric and electrostatic fields may significantly affect the biological activity. erefore, the contour maps around compound 17 (Figure 6) that has the highest activity value are presented in pictures. Figure 7 shows the steric as well as electrostatic contours of the CoMFA model. Green and yellow contours denote the steric interactions. e red and blue contours denote the electrostatic ones. e steric and electrostatic fields have fractions of 71.7% and 28.3%, respectively. e steric and electrostatic contour maps of the CoMFA study can provide information to alert which area can increase or decrease biological activity. In steric contour maps, the green contour indicated the region where bulky groups favored activity, and the yellow contour indicated the region where bulky groups have been unfavorable to activity. For electrostatic fields, blue contours indicate regions where electron-donating groups increase activity, while red contours indicate regions where electron-withdrawing groups increase activity.   Figure 7, respectively. In the CoMFA steric contour map (Figure 7(a)), two green contour maps are located near position 5 in the benzene ring, of compound 17, suggesting that bulky groups at this position would be favorable to increase the inhibitory activity, whereas two large yellow regions near the 2, 3, and 4 positions in the benzene ring of the same compound suggest that a bulky group at this position could decrease the activity.

CoMFA Contour Map.
Based on these observations, it is possible to explain why the activity of compounds 13 (pIC 50 � 6.00), 14 (pIC 50 � 6. 13), 16 (pIC 50 � 6.02), and 17 (pIC 50 � 6.22) are higher than that of compounds 8 (pIC 50 � 5.77) and 9 (pIC 50 � 5.43). However, these compounds have less steric groups (Cl, Br, and CH 3 ) in the 2, 3, and 4 positions of the benzene ring. Compounds 9 and 10, with low activities, have the bulk groups (OCH 3 ) in the same position of the benzene ring. On the contrary, compound 1 (pIC 50 � 5.89) with less steric group (Cl) in position 2 of the benzene ring is more active than compound 6 (pIC 50 � 5.72), with bulk methyl group in the same position.
In the CoMFA electrostatic contour map (Figure 7(b)), the blue contour near position 3 on the benzene ring indicates the region where the generation of more negative electrostatic potential decreases the anticancer activity (pIC 50 ). is can explain why compound 9 (pIC 50 � 5.43) with methoxy in position 3 is less active than compound 4 (pIC 50 � 5.80) with Br and 7 (pIC 50 � 5.84) with methyl in the same position 3. Compound 14 (pIC 50 � 6.13) with the Cl group in position 3 of the benzene ring is less active than compound 17 (pIC 50 � 6. 22) with the methyl group in the same position. e location of a red contour map near position 2 of the benzene ring suggests that electronegative groups at this position will increase the anticancer activity. is may explain why the activity of compounds 1 (pIC 50 � 5.89) and 3 (pIC 50 � 5.77), with Cl and Br groups in position 2, respectively, is greater than that of compound 7 (pIC 50 � 5.72) with the methyl group in the same position.
According to the preceding analysis, the presence of electropositive groups in position 3 of the benzene ring is favorable for increased activity.

CoMSIA Contour Map.
In the CoMSIA model, three different fields named as steric, hydrophobic, and hydrogenbond acceptor fields have been analyzed. e three contour maps are displayed in Figure 8. CoMSIA steric (a), hydrophobic (b), and hydrogen-bond acceptor (c) fields give the contribution accounting for 0.211, 0.317, and 0.472, respectively.  e steric contour maps in the CoMSIA model (Figure 8(a)) are more or less similar to those in the CoMFA model discussed in the CoMFA section. erefore, the following discussion will focus precisely on the hydrophobic field and hydrogen-bond acceptor field. Figure 8(b) shows the CoMSIA model for hydrophobic and hydrophilic groups which have been illustrated as yellow and gray contour maps, respectively. e yellow contour covering positions 3 and 4 of the benzene ring indicates that the presence of hydrophobic groups at these positions is favorable for increasing the activity, while a gray contour map covering position 2 of the benzene ring in compound 17 indicates that hydrophilic groups are favored for increasing the activity in this region. is can be interpreted by the fact that compounds 1 (pIC 50 � 5.89), 3 (pIC 50 � 5.77), and 13 (pIC 50 � 6.13) with huge hydrophilic group (Cl or Br) in position 2 of the benzene ring show higher activity than compound 6 (pIC 50 � 5.72) with the methyl group at the same position.
On the contrary, compound 19 (pIC 50 � 5.91) with the methyl group in position 4 of the benzene ring shows higher activity than compound 10 (pIC 50 � 5.43) with the methoxy group in the same position of benzene. Compound 8 (pIC 50 � 5.77) with the hydrophobic group in position 4 shows higher activity than compounds 2 (pIC 50 � 5.65), 5 (pIC 50 � 5.65), and 15 (pIC 50 � 5.70) with the hydrophilic group (Br or Cl) in the same position of the benzene ring.   For the H-bond acceptor contour maps (Figure 8(c)), the large red region converting position 2 of the benzene ring and sulfone group indicates that the presence of the hydrogen-bond acceptor is unfavored for improving the activity in this region. Conversely, the presence of H-bond donor groups in this region would increase the activity.
is can be interpreted by the fact that compounds 13 (pIC 50 � 6.00) and 16 (pIC 50 � 6.02) with sulfone as the H-bond donor group show higher activity than compounds 1 (pIC 50 � 5.89) and 6 (pIC 50 � 5.72) with no sulfonic group.
According to the above analysis of CoMFA and CoMSIA contour maps, the deduced optimization strategy for improving the inhibitory activities of the compounds against HTC116 (human colon cancer) is summarized and presented in Figure 9. (Table 5) 3.4. Y-Randomization Test. Y-randomization method was carried out to validate the 3D-QSAR model. Several random scrambles of the dependent variable (pIC 50 ) were conducted; then, after every shuffle, a 3D-QSAR was developed, and the obtained results are shown in Table 6. e low Q 2 and R 2 values obtained after every shuffle indicate that the good result in our original 3D-QSAR model is not due to a chance correlation of the training set.

Design for New Chalcone as Anticancer Agents.
e foregoing study is aimed at designing of novel molecules as anticancer agents by extracting different structural features from the proposed 3D-QSAR (CoMSIA) as the best model in this study and by exploring the main structure-activity relationships derived from this study (Figure 9). It is observed that steric and H-bond acceptor play an important role in the activity of studied compounds. is study proposes nine new molecules, which are optimized and aligned in the same way as those of the dataset compounds. e chemical structure and predicted activity values of new compounds are shown in Table 7. e newly designed compounds show good predicted activity compared with the most potent compound (molecule 17) in the dataset.

Lipinski's Rule and ADMET Prediction.
Lipinski's rule and ADMET parameters of the newly designed compounds and the most potent inhibitor in the dataset (M17) were calculated using pkCSM and SwissADME web servers. Lipinski's rule including log P, number of hydrogen-bond acceptors, number hydrogen-bond donors, number of rotatable bonds, and molecular weight is shown in Table 8.
All the compounds respect the conditions mentioned in Lipinski's rule, except two compounds 4 and 8 with two Lipinski violations. e ADMET prediction was used in this study to calculate the pharmacokinetic parameters (PK) of compounds (Table 9).
e results of ADMET analysis show that all the newly designed compounds have low values for the blood-brain partition coefficient, indicating that they will have a very low potential to cross the blood-brain barrier thereby eliminating the possibility of CNS-related toxicity. Additionally, other pharmacokinetic parameters such as

Conclusions
e construction of the 3D-QSAR model has been the objective behind the collection, optimization, and calculation of a series of chalcone derivatives acting as anticancer agents against HTC116 (human colon cancer). us, the internal and external predictive powers of the obtained CoMFA and CoMSIA models are satisfactory (Q 2 : 0.608, 0.806, respectively, and R 2 : 0.960, 0.934, respectively). e correlation between different fields and activity is well previewed by the contour maps of CoMFA and CoMSIA models. is has been a reference for the design of new compounds.
us, compounds that obtained key structural factors have been used to design ten inhibitors by modifying the chalcone scaffold. e overall result of this generated work gives the possibility to design and synthesize new chalcone derivatives with potential inhibitory activity to treat HTC116 (human colon cancer) and then obtain a better result in terms of biological activity, in silico ADME, and toxicity.

Data Availability
All the data are included within the manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Journal of Chemistry 11