Cytotoxic Evaluation, Molecular Docking, and 2D-QSAR Studies of Dihydropyrimidinone Derivatives as Potential Anticancer Agents

The diverse pharmacological role of dihydropyrimidinone scaffold has made it to be an interesting drug target. Because of the high incidence and mortality rate of breast cancer, there is a dire need of discovering new pharmacotherapeutic agents in managing this disease. A series of twenty-two derivatives of 6-(chloromethyl)-4-(4-hydroxyphenyl)-2-oxo-1,2,3,4-tetrahydropyrimidine-5-carboxylate (3a-3k) and ethyl 6-(chloromethyl)-4-(2-hydroxyphenyl)-2-oxo-1,2,3,4-tetrahydropyrimidine-5-carboxylate (4a-4k) synthesized in a previous study were evaluated for their anticancer potential against breast cancer cell line. Molecular docking studies were performed to analyze the binding mode and interaction pattern of these compounds against nine breast cancer target proteins. The in vitro cell proliferation assay was performed against the breast cancer cell line MCF-7. The structure activity relationship of these compounds was further studied using QSARINS. Among nine proteins, the docking analysis revealed efficient binding of compounds 4f, 4e, 3e, 4g, and 4h against all target proteins. The in vitro cytotoxic assay revealed significant anticancer activity of compound 4f having IC50 of 2.15 μM. The compounds 4e, 3e, 4g, and 4h also showed anticancer activities with IC50 of 2.401, 2.41, 2.47 and 2.33 μM, respectively. The standard tamoxifen showed IC50 1.88 μM. The 2D qualitative structure-activity relationship (QSAR) analysis was also carried out to identify potential breast cancer targets through QSARINS. The final QSAR equation revealed good predictivity and statistical validation R2 and Q2 values for the model obtained from QSARINS was 0.98 and 0.97, respectively. The active compounds showed very good anticancer activities, and the binding analysis has revealed stable hydrogen bonding of these compounds with the target proteins. Moreover, the QSAR analysis has predicted useful information on the structural requirement of these compounds as anticancer agents with the importance of topological and autocorrelated descriptors in effecting the cancer activities.


Introduction
Many of the pharmacologically active natural and synthetic compounds are composed of the heterocyclic nucleus. The derivatives of these agents containing nitrogen, oxygen, and sulphur atoms act as an important scaffold in drug designing. They are also an integral part of nucleic acid base pairs DNA and RNA such as purines and pyrimidines [1].
Most of the alkaloids isolated from marine sources showed significant pharmacological properties which consisted of dihydropyrimidine nucleus. Batzelladine alkaloids A and B are one of these alkaloids isolated from marine sources and act as potent inhibitors of HIV gp-120-CD4. This extended their application in pharmaceutical industry after the identification of another novel cell permeable molecule, 4-(3-hydroxyphenyl)-2-thione derivative, also called manostrol, as anticancer agent. The anticancer activity of manostrol depends on a new mechanism of affecting cell division by specific and reversible inhibition of mitotic kinesis motility without targeting tubulin [2]. The inhibitory action has shown to be on human kinesin Eg5 which causes mitotic arrest followed by apoptosis. This motor protein causes mitotic spindle formation. Other possible targets of these moieties have also been studied including centrin, calcium channels, and topoisomerase I [3]. Analogs of manostrol such as oxomonastrol, thio, and 3,4-methylenedoxy derivatives were developed, and their activity against HT-29 cancer cell lines were tested. Various other synthetic analogs L-771,688 and SQ 32926 have also been developed [1].
Since pyrimidine derivatives shows significant pharmacological activities and are essential constituents of living nature. Biginelli compounds have gained interest since last two decades because of their structural similarity with the clinically active dihydropyrimidine. These compounds are called as the esters of 6-methyl-2-oxo-4phenyl-1,2,3,4-tetrahydropyrimidine-5-carboxylic acid and were first synthesized by Pietro Biginelli by the condensation reaction of β-ketoesters, aryl aldehydes, and urea under acidic condition through one pot three component synthesis [4].
Manostrol is one of the most studied Biginelli adducts because of its promising anticancer activities providing an inspiration for the design of new compounds. Several manostrol analogs have shown potent anticancer activities against MCF-7 breast cancer cell lines. Globally, breast cancer has been diagnosed as the most commonly diagnosed malignancy having the highest incidence rate of mortality in women [5]. The progression of breast cancer is associated with several factors such as age, personal history of breast cancer, reproductive, environmental, and genetic factors. Prognostic factors can be used to predict the course and clinical outcome of breast cancer. These include ER, PR, Ki-67, and HER-2. Other factors that can be used to predict prognosis include cyclin E, cyclin D1, and cathepsin D but are not measured routinely. The status of progesterone receptor, estrogen receptor, and the human epidermal growth factor receptor 2 basically determines the scheme for the treatment of breast cancer along with the clinicopathological factors such as tumor grade, size, and status of lymph node [6].
The synthesis of derivatives of different scaffold having pharmacological importance has helped us in determining the biological activities of compounds that can further be screened for disease management [7][8][9][10][11].
In silico drug designing is a form of computer-based modeling and is a rapidly developing field. The development of in silico target identifications of drugs with the strategy of fast speed and low cost is receiving a huge attention worldwide because of the limitation of throughput, accuracy and cost, experimental techniques that cannot be applied widely [12]. Major roles of in silico approaches in drug discovery processes include virtual screening, in silico ADME/T prediction and advanced methods for determining proteinligand binding and quantitative structure-based drug design.
The in silico quantitative structure activity relationship (QSAR) is another approach used to find out a statistical correlation between the structure and function with the help of chemometric technique. The structure represents the substituents, properties, or descriptors of the molecules and their interaction energy fields, while the function refers to a biological and experimental outcome [13]. The chemometric procedures in QSAR refer to MLR, PLS, PCR, PCA, GA, etc. Several tools are available for the prediction of QSAR models that perform specific QSAR steps such as modelling, validation of statistics, and the descriptor generation [14]. The Open3DQSAR or PyCoMFA generates the CoMFAlike models while CORAL, a freeware software, uses a specific set of descriptors (SMILE based) to generate the QSAR PLS model [15,16]. Another standalone freeware QSAR tool is the QSAIRNS that can help in building the QSAR MLR having the ability of model validation, data partitioning, predicting a new activity of compound, and determination of applicability [17]. Ezqsar and camb are another R-packagebased tools that are available openly. They are basically used for beginners that utilize a single function to do the entire job [18].
In view of finding new potential leads with effective chemotherapeutic activities, about twenty two derivatives of 6-(chloromethyl)-4-(4-hydroxyphenyl)-2-oxo-1,2,3,4-tetrahydropyrimidine-5-carboxylate (3a-3k) and ethyl 6-(chloromethyl)-4-(2-hydroxyphenyl)-2-oxo-1,2,3,4-tetrahydropyrimidine-5-carboxylate (4a-4k) were synthesized in a previous study [19] (Figure 1). A neat reaction of urea, 4-choloroethylacetoacetate, and substituted benzaldehyde were refluxed for 1 h to obtain 6-chloromethyl-DHPMs. The resulting compounds were further reacted with a series of benzyl amine derivatives in methanol. The crystals were recrystallized using ethanol. The compounds were then characterized using FT-IR, 1 H NMR, and 13 C NMR. The structures of the synthesized compounds are shown in Figure 2. The spectral analysis of these compounds is mentioned in supplementary Table 1. The compounds were screened for their anticancer activities. The anticancer activities of these synthesized compounds were evaluated against the breast cancer target proteins identified through system biology approach [20]. The system biology approach has helped in identifying several gene targets in better management of diseases [21]. The in silico molecular docking studies of these synthesized compounds were performed to screen for the 2 Journal of Oncology best targets for these compounds. Furthermore, the in vitro efficacy of these compounds against breast cancer cell line MCF-7 was also performed to understand their antitumor effects. The in silico 2D-QSAR analysis was done to evaluate the structure activity relationship of synthesized compounds by QSARINS [22]. This was done to analyze the predicitivity and stability of models and the role of essential descriptors generated from both models.  3 Journal of Oncology acceptor (HBA), hydrogen bond acceptor (HBA), LogP, molecular volume (A3), polarizibility, and drug likeness. Moreover, the ADMET properties were also evaluated using online pkCSM tool. The tool is used to predict the pharmacokinetics, drug likeness, and medicinal chemistry aspect of small molecules. The compounds having molecular weight < 500 g/mol, hydrogen bond donor < 5, hydrogen bond acceptor < 10, and number of rotatable bonds < 10 are drug likable compounds. The server also helps in identifying the absorption parameters such as the water solubility and intestinal absorption as well as skin permeation. The distribution properties such as blood brain barrier permeation and CNS permeation were also calculated. The total renal clearance and the toxicity profiling including Ames test, hepatotoxicity, and skin sensitivity was also evaluated. The ligand and lipophilic ligand efficiency (LE and LLE) as well as lipophilicity-corrected ligand efficiency (LELP) values were predicted using Data Warrior tool [23].

Molecular Docking Studies.
PyRx docking software, an open source software, was used to identify the best target proteins for the proposed compounds [24]. Several libraries of compounds can be screened for potential target identification using PyRx, starting from job preparation to submission and analysis of results. PyRx is an easy to use and a valuable tool for Computer-Aided Drug Design and has a docking wizard AutoDock Vina. The visual analysis of results in PyRx is based on embedded Python Molecular Viewer (ePMV), and the results are stored in a built-in SQLite database.

Selection of Breast Target
Proteins. The target proteins identified through system biology approach were used in order to study the protein-ligand interaction of these proteins with the synthesized compounds [20]. The differentially expressed breast cancer genes were identified through extensive data mapping, and functional enrichment analysis was performed to screen the differentially expressed genes between breast tumor cells and treated tissues. Moreover, the interactions of these genes with several other proteins involved in breast cancer progression were studied. The shortlisted genes showed essential role in the progression of breast cancer. All the source proteins and the target proteins were shortlisted in order to identify the best target for these compounds. These proteins include ESR, PR, BRCA1, BRCA2, AKR1C2, HER2, CTNNB1, PLAUR, and RHEB.  Table 1. All the proteins obtained from Protein Data Bank contained water molecules and the original ligands. For the preparation of protein structures, cocrystallized ligand and any water molecules that were present were removed using MGL Tools-1.5.6, nonpolar hydrogen bonds merged, AD4.2 type and Gasteiger charges were assigned, and proteins were saved in .pdbqt format.
2.5. Active Site Prediction. DOGSITESCORER was used to identify the active sites of the proteins from the 3D coordinates of the receptor. DOGSITESCORER is an automated tool for pocket prediction based on 3D structure of protein and calculates the druggability of protein cavities [34]. For the prediction of druggability of pockets, the supervised machine learning technique (SVM) is utilized that predicts the potential pocket and describes them through descriptors. The site provides a druggability score between 0 and 1 showing the higher the score, the more the pocket is druggable. PyMOL was used to visualize the active site of target proteins and the residues involved [35].
2.6. Preparation of Ligand. The structure of ligands was drawn using ChemBioDraw Ultra 14.0, and energy was minimized using MM2 with the help of ChemBio3D Ultra 14.0. The structures were saved in PDB format for AutoDock compatibility. The ligand.pdb files were converted to ligand.pdbqt format using MGL Tools-1.5.6 (The Scripps Research Institute).

AutoDock
Run. The protein ligand binding was analyzed with the help of PyRx tool linked with AutoDock Vina in order to find the correct conformation and configuration of the ligands having the minimum energy structure. The grid centers were positioned on the active binding sites of both proteins, and the docked complexes were examined on the basis of their binding affinities (kcal/mol) and interaction patterns.
2.8. Analysis of Binding Affinity. The boxplot function in R-4.0.2 package was used to perform the scoring analysis of each protein with the synthesized compounds [36]. For interaction analysis, the Discovery Studio Visualizer Software, Version 4.0 (http://www.accelrys.com) was used to study the binding modes of synthesized compounds with the target proteins.

In Vitro Breast Cancer Activities of Synthesized
Compounds. The anticancer activity of the synthesized compounds was determined against human breast (MCF-7) cancer cell line. The MCF-7 (ATCC® HTB-22™) cell lines were gifted by Dr Syed Shahzad ul Hussan from Lahore University of Management Sciences (LUMS). The cells were cryopreserved at -196°C. The cells were grown in RPMI (Roswell Park Memorial Institute Medium) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin purchased from Gibco, USA. The cultures were maintained in 5% CO 2 atmosphere and a humidified incubator at 37°C. The different concentrations of synthesized compounds were used to assess the anticancer activity. 3-(4,5-Dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide (MTT) (Sigma) assay was used as described by Mosmann with a slight modification of 72 h of incubation [37]. A spectrophotometer at 520 nm was used to read the assay plates. A dose-response curve was plotted from the data generated to evaluate the concentration of tested compounds  2.11. Molecular Descriptor Generations. The PaDEL descriptor software was used to generate the quantum molecular descriptors and to calculate the additional energy, where a total of 1875 descriptors were calculated. The use of all the available descriptors would be, however, difficult to calculate the models; hence, few descriptors per model were used to reduce the computation time and to explore all the combinations with the help of all subset technique. The model generation was run for up to 8 variables to see the effect of addition of new descriptor on the quality of model.

Data
Division. The datasets were divided in a 4 : 1 ratio having both training sets and test sets. The training set constituted of 70% while the test set is 30% of the data according to the Kennard-Stone algorithm method.

Model
Building and Validation. The genetic algorithm (GA) technique was employed in which the most appropriate descriptors were selected to develop models based on large number of descriptors. The MLR model was obtained by the ordinary least squares (OLS) algorithm [22]. Twenty models were generated using up to 8 different descriptors, and the best model was shortlisted according to the lowest lack of fit (LOF) value.
2.14. Internal Validation. The validation of model was done by OECD principle which states that the model should have a definite endpoint, a clear applicability domain, an ambiguous algorithm, appropriate measure of robustness and predictivity, and a systematic explanation [38].

Cross Validation.
For cross validation (CV), the Q 2 LOO criteria were employed by iteratively removing from the dataset one compound while calculating the model with the rest of the compounds. The following parameters were considered to assess the quality of model: R 2 : highest value corresponds to the quality of the model, Q 2 LOO : highest values should be equal to R 2 , R 2 -Q 2 LOO : lower value indicates the stability of model, RMSE: value is low and close to training dataset, and other prediction methods.
Another method was used for cross validation, i.e., Leaving Many Out (LMO) allowing the study of compounds by excluding a large number of compounds. The stability of model was based on calculated values of R 2 and Q 2   Table 2 shows the drug-likeness properties of synthesized compounds justifying a strong correlation with the standard values.

Lead Optimization.
Further drug-likeness properties of all compounds such as ligand efficiency (LE), lipophilic ligand efficiency (LLE), and lipophilic-corrected ligand efficiency (LELP) values were predicted. The lipophilicity is considered to be a basic parameter to enhance structure efficiency making it from lead to drug candidate. ThecLogP, LE, LLE, and LELP of all compounds showed comparable results with that of standard values forLE > 0:30 kcal/mol/ HA,LLE > 0:5 kcal/mol,LELP − 10 < to <10, andcLogP < 3.
All the synthetic compounds showed to have none mutagenic and irritant behavior (Table 4).

Molecular Docking.
The molecular docking studies of synthesized compounds against nine target proteins were performed to analyze the best target for these compounds based on docking scores. The boxplot was generated to present the docking scores of all target proteins. Figure 3 shows the boxplot of all synthesized compounds on the basis of their interactions with all target proteins. In case of protein A (CTNNB1) according to the median value, the compound 4f is having the lowest median score of -11.7 with 80% of data in lower quartile and 20% in upper quartile. The compounds 4h and 4e showed the lowest median score of -10.4 and -10.3, respectively, with equal dis-tribution of data. The compound 4k showed the median score of -10.3 with 75% of data in lower quartile and 25% in upper quartile. The compounds 3e and 3f showed the median score of -10.1 and -9.8 with 60% of data in lower quartile and 40% in upper quartile. The compounds 4g, 4i, 3g, 4j, and 4h showed median score in the range of -9.8 to -9 kcal/mol. In protein B (BRCA1), the compound 4h showed the median score of -9.3 with 90% of data in upper quartile, 4e showed -8.5 median score with equal distribution, and 4f showed median score of -9.2 with 90% of data in upper quartile; 4k and 3e showed -8.8 with equal distribution and -8.6 with 90% in upper quartile, 3f had median score of -8.5 with 80% in lower quartile and 20% in upper quartile, 4g showed -9.8, and 4i had a score of -8.4 with 90% in upper quartile. The protein C (BRCA2) also showed a similar pattern but with the median score in the range of -8.9 to -6.6. The compound 4f showed the lowest median score of -8.3 with 80% of data in lower quartile, 4e showed -8.7, and 4h showed -8.9. Similarly, the compounds 3f and 3g had the lowest median score of -8.4 and 3e -8.3 with equal distribution.
The protein D (AKR1C2) showed the median score in the range of -8.5 to -5.8 with low range and varying distribution. The binding affinities for AKR1C2 were less when compared to proteins A, B, and C. The protein E (IGFR1) had the median score ranging between -9.2 and 7 with binding affinities better than protein D and high range. In protein G (RHEB), all the compounds had median score in the range of -7.8 to -6 kcal/mol with varying distribution       14 Journal of Oncology and high range. Similarly, the protein F (HER2) showed the lowest median score in the range of -8.8 to -6.6 with varying distribution. The protein H (PLAUR) showed the median score of -9.2 to -6.8 kcal/mol. A highest median range was observed in protein H with varying distribution. Moreover, the protein I (PR) showed the median score of -7.8 to -5.1 kcal/mol with low median range and varying distribution of data. In all the proteins, the compounds 4f and 4h showed the lowest binding scores. The docking of ligands into the active binding site of CTNNB1 showed the lowest binding scores.

Interaction
Analysis with Target Proteins. The protein ligand interaction analysis was performed to study the interaction patterns of ligands with different proteins in order to find the common binding sites in proteins subjecting to new functional roles. Figure 4 shows the binding mode of active compounds 4e, 4f, 4g, and 4h and standard against target proteins CTNNB1 (Figures 4(a) Figure 4 shows the residue interactions of active compounds 4e, 4f, 4g, and 4h with the protein CTNNB1 (ProtA). These compounds showed the lowest binding scores of -10.3, -11.7, -9.8, and -10.4 kcal/mol, respectively. The interaction analysis revealed stable hydrogen bond interactions of compound 4i with ASP199, while compound 4j showed two stable hydrogen bond interactions with LEU177 and GLU176. The standard tamoxifen showed pi-alkyl with PRO100, ALA138, LEU137, LYS199, and ALA134 and amide-pi stacked interactions with VAL197 (Figures 4(a)-4(e)).
3.6. Anticancer Activity. In this study, the in vitro anticancer activity of 22 derivatives of synthesized compounds was determined against the human breast (MCF-7) cancer cell lines with the help of MTT assay ( Table 6). The results revealed that the compounds having p-hydroxyl group of benzaldehyde (2) showed excellent anticancer activities when compared to standard against the breast cancer cell line. The compounds that showed more than 50% of inhibition were considered active. The compound 4f showed 85% inhibition of cells with an IC 50 of 2.19 at 200 μM concentration. The standard tamoxifen showed IC 50 of 1.88 μM. The compounds 4e and 4 g showed 82% inhibition with an IC 50 of 2.401 and 2.47, respectively. The compound 4h also showed 80% inhibition of cells with IC 50 of 2.33. The % inhibition of compounds 3e and 3f was 79.4 and 77.2% with IC 50 of 2.41 μM. The compounds 4k, 4i, and 4j showed up to 75% inhibition with IC 50 of 2.40, 2.699, and 2.88, respectively. The compounds 3h, 3i, 3j, and 3k showed approximately 55% inhibition at the same concentration, while the compounds 3a, 3b, 3c, 3d, 4a, 4b, 4c, and 4d showed less than 50% of inhibition ( Figure 6).   correlation were eliminated. About 1058 variables were excluded from the study based on all subset method. Several models were developed having good correlation with the response and a low multicollinearity between descriptors. The genetic algorithm-multiple linear regression (GA-MLR) method provided 4 descriptors which were then used for calculating the anticancer activities of the compounds.
The average values of R 2 and Q 2 LOO (with their standard deviation) were plotted to evaluate the model performances versus the size of the developed models. It also revealed whether any overfitting in the models exists (Figure 7). The plot showed that by adding a new descriptor, the values of R 2 and Q 2 LOO increased. The model with four variables was selected based on the lowest LOF value to predict the anticancer activities.
The best MLR model equation obtained is shown below.  Table 7 shows the experimental IC 50 and the results predicted by MLR model for training set. Table 8 shows the Pearson correlation matrix which describes that a low value in coefficient (<0.7) between each pair of descriptor shows no significant multicollinearity among descriptors in the developed model. The internal validation of the model that is the scatter plot, scatter plot by LOO, scatter plot by LMO, and y-scrambling predicted the reliability of the model as shown in Figure 8. The applicability domain also defined the reliability of the model (Figure 9).

Discussion
Breast cancer pathogenesis and progression has been studied extensively with the discovery of several agents that have proved potential in the management of this disease. However, till date, the incidence rate of breast cancer is still significant and requires further strategies to combat the mortality and morbidity rate. This study uses the computational technology to identify the breast cancer targets for the synthesized compounds that can have potential role as breast cancer activities.
The in silico ADMET and lead optimization studies revealed all the compounds to be nonmutagenic and noncarcinogenic having drug-like properties. The results depicted compounds may act as therapeutically active against target proteins. All the synthesized compounds also followed the Lipinski Rule of 5 having HBA < 10 and HBD < 5, LogP < 500 g/mol. The increase number of HBA and HBD results in poor permeation. The molecular docking analysis was performed to analyze the binding of synthesized compounds with the identified target proteins. In the protein-ligand  The interaction analysis revealed stable hydrogen bond interactions of compound 4i with ASP199, while compound 4j showed two stable hydrogen bond interactions with LEU177 and GLU176. The standard tamoxifen showed pialkyl with PRO100, ALA138, LEU137, LYS199, and ALA134 and amide-pi stacked interactions with VAL197. The energy scores revealed efficient binding of these compounds with the target proteins. All the other proteins also showed efficient binding and interaction pattern, and the common amino acid residues involved in interaction are mentioned in Table 5.
The breast cancer activities of all the synthesized compounds were performed against the cell line MCF-7. The MCF-7 cell line is considered estrogen receptor-(ER-) positive and progesterone receptor-(PR-) positive expressing high level of Erα transcripts [40,41]. The epidermal growth factor receptor (EGFR) and the human epidermal growth factor receptor-2 (HER2) are also present in MCF-7 cells [40]. The MCF-7 cells are also positive for β-catenin [42]. Due to the expression of these proteins by MCF-7 cell line, it was used to analyze the role of synthesized compounds as cytotoxic agents. It was observed that the activities of compounds 4f, 4h, and 4e were greater than all the compounds and were due to the -F, NO 2 , and -Br aniline groups with fluorine having the most potent activity due to its high electronegative nature. By replacing the groups with benzylamine (3a and 4a), -Br benzylamine (3c and 4c), and -F benzylamine (3d and 4d), the activity dropped significantly suggesting the more cytotoxic activities of aniline derivatives when compared to benzylamine derivatives. The compounds 3e, 3f, and 3g also showed better activities due to the aniline nature of compounds with -NO 2 group of 3h showing the least activity. The benzimidazole moiety of compounds 3g and 4 g also showed effective nature of this molecule. The compounds 4k, 4i, and 4j also showed good activities having the anisidine moieties. The ortho anisidine showed more % inhibition than para and meta. This study was carried due to the existence of several evidences on the antiproliferative activities of dihydropyrimidinones by scientists. In a similar study, about 22 manostrol analogs were synthesized by Matias and coworkers and studied for their antiproliferative activities against five different cancer cell lines. Their result also showed stronger antiproliferative activities of their compounds against MCF-7 cancer cell line with compounds having chlorine moiety displaying significant effects on the proliferation of hepatic (HepaRG), colon (Caco-2), and breast (MCF-7) cancer cell lines [43]. Another series of 32 novel Biginelli dihydropyrimidinones were synthesized by Kumar and colleagues and were studied for their in vitro antioxidant and anticancer activities. The compounds exhibited significant anticancer activities against breast cancer cell line MCF-7 at 10 μg concentration [44]. The cytotoxic activities of another synthesized library of dihydropyrimidinone benzopyran hybrids were evaluated for their cytotoxic activities against four human cancer cell lines A549 (lung carcinoma), MCF-7 (mammary gland adenocarcinoma), HCT-116 (colorectal carcinoma), and PANC-1 (pancreatic duct carcinoma) and showed consistent cytotoxic activities against these cell lines [45]. The antiproliferative activities of dihydropyrimidinones were also studied in another study depicting potent cytotoxic activities of dihydropyrimidinone analogs against melanoma (UACC.62), kidney (786-0), breast (MCF-7), ovarian (OVCAR03), and, particularly,      [46]. All the evidences support the significant role of dihydropyrimidone in breast cancer cell line. Moreover, the significant activities of these compounds against breast cancer cell line and optimum binding energies of these compounds against identified target proteins support the effectiveness of these compounds as anticancer agents.
The QSAR studies were performed by two different software to analyze the model quality and their reliability by both methods.
According to the fitting criteria, the R 2 value is 0.989 that is closer to 1 that shows a good quality model for anticancer inhibition. Moreover, the lower value of LOF and the R 2 adj of 0.985 depicting the convenience to add a new descriptor to the model suggest no overfitting in the model. The model showed to be a good model having least amount of descriptors. The higher value of F (234.487) and the low value of k xx (0.324) show minimum correlation between the descriptors. Similarly, the Delta k (0.084) and the small error on training sets (RMSE tr = 0:148) showed appropriate correlation between the descriptors. The scatter plot obtained by the model equation versus the experimental IC 50 for training set determines the availability of potential outliers (Figure 8(a)). The scatter plot detects the grouping of the data and the possibility of any outlier present.

Internal and External
Validation of the Model. The internal validation of the model was done to check the fitting and stability of the models. The cross validation by Leave-One-Out (LOO) method showed good internal prediction as the Q 2 LOO = 0:977 (variance explained by LOO) has a comparable value with R 2 = 0:989. Moreover, the small error in prediction of RMSE cv = 0:217 shows a robust and stable model. A plot was generated between the predicted values by LOO versus the experimental values of IC 50 (Figure 8(b)). Another method was employed for internal validation that is Leaving-Many-Out (LMO) that leaves out 30% of the dataset to study the model behavior.

22
Journal of Oncology for R 2 and Q 2 indicating the model has not been obtained by random correlation. Figure 8(d) shows the plot between the R 2 y−scr and Q 2 y−scr values against the R 2 and Q 2 of the model. The external validation of the model was also performed to test the predictive ability of the model. The model showed R 2 ext (external determination coefficient [47]): 0.97, R 2 ext : 0.6479, Q 2 -F1: 0.7320, Q 2 -F2: 0.8682, and Q 2 -F3 (variances explained in external prediction [48]): 0.702. The parameters were equivalent to the value of R 2 model. The predictions of compound in external set are shown in Figure 8(a).
The reliability of the model is based on the compounds falling in the applicability domain (AD). The leverage (h) and standardized residuals were used as described by [49]. William's graph was generated to observe the compounds lying in the applicability domain of the model (Figure 9) by plotting the standardized residuals for each compound against the leverage values. In the applicability domain, a defined domain is set up constituting all the data points within the boundary for residuals having a leverage threshold of HAT i/i h * = 1:000 [50]. Most of the compounds fall in the applicability domain except for the compound 3f having value greater than critical leverage (h = 1:29) that can be considered as an outlier.

Interpretation of Descriptors.
In model generated by QSARINS, 70% of the anticancer activity can be described using four descriptors. All the variables belong to 2dimensional family (MATS3i, ASP-5, VR2, and GGI10). The descriptor GGI10 belongs to the GALVEZ family and is a topological charge index that has its origin in first ten eigenvalues. There are two categories for the GALVEZ class, that is, the topological charge index of order n (GGIn) and the mean topological charge index of order n (JGGIn). The "n" is the order of eigenvalue. The GGI10 is the topological charge index of order 10 and has shown positive correlations to the activity, suggesting an increase in value of GGI10 would augment the anticancer activities of synthesized compounds. The descriptor VR2_Dzi also belongs to topological distance matrix and is defined as the normalized Randic-like eigenvalue-based index from Barysz matrix weighted by ionization potential. The negative correlation suggests lower value is associated with the activity of compounds. The 2D-AUTO descriptor (MATS3i) is the topological structure of Moran autocorrelation of lag 3 weighted by ionization potential. It is the summation of different autocorrelation functions giving different vectors based on lengths of structural fragment. The weighted component in the descriptor is linked to the physicochemical property suggesting the association of topology of the structure with the selected property. The autocorrelation vector of lag k is indicative of the number of edges in the fragment, while the last character of the descriptor "i" shows the physicochemical property that is the ionization potential. The negative correlation of MATS3i in the model suggests unfavorable conditions associated with lag 3 weighted by ionization potential. All the descriptors were not correlated with each other.

Conclusion
In this study, 22 derivatives of ethyl 6-(chloromethyl)-4-(4hydroxyphenyl)-2-oxo-1,2,3,4-tetrahydropyrimidine-5-carboxylate were evaluated for their potential for anticancer activities. The compounds 4e, 4f, 4g, and 4h showed good anticancer activities against the breast cancer cell line MCF-7 when compared to standard tamoxifen. The in silico data also revealed best binding affinity and interaction pattern of these compounds against target proteins; moreover, the lead optimization revealed that the compounds have drug-like properties and may act as a lead. The QSAR analysis was carried out to investigate the role of molecular descriptors in attributing anticancer activities of synthesized compounds. The models developed to predict the structural features of these compounds as anticancer revealed useful information about the structural requirement of these compounds suggesting the importance of topological and autocorrelated descriptors. Further, in vitro assays will be carried out to confirm the role of these compounds in targeting these proteins.

Data Availability
All the data has been included in the manuscript.