Ligand Based Pharmacophore Modeling and Virtual Screening Studies to Design Novel HDAC2 Inhibitors

Histone deacetylases 2 (HDAC2), Class I histone deacetylase (HDAC) family, emerged as an important therapeutic target for the treatment of various cancers. A total of 48 inhibitors of two different chemotypes were used to generate pharmacophore model using 3D QSAR pharmacophore generation (HypoGen algorithm) module in Discovery Studio. The best HypoGen model consists of four pharmacophore features namely, one hydrogen bond acceptor (HBA), and one hydrogen donor (HBD), one hydrophobic (HYP) and one aromatic centres, (RA). This model was validated against 20 test set compounds and this model was utilized as a 3D query for virtual screening to validate against NCI and Maybridge database and the hits further screened by Lipinski's rule of 5, and a total of 382 hit compounds from NCI and 243 hit compounds from Maybridge were found and were subjected to molecular docking in the active site of HDAC2 (PDB: 3MAX). Finally eight hit compounds, NSC108392, NSC127064, NSC110782, and NSC748337 from NCI database and MFCD01935795, MFCD00830779, MFCD00661790, and MFCD00124221 from Maybridge database, were considered as novel potential HDAC2 inhibitors.


Introduction
Histone deacetylases (HDACs) are the enzymes that deacetylase the epsilon-N-acetyl-lysine group on histone tails of the protein and result in tightening of nucleosome structure and gene silencing [1]. There are two types of histone forms which are histone acetylases and histone deacetylases [2]. Histone deacetylases (HDACs) are found in animals, plants, fungi, archaebacteria, and eubacteria [3]. Histone deacetylases are generally classified into four different classes, namely, HDACs 1-3 and 8, belonging to Class I and related to homologous to Rpd3, HDAC 4-7, 9-10 are Class II related to Hda1, Sirt 1-7 are Class III and are similar to Sir2 and HDAC11 belongs to Class IV. Classes I and II are operated by zinc dependent mechanism and Class III by NAD [4][5][6][7][8]. Histone deacetylases (HDACs) control the gene expression and cellular signaling and histone deacetylases 2 (HDAC2) is overexpressed in solid tumors including colon cancer, lung cancer, cervical carcinoma, breast cancer, and kidney/cervix cancer and also in Alzheimer's disease [9,10]. Several HDAC inhibitors are in clinical trial, namely, hydroxamic acid derivatives, benzamide derivatives, cyclic peptides, and short-chain fatty acids [11]. The first histone deacetylase (HDAC) inhibitor SAHA (suberoylanilide hydroxamic acid or vorinostat) approved by FDA for treating cutaneous Tcell lymphoma and other hydroxamic acids are in clinical trial. The benzamide derivatives, which are in clinical trials, are Entinostat (MS-275 or pyridin-3-yl methyl 4-((2aminophenyl) carbamoyl) benzyl carbamate) currently in phase II clinical trial for Hodgkin lymphoma, phase I trial of advanced leukemia and myelodysplastic syndrome (MDS), and Mocetinostat (MGCD0103 or N-(2-Aminophenyl)-4-[[(4-pyridin-3-ylpyrimidin-2-yl)amino]methyl] benzamide) in phase II clinical trial for Hodgkin lymphoma, phase I trial of advanced leukemia, myelodysplastic syndrome (MDS), diffuse large B-cell lymphoma, and follicular lymphoma [12][13][14][15]. Ligand based pharmacophore modeling is a major tool in drug discovery and is applied in virtual screening, de novo design, and lead optimization [16]. Different histone deacetylase (HDAC) inhibitors had been synthesized and experimental activity was found. Different pharmacophore and virtual screening studies had been reported on histone deacetylase (HDAC) with known hydroxamic acid derivatives and QSAR studies reported on histone deacetylases 2 (HDAC2) with N(2-aminophenyl)-benzamides [17][18][19]. In the present study benzamide derivatives are used to generate the pharmacophore model and virtual screening studies have been done for histone deacetylases 2 (HDAC2) proteins to gain knowledge regarding pharmacophore model and virtual screening. This study aims to construct the chemical feature based on pharmacophore models for histone deacetylases 2 (HDAC2).

Data Preparation.
A training set of 48 histone deacetylases 2 (HDAC2) inhibitors of two different chemotypes were selected form previously published data and the IC 50 values were identified using the same biological assay. The chemotype A is N(2-aminophenyl)-benzamide [20][21][22][23][24][25][26][27][28][29][30][31] and chemotype B is N-hydroxy benzamide derivatives (see supplementary Figure 1 in the Supplementary Material available online at http://dx.doi.org/10.1155/2014/812148) [32][33][34]. 3D QSAR module in Discovery Studio (DS) was used for developing the pharmacophore. The 2D structure of compounds was drawn in ISIS draw and they were converted into 3D form and conformational models were generated by FAST method, the conformers minimized by the CHARMm force field and the energy threshold value of 20 kcal/mol. A maximum of 255 conformers were developed for each compound and these conformer models were used for hypotheses generation, fitting the compound into the hypotheses and estimating the activity of the compound. The training set of 48 molecules was chosen with IC 50 values with a range from 0.014 M to 21 M. The dataset activity (IC 50 ) was classified based on the span over four orders of magnitude, that is, active (IC 50 ≤ 0.1 M, ++++), moderately active (0.1 ≤ IC 50 ≤ 1 M, +++), less active (1 ≤ IC 50 ≤ 10 M, ++), and inactive (IC 50 > 10 M, +).

Pharmacophore Model Generation.
HypoGen algorithm was applied to build the pharmacophore model and in the present study four features, which are hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), ring aromatic (RA), and hydrophobic (HY), were selected to generate the pharmacophore hypotheses [35]. HypoGen generates pharmacophore model based on chemical features of active compounds in training set. The uncertainty value 2 was selected from default 3, which means the biological activity is two times higher or lower than the true value. All other parameters were kept as default. The developed pharmacophore model was selected based on the highest correlation coefficient, lowest total cost, and root mean square deviation (RMSD).

Pharmacophore
Validation. The pharmacophore model is validated by three steps: cost analysis, Fischer's randomization test, and the test set prediction. The quality of the model is described in terms of fixed cost, total cost, and null cost. The fixed cost represents the simplest model and it fits the data perfectly. The null cost represents no features with high cost value and it estimates the activity to be average activity of the training set compounds. The best model was selected based on the difference between the two cost values (null cost − total cost); if the difference between the costs is greater than 60 means, the model has excellent true correlation. If the difference is 40-60, the model has prediction correlation of 70-90%, and if the difference is below 40, it may be difficult to predict the model. Fischer's randomization is the second approach to validate the pharmacophore model. The 95% confidence level was selected to validate the study and 19 random spread sheets were constructed. This method generates the hypotheses by randomizing the activity of the training set compounds. The correlation between the structure and biological activity was validated by this method. The final sets of validation were selected using twenty HDAC2 inhibitors as given in supplementary Figure 2. The Ligand pharmacophore mapping module in Discovery Studio was used to map the ligands and estimate the predicted activity of the test set compounds.

Database Search.
Virtual screening studies were used to find novel and potential leads from virtual database for further development [36]. The virtual screening studies were used to find novel leads for HDAC2. The Hypo1 model was used as a 3D query in database screening, and the National Cancer Institute (NCI) database containing 265242 molecules and Maybridge database containing 58723 molecules were used for screening [37,38]. Ligand pharmacophore mapping protocol was used with flexible search option to screen the database. Hit compounds from the database with estimated activity less than 0.1 M were selected for further screening using Lipinski's rule of five; compounds have (i) molecular weight less than 500, (ii) hydrogen donors less than 5, (iii) hydrogen acceptors less than 10, and (iv) an octanol/water partition coefficient (Log ) value less than 5.

Molecular
Docking. Docking is the binding orientation of small molecules to their protein targets in order to predict the affinity and activity of the small molecules. Hence docking plays an important role in the rational drug design. Molecular docking studies were performed by using LigandFit module in Discovery Studio [39]. There are three stages in LigandFit protocol: (i) docking, in which attempt is made to dock a ligand into a user defined binding site, (ii) in situ ligand minimization, and (iii) scoring, in which various scoring functions were calculated for each pose of the ligands. Protein preparation was the main step in docking and all ligands were docked into the active site of the receptor. Protein preparation involves deletion of water molecules and addition of hydrogen atoms and applying CHARMm force field. The active sites were searched using flood filling algorithm. The active site was defined as region of HDAC2 that comes within 12Å from the geometric centroid of the ligand. Ten poses were generated for each ligand during the docking process and the best poses were selected based on the best orientation of the molecule in the active site and dock score values, which was selected after energy minimization with smart minimization. The dock score was calculated using the following formula: Single dock score may fail to obtain active molecules; hence, consensus scoring method was applied which consists of LigScore1, LigScore2, Jain, Piecewise Linear Potential (PLP1 and PLP2), and Potential of Mean Force (PMF). The active molecules were selected based on the consensus scoring method and H-bond interaction with the receptor. The crystal structure of the HDAC2 protein (PDB ID: 3MAX) was downloaded from the protein data bank (http://www.rcsb.org/pdb). The crystal structure of histone deacetylases 2 (HDAC2) protein has three chains, which are A, B, and C. The chain A has higher docking score than chains B and C, so chain A is selected for docking. The hit compounds from the database screening with positive Lipinski's drug likeness were subjected to molecular docking studies into the active site of the 3MAX receptor.

Results and Discussion
3.1. Pharmacophore Generation. Pharmacophore model, virtual screening, and molecular docking studies were performed to find novel HDAC2 inhibitors. Dataset of 48 molecules with structural diversity and four orders of activity magnitude (0.014 to 21 M) were selected to develop pharmacophore model using HypoGen algorithm in Discovery Studio. Four features hydrogen bond donor (HBD), hydrogen bond acceptors (HBA), ring aromatics (RA), and hydrophobic (HY) were selected. Top 10 hypotheses were generated with the following features: HBA, HBD, RA, and HY. The statistical parameters such as cost values, correlation, and RMSD were summarized in Table 1. The best hypothesis was selected out of 10 hypotheses by the highest cost difference. Hypo1 has the highest cost difference between null cost and total cost of 68.45, correlation coefficient of 0.75, the lowest RMS deviation of 1.64, and configuration cost value of 17.66. This indicates the model and the data correlated by more than 90%. High correlation coefficient and low RMSD indicate the ability to predict the activity of the training set compounds is high. The Hypo1 has a correlation coefficient value ( 2 = 0.75), and the model strongly predicts the activity of training set compounds. The correlation between the experimental activity and predicted activity of training set compounds was shown in Table 2. For most of the compounds the model predicts the activity correctly. Figure 1(a) shows the 3D spatial arrangement of all features with distance constraints of Hypo1. The features of Hypo1 were mapped onto the active compound 15 as shown in Figure 1(b). HBD is mapped by amino group, HBA is mapped by phosphonate group, aromatic ring is mapped by aromatic group, and HY is mapped by hydrophobic group. The results indicate that HDAC2 inhibition requires the following features: HBD, HBA, RA, and HYP. Inactive compound 29 was mapped partially onto the features of Hypo1 as shown in Figure 1(c). The fit value for the most active and the least active compounds was generated to be 5.39 and 3.21, respectively.

Pharmacophore
Validation. The pharmacophore model can be validated by three methods: cost analysis, test set prediction, and Fischer's randomization test.

Cost Analysis.
The HypoGen algorithm in DS produces three cost values during the pharmacophore generation, which are fixed cost, total cost, and null cost. The model is validated by the difference between the null cost and total cost; if the model has cost difference above 60, it has the predictability chance of greater than 90%. The Hypo1 having the cost difference of 68.45 shows significant model (shown in Table 1).

Test Set Prediction.
A good pharmacophore model can predict not only the activity of the training set compounds but also external test set compounds. 20 compounds with different activity range were used as a test set to check the predictability power of the pharmacophore model. Ligand pharmacophore mapping protocol with flexible search option  was used to map the test set compounds. In test set analysis, for most of the compounds the model predicted activity to the tune of less than 10%. Out of 20 compounds 17 compounds were predicted with an error factor less than 5% and 3 compounds were predicted with an error factor less than 10%. The experimental and predicted activities of the test set compounds were shown in Table 3.

Fischer Randomization Test.
Fischer randomization test was the third approach to validate the Hypo1 using DS. In this method the experimental activity of the training set compounds was randomly scrambled and generates the random pharmacophore model using the same parameters as used in developing the Hypo1 hypothesis. Confidence level of 95% was set and it created 19 spread sheets, all 19 random spread sheets have high cost values than total cost, and correlation value is less than the Hypo1 (supplementary Table 1). It clearly shows none of the randomly generated pharmacophores has good statistical values than Hypo1. The difference in costs between the HypoGen and Fischer randomizations was shown in Figure 2. All the three validations methods demonstrated that Hypo1 hypothesis has good predictability and can be chosen as the best model.

Database
Screening. The best pharmacophore model Hypo1 was used as a 3D query to search the NCI (265242) and Maybridge (58723) databases using flexible search option in DS. A total of 6130 compounds from NCI and 1379 from Maybridge were mapped using the features of Hypo1. A total of 1198 and 440 compounds from NCI and Maybridge showed HypoGen estimated value of less than 1 M and were considered for further studies and these compounds were screened for Lipinski's rule of 5. A total of 625 (382 NCI, 243 Maybridge) compounds obeyed the rule and were subjected to molecular docking studies. The flowchart in Figure 3 was a schematic representation of virtual screening process.  three chains chain A has given the best docking score and higher H-bond interactions than chains B and C. The docking score of all three chains with Entinostat was shown in supplementary Table 2. Chain A was selected as an active chain and the final hit compounds from virtual screening studies were docked into active site of 3MAX-A. The docking score along with binding orientations and hydrogen bonds was considered for choosing the best pose of the docked compounds. The docking scores were compared with MS-275 (Entinostat). The docking score of the Entinostat was 42.6 kcal/mol and hit compounds from the virtual screening studies show better binding than the active compound Entinostat. The Entinostat has the four hydrogen bonding interactions with Arg39, Cys156, Gly305, and His183 given in Figure 4(a). The hit compounds that scored docking score higher than active compound and form interaction with the crucial amino acids were considered as effective leads for designing novel HDAC2 inhibitors. 74 compounds from both databases showed good interactions in the active site of the HDAC2 and scored more than 45, about 20 compounds that showed better docking score than active compounds were chosen as leads and their docking score and H-bond interactions were listed in supplementary Tables 3 and 4. formamide has interaction with Asp181. MFCD00661790 (N-(2-(3-(2,4-difluorophenyl)thioureido)ethyl)-2-(4-hydroxyphenyl)acetamide) has the docking score of 81. Kcal/mol forming four hydrogen bond interactions with Cys156, His153, His146, and Ala141 amino acids, Hypo1 HBD mapped on oxygen shows hydrogen bond interaction with Ala141, HBA mapped on nitrogen forms bonds with His183, nitrogen of thiourea forms bonds with His146, and acetamide of oxygen has hydrogen bond with Cys156; the binding interactions are shown in Figure 4(h). The fourth hit MFCD00124221 (N-(4-(3-(2,3-dichlorophenyl)thioureido)phenyl)-2-hydroxybenzamide) has docking score of 65.86 kcal/mol with three hydrogen bonds with Cys156, Gly305, and His183. The complex shown in Figure 4(i) shows benzamide of oxygen with Cys156, nitrogen with Gly305, and nitrogen of thiourea with His183 and shows hydrogen bond interaction. The pharmacophore overlay of the hit compounds was shown in Figure 5. The identified lead compounds along with their estimated IC 50 were shown in Figure 6. The studies show Arg39, Cys156, His145, and His146 were the important amino acids in the active site involved in hydrogen bond interaction. Based on pharmacophore modeling, virtual screening, and molecular docking studies, the compounds listed in supplementary Table 3 are selected as novel leads for effective HDAC2 inhibition. All identified hits were with diverse scaffolds and provide opportunities for designing novel HDAC2 inhibitors. The lead compounds were selected based on the docking score and structural diversity. The correlation between the estimated activity and docking score of top 10 lead compounds from NCI and Maybridge is 0.61 and 0.54, respectively, which suggests that the selected inhibitors in the present study could be specific HDAC2 inhibitors.

Molecular
Comparative study on previous developed models with present study shows common pharmacophore features were present in all studies. Pharmacophore model on histone deacetylase (HDAC) with known hydroxamic acids and cyclic peptides shows four pharmacophore features: one hydrogen acceptor and one hydrophobic and two aromatic rings [17] and in the study with known hydroxamic acids, benzamides, and biphenyl derivatives on HDAC [40] three pharmacophore features were shown: hydrogen acceptors, hydrogen donors, and hydrophobic aromatic ring. Pharmacophore model on histone deacetylase (HDAC8) with known hydroxamic acids has shown four pharmacophore features: one hydrogen acceptor, two hydrogen donors, and one hydrophobic group [18]. The present developed pharmacophore model on HDAC2 with known benzamide derivatives shows four pharmacophore features: one hydrogen acceptor, one hydrogen donor, and one hydrophobic and one aromatic rings, which correlates with the previous studies. The selected eight lead compounds and Entinostat were docked into the active sites of histone deacetylase (HDAC) (PDB: 1ZZ1) and histone deacetylase (HDAC8) (PDB: 1T64), and in both the receptors chain A was selected for docking. The docking result shows that compounds with HDAC and HDAC8 comparably showed lesser docking score and interactions than HDAC2. The docking scores and H-bond interactions were shown in supplementary Tables 5 and 6.
The combinations of pharmacophore, virtual screening, and molecular docking successfully give more potential inhibitors that can have great impact for future experimental studies in diseases associated with HDAC2 inhibition.

Conclusion
In this study, ligand pharmacophore model developed by HypoGen algorithm in DS, 10 hypotheses were generated using 48 training set compounds with structural diversity. The best pharmacophore Hypo1 was characterized by high cost difference and correlation coefficient comprised of HBD, HBA, RA, and HY features. The Hypo1 was validated by external test set and Fisher's randomization test suggests the model has good predictability. The Hypo1 was used as a 3D query to screen NCI and Maybridge databases. 625 compounds with estimated activity less than 1 M and favourable Lipinski's rule were selected for docking studies. In molecular docking studies the important interactions with inhibitors and active site residues were determined. Based on docking score and interactions twenty hits were found and finally eight hits NSC108392 (4-((