Combined 3 D QSAR Based Virtual Screening and Molecular Docking Study of Some Selected PDK-1 Kinase Inhibitors

Phosphoinositide-dependent kinase-1 (PDK-1) is an important therapeutic target for the treatment of cancer. In order to identify the important chemical features of PDK-1 inhibitors, a 3DQSAR pharmacophore model was developed based on 21 available PDK1 inhibitors. The best pharmacophore model (Hypo1) exhibits all the important chemical features required for PDK-1 inhibitors. The correlation coefficient, root mean square deviation (RMSD), and cost difference were 0.96906, 1.0719, and 168.13, respectively, suggesting a good predictive ability of the model (Hypo1) among all the ten pharmacophore models that were analyzed. The best pharmacophore model (Hypo1) was further validated by Fisher’s randomization method (95%), test set method (r = 0.87), and the decoy set with the goodness of fit (0.73). Further, this validated pharmacophore model Hypo1 was used as a 3D query to screen the molecules from databases like NCI database andMaybridge.The resultant hit compounds were subsequently subjected to filtration by Lipinski’s rule of five as well as the ADMET study. Docking study was done to refine the retrieved hits and as a result to reduce the rate of false positive. Best hits will further be subjected to in vitro study in future.

Recognition process between ligand and model is based on spatial distribution of certain structural features of active site being complimentary to those of the interacting ligands, and the features common to the ligands would provide the information about the active site.A pharmacophore mapping is the essential step towards understanding of receptorligand recognition process and is established as one of the successful computational tools in rational drug design [16,17].This involves the identification of a three-dimensional arrangement of functional groups which a molecule must possess to be recognized by the receptor.Further, a model is generated by finding chemically important functional groups that are common to the molecules that bind.Pharmacophore can be derived by direct analysis of the structure of known ligand either in the most stable conformer or in the form observed when complexed with the target protein.
In the present study, a three-dimensional pharmacophore model for PDK-1 kinase inhibitors has been developed.The generated model is further utilized for screening of potentially active candidates from NCI and Maybridge database.The efficacy of these compounds is further validated by molecular docking.

Material and Methods
2.1.General Methodology.All pharmacophore models generation and Hypo1 based virtual screening were performed using the following tools.Ligandfit.Docking studies were achieved using Discovery Studio 2.5 (Accelrys Inc., San Diego, CA).

Data Set for Pharmacophore Analysis.
A set of 83 different compounds were collected from different references [18][19][20][21], which have been identified and reported to be inhibitors of PDK-1 kinase.The inhibitory activity of these compounds, expressed as IC 50 (i.e., concentration of compound required to inhibit 50% of PDK-1 kinase activity), was studied for all compounds.The IC 50 values spanned across a wide range from 3.0 to 65,000 nM.Amongst 83 compounds, 21 compounds were selected as training set compounds and the rest of compounds were taken as a test set compounds.The chemical structures of all training set compounds are shown in Figure 1.The selection of the training set and test set were according to the following rules: (i) there is structural diversity among molecules, (ii) both training set and test set cover a wide range of activity, and (iii) the highest active compounds were included in the training set because they provide critical information for pharmacophore generation.The geometry of all compounds was built by using Accelrys Discovery Studio 2.5 [22].All the compounds were minimized using the steepest descent algorithm with a convergence gradient value of 0.001 kcal/mol and a family of representative conformations was generated by fast conformational analysis methods using poling minimize algorithm [23] and CHARMM force field parameters [24].A large number of confirmations of each compound were generated within an energy threshold of 20.0 kcal/mol above the global energy minimum.

Pharmacophore Modeling.
Based on the conformations for each compound, HypoGen module of Discovery Studio 2.5 was used to construct the possible pharmacophore models [25].Instead of using the lowest energy conformation of each compound, all the conformational models of each compound in the training set were used in Discovery Studio 2.5 for pharmacophore hypothesis generations.The training set compounds (21 in number) associated with their conformations were submitted to Discovery Studio 2.5 for 3D QSAR pharmacophore Generation (HypoGen).The HypoGen module generated hypothesis with features common in active molecule and missing from inactive molecule.

Model Validation.
The statistical parameters, such as the cost value, determine the significance of the model.The best model was selected on the basis of significant statistical parameters, like high correlation (), predicted the lowest total cost, and lower value of RMSD, and the value of the total cost should be closer to the fixed cost and far away from null cost.Another parameter, configuration cost, is also important for the determination of significance of the model, and it should be <17.The best hypothesis Hypo1 was also validated by test set validation method, Fischer's randomization validation, and decoy set method.Ligand pharmacophore mapping protocol was used for estimating the activity of all 62 test set compounds.

Decoy Set Validation.
Results of test set validation method could only indicate that the generated pharmacophore model (Hypo1) has high efficiency in picking the active molecules but is not conformity as it also picked the inactive molecules.To further evaluate this, decoy set validation method was used to evaluate the efficiency of Hypo1 by calculating the GH (goodness of hit list) and EF (enrichment factor).A data set of small molecules was generated by decoy set finder 1.1 which included 1980 molecules with unknown activity and 20 active molecules were making a decoy set of 2,000 molecules.GH (goodness of hit list) and EF (enrichment factor) were calculated by the following equations: where  = total number of molecules in hit list,  = total active molecules present in the hit list,  = total active molecules present in database, and  = total molecules present in decoy set.
The range of GH score varies from 0 to 1. GH score 0 means a null model, while the GH score 1 means generation of an ideal model.
Although when the GH score is higher than 0.7, it reflect the generation of a very good model.The EF and GH were found to be 69.23 and 0.73 (shown in Table 1) indicating that the generated pharmacophore model had a rationale for virtual screening.

Virtual Screening and ADMET Analysis.
The final validated hypothesis (Hypo1) was used as a 3D structural query for retrieving potent compounds from NCI database and Maybridge database having 23,8819 molecules and 2,000 molecules, respectively.A systematic diagram of virtual screening protocol is shown in Figure 2.   The best hypothesis (Hypo1) consists of four features, that is, two hydrogen bonds acceptor (HBA), one hydrogen bond donor (HBD), and one hydrophobic aliphatic feature (HyA).Figures 3(a

Cost Analysis.
In addition to generating a hypothesis, HypoGen also provides two theoretical costs (represented in bit units) to help assess the validity of the hypothesis.The first is fixed cost (cost of an ideal hypothesis), which represents the simplest model that fits all data perfectly, and the second is the null cost (cost of null hypothesis), which represents the highest cost of a pharmacophore with no features and which estimates activity to be the average of the activity data of the training set molecules.They represent the upper and lower limits for the hypothesis that are generated.A meaningful pharmacophore hypothesis may be generated when the difference between null hypothesis and the fixed hypothesis is large; a value of 40-60 bits may indicate that it has 75-90% probability of correlating the data.
Other two parameters that also determine the quality of any pharmacophore are configuration cost or entropy cost and error cost.The configuration cost depends on the complexity of the pharmacophore and should have value <17 whereas the error cost is dependent on the root mean square difference between the estimated and the actual activity of the training set.The difference between total fixed cost and the null cost of the Hypo1 was observed to be 168.4843,which is more than 40-60, which depicts more than 90% probability of data correlation.Noticeably, the total cost of Hypo1 was much closer to the fixed cost than to the null cost.Furthermore, a high correlation coefficient of 0.96906 was observed with RMS value of 1.0719 and the configuration cost of 15.4729, demonstrating the development of a reliable pharmacophore model with high predictivity.The most active compound exhibits a good fit with all features of the pharmacophore hypothesis, Hypo1, whereas in the least active compound it had hydrogen bond acceptor feature missing.Based on this, it may be concluded that two HBA features are important for PDK-1 kinase inhibitory activity.compounds, with a significant predictive correlation value ( = 0.87) between experimental and estimated activities (shown in Figure 5).The experimental and estimated activities of test set compounds mapped on the best hypothesis (Hypo1) were shown in (Table 4).Further, another validation method was used to characterize the quality of the hypothesis using error ratio, which is the difference between estimated activity and experimental activity.Also an error ratio ≤10 depicts that there is no more than one order difference between estimated and experimental activity values, not more than one order.The best hypothesis (Hypo1) exhibited an error value ≤10 for 53 compounds out of 62 compounds.Only 9 compounds (compound 29, compound 32, compound 34, compound 38, compound 40, compound 51, compound 52, compound 53, and compound 55) with values > 10 were considered as outliers and rejected.The most potent compound 22 of the test set (IC 50 = 6 nM) was mapped with Hypo1 (Figure 6).The best hypothesis (Hypo1) mapped very well, also all the chemical features of this compound matched and the estimated activity of this compound had an IC 50 value of 1.3 nM.Based on these results, it was confirmed that one HBD, two HBA, and one HyA (hydrophobic aliphatic) features are essential for PDK-1 inhibitory activity.

Fisher's Validation.
Fischer's randomization test method was used to evaluate the statistical relevance of Hypo1 by using the CatScramble program.The confidence level was fixed at 95%.The CatScramble program generated 19 random spreadsheets to construct hypothesis using exactly the same conditions as used in generating the original pharmacophore hypothesis.Total cost of 19 pharmacophore hypothesis generated randomly and the original pharmacophore hypothesis are also presented in Figure 7.It is observed that an original hypothesis (Hypo1) was far more superior to the 19 random hypotheses, Figure 7: The difference in costs between the HypoGen runs and scrambled runs.The 95% confidence level was selected.
suggesting that Hypo1 is not generated by any chance event.
These results have provided 95% confidence of the proposed hypothesis.exhibited good mapping with Hypo1 using fast and flexible search method.Out of total 8,833 compounds, 8,530 compounds were from NCI and 333 compounds were from Maybridge database.Out of these 8,833 molecules, 2033 molecules having their IC 50 < 1M were selected for further studies.Further sorting of these hits has been done by Lipinski's rule of five, to evaluate their drug similarity.Total 1,613 molecules passed this evaluative process.These 1,613 molecules were further evaluated for the ADMET studies.

Pharmacophore
Only 842 molecules passed the ADMET filtration process.Those molecules were selected for further molecular docking studies, which exhibited estimated activity ≤0.5 M.It was observed that only 43 molecules satisfied these conditions and further molecular docking study was prepared for these molecules.

Molecular Docking Studies.
Further studies were conducted for selected compounds (retrieved hits) and evaluated the binding mode between compounds and protein.All the compounds and compound 1 were docked into the binding site of PDK-1 [26] (PDB entry: 1UU7) [27] by using LigandFit [28]  Compound NSC 24871 formed the hydrogen bond interaction with Lys111, Ser160, and Ala162.It was observed that the phenyl ring of compound was sandwiched between the phenyl rings of Tyr161 and Phe93 and they formed the pi-pi interaction.Tyr161 formed pi-pi interactions with phenyl ring of Compound 218342, while the carboxyl groups were involved in the formation of two hydrogen bonds with Lys111 and Phe94.Phenolic oxygen was involved in formation of hydrogen bond with Ser162 and Ala162 amino acids.In all the cases, Try 161 was involved in forming pi-pi interaction with the phenyl ring of the compounds.2D representation of molecular docking results of all three compounds was shown in the Figures 8(b), 8(c), and 8(d).Lys111 formed two hydrogen bonds with the two different oxygen atoms of phenyl groups of the Compound NSC 24871.Also one phenolic oxygen atom formed the two hydrogen bonds with the two hinge regions in amino acids, that is, Ser160 and Ala162.These three compounds retrieved from two databases (NCI & Maybridge), exhibited good interactions with important amino acids in the active sites.Among all three compounds, Compound NSC 218342 retrieved from the NCI database was observed to exhibit good estimated activity, fit values and docking score, and hydrogen bond interactions.Molecular docking results support that these molecules can be further taken as the potential leads for designing novel PDK-1 inhibitors in the future.

Conclusions
A ligand based computational method was used to identify molecular structural features required for effective PDK-1 inhibitors for discovery of drugs to prevent and cure wide variety of cancers.A data set of 83 compounds of selective PDK-1 inhibitors with their respective activities ranging over a wide range of magnitude has been used to generate pharmacophore hypothesis and to predict the activity successfully and accurately.A highly predictive pharmacophore model was generated based on 21 training set molecules, which had hydrogen bond acceptor, hydrogen bond donor, and hydrophobic aliphatic groups as chemical features which described their activities towards PDK-1 kinase.The validation of the model was based on 62 test set molecules, which finally showed that the model was able to differentiate various classes of PDK-1 inhibitors with a high correlation coefficient of 0.87 between experimental and predicted activities accurately.Further validation of Hypo1 was done by a decoy set method.The decoy set method exhibited GH score of 0.73 which depicts that designed model has very high efficiency in screening the molecules from database.Hypo1 was used as a 3D query to screen the potential molecules from the NCI database as well as Maybridge database.The hit compounds were filtered subsequently by Lipinski's rule of five and ADMET filtration.Further, molecule selection was refined by docking study.After the docking studies, it was observed that the 3 molecules (NSC 218342, NSC 24871, and NSC 211930) with different scaffolds exhibited better docking energy as well as better interaction.To conclude, the defined drug candidate is further evolved using in vitro and in vivo studies as anticancer molecule.

HypoGen.
It was implemented in Catalyst (Catalyst 4.1, Molecular Simulations Inc., San Diego, CA).Fisher Randomization Test.It was done by CatScamble program implemented in Catalyst.Lipinski Filtration.It was performed using Pipeline Pilot Studio (SciTegic Inc., San Diego, CA).

Figure 4 :
Figure 4: (a) The highest active compound (compound 1, IC 50 = 3.0 nM) mapped on the best pharmacophore model; (b) the least active compound (Compound 21, IC 50 = 65,000 nM) mapped on the best pharmacophore model (Hypo1).The most active compound exhibits a good fit with all features of the pharmacophore hypothesis, Hypo1, whereas in the least active compound it had hydrogen bond acceptor feature missing.Based on this, it may be concluded that two HBA features are important for PDK-1 kinase inhibitory activity.

Figure 8 :
Figure 8: 2D representation of top docking hits retrieved from database and most active compound (Compound 1).

Table 1 :
Statistical parameters from the validation of the pharmacophore model by mean of decoy set.

Table 2 :
Information of statistical significance and predictive power presented in cost values measured in bits for top 10 hypotheses as a result of automated HypoGen pharmacophore generation process.The cost difference between null cost and total cost; null cost is 258.686bits; fixed cost is 77.5618 bits; configuration cost is 15.4729 bits.

Table 3 :
Experimental biological data and estimated IC 50 of training set molecules based on pharmacophore model Hypo1.

Table 4 :
Experimental biological data and estimated IC 50 of test set molecules based on pharmacophore model Hypo1.

Table 5 :
The estimated activity, interaction energy, and LigandFit scoring results of top ranked four compounds obtained from the combination of Hypo1 based virtual screening and molecular docking studies.