Correlation between Virtual Screening Performance and Binding Site Descriptors of Protein Targets

Rescoring is a simple approach that theoretically could improve the original docking results. In this study AutoDock Vina was used as a docked engine and three other scoring functions besides the original scoring function, Vina, as well as their combinations as consensus scoring functions were employed to explore the effect of rescoring on virtual screenings that had been done on diverse targets. Rescoring by DrugScore produces the most number of cases with significant changes in screening power. Thus, the DrugScore results were used to build a simple model based on two binding site descriptors that could predict possible improvement by DrugScore rescoring. Furthermore, generally the screening power of all rescoring approach as well as original AutoDock Vina docking results correlated with the Maximum Theoretical Shape Complementarity (MTSC) and Maximum Distance from Center of Mass and all Alpha spheres (MDCMA). Therefore, it was suggested that, with a more complete set of binding site descriptors, it could be possible to find robust relationship between binding site descriptors and response to certain molecular docking programs and scoring functions. The results could be helpful for future researches aiming to do a virtual screening using AutoDock Vina and/or rescoring using DrugScore.


Introduction
Molecular docking is a method in which it is attempted to find the most probable pose of the ligand in the active site of a receptor and estimation of the binding energy. Molecular docking is a computational approach whose applicability in virtual screening was approved. Comparing with experimental methods of HTS (High Throughput Screening) it can save time and cost of a drug discovery project. However, it suffers from some drawbacks such as a high rate of false positives [1,2]. It was shown that docking programs have a reasonable power to predict correct binding pose of the ligands. However, their scoring powers were not same for different protein families and also there is a weak correlation between docked scores and binding affinities of the ligands [3,4].
One of the most cited open source docking engines is AutoDock Vina [5]. It uses genetic algorithm to search for the most energy favorable pose of a flexible small molecule in either a rigid or a flexible binding site of a protein. Here, AutoDock Vina was employed as a docking engine. Generally, the docking engines use scoring functions to discriminate between favorable and unfavorable binding poses of the same molecule [6]. Furthermore, scoring functions rank the best binding poses of the different small molecules to find strong binders among them. The scoring functions deal with a trade-off between speed and accuracy. Thus, rescoring and consensus scoring approaches have been investigated to discover a stable method that possibly could add up the accuracy of various scoring functions and outperform single scoring functions [7][8][9][10][11]. However, it has been suggested that the scoring functions performances are target dependent. However, the present study is different in some aspects. The data set is retrieved from DUD-E [12] data set to avoid bias in the design of active groups and decoys data set for each protein target. In addition, the protein targets data set is diverse and we attempted to find possible relationships between scoring function performances and the binding site descriptors.
One of the proposed solutions that possibly could improve the virtual screening results is rescoring. Scoring functions can fall into three categories [6,13]: (1) empirical 2 International Journal of Medicinal Chemistry scoring functions, including ChemScore [14], (2) knowledgebased potentials, including DrugScore [15], and (3) forcefield based approaches, including AutoDock Vina [5] and AutoDock 4.2 [16]. Four metrics can be employed to assess the performance of a scoring function: the scoring power, ranking power, docking power, and screening power [6,17]. Thus, rescoring can be done to find the best conformation of a single molecule (improvement of docking power) and for improvement of estimation of the binding energy and ranking the ligands (scoring and ranking power) or reranking the hits of a virtual screening to discriminate between decoys and true binders (improvement of screening power). The latter is the main concept of this research. A consensus scoring method so-called rank-by-number that had shown promising results [9] was also tested in this study. Several reports [1,[7][8][9][10][11] investigated the possible effects of rescoring on the different metrics of scoring performance. Among them the main result of more recent studies that have been done on larger data sets is that scoring function performance is very dependent on target [1]. In the other words, the current scoring functions are not universal.
In this study it was attempted to evaluate rescoring performance in virtual screenings conducted on a large set of predefined ligands and decoys for 32 receptors. In addition, the aim of this study is to find a method to predict the performance of a scoring function on specific targets. This study seeks to address two questions. (1) Can employed rescoring strategies consistently improve discrimination binders from decoys? (2) Can the performance of docking and/or scoring be predicted by specification of the receptors binding sites?

Receptors and Ligand
Preparation. 32 diverse targets were selected from the DUD-E database [12] ( Table 1). The selection was based on the diversity and size of the set to keep the computational cost as low as possible. The same 3D structures that had been used in DUD-E for each of the 32 selected targets were retrieved from protein data bank (PDB) ( Table 1). Then, the PDB files were prepared for AutoDock Vina docking. Cocrystal ligands and water molecules were removed, hydrogen and partial charges (Gasteiger) were added, and the coordinates of the 3D structures were saved in pdbqt format. The ligands from the DUD-E data set were used following modifications. The ligands in the DUD-E set have been divided into active compounds and decoy compounds for each target. There are approximately 50 decoys for each active compound in the whole DUD-E set. The active group contained some duplicate structures that differ in their protonation states. As this would generate an analog bias, the duplicate forms were omitted, and only a single structure, which was in its physiological protonation state, was kept. The corresponding decoy structures were also omitted from the study. All the ligands were converted to pdbqt files. The number of active groups and decoys for each target were reported in Table 1.

Virtual
Screening. The AutoDock Vina was employed for the molecular docking [5]. For each of the targets, a box was defined to dock the ligands properly in each active site. In all the docking runs, the exhaustiveness was set to 8. The cocrystal ligand for each target was redocked in the binding site of the target and the results are available as in Supplementary Materials (available here).

Rescoring.
Four scoring functions and combinations of them have been evaluated in this study. These four scoring methods were from three different categories. Vina scoring (built-in scoring function of AutoDock Vina) and AutoDock4.2 scoring functions are force-field based. Chem-Score is a SYBYL built-in scoring function that is an empirical scoring function. DrugScore is a knowledge base scoring function and is available as a standalone scoring function. All of the best docked poses of the ligands based on the Vina scoring function were rescored by other three scoring functions and also by all possible combinations. Thus, 11 consensus scorings were also applied (Tables 2 and 3).
A previously defined consensus scoring (rank-bynumber method [9]) was employed to summarize the results of multiple scoring functions. Rank-by-number consensus score is an average of the -scaled scores calculated by each of the individual scoring functions. Individual -scaled scoring function values ( Score) are computed by where is the scoring value of an individual scoring function, is the mean value, and is the standard deviation of this scoring function for entire set.

Calculation of Binding Site Descriptors.
Binding site environment properties were retrieved form PLIC [18] database. This is a database that provides cluster of binding sites. It uses Fpocket [19] and LPC [20] to generate the following binding site descriptors: pocket volume, number of alpha spheres, mean alpha sphere radius, proportion of apolar alpha spheres, mean local hydrophobic density, hydrophobicity scores, volume score, charge score, proportion of polar atoms, alpha sphere density, maximum distance between COM and alpha sphere, Maximum Theoretical Shape Complementarity, observed shape complementarity, and normalized shape complementarity.

Statistical Analysis.
To assess the performance of each scoring function and the consensus scoring two parameters were used: area under the curve (AUC) of the ROC (receiver operating characteristic) curve and enrichment factor (EF) at different levels. To evaluate the performance of the scoring functions in discriminating active groups among decoys the scoring functions performance was tested on docked active and decoy compounds. The ROC curve and EF were applied to determine the performance of each scoring function. The increase in AUC of the ROC curve can be used as an indicator of improvement in discrimination between true ligands from decoys. AUC can have a value between 0 and 1, in which AUC = 0.5 means that the method of interest performed like a random selection in average, while AUC = 1 means the complete discrimination between true and false cases (active and decoys). EF is defined as the fraction of active compounds found divided by the fraction of the screened library: EF1% and EF2% showed the ability of a particular scoring method to retrieve true ligands with a high rank among virtual screening results.
Significance of the difference between the AUC of the two ROC curves was assessed using online tool at http:// vassarstats.net/roc comp.html. Other statistical tests and plotting were done using R (R: a language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria; URL http://www.R-project.org/) including the following packages: enrichvs and ROCR.

Results
The average and difference in AUC of the ROC curve for each scoring method after rescoring are presented in Tables  2 and 3, respectively. They show the overall performance   Figure 1 demonstrated this fair correlation between DrugScore performance and the binding site descriptor, MTSC. In Table 6 the protein targets whose AUC of the ROC curve were significantly increased or decreased after rescoring by DrugScore were emphasized ( Figure 2). According to the various classifications plot (data not shown) it was found out that these two groups can be separated based on two descriptors, volume score and MTSC ( Figure 3).

Discussion
The calculated performance of AutoDock Vina on individual target can be used for selection of this docking engine for virtual screenings on specific targets. Furthermore, the results showed slight general improvement in discrimination between decoys and ligands by using consensus rescoring method which consisted of Vina and DrugScore scoring functions. By active site analysis it was shown that DrugScore improved the discrimination power of AutoDock Vina significantly in case of receptors that had both high volume score and MTSC. In addition, it was shown that AutoDock and  DrugScore Screening powers had significant correlation with MTSC and MDCMA.
AutoDock Vina is free for academics and has showed a good scoring power in a recent study on large and diverse data set [4]. Thus, it was selected as a docking engine for pose prediction in the present study. The screening power of AutoDock Vina was correlated with MTSC and MDCMA. The reported AUC of the ROC curve and enrichment factor could be used for prediction of AutoDock Vina performance on each target. Furthermore, MTSC and MDCMA values could be used as a possible indicator of successfulness of AutoDock Vina in a virtual screening on a specific target protein. It was suggested [21] that AutoDock Vina had a better average performance for 31 protein targets' virtual screening than DOCK [22]. As AutoDock Vina is an open source and shows good performance compared with other docking engines, improvements of AutoDock Vina code in different aspects such as parallel run [23] have been conducted during recent years.
It was suggested that the performances of docking program and scoring functions were target dependent [1,4]. The    nature of the active site of the proteins, the choice of scoring functions, and the set of ligands used for comparisons all affected the performance in scoring and ranking compounds [11]. Some studies concluded that consensus scoring (rankby-number, consisting of three or four scoring functions) outperformed individual scoring performance [9]. In most of the studies that were conducted on more diverse and larger data sets, there is no strong correlation between affinity and scoring function predictions [4,10]. In this study, only the ranking power of the scoring function was estimated. In overall consensus scoring with both DrugScore and Vina scoring functions, rescoring with DrugScore slightly improved the ranking metrics (AUC of the ROC curve and EF), but it was not statistically significant. Rescoring by DrugScore produces most cases with significant increased or decreased screening power (assessed by changes in the AUC of the ROC curve) with respect to the original Vina scoring. Therefore, these data were used to find possible binding site descriptors that could predict the performance of DrugScore rescoring in improvement of original virtual screening results. Finally, after exploring different descriptors it was found that a simple model based on two descriptors (volume score and MTSC) could fairly predict the improvement of virtual screening results after rescoring by DrugScore for a target protein. DrugScore has been also successful in some other rescoring campaigns [8,24] and was one of the best performers in a ranking power assessment among 16 scoring functions [7]. MTSC indicates the shape complementarity of a binding site with the specific cocrystalized ligand. Here, it was shown that the performance of DrugScore as well as AutoDock Vina docking and subsequent scoring are correlated with the value of MTSC. It could be due to the better performance of AutoDock Vina docking algorithm in finding near native pose of active groups in the case of a binding site with high MTSC. The values of the volume score descriptor were correlated with the improvement of virtual screening results by DrugScore rescoring. This could be explained as better performance of DrugScore in the case of the higher number of ligand-protein interactions in the bigger binding sites.

Conclusion
The results consistent with those previous studies suggested that performance of docking and scoring functions was target specific. Working on new scoring functions that include terms for aromatic-aromatic or -cation or halogen protein interactions has been suggested. A correlation between screening power of AutoDock Vina and DrugScore and two binding site descriptors, MTSC and MDCMA, was found. The improvement after rescoring with DrugScore was predicted by two descriptors: volume score and MTSC. The ultimate goal of this study was to determine which of the scoring functions or combinations of them would yield the best results in terms of enrichment when used in a virtual screening study. The results could provide useful information for people to select the most appropriate target for using AutoDock Vina and/or DrugScore in their studies.

Conflicts of Interest
The author declares that they have no conflicts of interest in the publication.