Evaluation of Docking Target Functions by the Comprehensive Investigation of Protein-Ligand Energy Minima

The adequate choice of the docking target function impacts the accuracy of the ligand positioning as well as the accuracy of the protein-ligand binding energy calculation. To evaluate a docking target function we compared positions of its minima with the experimentally known pose of the ligand in the protein active site. We evaluated five docking target functions based on either the MMFF94 force field or the PM7 quantum-chemical method with or without implicit solvent models: PCM, COSMO, and SGB. Each function was tested on the same set of 16 protein-ligand complexes. For exhaustive low-energy minima search the novel MPI parallelized docking program FLM and large supercomputer resources were used. Protein-ligand binding energies calculated using low-energy minima were compared with experimental values. It was demonstrated that the docking target function on the base of the MMFF94 force field in vacuo can be used for discovery of native or near native ligand positions by finding the low-energy local minima spectrum of the target function. The importance of solute-solvent interaction for the correct ligand positioning is demonstrated. It is shown that docking accuracy can be improved by replacement of the MMFF94 force field by the new semiempirical quantum-chemical PM7 method.


Introduction
Protein-ligand binding free energy calculation is one of the key problems for molecular modeling in the computer-aided structural based drug design [1][2][3][4]. However, the accuracy of such calculations better than 1 kcal/mol has not been reached yet for a randomly selected target protein [1]. Only with such high accuracy of the protein-ligand binding free energy calculations is it possible to perform the rational inhibitor optimization on the basis of computer modeling and to advance from weak inhibitors to lead compounds (hit to lead) followed by the lead optimization to increase the binding affinity and to improve the selectivity of new inhibitors. Though the most accurate calculations of the protein-ligand binding free energy can be done with molecular dynamic (MD) simulations [5], other approaches of the protein-ligand binding energy calculations, especially docking, are also in demand. Docking is the molecular modeling method based on the search of the ligand binding pose in the target protein active site and the subsequent calculation of the score, which allows the protein-ligand binding free energy to be estimated. Although appreciable progress in improving accuracy of protein-ligand binding free energy calculations with docking is visible in recent years, for example, see [6,7], the success rate, but not high accuracy, is still a measure of the docking success in ligand positioning as well as in the ligand binding energy calculation [7]. It is not surprising because the accuracy of such calculations depends on many interrelated factors in a complicated manner. Those factors are the force field describing inter-and intramolecular interactions, the solvent (water) model, the target protein and ligand models, method and approximations of the free energy calculation, and algorithms of calculations and computing resources concentrated on solving the docking problem for one proteinligand pair, and so forth.
In the frame of the docking procedure, the ligand binding pose is generally believed to be the global minimum of the protein-ligand potential energy function (the docking 2 Advances in Bioinformatics paradigm). Thus, the ligand positioning is the global minimum search problem for the energy target function, depending on the degrees of freedom of the given protein-ligand system. Due to thermal motion in the thermodynamic equilibrium state, the ligand continuously jumps from one binding pose to another and for binding energy estimation we have to find not only the energy global minimum but also at least the low-energy part of the whole local minima spectrum.
The target function is defined by the choice of either the force field or the quantum-chemical method describing interand intramolecular interactions and also by solvent, target protein, and ligand models. Obviously, high accuracy of the correct ligand positioning is the necessary condition for high accuracy of the protein-ligand binding energy calculation and the latter is vitally important for high reliability of docking programs and high effectiveness of their application in drugs design. So, the adequate choice of the target function is extremely important for high accuracy of docking.
There is a wide variety of docking programs, such as AutoDock [8,9], FlexX [10], FlexE [11], ICM [12,13], DOCK [7,14], GOLD [15], SOL [16][17][18], TTDock [19], BUDE [20], and Surflex-Dock [21] with their own target functions and the global minimum search algorithms for the ligand positioning. The situation is aggravated by the fact that most of the target functions used in these docking programs in addition to force field parameters have usually some extra parameters fitted for better predictions at a certain training set of proteins and ligands. These fitting parameters have no physical sense, and their usage makes it difficult to estimate a priori the ligand positioning accuracy as well as the accuracy of the protein-ligand binding energy calculations for a given force field.
In this work, we evaluated 5 target functions for ligandprotein docking on the base of the MMFF94 force field (Merck molecular force field) [22] in vacuum, on the base of the PM7 quantum-chemical semiempirical method in vacuum [23] and also taking into account several implicit solvent models: PCM [24,25], COSMO [26], and SGB [27,28]. These target functions were used without any fitting parameters for the same proteins and ligands structural models. As a global optimization algorithm, we chose the simple Monte Carlo search method to perform the comprehensive search of the protein-ligand low-energy local minima. This search method was implemented in the novel FLM (Find Local Minima) docking program [29]. The FLM program is too slow to compete with existing docking programs but it can find all low-energy local minima in a given part of the protein-ligand phase space if large enough computer resources are available.
The detailed examination of the low-energy minima spectra of a set of protein-ligand complexes has been fulfilled due to use of large supercomputer resources concentrated for this task at the Lomonosov supercomputer of Lomonosov Moscow State University.

The FLM Program.
The FLM program is the MPI (message passing interface) based program, developed to find lowenergy minima of the ligand-protein system. During the minima search, the protein is considered as rigid and the ligand is fully flexible.

Minima Search Algorithm.
The FLM program finds local energy minima of the protein-ligand complex by the simple Monte Carlo search algorithm: multiple local optimizations are performed starting from random initial ligand positions. The random initial ligand position is obtained by a random continuous ligand deformation and rotation-translation: (i) The ligand geometrical center is moved to a random point in the search area. The geometrical center of a molecule is defined here as its center of gravity with all atomic masses equal to unity. The search area is defined as the sphere with the center at the ligand native position geometrical center and with the radius of 8Å. The ligand native position is the position of the ligand in the crystallized protein-ligand complex structure. The search area sphere covers the active site of the target protein.
(ii) The ligand is rotated as a whole around a random axis passing through the ligand center by a random angle from [− , ].
(iii) The ligand torsions are rotated by a random angle from [− , ] (torsion is a single acyclic bond of the ligand). Not all random system conformations are further optimized. At first, atom-atom distances are checked: atoms from each ligand-ligand or protein-ligand atom pair must be separated by more than 0.5Å. Otherwise, this random system conformation is rejected. Local optimization is performed by the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) [30,31] algorithm without any restrictions on the positions of the ligand atoms in the search area. All Cartesian coordinates of ligand atoms are moved during optimization. Each local optimization stopped when the maximal component of the optimized target function gradient decreased to the value 10 −5 kcal/mol/Å. If the ligand center moves out of the search area after the optimization, the respective local minimum is rejected.
After successful local optimization, the energy minimum can be recalculated taking into account interaction with solvent; see Section 2.2.
A set of computed different local minima with the lowest potential energies is being kept in operative memory during FLM calculations. A new computed local minimum is included into the set, if it differs from any minimum of the set, and the minimum with the highest energy is excluded from this set. Two minima are different if RMSD (root mean square deviation) between them exceeds 0.1Å. The RMSD is calculated over the ligand heavy atoms without taking into account possible chemical symmetry. The minima search for each protein-ligand complex was conducted during the given time interval: 3 hours. This way of the program halt was used due to some peculiarities of our supercomputer queuing system. Advances in Bioinformatics 3

Parallelization Algorithm and Efficiency.
The local minima search is parallelized: independent local optimizations of different initial ligand conformations are continuously performed in parallel by different MPI processes. The optimization results are collected in the master process to form the low-energy minima set. The current collected minima set is repeatedly sent back from the master process to other processes, so other processes can select only promising minima to send.
Calculations were done on the "Lomonosov" supercomputer [32]: 1024 nodes (8192 CPUs) were utilized for each run of the FLM program and about 20 000 CPU * hours per a protein-ligand complex and about 100 CPU * hours per a free ligand were consumed during these calculations. Total time for obtaining all minima sets was about several million CPU * hours (including time of postprocessing minima energies recalculations with the other programs: MCBHSOLV and MOPAC). Parallelization efficiency can be estimated via number of finished optimizations in a fixed time. Table 1 presents amount of "useful" calculations in a fixed time of 3 hours (number of finished local optimizations) depending on number of used nodes for the 1DWC protein-ligand complex.
As we can see in Table 1, the FLM program performance scales linearly with the increasing number of working processes. So, the chosen simple Monte Carlo search method for the global optimization is justified in terms of scalability.

Minima Obtaining
Protocol. The set of 16 protein-ligand complexes with experimentally known structures and binding constants was chosen from the Protein Data Bank (PDB) [33] for low-energy local minima search: (i) 4 complexes of the CHK1 (checkpoint kinase 1) protein (4FT0, 4FT9, 4FSW, and 4FTA).
These protein-ligand complexes were chosen, because they are available in PDB with good resolution, and the ligands variety covers a wide range from small and rigid ligands (4FSW ligand: 26 atoms including hydrogen atoms, 0 torsions) to big and flexible ones (1VJ9 ligand: 74 atoms including hydrogen atoms, 19 torsions). Also, the locally optimized ligand native position has RMSD from the original native pose less than 1.5Å for all these 16 complexes, both for the optimization with the MMFF94 in vacuo target function and for the optimization with the PM7 in vacuo target function. That means that the locally optimized ligand native position still can represent the native ligand pose.
Protein structures were prepared by elimination of all "HETATM" records (i.e., the records corresponding to atoms, ions, and molecules which are not part of the protein structure) from the PDB files of the complexes, and then hydrogen atoms were added by the original APLITE program [16] to the protein structures. The APLITE program adds hydrogen atoms according to the standard amino acid protonation states at pH = 7. Histidine protonation state is chosen by comparing of electrochemical potentials for hydrogen atom at "HD1" and "HE2" positions. Optimization of hydrogen atoms positions is performed with MMFF94 force field after the hydrogen atoms preplacement. During this optimization, all rotation variants of torsionally moveable hydrogen atoms (e.g., hydroxyl hydrogen atom from tyrosine) are tested. The heavy atoms optimization is not performed. Ligands were also taken from the PDB files. Hydrogen atoms were added to the ligands by Avogadro program [34]. The heavy atoms optimization is not performed for the initial ligand conformation. The table with 2D ligand structures and information about their charges and numbers of atoms and torsions is presented in the Supplementary Material available online at http://dx.doi.org/10.1155/2015/126858.
The target protein was considered as rigid, and the ligand was considered as totally flexible and moveable in the search area (see Section 2.1.1) around its native position during the minima search.
The first set of the low-energy local minima was obtained by the FLM program with the target function from the MMFF94 force field [22]: the local optimization of the random initial ligand position in the search area and the local minimum selection into the minima set were made on the base of the protein-ligand MMFF94 energy value in vacuo. This set is designated as "{1}MMFF94". Then, these local minima were recalculated for the same protein-ligand geometries with the MMFF94 force field target function taking into account the interaction with water solvent by the MCBHSOLV program [35] in the frame of the PCM implicit model. This low-energy local minima set is designated as "{1}MMFF94 + PCM". The third local minima set was obtained from the "{1}MMFF94" set by the protein-ligand energy local optimization in the frame of the semiempirical quantum-chemical PM7 method [23] with the help of the MOPAC program [36], and this minima set was designated as "{1}PM7". Finally, energies of all minima from the "{1}PM7" set were recalculated for the same geometries by the PM7 method with the COSMO implicit water solvent model [26] implemented in the MOPAC program, and this minima set was designated as "{1}PM7 + COSMO".
Further, the second option of the low-energy minima search was performed by the FLM program for all 16 proteinligand complexes as follows: the selection of local minima into the low-energy minima set was made on the base of the MMFF94 force field taking into account water in the frame of the PCM implicit solvent model. In other words, the local energy optimization of the initial random ligand position in the search area was carried out in the frame of the MMFF94 force field in vacuo as previously, but the selection into the low-energy local minima set was made on the base of the minima energies recalculated in the final optimization point taking into account the protein-ligand complex interaction with water solvent in the frame of the PCM model [35]. This set was designated as "{2}MMFF94+PCM". Energies of these minima have been recalculated with the MMFF94 force field in vacuo (the minima set "{2}MMFF94") and also taking into account interaction of protein-ligand complexes with water in the frame of the SGB [27] implicit solvent model (the set "{2}MMFF94 + SGB") implemented in the DISOLV program [16,28].
In other words, the 1st group of the minima sets has been obtained on the base of MMFF94 in vacuo target function local minima search: "{1}MMFF94" and "{1}MMFF94 + PCM", with one ensemble of configurations, and "{1}PM7" and "{1}PM7 + COSMO", with another ensemble of configurations. The 2nd group of the minima sets has been obtained on the base of the local minima search with the MMFF94 target function taking into account the PCM solvent model: "{2}MMFF94", "{2}MMFF94 + PCM", and "{2}MMFF94 + SGB", all with the same minima configurations.
Implicit solvent models PCM and SGB are implemented in the DISOLV program, and it has been shown that the sufficiently high accuracy (<1 kcal/mol) of the protein-ligand complex interaction with solvent could be achieved only for a sufficiently small (≤0.1-0.3Å) step of the triangulation grid on the SES (Solvent Excluded Surface) surface [28]. However, the PCM solving in the DISOLV program is too slow to be used in the docking process. Nevertheless, the MCBHSOLV program [35] has been developed recently on the base of the multicharge method for approximation of large dense matrices [37] generated by triangulation of the molecular SES and by discretization of polarized charges on it. This new implementation of the PCM model is about 300 times faster than the DISOLV program without the accuracy loss. Just the MCBHSOLV program has been used for obtaining the minima sets "{1}MMFF94+PCM" and "{2}MMFF94+PCM" with the triangulation grid steps 0.2Å and 0.3Å, respectively, and in the latter case all selected minima energies have been recalculated with the triangulation grid step 0.15Å. The minima set "{2}MMFF94 + SGB" has been obtained with the DISOLV program using the SGB method (the triangulation grid step was also 0.15Å), which was not so accurate as PCM [28] but it is several times faster than MCBHSOLV [35].
Parameters of the FLM minima search were chosen in such a way that 10 5 -10 6 test local optimizations were performed for one protein-ligand complex and only 10 3 -10 4 different minima (without taking into account chemical symmetry) were chosen into the low-energy minima set. Among these low-energy minima up to several dozen ones were in the interval 5kT from the lowest energy minimum, for example, 46 minima for the 1VJA complex, and only those gave a significant contribution to the protein-ligand binding free energy.
In addition to the low-energy minima found by the FLM program, we considered also the locally optimized ligand poses obtained with the same target functions local optimization but from the experimentally observed native ligand initial positions. Just these optimized native ligands were compared with the respective low-energy minima found by the FLM program.

Binding Free Energy Calculation.
The protein-ligand binding free energy Δ bind is calculated as Δ bind = (PL) − (P) − (L), where (PL), (P), and (L) are free energies of the protein-ligand complex, the free protein, and the free ligand, respectively. Proteins are considered as rigid and free energies of protein-ligand complexes and free ligands are calculated in the multiwell approximation which is similar to the "mining minima" method of Chen et al. [6]. The potential energy of a molecular system is approximated by a set of independent parabolic wells in these methods. The multiwell approximation differs from the "mining minima" method mainly by more uniform and exhaustive low-energy local minima search by the FLM program instead of configuration space exploration along a combination of low-frequency modes as it was made by the "mining minima" method; also we used the Cartesian coordinates instead of the bond-angletorsion coordinates. The configuration integral of a molecular system (thus, the free energy = − ln( )) is approximated by a sum of contributions from different energy wells : where 0 is the potential energy value in the minimum of the th energy well, ] corresponds to the vibrational degrees of freedom of the ligand in the th energy well, and and correspond to the translation and rotation of a molecular system as a whole, respectively [38,39]: where is th natural frequency of th harmonic oscillator corresponding to the th energy well, is the number of vibrating atoms, is the Boltzmann constant, and is absolute temperature: where is Euler's constant, is the overall mass of the molecular system, is the concentration (the reciprocal volume per one molecule; in the case of standard free energy calculation = 1 mol/L = 6.02 * 10 26 units/m 3 ), and ℎ is Planck's constant: Advances in Bioinformatics 5 here , , and are principal moments of inertia of the molecule.
Natural frequencies for protein-ligand complexes and ligands were calculated through respective Hessian matrices. Proteins were considered as rigid, and no frequencies were calculated for them.

Results and Discussion
3.1. Minima Analysis. In principle, the Monte Carlo search algorithm is able to find the truly global minimum at the expense of huge computational resources. If we want to treat the result of the Monte Carlo search as global minimum, then this minimum, at least, must have energy below or equal to the energy of any arbitrary ligand position. The ligand native position is believed to be the global minimum of the proteinligand system potential energy. So, the optimized ligand native position can be used as good upper estimate of the global minimum energy. We can check whether the Monte Carlo search algorithm has found the minimum with the energy below or equal to the energy of the optimized ligand native position. We checked this statement for both of the the FLM program run results, that is, for "{1}MMFF94" and "{2}MMFF94 + PCM" minima sets, and this statement is true in all cases except the 1VJA case from the "{2}MMFF94 + PCM" minima set. So, we can adopt that the Monte Carlo search parameters were good enough, and the found minima are the real low-energy minima. The exceptional case of the 1VJA complex, where ligand has 61 atoms and 17 torsions, is a hard system for the global minimum search, and the time of the minima search was not long enough to reveal the global minimum.
Reliability of the low-energy minima search for a given protein-ligand complex can be verified as follows. Let us analyze the update rate of the local minima set (the set of different local minima with the lowest energies) during "test optimizations." The local minima set is updated, when a new local minimum after a "test optimization" is accepted into it. If the local minima set is not updated for a long time, then the local minima search is likely thorough. Two examples of the local minima set update rate are shown in Figure 1, where the dependences of the number of updates on the total number of the performed "test optimizations" are presented for two protein-ligand complexes. First example, the protein-ligand complex 1SQO where the ligand has 34 atoms and 4 torsions, is a simple system for the local minima search. Second example, the protein-ligand complex 1VJA where the ligand has 61 atoms and 17 torsions, is a hard system for the local minima search. As we can see from Figure 1, the local minima set for the 1SQO complex almost reached the saturation after 4 * 10 5 "test optimizations", but the local minima set for the 1VJA complex was far from the saturation after 4 * 10 5 "test optimizations." So, it indicates that the complete minima search for the protein-ligand complex with a big flexible ligand requires more than 4 * 10 5 "test optimizations." It is also interesting to analyze the global minimum update rate. The global minimum of the 1SQO protein-ligand complex was found almost immediately, after 661 done "test optimizations," and it has not changed till the end. But the global minimum of the 1VJA complex was updated for the last time only after 64205 done "test optimizations." If we did more than 4 * 10 5 "test optimizations," we would probably find a deeper global minimum of the 1VJA complex.
Let us introduce some notations. The minima set of the given protein-ligand complex with energies calculated by a given target function can be sorted by their energy in ascending order; that is, every minimum gets its own index equal to its number in this sorted list of minima. The lowest energy minimum has index equal to 1. When we include the energy of the locally optimized native ligand in this sorted list, it also will get a certain index and we will designate it as "Index of Native" or "IN." When we do not include the optimized native ligand in this sorted minima list, some minima from the list might be close in space to the native (nonoptimized) ligand position. It is possible even that one minimum found by the FLM program will coincide with the optimized native ligand position. We designate the index of the minimum having RMSD from the nonoptimized native ligand position less than 2Å as "Index of Near Native" or "INN." If there are several such minima which are close to the native position, we will choose the minimum with the lowest energy (with the lowest index) as "INN." The extreme values of these indexes could be interpreted as follows:  function is most likely valid for ligand positioning, and the minima search is thorough.  Table 2.
We can see in Table 2 that the protein-ligand MMFF94 energy in vacuo is the valid target function for ligand positioning strictly speaking only for few protein-ligand complexes: 1C5Y, 1F5L, and 1SQO (see column "{1}MMFF94"). Only for these three complexes (20% out of 16 complexes; this percentage was the same when we expanded the test set up to 30 complexes) the optimized native ligand position has the lowest energy among all energy minima found by the FLM program (IN = 1), and the position of the minimum with the lowest energy (the global minimum of the target function) found by the FLM program is close to the ligand native pose (INN = 1). For all other 13 complexes, the energy of optimized native ligand position is higher (IN > 1) than energies of some minima of the target function (MMFF94 in vacuo) and the difference between the optimized native ligand position and the global energy minimum can be several kcal/mol (4.7 kcal/mol for the 1VJA complex) or as large as 91.2 kcal/mol for the 4FTA complex. Nevertheless, there is a low-energy minimum of the target function close to the native ligand pose for many of these complexes (INN = 1, 2, . . . , 28, 131). We can definitely conclude that the MMFF94 in vacuo target function is invalid only for ligand positioning for four complexes (4FTA, 4FV6, 1DWC, and 1TOM): the energy of the optimized native ligand position is higher than energies of many target function local minima (IN ≫ 1) and there are no target function local minima close to the native pose (INN ≫ 1). Except for these four "bad" complexes, for all other complexes there is a local minimum of the target function (MMFF94 in vacuo) which is situated near the ligand native pose and its index is not larger that ≈10 2 . This means that if we find 10 2 -10 3 local minima of the target function (MMFF94 in vacuo) there will be ligand poses among them which are situated near the native ligand pose or near the optimized native ligand pose. So, for the accurate calculations of the protein-ligand binding free energy we have to take into account the whole spectrum of the target function lowenergy local minima. In this case, the local energy minimum (or several minima) near the native ligand pose or near the optimized native ligand position will be included in the calculation of the protein-ligand binding energy for many (for 13 complexes of 16) of the considered protein-ligand complexes.
We can see in Table 2, comparing "{1}MMFF94" and "{1}MMFF94 + PCM" columns, that taking solvent into account in the frame of the PCM method can improve the target function quality for most complexes, except for the 1VJ9 case.    Table 2.
Also, comparing "{2}MMFF94 + PCM" and "{2}MMFF94 + SGB" columns in Table 2, we can conclude that the SGB method improves positioning quality better than the PCM method (e.g., IN/INN for 4FT0 decrease from 164/159 to 8/6 with changing the target function from "{2}MMFF94 + PCM" to "{2}MMFF94 + SGB"), although the PCM method is more sophisticated [28]. We explain this fact by relative smoothness of the SGB method comparing with PCM. The latter depends on the surface triangulation and the multicharge large matrixes approximation [35] which can be different for close local minima and demands many iterations to reach self-consistency. However, the SGB method is based on direct calculations and for close minima it must give close energies of protein-ligand complex interaction with solvent.
Values of minima RMSD from the native (crystallized) ligand position are given in Figure 2 for all 1024 low-energy minima for each of all 16 complexes. Figure 2(a) presents the RMSD of conformations obtained for the "{1}MMFF94" (and also for "{1}MMFF94 + PCM") set, and Figure 2(b) presents the RMSD of conformations obtained for the "{2}MMFF94 + PCM" (and also for "{2}MMFF94" and "{2}MMFF94+SGB") set. Local optimization from the initial "{1}MMFF94" conformations carried out by the PM7 method and resulting in "{1}PM7" and "{1}PM7+COSMO" sets did not change significantly the conformations of the ligands: RMSD between the initial "{1}MMFF94" and PM7 optimized poses were in the range ≈0.1-1.0Å. All low-energy minima were ranked by the RMSD value and the relationships between the RMSD and order number are presented in Figure 2. It may be remarked that the lowest RMSD values are less than 2Å for all complexes (excluding the 4FTA, 4FV6, and 1TOM complexes from the "{1}MMFF94" set).
The comparison of the RMSD values obtained by the FLM program with those obtained by other docking programs (SOL [16][17][18] and Autodock [8,9]) shows that SOL docking with standard parameters (and 99 runs) gives 13 complexes out of 16 which have the smallest RMSD <2Å, and Autodock docking with standard parameters (and 100 runs) gives 14 complexes out of 16 which have the smallest RMSD <2Å. Thus, all these programs can find well enough the position near the native ligand position, but this position does not always correspond to the energy global minimum, which is defined by the energy target function.
All low-energy minima found by FLM can be grouped into different clusters with respect to RMSD between the ligand conformations. Two conformations belong to one cluster if RMSD between them is less than 1.4Å. Numbers of clusters   The indices of the clusters where the native ligand conformation is located. The clusters are sorted here by lowest energies of their conformations in ascending order; that is, the cluster containing the minimum with the lowest protein-ligand energy has index equal to 1. "inf" means that the native position does not fall into any cluster. for each protein-ligand complex are presented in Figure 3 and Table 3.

Complex ID
We can see that, for some complexes, 1TOM and 1O3P, all minima are divided into only several groups, but for many other complexes the number of clusters varies from several dozen to several hundred. Also, change from the energy in vacuum ({1}MMFF94) to the energy in solvent ({2}MMFF94 + PCM) during the minima search results in considerable increase in number of clusters for most of the complexes; that is, the diversity of low-energy minima configurations increases. Table 3 shows that the native ligand configuration does not always belong to the cluster with the lowest minima energies.

The Binding Free Energy Components.
As the initial approximation for the binding energy, it is possible to take the difference between potential energy of the protein-ligand complex global minimum, potential energy of the free ligand global minimum, and potential energy of the free protein.
These values for the MMFF94 force field in vacuum together with several additive corrections as well as the total binding energies (calculated in the multiwell approximation and experimental ones) are shown in Table 4.
As we can see from Table 4, the potential energy component of the binding energy is the most variable and important. The binding energy corrections, corresponding to translational and rotational degrees of freedom, are practically constant for all protein-ligand complexes, and binding energy corrections, corresponding to vibration degrees of freedom and multiple minima accounting, are relatively small. Thereby, the protein-ligand binding free energy is primarily determined by the potential energies of the global minima. We would like to emphasize that the energy of the ligand deformation from its free conformation to the bound one (the ligand strain energy) is automatically taken into account in our calculations, and its value is in the range from several kcal/ mol to several dozen kcal/mol for tested complexes. This strain energy is neglected in the score of most docking programs.

Calculated and Experimental Binding Energies for Different Energy
Functions. Finally, we compared experimental and calculated binding energies, and the results are presented in Table 5. For comparison of our results with those obtained through more conventional approaches and real docking programs, we also presented in Table 5 scoring functions obtained with docking programs SOL [16][17][18] and Autodock [8,9]. The Autodock program is widely used for docking and the SOL program was successfully used recently for development of new thrombin [40], urokinase [41], and factor Xa inhibitors [42].
Experimental binding free energy Δ exp were obtained from the respective binding constants by (5), where are substituted in mol/L units and temperature is equal to 310 K: The "{1}MMFF94", "{1}MMFF94 + PCM", "{1}PM7", and "{1}PM7 + COSMO" and "{2}MMFF94", "{2}MMFF94 + PCM", and "{2}MMFF94 + SGB" energies were calculated as a difference between potential energies of the protein-ligand complex (in the global minimum), free ligand (in the global minimum), and free protein (in its initial conformation, because it is considered as rigid).  Table 5: Binding energies (in kcal/mol) calculated as 1 0 (PL) − (P) − 1 0 (L), where 1 0 (PL) and 1 0 (L) are energies of the protein-ligand complex and the free ligand in their global minima, respectively, and (P) is energy of the protein in its configuration prepared as it is described in Section 2. The global energies of complexes and ligands were taken from respective minima sets (see Section 2.2). Δ exp is the experimental binding energy calculated from the binding constant. "Energy range" is the difference between the highest and the lowest energies among all 16 protein-ligand complexes. "Energy correlation" is Pearson correlation coefficient between experimental and calculated binding energies. Autodock and SOL scoring functions are also given to compare (in kcal/mol). The energy ranges (difference between the highest and the lowest energies among the 16 protein-ligand complexes) and coefficients of correlation with the experimental energies are also shown in Table 5. We can see that calculated binding energies differ strongly from the experimental values for all investigated energy functions (excluding the SOL and the Autodock scoring functions where the fitting coefficients are used). Nevertheless, we can see noticeable improvement in the values of the energy ranges when solvent is taken into account and quantum-chemical PM7 method is used instead of MMFF94 force field. Among seven variants of potential energy function, the "PM7 + COSMO" energy function has the most realistic energy range 35.5 kcal/mol to be compared with 7 kcal/mol for the experimental binding energy range.

PDB ID
Also we can see that the correlation coefficients are not close to 1, and for the "{1}MMFF94 + PCM", "{2}MMFF94 + PCM", and "{2}MMFF94 + SGB" cases the coefficient is even negative. But among seven variants of the potential energy function, the "PM7" energy function has the best correlation coefficient with experimental energies. We should emphasize that these results have been obtained by using the general purpose methods (MMFF94, PCM, PM7, and COSMO) without any fitting coefficients to the special case of the protein-ligand interactions. Table 5 also presents the SOL and the Autodock scoring functions, their energy ranges, and their correlations with the experiment binding energies. Although the energy ranges of Autodock and SOL scoring functions are closer to the range of the experimental binding energies than the ranges of binding energies calculated by the FLM program with different target functions, the correlation coefficients of SOL and Autodock scoring functions with experimental values are much smaller than ones calculated with the FLM program ("{1}MMFF94", "{1}PM7", "{1}PM7 + COSMO", and "{2}MMFF94"). This may follow from the fact that SOL and Autodock docking programs are aimed to virtual screening of large databases of compounds and many simplifications are used in these programs (e.g., simplified force field and absence of the ligand deformation energy in the scoring function).

Conclusions
The results of our investigations show that the docking target function on the base of MMFF94 force field in vacuo can be used for discovery of native or near native ligand positions for some protein-ligand complexes by finding the global energy minimum of the target function, for example, for five complexes of 16 in Table 2 either IN = 1 or INN = 1 (there were 14 such cases when we expanded the test set up to 30 complexes). If a set of low-energy minima of this target function is taken into account (not only the global minimum), we find the near native ligand position for much more complexes; for example, the number of such complexes in Table 2 will be 13 out of 16 for the set of 1024 low-energy minima. We can conclude that for the calculation of the protein-ligand binding free energy, it is better to take into account not only the global energy minimum but a whole set, for example, 1024, of the low-energy local minima. In this case, we take into consideration near native ligand positions for most of the complexes investigated in the present research. Nevertheless for some complexes (4FTA, 4FV6, and 1TOM in Table 2) it is impossible to find near native ligand positions among 1024 local minima of the target function (MMFF94 in vacuo): indexes IN/INN are inf/inf. However, the rational target function improvement has been demonstrated without use of any fitting parameters: the ligand positioning is better for the target function of the more sophisticated physical model, that is, taking into account the implicit solvent model. Also improvement of the ligand positioning occurs when MMFF94 force field is substituted by the semiempirical quantum-chemical method PM7. It is apparent from Table 2 that indexes IN and INN decrease for the more sophisticated models. It is also obvious from Table 2 that usage of more sophisticated target function at the stage of low-energy minima selection results in better ligand positioning: the lowest IN and INN indexes are in the two rightmost columns in Table 2 and there are no "inf" labels at all in these two columns.
The best target functions of all target functions examined in this research are the protein-ligand potential energy in the frame of MMFF94 force field [22] with the implicit solvent SGB model [27,28] and the potential energy in the frame of the semiempirical PM7 method [23] with the COSMO implicit solvent model.
The results of the present investigation show that further improvement of the docking target function is required until its global minimum coincides with or is near the optimized native ligand position for a broad set of protein-ligand complexes (IN = 1, INN = 1).
Calculated in the multiwell approximation binding free energies differ strongly from experimental binding energies for all investigated energy functions. More realistic binding energies were obtained for new quantum-chemical semiempirical PM7 method and taking into account water in the implicit COSMO model. The main contribution to the binding free energy is given by potential energies of the protein, ligand (in the global minimum), and their complex (in the global minimum) with solvent taken into account. Present investigations show that for the increase of docking accuracy for ligand positioning as well as for binding energy calculation it is necessary to take into account interaction of the protein, ligand, and their complex with water solvent and also to look for or create force field better than MMFF94 and/or to carry out docking with quantum-chemical methods, for example, with new semiempirical PM7 method.
The correlation coefficients between the experimental binding energies and the energies calculated by the FLM program are still far from unity, but for some target functions they are much larger than correlation coefficients between SOL or Autodock scoring functions and the experimental binding energies.
On the other hand, the improvement of the docking results for a given force field and a solvent model can be expected if we take into account the mobility of the protein atoms which locate close to the ligand (first of all, protein hydrogen atoms whose positions are not determined experimentally). Such a facility is included in the FLM program, and we hope to carry out respective investigations in the near future.
Despite the fact that parallel computing is used now mainly for docking of large ligand databases (virtual screening), the employment of advanced and more sophisticated models demands larger computational resources and increases importance of parallel algorithms and their acceleration [20] for docking of a single ligand.
Relying on the present investigations, we guess that current supercomputers are powerful enough to perform a comprehensive search of protein-ligand low-energy minima in the frame of existing two-body force fields with implicit solvent models and without usage of preliminary calculated energy grids for flexible ligands with up to about 20 internal torsions. The respective search programs (e.g., our program FLM) is a tool to evaluate adequacy of force fields and solvent models for ligand docking into the active sites of the target proteins.