A New Approach for Flexible Molecular Docking Based on Swarm Intelligence

. Molecular docking methods play an important role in the field of computer-aided drug design. In the work, on the basis of the molecular docking program AutoDock, we present QLDock as a tool for flexible molecular docking. For the energy evaluation, the algorithm uses the binding free energy function that is provided by the AutoDock 4.2 tool. The new search algorithm combines the features of a quantum-behaved particle swarm optimization (QPSO) algorithm and local search method of Solis and Wets for solving the highly flexible protein-ligand docking problem. We compute the interaction of 23 protein-ligand complexes and compare the results with those of the QDock and AutoDock programs. The experimental results show that our approach leads to substantially lower docking energy and higher docking precision in comparison to Lamarckian genetic algorithm and QPSO algorithm alone. QPSO-ls algorithm was able to identify the correct binding mode of 74% of the complexes. In comparison, the accuracy of QPSO and LGA is 52% and 61%, respectively. This difference in performance rises with increasing complexity of the ligand. Thus, the novel algorithm QPSO-ls may be used to dock ligand with many rotatable bonds with high accuracy.


Introduction
Computer-aided tools have wide applications in various fields, such as biotechnology engineering, medical engineering, mathematical modeling, and electronic information [1][2][3][4].In the drug design field, molecular docking is a useful tool in drug discovery and rational drug design.New drugs can be designed efficiently by using computer-aided docking algorithm simulations to find highly affine components that bind well to the targeted protein [5,6].By doing so, the biochemical reaction that the target molecule catalyzes can be altered or prevented.In the past, drug discovery process was very expensive and time consuming by in vitro experiments.Now, structure-based in silico approach allows for a faster and cheaper identification of promising drug candidates by the virtual screening of compound databases [7].
Molecular docking has been developed with primarily two parts varying: scoring function and searching method [8,9].Docking methods typically use an energy-based scoring function to identify the energetically most favorable ligand conformation when bound to the target.The general hypothesis is that lower energy scores represent better proteinligand bindings compared to higher energy values.Therefore, molecular docking can be viewed as a complex combinatorial optimization problem.The aim of combinatorial optimization is to find the ligand-binding mode with the lowest energy [10].
During the last decade, many different tools have been used for the docking problem, such as DOCK [11], GOLD [12], FlexX [13], and AutoDock [14].Among these programs, AutoDock is a well-known example of flexible docking technique.Simulated annealing [15] was originally used within AutoDock.Later, a genetic algorithm (GA) [14] and a Lamarckian genetic algorithm (LGA) [16], which incorporates a local search, have been added.Recently, also particle swarm optimization (PSO) [17][18][19][20] and ant colony optimization (ACO) [21] have been proposed as search methods in the AutoDock program.Although there exist many different algorithms and programs, the problem is far from being solved [22,23].None as of now can be viewed as offering 2 Mathematical Problems in Engineering a robust and accurate solution to the docking problem, even in the context of a rigid protein receptor.In general, the complexity of the task rises with increasing flexibility of the molecules.For flexible docking, the number of rotatable bonds of the side chains increases the dimension of the search space excessively.And if parts of the receptor are also considered to be flexible, the number of rotatable bonds of the system will be the sum of the rotatable bonds of ligand and receptor region; this situation should become more complicated.
In this study, we propose a novel search method called QPSO-ls for solving highly flexible docking problem, which is a hybrid of quantum-behaved particle swarm optimization (QPSO) and a local search method.In 2004, inspired by quantum mechanics and trajectory analysis of PSO [24], Sun et al. [25] used a strategy based on a quantum delta potential well model to sample around the previous best points and later introduced the mean best position into the algorithm and proposed a new version of PSO, quantum-behaved particle swarm optimization.QPSO is a kind of probabilistic algorithm [26]; it needs no velocity vectors for particles and has fewer parameters to adjust.It has been proved that this iterative equation leads QPSO to be globally convergent [27].It has been shown to successfully solve a wide range of continuous optimization problems.For example, it has been used to tackle the problems of multiobjective optimization [28], neural network training [29], bioinformatics [30], and so on.In addition to the applications, many efficient strategies have been proposed to improve the performance of QPSO [31,32].In the study, to improve search performance, an efficient variant of the Solis and Wets local search technique [33] is incorporated into QPSO algorithm such that both techniques cooperate fully with each other.Simulation results reveal that QPSO-ls has a better convergence and a lower docked energy than QPSO algorithm alone.The environment and scoring function that are used by the algorithm are provided by AutoDock version 4.2.
The remaining part of the paper is arranged as follows.First in Section 2 we describe the algorithms and scoring function employed by the program for the docking problem.Next in Section 3 we describe input data preparation and docking protocols.The docking experimental results are discussed in Section 4. Finally, conclusions are given in Section 5.

Algorithm and Scoring Function
2.1.Quantum-Behaved Particle Swarm Optimization (QPSO) Scheme.Quantum-behaved particle swarm optimization (QPSO) was motivated by concepts from quantum mechanics and particle swarm optimization (PSO); it is a probabilistic optimization algorithm.In QPSO, the state of a particle is depicted by wavefunction ( ⃗ , ) instead of ⃗  and ⃗ V.The dynamic behavior of the particle is widely divergent from that of the particle in traditional PSO systems in that the exact values of ⃗  and ⃗ V cannot be determined simultaneously.We can only learn the probability of the particle's appearing in position ⃗  from probability density function |( ⃗ , )| 2 , the form of which depends on the potential field the particle lies in.And then the probability distribution function of the particle's position can be calculated through the probability density function.By employing the Monte Carlo method, the particle's position is updated according to the following equation: In [25], parameter  is evaluated by where parameter  is called the contraction-expansion coefficient, which can be tuned to control the convergence speed of the algorithms.Then we get the position update equation as Parameter  is the only parameter in QPSO algorithm.The control method of  is vital to convergence rate and performance of the algorithm.Therefore, in the later work [34], Sun et al. proposed a mainstream thought point to evaluate parameter , the creativity of a particle.The mainstream thought point or mean best position (mbest) is defined as the center of gravity gbest position of the particle swarm.That is, where  is the population size and   is the gbest position of particle .The value of  is given by Hence, the particle's position is updated according to the following equation: This equation is iterative equation of position in quantum-behaved particle swarm optimization (QPSO) algorithm.
The procedure for implementing the QPSO is given by Algorithm 1.

Local Search.
The local search method of Solis and Wets was employed in this study.The method of Solis and Wets is a stochastic heuristic for continuous parameter spaces, which introduces a probabilistic element.Its primal purpose is the local optimization of functions that do not provide gradient information.Basically, the local optimization starts by exploring a random direction in search space and generally follows this direction with random movements as long as the objective function keeps improving.Continued improvements lead to an expansion of the random search steps, whereas continued failing narrows the search.AutoDock adopts a LGA which is a hybrid of GA and a variant of Solis and Wets local search, and the algorithm is illustrated in detail in [19,33].

Hybrid Algorithm of QLDock.
The landscape of energy function contains many local minima for flexible docking, so the performance of using only individual global search or local search may be not satisfactory.By analogy to combining GA with LS, we combined QPSO with LS and applied a hybrid algorithm to solve high flexible docking problem by proper design to enhance advantages of both algorithms as well as reducing their disadvantages to solve specific optimization problem.
In the hybrid search, QPSO can globally explore promising regions with low energy; the local search modifies the position parameters of a particle without referring to other particles and also maintains the diversity among particles.The local search may be applied to the particle according to a predefined probability.Hence, the algorithm QPSO-ls is beneficial to maintain diversity of particles and in turn prevent premature convergence.The global search method of QPSO and the Solis and Wets local search technique can efficiently compensate with each other in the new QPSO-ls hybrid algorithm.Simulation results show also that QPSO with local search is more efficient than QPSO without local search.A general structure of the QPSO-ls hybrid is as shown in Algorithm 2.

Scoring Function.
Search algorithms are able to quickly generate large number of possible conformations.The purpose of a scoring function is to compare the "quality" of these possible solutions and then select best binding modes.The scoring functions in common use attempt to approximate the binding free energy for the ligand binding to the receptor; a low energy indicates stable system and thus a likely receptorligand binding interaction.AutoDock 4.2 [35][36][37] adopts a semiempirical free energy scoring function to evaluate a docked conformation.The total docked energy is expressed as the sum of the intramolecular energy and the intermolecular energy of the ligand and protein.First, the intramolecular energetics are estimated for the transition from the unbound states to the conformation of the ligand and protein in the bound state.Then, the second step evaluates the intermolecular energetics of combining the ligand and protein in their bound conformation.
The force field includes six pairwise evaluations () and an estimate of the conformational entropy lost upon binding (Δ conf ): where  refers to the "protein" and  refers to the "ligand" in a protein-ligand docking calculation.Each of the pairwise energetic terms is defined by the following energy: The first term is a typical Lennard-Jones 12-6 dispersion repulsion interaction.The second term is a directional 12-10 hydrogen bond potential.The third term is a screened

Data Preparation and Parameter Setting
3.1.Data Preparation.The prepared protein and ligand structures were saved in the PDBQT file format with the AutoDock version 4.2.All the following procedures were performed using AutoDock Tools.First, the protein input files were obtained according to the AutoDock manual: (1) remove water molecules and metal ions not belonging to the binding site first; (2) repair residues with missing atoms; (3) add hydrogen atoms, assign partial charges, and merge nonpolar hydrogen atoms; and (4) assign solvent parameters.Then, the ligand input files also follow the AutoDock manual: (1) extract the coordinates of ligand atoms from the PDB file; (2) add hydrogen to all atoms and assign partial charges; and (3) define rigid root and torsions of the ligand.The program AutoGrid was used to calculate energy grid maps.A default grid size of 60 × 60 × 60 points with a spacing of 0.375 Å was applied, which corresponds to a cube with an edge length of 22.5 Å.

Experiment Parameter
Setting.The 23 protein-ligand complexes that are used for the computational experiments are listed in Table 1.In AutoDock, the solution representation consists of a vector specifying the position and orientation of the ligand and the arrangement of its rotatable bonds.For describing the orientation of a ligand, a quaternion (defined by a unit vector and a rotation around that vector) is used.Hence, a solution of docking needs to optimize 7 + ntor parameters: 3 for the position, 4 for the orientation, and ntor for the number of rotatable bonds in the ligand.
The docking experiments were performed on the same computer.All methods use the same protein coordinate and start with exactly the same initial location, conformation, and orientation of the ligand.Each docking experiment has been repeated 30 times.The initial population was set to 50 individuals; maximum number of energy evaluations was 2.5 × 10 5 ; maximum number of generations was 27,000.The local search was based on the Solis and Wets local search.The probability of performing local search on individual was 0.06; the iterations of Solis and Wets local search were 300.The other parameters provided by the default setting were the same as in AutoDock.

Results
To compare the performance of the docking problem, we applied the same parameter settings for all algorithms.For the evaluation of each complex, we recorded the docking energy, root mean square deviation (RMSD), running time, and the total number of scoring function evaluations in all docking runs.We rewrote the AutoDock program using C++ under Linux.The docking tests run on an Intel Pentium dual core 3.0 GHz PC with 2 G memory.The OS is Red Hat Enterprise Linux.

Evaluation of Docking Energy.
To investigate the efficiency of the hybrid algorithm of QPSO-ls, 30 independent docking experiments were performed for QLDock, QDock, and AutoDock.Optimization algorithms are guided only by energy function; their search ability can be evaluated in terms of docked energy.Table 2 shows the results of the best docking energy and RMSD value of 30 docking runs using three algorithms.We chose 23 protein-ligand complexes to cover a wide range of ligands differing in their flexibility; the dimension of the search space ranges from 7 (2 rotatable bonds) to 27 (20 rotatable bonds).Since molecular docking simulation methods rely on the assumption that a lower binding energy of the predicted complex is closer to the native state, as can be seen in Table 2, QPSO-ls obtained the lowest energies for 19 out of 23 complexes.For highly flexible ligands with torsions >10, the energies obtained by QPSO-ls are much lower than those of LGA, except for the complex 2ifb with plm.The RMSD values obtained by QPSO-ls were also smaller than those of the LGAs, except for 1apu.For the complexes with less flexible ligands, the differences of both docked energy and RMSD between the three methods are not significant.The docking energy and RMSD obtained by QPSO-ls are considerably better than those returned by QPSO and LGA, for instance, especially for highly flexible docking.Also, it has been shown that the QPSO-ls performs very well compared to the common algorithm of QPSO.
In Figure 1, the docking energy of the best solution of 30 docking runs is shown for three algorithms.In the case of the structure 1aaq, the ligand psi has 20 rotatable bonds resulting in a dimension search space of 27 and represents a challenge for flexible docking.In QPSO-ls algorithm, the application of the local search strategy is advantageous resulting in a pose with binding energy = −17.14kcal/mol and an RMSD value of 1.64 Å.In comparison, the value of QPSO's binding energy = −10.63kcal/mol and RMSD = 3.90 Å and the value of LGA's binding energy = −8.82kcal/mol and RMSD = 2.96 Å, respectively.
Moreover, QPSO-ls finds this ligand complex with lower binding energy after approximately 140,000 computing steps in comparison to 150,000 computing steps found by the LGA.It shows that QPSO-ls had better convergence performance than LGA, which also employs a local search strategy.The average execution time of QPSO-ls per independent run was 27.53 s, while that of LGA was 25.69 s.The execution time of QPSO-ls was slightly greater than that of LGA using the same number 250,000 of function evaluations.
For the complexes with less flexible ligands, the evolution of the lowest energy among the individuals through generations displays a similar behavior for three algorithms, and the algorithm converges very rapidly to lower energies.In most of the cases, the lowest energy is reached almost after 70000 computing steps and remains stable after that for the 1abf/fca complexes.Similar results are obtained also for other less flexible ligands.
For the complexes with more than 15 rotatable bonds, the algorithm QPSO-ls is able to find conformations with much lower binding energies than QPSO and LGA.In this situation, QPSO and LGA lose the diversity in the population quickly and converge in some local optima.Moreover, the results show that the good performance of QPSO-ls with respect to QPSO is due to the concept of local search during the optimization, which can help to escape local minima.Although the local search procedure of QPSO-ls algorithm is computationally more demanding, the chance to find the optimum is higher.Consequently, this algorithm should be well suited for docking of ligands with many rotatable bonds.

Comparison of Docking Accuracy.
To assess the docking accuracy of three programs, the root mean square deviation (RMSD) value with respect to the reference structure was chosen as a measure for the quality of the prediction.RMSD can be calculated using the following formula: where  is the total number of atoms in the molecule and  is a distance between each pair of corresponding atoms.The RMSD value of 2 Å is used as a cutoff value.Poses closer to the experimentally determined structure (with RMSD lower than 2 Å) are considered sound.Figure 2 presents the plots of the RMSD values derived by QLDock and two other docking programs versus corresponding protein-ligand complexes.A prediction of a binding mode is considered successful if the RMSD is below 2 Å.As can be seen from Figure 2, the QPSO-ls algorithm of QLDOCK predicts the protein-ligand complexes correctly in 74% of the cases, which is the highest percentage of all docking programs applied.The QPSO algorithm of QDOCK was successful in 52% of the cases and AutoDock in 61%.In 12 of 23 cases, QLDock provides also the lowest RMSD value of all docking programs investigated with an RMSD below 2.0 Å, as three programs employ the same docking function, which indicates the superior sampling efficiency of the QPSO-ls algorithm.
Comparing the performance of algorithm with respect to the flexibility of a ligand, we divided the investigated complexes into two classes depending on the number of rotatable bonds of the ligand: when the rotatable bond is more than five, ligand is regarded as highly flexible molecule.As listed in Table 1, QLDock predicts highly flexible ligands very well with an average RMSD value of 1.65 Å.The average RMSD values are 2.25 Å and 2.10 Å for QPSO and LGA docking programs, respectively.We can observe that better results are obtained for mean RMSD in the case of rotatable bond >10.On the contrary, this confirms again the applicability of the QPSOls algorithm for an efficient sampling in high-dimensional search spaces.
In Figure 3(a)1aaq, we present results for the 1aaq/psi protein-ligand complex which show that the QLDock method gives very useful insights into possible docking paths.The 3D visualization of the docking state as computed by QPSO and LGA for the 1aaq/psi suggests that the ligand could be undocked in suitable position; the aromatic ring of psi is misplaced as demonstrated in Figures 3(b)1aaq and 3(c)1aaq.Only QPSO-ls did find its suitable binding pose; its RMSD is 1.64 Å.In Figure 3 bottom fca was docked into the active site of 1abf by three algorithms, and the resultant binding poses were superposed with their binding poses in the X-ray crystal structures of fca complexes.For instance, the three results are all docked into the 1hvr active site in an acceptable accuracy; the values of RMSD were 0.95 Å, 1.81 Å, and 0.78 Å for QPSOls, QPSO, and LGA, respectively.This demonstrates that the QPSO-ls results are much better than that of other docking programs in handling complex docking problems.

Conclusion
A QPSO-ls algorithm for the molecular docking problem was presented in this paper.We evaluated its performance by a series of experiments using 23 complexes.The proposed algorithm uses the energy evaluation function of the wellknown tool AutoDock 4.2.To ensure the preservation of diversities of the particles and prevent the convergence procedure from prematurity, a local search strategy has been implemented.Although the number of computing steps needed to reach convergence increases, the QPSO-ls algorithm clearly outperforms the QPSO algorithm and the default LGA in terms of docked energy, convergence performance, robustness, and accuracy of the high-dimensional molecular docking problem.As demonstrated above, these results indicate also that successful identification of binding modes might be further improved by combining the results from multiple programs, and the QPSO based program seems to be excellent approach and alternative to solve the flexible docking problem in molecular docking.

10 Figure 1 :
Figure 1: Convergence performance of the lowest docking energy for 1aaq, 1icn, 1hvr, and 1abf during the optimization process.

Figure 2 :
Figure 2: Comparison of the RMSD values with respect to three various algorithms for all instances.

Figure 3 :
Figure 3: The docked ligand conformation with the lowest energies for 1aaq, 1hvr, and 1abf (from top to bottom).The native conformations (shown in green) and the predicted conformations (shown in magenta) are represented by sticks and balls.(a) QPSO-ls of QLDock.(b) QPSO of QDock.(c) LGA of AutoDock.

Table 1 :
Protein-ligand complexes used for the experiments.

Table 2 :
Comparison of docking energy and RMSD values of QLDock, QDock, and AutoDock.