Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules

Since its introduction, the basin hopping (BH) framework has proven useful for hard nonlinear optimization problems with multiple variables and modalities. Applications span a wide range, from packing problems in geometry to characterization of molecular states in statistical physics. BH is seeing a reemergence in computational structural biology due to its ability to obtain a coarse-grained representation of the protein energy surface in terms of local minima. In this paper, we show that the BH framework is general and versatile, allowing to address problems related to the characterization of protein structure, assembly, and motion due to its fundamental ability to sample minima in a high-dimensional variable space. We show how specific implementations of the main components in BH yield algorithmic realizations that attain state-of-the-art results in the context of ab initio protein structure prediction and rigid protein-protein docking. We also show that BH can map intermediate minima related with motions connecting diverse stable functionally relevant states in a protein molecule, thus serving as a first step towards the characterization of transition trajectories connecting these states.


Introduction
Global optimization is an objective of many disciplines, both in academic and industrial settings [1,2]. Characterization of complex systems often poses very hard global optimization problems with many variables [3,4]. Algorithms that target such problems largely build on or combine four main approaches: deterministic, stochastic, heuristic, and smoothing [3,[5][6][7]. All these algorithms are challenged by systems where the variable space contains multiple distinct minima. While most algorithms can efficiently find a minimum, not all can feasibly locate the global minimum.
Some of the most successful applications of global optimization algorithms on characterizing physical and biological systems build on the stochastic Monte Carlo (MC) procedure and its Metropolis variant [8]. For instance, simulated annealing is one of the most widely used algorithms for finding the global minimum of a multivariable function for different complex systems [4,9,10]. Adaptations that build on deterministic and stochastic numerical procedures, such as molecular dynamics (MD) and MC, are abundant in computational biology for the structural characterization of biological macromolecules (cf. [11,12]).
Basin hopping (BH) is a global optimization framework that is particularly suited for multivariable multimodal optimization problems [13], and it is our thesis in this paper that BH is an effective framework for the characterization of biological macromolecules. The basic BH framework is well studied and understood, but modifications to its core components are necessary for application to complex biological systems. In what follows, we first summarize the basic BH framework and some of its salient properties before proceeding to identify modifications necessary for application to biological macromolecules.
BH combines heuristic procedures with local searches to enhance its exploration of the given variable space, conducted as a series of perturbations followed by local optimization. As shown in pseudocode in Algorithm 1, (1) (8) i ← i + 1 Algorithm 1: Basic BH framework in pseudocode. the framework can be described in terms of a local search procedure LOCALSEARCH that maps a point X i in variable space to its nearest minimum Y i , a perturbation move PERTURB that modifies a current minimum Y i to obtain a new point X i+1 in variable space, and a stopping criterion STOP that terminates these repeated applications of a structural perturbation followed by a local optimization. The repeated applications result in a trajectory of local minima Y i . As shown in Algorithm 1, only the lowest minimum needs to be retained in memory when seeking the global minimum of some function f . It is important to note that Algorithm 1 shows a specific realization of the BH framework, known as monotonic BH (MBH), where the current minimum is not accepted if it does not lower the lowest value obtained for the function f so far. In this case, another perturbation is attempted in order to obtain a new starting point for the local optimization that follows. While this basic framework is easy to describe and employ for global optimization, effective implementations exploit specific domain expertise about the system at hand [14][15][16][17][18][19]. Heuristics are designed based on specific system knowledge to implement an effective perturbation component. Domain-specific expertise is also employed for an effective implementation of the local search component. The stopping criterion is often implemented in terms of a maximum number of function evaluations or in terms of no improvements over a window of the last sampled minima. It is important to note that the stochasticity in BH is mainly due to the implementation of the perturbation component, which seeks to take the exploration out of the current local minimum. The local optimization component, on the other hand, can employ deterministic numerical techniques to locate the local minimum with arbitrary accuracy [20].
The core advantage of the BH framework over a multistart method that essentially samples local minima at random is that BH moves between adjacent local minima in the variable space. This strategy is more effective when exploring high-dimensional variable spaces associated with complex physical systems, where the addition of new dimensions can result in an exponential increase in the number of minima in the space [21]. The adjacency is a result of a deep connection between the perturbation and local optimization. Despite the application setting, a good general rule is for the perturbation to preserve some structural characteristics of the local minimum Y i it is disrupting to obtain a new starting point X i+1 for the next application of the local optimization.
If the magnitude of the perturbation jump in the variable space, measured through some distance function d(Y i , X i+1 ), is small, then X i+1 may remain in the basin of attraction of Y i , and the local optimization will bring X i+1 back to Y i . On the other end of the spectrum, the perturbation can completely disrupt Y i and obtain an X i+1 that could have essentially been obtained at random. While the local optimization will yield a new local minimum Y i+1 / = Y i , the BH will degenerate to a multistart method in this case. Different studies have shown that the perturbation needs to preserve some of the structure of the current minimum for BH to be more effective than the multistart method [20,22]. It is the careful implementation of the perturbation component that allows BH to organize the local minima it samples according to an adjacency relationship [21].
The BH framework is sometimes referred to as a funneldescent method, because its core behavior of iterating over adjacent local minima has turned out to be an effective optimization strategy for functions with a funnel landscape [21]. The generality of the framework and its ease of adaption for different systems has resulted in diverse applications, which span from geometry problems, such as packing circles in circular containers [20], to statistical physics problems of characterizing low-energy states of small atomic clusters [3].
The BH framework originated in the computational biology community dating back to the pioneering work of Wales [3], where the objective was to characterize the minima of the Lennard-Jones energy function in small atomic clusters. The term basin hopping was coined in this work, though to an extent, the stated motivation for the BH framework was from related optimization algorithms in the evolutionary search community. In fact, BH can be viewed as a special case of Iterated Local Search, which is popular for solving discrete combinatorial optimization problems [23]. An algorithmic realization of the BH framework was available prior to the work of Wales, most notably in Scheraga's MC with minimization algorithm [9,24].
The BH framework is particularly suited to deal with molecular spaces, where the function sought for optimization is a complex nonconvex potential energy function summing over the interactions among atoms in a 3-dimensional molecular structure. The global minimum of the function corresponds to the structural state of the molecule that is most stable under equilibrium conditions and so relevant for biological activity. Structural characterization of the biologically active (native) state of biological macromolecules is Advances in Artificial Intelligence 3 an important problem in computational structural biology. A grand-standing challenge nowadays is to characterize such states for protein molecules, which are central in many chemical pathways in the cell and are the focus of this paper.
Proteins are complex systems with hundreds to thousands of atoms. These atoms are organized in amino-acid building blocks which connect serially to form a polypeptide chain (the N-terminus of one amino acid connects to the Cterminus of the other to form a peptide). Figure 1(a) shows a short polypeptide chain. Depending on the representation employed, a spatial arrangement of the atoms that constitute a polypeptide chain, also referred to as a conformation, may require the specification of a prohibitive number of variables. A popular representation in computational structural biology employs only the angles shown in Figure 1(a). These angles can be used to define the variable space, as their modification gives rise to different conformations.
The variable, or conformational, space of a polypeptide chain is associated with a funnel-like energy surface [25,26]. The size and ruggedness of this surface, illustrated in Figure 1(b), are the primary reasons why obtaining structural information on native state of a protein polypeptide chain based on the chain's amino-acid sequence alone is an outstanding challenge in computational structural biology [27]. Meeting this challenge, often known as ab initio protein structure prediction, is needed, however, to close the gap between the wealth of protein sequence data and the scarce information on their native structures. Obtaining structural information ab initio promises to elucidate the structurefunction relationship and advance structure-driven studies and applications on protein molecules [28][29][30].
The funnel-like but rugged energy surface of protein molecules seems suitable for the BH framework. In general, it is challenging to locate the global minimum in this surface and so elucidate the native structure of a protein. One of the main reasons relates to imperfect modeling. The energy functions currently available to probe the protein energy surface are semiempirical and contain inherent errors [31]. Due to the specific process undertaken in computational chemistry to design such functions, the actual global minimum of a designed protein energy function may deviate significantly from the true global minimum (the native structure obtained by experiment in the wet laboratory). Studies report deviations in the 2-4Å range [32]. Due to these deviations, computational approaches that aim to obtain a broad view of the energy surface are more appropriate, particularly if they are to be followed by detailed heavy-duty optimization techniques on select conformations.
A common strategy among protocols for ab-initio protein structure prediction is the sampling of a large number of low-energy conformations. These are end points of many independent MD or MC trajectories optimizing some chosen energy function [28,[33][34][35][36][37][38][39]. Alternatively, the trajectories can be integrated in a tree to better control the exploration and use online analysis to bias the tree away from highenergy oversampled regions [40,41]. The conformations are then grouped by structural similarity to reveal local minima from which it is worth continuing the exploration at higher representational detail. The goal then becomes obtaining convergence to a region of the space that can be predicted to represent the native state.
In the context of ab-initio protein structure prediction, BH can be employed to explicitly sample local minima in the protein energy surface. At a superficial level, this would require the retainment of an ensemble of local minima and not just the current one. In addition, while the pseudocode in Algorithm 1 shows a simple realization of the BH framework, MBH, applications of BH on molecular spaces often make probabilistic decisions on whether to accept a current minimum. Procedurally, the framework still consists of repeated applications of a structural perturbation followed by an energy minimization. However, a Metropolis criterion [42] biases the sampling of local minima towards lower energy ones over time. Essentially, the decision to accept where E refers to the energy function, K B is the Boltzmann constant, and T is temperature. Temperature does not need a physical meaning, as its main role is to scale the height of an energy barrier.
The appeal of the BH framework is that it transforms the protein energy surface into a collection of interpenetrating staircases, as illustrated in Figure 1(c). A succinct discrete representation is obtained for this surface in terms of local minima. It is important to note that BH does not modify the energy surface in any way. Instead, it projects each point (conformation) to its closest local minimum to effectively reveal a map of the energy surface in terms of local minima. The details of the energy surface between local minima are lost, but this degree of resolution is still very useful for a structural characterization of protein molecules.
Given its ease of implementation, BH is starting to gain popularity as an optimization framework for biological systems. Current applications of BH for structural characterization of biological molecules essentially differ in the specific implementations for the perturbation and local optimization components. Local optimization, for instance, is implemented as gradient descent or Metropolis MC at low temperature, whereas the perturbation component, on the other hand, directly modifies atomic coordinates in existing work. These implementation choices have allowed BH algorithms to capture local minima of small atomic clusters and even map energy surfaces of polyalanines and other small proteins [14,32,43,44]. However, applications to structure prediction [45] have been limited to small proteins, mainly because representation of conformations through atomic coordinates results in a prohibitive variable space. In particular, the BH algorithm in [45] succeeds in locating conformations closer to the experimentally determined native structure than MD with simulated annealing, but its efficiency drops on sequences longer than 75 amino acids.
In this paper, we show that BH is a useful framework for structural characterization beyond structure prediction. We recognize that BH is general and can be employed to map the equilibrium conformational space of a biological system. For instance, we show that with suitable modifications to the perturbation and local optimization components, BH can be applied to protein-protein docking to reveal native 4 Advances in Artificial Intelligence Rejected minimum conformation lowest-energy configurations of protein molecules resulting from the assembly of various polypeptide chains. Specifically, in the context of ab-initio protein structure prediction, we show that implementations of the main components in BH that employ domain-specific knowledge result in increased efficiency and allow application to longer protein chains. We also show that the ability of BH to provide a map of the energy surface in terms of minima is useful not only when the goal is to locate the global minimum (whether that minimum corresponds to the native structure of one protein polypeptide chain or of a complex resulting from assembly of multiple chains), but also when the objective is to characterize proteins with more than one functionally relevant state. Such proteins are abundant in biology as effective biological machines that can tune their biological function through molecular motions [46][47][48][49]. A map of the minima surrounding the functionally relevant states is useful for understanding how the protein hops between minima in transition trajectories connecting these states [33,49,50].
The presentation in this paper of BH as a general, versatile framework builds over our recent work on abinitio structure prediction and rigid protein-protein docking [22,51]. In particular, in the context of structure prediction, we show that employment of the molecular fragment replacement technique allows BH to efficiently capture the native structure. In the context of protein-protein docking, we incorporate geometric hashing to efficiently obtain structural perturbations of a dimeric configuration.

5
Additional information from evolutionary sequence analysis allows restricting the variable space. Finally, we provide here a proof-of-concept demonstration that BH can be applied to understand the connectivity between functionally relevant states in a protein in terms of the minima surrounding these states. Obtaining a view of minima in the equilibrium conformational space of a protein molecule is the first step into elucidating motions and transition trajectories that take a protein between the states it uses for biological function.

Methods
The basic BH framework was showcased in Algorithm 1 in Section 1. Our algorithmic treatment in this section focuses on modifications to the basic components of BH which allow its application to the three different problems on which we focus in this paper. As described in Section 1, two modifications that allow application to these problems concern accepting a newly obtained local minimum according to the Metropolis criterion (unlike the basic MBH algorithm) and adding that minimum to a growing ensemble of BH-obtained local minima (unlike recording only the last one as in the basic MBH framework). The description of BH below is organized according to the three different applications showcased in this paper in Sections 2.1, 2.2, and 2.3, respectively. The treatment of BH in each application is limited to description of four main components: (1) representation of the system being modeled, which allows defining the variable space; (2) description of the energy function being optimized by BH; (3) implementation of the structural perturbation move; and (4) implementation of the local search procedure for the local optimization.

BH for Sampling Decoy Conformations for Ab Initio
Protein. As described above, the BH framework can be employed to obtain a broad view of the energy surface in terms of low-energy local minima. This can be done efficiently at a coarse-grained level of detail, employing an energy function that sacrifices detail and some accuracy to save computational time. The sampled conformations corresponding to the local minima are low-energy decoy conformations, which can then be fed to any structure prediction protocol for further analysis and refinement of select conformations with dedicated computational resources. The refinement will allow adding further detail, discriminating between decoy conformations, and making a prediction on which refined conformation can be considered to represent the native structure. Figure 1(a), a polypeptide chain of n amino acids contains 2n backbone (φ, ψ) dihedral angles. Our representation of a protein conformation employs only these angles, which constitute the variable space. Side chains are sacrificed, as any structure prediction protocol can pack them as part of the ensuing refinement of decoy conformations [52]. The representation here is essentially the idealized geometry model, which fixes bond lengths and angles to idealized (native) values. Forward kinematics allows computing Cartesian coordinates of the backbone atoms (on which the energy function described below operates) from the φ, ψ angles in the representation [53].

Energy Function.
The energy function is a modification of the associative memory Hamiltonian with water (AMW) [54]. This function has been used previously by us and others in the context of ab-initio structure prediction [40,41,[55][56][57]. AMW sums nonlocal terms (local interactions are kept at ideal values in the idealized geometry model): The E Lennard-Jones term is implemented after the 12-6 Lennard-Jones potential in AMBER9 [58] but allows a soft penetration of van der Waals spheres. The E H-Bond term allows modeling hydrogen bonds and is implemented as in [59]. The other terms, E contact , E burial , and E water , allow formation of nonlocal contacts, a hydrophobic core, and water-mediated interactions, and are implemented as in [39].
The listed energy terms of E AMW sum over pairwise interactions. For instance, the 12-6 functional form of the Lennard-Jones term is −4 i j is a constant characteristic of the types of atoms at positions i and j, σ i j is the average diameter of the atoms, and r i j is their distance. This functional form illustrates the quadratic running time of a typical energy function modeling pairwise interactions. More importantly, the terms summed together in an energy function are competing; minima of one term are obtained by suboptima of the other. This competition is known as frustration and refers to the fact that slight changes in atomic positions may lower the value of one term but increase that of another term. The result of summing competing terms in an energy function is a complex multimodal function, whose optimization is nontrivial. More details on the functional form of the other terms of the AMW energy function can be found in [33,54].

Implementation of Structural Perturbation.
The realization of the BH framework we describe here hops between two conformations representing two consecutive minima C i and C i+1 through an intermediate C perturb,i conformation. The perturbation modifies C i to obtain a higher-energy conformation C perturb,i to escape the current minimum. Essentially, 6 backbone dihedral angles of a fragment of the polypeptide chain associated with three consecutive amino acids in the current conformation C i are modified simultaneously. This process is referred to as the molecular fragment replacement technique, because it allows replacing the current configuration (in terms of angles) of a selected fragment with another fragment configuration [60].
Molecular Fragment Replacement. The fragment replacement technique has allowed ab-initio structure prediction methods to make great advancements [28,[34][35][36][37]. Its key advantage is that it allows obtaining physically realistic modifications if the fragment configurations are sampled from a library of actual native structures obtained in the wet laboratory. The basic idea is that a subset of nonredundant 6 Advances in Artificial Intelligence protein structures are obtained from the Protein Data Bank [61], and configurations of all fragments that can be defined for k consecutive amino acids are excised from these structures and stored in a library. We direct the reader to [28,41] for a detailed description of how the library is constructed. In this work, we employ a fragment of length 3 rather than a longer fragment, so that the magnitude of the jump resulting from the fragment replacement in variable space is limited.
The perturbation component is implemented as follows. Given a conformation C i , a fragment of length 3 is selected at random over the polypeptide chain (n − 2 fragments can be defined with overlap over a chain of n amino acids). Once the fragment is selected, a configuration for that fragment is then sampled at random over those available for the fragment from the fragment configuration library. The replacement of the angles of the fragment in C i with those of the configuration obtained from library results in C perturb,i .
Since low-energy conformations tend to be compact and leave little room for movement without raising energy (a concept known as frustration in protein biophysics), this implementation of the perturbation component is sufficient to obtain a high-energy conformation through which to escape the current local minimum. Additionally, C perturb,i will share nearly all of its local structural features with C i , but the new conformation will have a higher energy and a different overall global structure. We note that the first conformation to initiate BH is obtained after n − 2 fragment configuration replacements over an extended conformation.

Implementation of Local Optimization.
Our implementation of the local optimization conducts a series of modifications starting from C perturb,i to reach a new minimum C i+1 . While numerical techniques can be used here, they tend to be inefficient [45]. We employ instead a greedy search, which essentially attempts a maximum of m consecutive fragment replacements (as described above for the perturbation component) until k consecutive attempts fail to lower energy. The resulting C i+1 conformation is added to the trajectory according to the Metropolis criterion based on the energetic difference with C i .
Our implementation of the local optimization is probabilistic due to the fragment replacement technique. Moreover, the true bottom of a current basin may not be found. A working definition of a local minimum is employed instead in terms of the parameter k. Finding true local minima in the energy surface can be computationally intensive while unnecessary. For instance, analysis of the AMW surface in related work in [22] shows that the native structure is near but not at a minimum. In addition, the results in Section 3 make the case that a working definition of a local minimum is sufficient to discover near-native conformations.

BH for Sampling Decoy Configurations for Rigid Protein-Protein Docking.
In this application, the native structures of two protein polypeptide chains (referred to as monomers) are known atomic coordinates obtained for each of the chains from experiment or structure prediction protocols.
The objective is to find the native quaternary structure that brings the two monomers together. The assumption here is that the monomers do not change structure upon docking but bind rigidly with each other. Under this assumption, the objective is to find the spatial arrangement that brings one monomeric structure over the other and results in a dimeric configuration of lowest energy.

Employed Representation.
In rigid docking, the only variables of interest are those that allow representing a spatial arrangement of one monomeric structure over another. A natural way to do so is through rigid-body transformations, which can be represented as vectors of 6 variables (3 for translation and 3 for rotation in 3-dimensional space). Hence, the variable space here is the 6-dimensional SE(3) space consisting of rigid-body motions or transformations.
The variable space we consider here is not the entire SE(3) but is constrained to rigid-body motions that align geometrically complementary and evolutionary conserved regions of the molecular spaces associated with each of the monomers. This builds upon earlier work by us on rigid docking which makes use of geometric hashing [51,62]. Geometric hashing is a popular technique that essentially discretizes the space of rigid-body transformations by defining these transformations as alignments of geometrically complementary regions on monomeric molecular surfaces [63][64][65][66]. In recent work [51,62] we show that the number of regions relevant for alignment can be further reduced by focusing on regions with high evolutionary conservation. Such regions are often found to be on contact interfaces [67].
While details of the process through which rigid-body transformations are defined are available in previous work [51,62], we provide here a brief summary. The Connolly representation is first obtained for each monomeric surface [68]. The representation stores geometrical information for points on the surface, including whether the point represents a convex, saddle, or concave region. The representation is made less dense by only storing key locations on the molecular surface, known as critical points [69]. Triangles can be defined over these points. Associating evolutionary information with a critical point (through an analysis of related biological sequences [67]) allows focusing on triangles with high sequence conservation. We refer to these as active triangles. Once two geometrically complementary (e.g., concave with convex) active triangles T A and T B are obtained (from the molecular surfaces of monomers A and B, resp.), a rigid body transformation is easily defined as the one that aligns the local coordinate frame associated with T B over that associated with T A .

Energy Function.
Each rigid-body motion can be represented as a transformation, a vector of 3 translation and 3 rotation components (details below) that when applied to the moving monomer (one monomer is designated as moving and the other as reference or base) move that monomer in space and bring it over the reference monomer. Atomic coordinates are then obtained for the resulting dimeric configuration, which can now be evaluated in terms Advances in Artificial Intelligence 7 of the interaction energy. The energy function we employ combines three nonlocal terms useful for contact interfaces: E = E VdW + E electrostatic + E hydrogen-bonding . The first term implements the standard 12-6 Lennard-Jones potential as in the CHARMM force field [70]. The electrostatic term implements Coulomb's law, also as in the CHARMM force field [70]. The hydrogen-bonding term is calculated as in [71] through the 12-10 hydrogen potential: E hydrogen-bonding = 5 × [(r 0 /d i j ) 12 − 6 × (r 0 /d i j ) 10 ], where d i j is the distance between acceptor and donor atoms i and j, and r 0 = 2.9Å is the optimal distance for hydrogen bonding. Energy is computed only for the contact interface, which is defined over pairs of atoms in one monomer in contact with the atoms in the other monomer. Two atoms are in contact if their Euclidean distance is not higher than 4.0Å.

Implementation of Structural
Perturbation. The exposition above describes that a rigid-body motion is obtained by aligning an active triangle T B on the surface of monomer B with a geometrically complementary active triangle T A on the surface of the base monomer A. Let the current minimum C i be the configuration corresponding to the transformation aligning T B with T A . In other words, the contact interface in C i is that obtained by aligning T B with T A . As described in Section 1, an effective structural perturbation needs to preserve the adjacency relationship. For this reason, an effective perturbation in this context needs to modify the contact interface in C i but limit the magnitude of the perturbation. The implementation we pursue here seeks a new pair of triangles, T A and T B , to perturb C i and obtain C perturb,i . In order to limit the magnitude of the perturbation and preserve some of the contact interface of C i in C perturb,i , T A needs to be close to T A , and T B needs to be close to T B . This is implemented as follows. The molecular surface of each monomer is precomputed and represented in terms of a finite list of active triangles. The center of mass of each triangle is computed, and reverse indexing is used in order to sample a triangle T A and a triangle T B whose center of mass is within dÅ of the center of mass of triangles T A and T B , respectively. The process repeats until a pair T A and T B are found which are geometrically complementary. A new rigidbody transformation aligning T B with T A is then defined, resulting in the perturbed configuration C perturb,i . Sampling in a d-radius neighborhood allows controlling and limiting the extent to which C perturb,i perturbs the structural features of C i (in this context, the contact interface).

Implementation of Local Optimization.
As in the realization of the BH framework for protein structure prediction, the local optimization here also attempts at most m structural modifications starting with C perturb,i . The optimization terminates early if k consecutive modifications fail to lower energy. A naive implementation of the local optimization could employ the same structural modifications as the perturbation component; that is, new pairs of geometrically complementary active triangles are sought, but using a smaller d value. Our recent work on docking shows, however, that it becomes difficult to find geometrically complementary active triangles with smaller values of d [72]. A more effective alternative is to sample new rigid-body transformations directly rather than through new pairs of geometrically complementary active triangles and to do so in a continuous small neighborhood of an initial transformation.
Let the vector t, u, θ be a rigid-body transformation, where t refers to the translation component, and u, θ is an axis-angle representation of the orientation component (implemented through quaternions). In each move in the local optimization, a new random transformation is sampled in a small neighborhood of the transformation representing the configuration resulting from the previous modification. A new translation component t is sampled in a δ t neighborhood of t. A new axis u is sampled by rotating around u by a sampled angle value δ φ ; a new angle θ for the rotation component is obtained by sampling in a δ θ neighborhood around θ. The result is that each move is a small modification of the contact interface to project a configuration onto its nearest local minimum. We note that, as before in the context of structure prediction, a working definition is employed here for the local minimum.
The result is a trajectory of low-energy dimeric configurations that are useful as decoys for the purpose of docking protocols. As in protein structure prediction, protein-protein docking protocols rely on first obtaining low-energy decoy configurations. Structural and energetic analysis then allows selecting a subset for further refinement in order to make a prediction on the native quaternary structure.

BH for Mapping Minima Connecting Diverse Stable
States of a Protein Molecule. Many proteins employ motions to access different structures that allow them to tune their biological function [73,74]. An important problem is to understand how a protein transitions between different functionally relevant states [50,75,76]. The problem of obtaining transition trajectories is directly related to that of obtaining the connectivity of the space around stable states. Computing transition trajectories is challenging [77], as such trajectories can connect structural states far away in the variable space. By taking into account a system's dynamics, the typical MD framework is in principle desirable to provide information on the time scales associated with conformational changes in a transition trajectory. However, its practical application is limited. Long simulation times may be needed to observe a transition trajectory to go over energy barriers.
In this proof-of-concept application, we propose that the BH framework can be a valuable tool as a first step towards elucidating transition trajectories. BH can be employed to map the minima connecting two given structural states and thus elucidate energetically credible conformational paths. Treating conformations in the path as important milestones, MD-based techniques can then be employed to locally deform a conformational path into an actual transition trajectory that incorporates dynamics [78].
In this application, the representation, energy function, and the implementations of the perturbation and local search components of BH are as in Section 2.1. Here we pursue a proof-of-principle demonstration as follows. Let us suppose we are given two stable structural states of a protein, A and B. One of them can be regarded as the initial conformation to initiate a BH trajectory of local minima, and the other as the goal. A given number, let us say h, of BH trajectories can be initiated from the initial structure. The trajectories are allowed to grow for a fixed number of energy evaluations. In the unbiased scenario, the trajectories do not employ information about the location of the goal conformation in variable space. The results in Section 3 show that with sufficient sampling, if the initial and goal conformations are low-energy (i.e., stable), even unbiased BH trajectories are successful at approaching the goal conformation.
In a second scenario, the trajectories can be biased. Let us define an -radius ball around the goal conformation. As long as a BH trajectory stays outside this volume of the variable space (i.e., no minima are or closer to the goal), the BH exploration proceeds unbiased. When the trajectory enters the designated goal region of the variable space, say through its current minimum C i , the exploration is biased towards obtaining minima that stay within the goal region. Given C i , multiple perturbations followed by local optimization are attempted until a C i+1 is found which remains in the goal region. While the number of attempts is limited to a maximum of l consecutive failures before the BH exploration returns to its unbiased setting, in practice it is possible to remain in a goal region for a sufficiently large . The exploration terminates when the goal conformation is approached within some determined tolerance.
The value of is related to that of l. Moreover, a meaningful value for depends on the distance metric used and its effectiveness on a particular system. For instance, on small proteins, lRMSD can be used to determine the radius of the goal region. On other systems, instead, other measurements allow circumventing some of the issues with lRMSD. For instance, the TM-score [79] and GDT TS [80] allow better capturing structural similarities than lRMSD when motions are localized to specific regions. The two are also less sensitive to noise. Familiarity with the system to be modeled allows better determining which measurement should be used and what values will be effective for and l. As the goal in this paper is to show a proof-ofconcept demonstration that BH can be useful to obtain information on minima connecting diverse stable states in the equilibrium conformational space of a protein, we do not devote time to fine tuning parameters. The results in Section 3 show that values exist for these parameters that allow BH to come in closer proximity to the goal structural state in the biased over the unbiased implementation. Further tuning of the parameters is expected to improve the results and provide interesting directions for researchers to explore the viability of BH in this application context.

Results
Experimental Setup. The stopping criterion in each experimental setting to evaluate the performance of BH is set to a fixed number of energy evaluations. This number is 10 7 energy evaluations for the application of BH on structure prediction and 10 6 on our last application of connecting between different stable states. Results for the proteinprotein docking do not change after 10, 000 conformations, so this number is employed as a stopping criterion. Additionally, m and k are set to 100 and 20, respectively. In the application on docking, different values are tried for δt, δφ, δθ, and the ones employed for the experiments presented below are 1.5Å, 10 • , and 30 • , respectively. On the last application of BH, h is set to 10 trajectories, is set to a TMscore of 0.4, l = 100, and the exploration terminates earlier than 10 6 energy evaluations when the current minimum is within a TM-score of 0.9 of the goal conformation.
We present three main sets of results according to the three different BH applications analyzed here. Where possible, results are compared to those reported by other stateof-the-art structure prediction or protein-protein docking methods (Tables 1 and 2 in the results presented in Sections 3.1 and 3.2, resp.). In addition, analysis of the BH-obtained minima is conducted, and distributions of the distances between consecutive minima are shown. This allows evaluating whether the implementations for the perturbation and local optimization in each application setting preserve the adjacency relationship between consecutively obtained minima. The comparison with state-of-the-art methods and the adjacency analysis employ the least Root-Mean-Squared Deviation (lRMSD) semimetric. Analysis of results obtained on the third application of BH on connecting stable states of a protein molecule employs additional measurements, such as GDT TS and TM-score (Tables 3 and 4 in the results presented in Section 3.3). The minima sampled by BH in the context of this third application are also visualized on a lowdimensional projection of the variable space (the projection coordinates are detailed below) that reveals where the BH sampling focuses.
The main measurement used in the analysis below is lRMSD. Briefly, lRMSD measures the weighted Euclidean distance between corresponding atoms after optimal superposition of the two conformations under comparison (or configurations, if consisting of more than one polypeptide chain). The optimal superposition refers to the rigid-body motion or transformation in SE(3) minimizing this weighted Euclidean distance [81]. lRMSD captures structural dissimilarity, but it is not a Euclidean metric, as it does not obey the triangle inequality. Low values indicate high similarity, and high values indicate high dissimilarity, but interpretation of intermediate values is difficult. Interpretation has been the subject of many studies [82]. For instance, lRMSD has been found to depend on system size. A 5Å lRMSD between a computed conformation and the native structure of a short protein chain of no more than 30 amino acids is considered a large deviation, but the same dissimilarity is less significant for a medium-size protein of 100 amino acids or more. Working interpretations abound. In general, for mediumsize proteins, if the lowest lRMSD obtained over computed conformations to the known native structure is more than 6-7Å, the native structure is not considered to have been captured in silico. Table 1: Comparison of the lowest lRMSDs obtained by BH to those obtained by other methods on the protein dimers studied here. MBH refers to monotonic BH. lRMSDs reported by BH, MBH, and the work in [56] in columns 5-7 are over backbone atoms, whereas those reported by the work in [36,84] in columns 8-9 are over alpha carbons of the backbone chain.  However, high values of lRMSD cannot be automatically interpreted to indicate significant structural dissimilarity. Since lRMSD weighs each atom equally, it cannot capture global topology changes and overly penalizes cases where the differences are localized to a specific region of the molecule due to, say, a large-scale motion. For instance, the lRMSD of two conformations can be high even if structural deviations are limited to a loop that has a different orientation in the two conformations under comparison [83]. In such cases, other measurements, such as GDT TS and TM-score, can be more appropriate. While different in implementation details, these two scores essentially locate a maximum subset of atoms between two conformations under comparison which are close in space after optimal superposition and minimizes an overall lRMSD-based error. While GDT TS is reported in %, TM-score is unitless. Both capture similarity, so higher values are better. While lRMSD and GDT TS depend on system size, TM-scores are found to be more reliable [83], which is why we employ them here in the analysis in Section 3.3 on the third BH application on connecting diverse stable states.

Analysis of BH-Obtained Decoy Conformations of a Protein Polypeptide Chain.
Our realization of the BH framework for the purpose of ab-initio structure prediction is applied to a comprehensive list of 15 target protein systems. These systems, listed in Table 1, range from 61-123 amino acids in length and cover the α, β, and α/β folds. Many of them are selected due to the availability of data reported on them by structure prediction protocols. On these systems, computing 10 7 energy function evaluations takes 1-4 days of CPU time on a 2.4 Ghz Core i7 processor, depending on chain length. Table 1 shows comparisons to state-of-the-art methods in ab-initio structure prediction in terms of lRMSD. Over all minima obtained from each amino acid sequence by BH, the conformation with the lowest lRMSD to the known native structure of that sequence (experimentally obtained structure with PDB ID shown in this table) is recorded, and that value is reported in column 5 in Table 1. To take into account stochasticity, we report in Table 1 the average lowest lRMSD obtained over 3 independent runs. This value is compared to lowest lRMSDS reported by methods that are popular in ab-initio structure prediction [36,84]. We also compare to data obtained with our previous work on ab-initio structure prediction with a robotics-inspired tree-based exploration  method [40,41]. Monotonic BH (MBH) is also included in the comparisons (column 6). The results in Table 1 make the case that BH performs just as well as state-of-the-art methods in structure prediction in terms of its ability to obtain low-lRMSD conformations in an ab-initio setting. The role of the energy function may partially explain some differences among the methods, as they employ different energy functions (MBH and [41] also employ AMW). It is interesting to note that, while our realization of BH (which uses a Metropolis criterion to add the next minimum to its trajectory) obtains lower lowest lRMSDs on more proteins than MBH, the performance of MBH is comparable to the other methods in many cases. MBH can be regarded as a special case of the BH framework with the Metropolis criterion, where temperature T = 0. The T value we use here for our realization of BH allows a 2.6 kcal/mol energy increase between two consecutive local minima with probability 0.1.

Evaluation of Adjacency Relationship.
Adjacency between local minima obtained consecutively by BH is often stated as important for global optimization. Here we show concretely, in the context of ab-initio structure prediction, how this adjacency correlates with the lowest lRMSD reported by BH to the known native structure of each of the protein systems studied. The lRMSD between two consecutive local minima is computed, and the average is recorded for a given protein system. This value is plotted against the lowest lRMSD from the native structure obtained by BH on each protein system in Figure 2. A strong correlation of 94% is observed in Figure 2 between the average consecutive local minima distance and the lowest lRMSD to the native structure. This result suggests that adjacency of consecutively sampled local minima is related to the ability of BH to locate the native structure. Figure 2 shows that, in cases where the average consecutive local minima distance is large, BH does not come close to the native structure.
Further detail is provided in Figure 3 on two protein systems. These systems are selected to represent two diametrical cases that correspond to the bottom left and top right portions in Figure 2. The lowest lRMSD structures obtained for each of these two systems by BH are superimposed over their respective native structures in Figures 3(a)-3(b). The entire distribution of consecutive local minima distances is shown for these two proteins in Figures 3(c)-3(d). Figures  3(c)-3(d) further show that, in cases where the majority of consecutive minima are not adjacent in variable space, the overall performance of BH in terms of lowest lRMSD to the native structure suffers. One reason for the poor adjacency is that the fragment replacement may perturb too much of a conformation. In a recent analysis [85], we show that this can be controlled by biasing the sampling of fragment replacements towards those that will result in small structural changes in terms of lRMSD between C i and C perturb,i .
Comparison of BH with Multistart Sampling. Adjacency of consecutive local minima in BH is often stated as a key distinguishing characteristic over a multistart method, where initial points for local optimization are essentially sampled at random over the variable space. Here we show the effect of the adjacency relationship in a concrete setting in terms of the energetic quality of the sampled minima.
On the same two protein systems where the above analysis highlights consecutive local minima distances, we show in Figures 3(e)-3(f) the distribution of energies. Figures 3(e)-3(f) superimposes the distribution obtained by BH over that obtained by the multistart method. The results show that BH obtains lower-energy minima than the multistart method. In the context of ab-initio structure prediction, the quality of decoy conformations obtained by BH is superior over that obtained by a multistart method.
Sampling Redundancy in BH. It is interesting to determine how often our realization of the BH framework here comes close to the native structure. We show this visually through a projection of the variable space in a few dimensions. The projection coordinates we choose are based on the ultrafast shape recognition (USR) features [86], which we have employed in previous work to guide a tree-based exploration of the variable space with measurements taken over a low-dimensional projection [40,56]. These coordinates give a coarse representation of the molecular shape. They are first momenta of distance distributions of atoms in a molecule from selected points on the molecule. The selected (reference) points are the centroid (ctd), point closest to centroid (cst), point farthest from centroid (cfd), point farthest from cfd, and so on. More reference points can be defined this way, but the ones we employ for the visual representation here use only ctd and cfd. It is worth noting that, while coarse, the USR-based projection is fast to compute for each conformation, unlike PCA-or ISOMAPbased decompositions [87][88][89], which are time consuming and hard to use in an online setting and contain several other shortcomings noted for conformational space [90]. Figure 4 shows the projection of BH-sampled minima over the two USR-based coordinates measured using the ctd and cfd reference points. The projection is discretized so that cells can be defined in a 2d grid for the purpose of measuring how often BH samples similar minima in terms of their coarse 2d USR-based representation. The 2d grid in Figure 4 is color coded with a blue-to-red color scheme that corresponds to cells with low-to-high number of minima projected to them. The cell that contains the projection of the native structure is marked with an ×.
The projection of the sampled minima in this USRbased 2-dimensional space allows visualizing highly sampled regions by BH. The representation is coarse (e.g., cell widths used here for visualization can be made smaller), as conformations that map to the same cell may be severalÅ apart, but the projection is useful to draw two conclusions. First, compared to the vast variable space (sea of blue in Figure 4), BH sampling seems to focus in regions near the native structure. These regions represent the equilibrium conformational space. Second, sampling can be redundant; some regions are more populated than others. Future research can address redundancy in order to enhance the capability of BH to sample the equilibrium conformational space of a protein molecular in terms of local minima.

Analysis of BH-Obtained Decoy Configurations of Protein-Protein Dimers.
Our realization of the BH framework for the purpose of protein-protein docking is applied to a comprehensive list of 15 different dimers. These vary in size, represent diverse functional classes, and have been tested by other protein-protein docking methods, and some are even CAPRI targets. Testing is carried out on a 2.66 GHz Opteron processor with 8 GB of memory. Depending on system size, obtaining 10, 000 conformations takes 6-12 CPU hours. Table 2 shows the lowest lRMSD from the known native structure (with PDB ID shown in column 1) obtained by BH in column 3. Lowest lRMSDs reported on these systems by other methods are shown in columns 4-5. System size in terms of number of atoms in each of the chains is shown in column 2. Table 2 shows that BH achieves low lRMSDs to the native structure on each system. Moreover, these are comparable to the lRMSDs reported by other related methods. In particular, the method presented in [66] employs geometric hashing, whereas that in [91] uses long optimizations with a carefully designed energy function that employs information on evolutionary conservation to sample low-energy conformations. In addition to a comparable performance with these methods, BH samples many configurations within 5Å lRMSD of the native structure (data not shown). These configurations, if selected and further refined in the course of a multistage docking protocol, will allow obtaining the native structure in great detail.

Evaluation of Adjacency Relationship.
We investigate here the adjacency between consecutively sampled local minima. Figure 5(a) plots the mean consecutive local minima distance in terms of lRMSD for each protein against the lowest lRMSD obtained to the native structure. A positive correlation of 73% is observed. The mean consecutive local minima distance is less than 15Å for about half of the systems. While this may seem like a large number compared to the related results on ab-initio structure prediction, the range is larger due to the size of the dimeric systems (lRMSD depends on size). The strong correlation suggests that adjacency of consecutively sampled minima directly relates with the ability of BH to locate a global minimum. Lower lowest lRMSDs (<5Å) are obtained here compared to the abinitio structure prediction setting. This is not surprising, as the variable space here is 6-dimensional, whereas the space The projection for the protein with native PDB ID 2ezk is shown in (a), and that for the protein with native PDB ID 1hhp is shown in (b). The grids are color coded with a blue-to-red color scheme to show cells with low-to-high number of minima projected to them. The cell that contains the projection of the native structure is marked with an ×. The range of values of each of the coordinates is estimated as in [40]. Maximum values are based on an extended chain, and minimum values are based on the Flory compact self-excluding model of a chain of n amino acids. To improve visualization, the ranges are limited here, and the grids are clipped to allow focusing to regions with some minimal population.
in the ab-initio structure prediction application contains hundreds of dimensions. Further detail is provided in Figures 5(d)-5(e), which shows the distribution of lRMSDs between consecutively sampled local minima on two protein systems. These systems are selected to represent two diametrical cases that correspond to the bottom left and top right portions in Figure 5(a). The actual lowest-lRMSD structures obtained on these systems are shown in Figures 5(b)-5(c), superimposed over the corresponding native structures of these proteins. The distributions in Figures 5(d)-5(e) show that more pairs of consecutive minima with low lRMSDs are obtained for the protein where BH also obtains a lower lowest lRMSD to the native structure.

Analysis of BH Trajectories in Connecting Diverse Stable
States of a Protein. The unbiased setting of BH is tested here in detail on two proteins, calmodulin and adenylate kinase. Some encouraging results are shown for the biased setting as well, but a detailed investigation of the biased implementation and parameter tuning is beyond the purpose of this work.

Mapping Minima between Stable States in Calmodulin.
Calmodulin is a 144 amino-acid long EF-hand protein that binds calcium and regulates more than 100 proteins, including kinases, phosphodiesterases, calcium pumps, and motility proteins [46][47][48]. The protein resembles a dumbbell, with the terminal domains linked by a flexible α-helix and the termini in a transorientation from each other on either side of the central linker. The partial unfolding of the central linker around position 77 gives calmodulin flexibility.
Calmodulin has been captured in three different functionally relevant structural states in the wet laboratory [92][93][94]. These states are documented in the PDB as X-ray structures under PDB IDs 1cfd (apo state), 1cll (calciumbinding state), and 2f3y (collapsed peptide-binding state). The central helix is fully formed in the calcium-binding state, unfolds in the middle in the apo state, and bends in the collapsed state. Transitions between the apo and collapsed states have been observed both in experiment and simulation [49,50].
In order to test the unbiased setting of BH in this application, the following experiment is conducted. Each of the structures is obtained from the PDB and employed as an initial conformation. h = 10 Bh trajectories are launched independently from any of them. The proximity to any of the other two structures is reported in Table 3. Entry at row i and column j reports the best proximity over the 10 trajectories initiated from structure i to goal structure j.
Proximity to the goal is measured in three ways. Table 3 shows lowest lRMSD, highest TM-score [79], and highest GDT TS [80]. The results in Table 3 show that BH is able to capture the structure with PDB ID 1cfd (apo) when initiated from 1cll (semicollapsed state) and vice versa (TMscores above 0.5 are found to capture significant structural similarity [83]). BH also captures the structure with PDB ID 1cll (semicollapsed state) when initiated from 2f3y (collapsed state) and vice versa. A very low lRMSD and very high TMscore and GDT TS score are obtained from 2f3y to 1cll. This is an encouraging result, since 1cll captures a partially closed state, whereas 2f3y captures calmodulin in its close state. Structurally, 1cfd, the apo state, is further away from the semicollapsed and collapsed states. Indeed, all three measurements in Table 3 indicate that BH has not captured state 1cfd from 2f3y and vice versa. Table 3 also shows values along the main diagonal, which record the closest that a BH trajectory comes to its initial conformation. While this is achieved often through the very first minimum, the structure with PDB ID 1cfd is the only exception. This indicates that this structure is not at a local minimum, and BH quickly steers away from this initial conformation. This may also explain the difficulty of capturing that state when initiated from any of the other two. In addition, analysis of the biased setting for calmodulin reveals that when an value of 0.4 in terms of TM-score is used, BH is able to come closer to the goal structures. Improvements are around 0.5Å (data not shown).

Mapping Minima between Stable States in Adenylate
Kinase. Adenylate kinase is a 214 amino-acid long phosphotransferase enzyme that maintains energy balance in cells by catalyzing the reversible reaction Mg 2+ · ATP + AMP Mg 2+ · ADP + ADP [95]. The protein consists of a CORE domain and two substrate-binding (AMP-and ATP-) domains. The binding domains move and bind substrates independently, resulting in different functional states.
Adenylate kinase has been found in four different structural states in the wet laboratory [96][97][98][99] 1dvr and 2ak3). Transitions between the apo and collapsed states have been observed both in experiment and simulation [76,100,101].
Unbiased BH trajectories are initiated from each of the four structures, and the best proximity to any of the other three is reported in Table 4 in terms of TM-score and GDT TS. lRMSD is not employed, as the chains deposited under the PDB IDs listed above are of different lengths (due to differences in the setup of the structure resolution protocol in the wet laboratory). The results in Table 4 show that adenylate kinase is indeed a challenging system. The BH trajectories manage to come close only to the structure with PDB ID 2aky when initiated from the structure with PDB ID 1dvr and vice versa. This is an encouraging result, nonetheless, because the structures with PDB ID 1dvr and 2aky are structurally closer to each other.
Outstanding Challenges. Calmodulin and adenylate kinase are considered challenging systems for computational investigation due to their large size [33]. The results above support the fact that size limits sampling capability in BH. The upper bound of 10 6 energy evaluations limits BH to sample around 1, 500 minima for calmodulin and 1, 000 minima for adenylate kinase. Considering the small number of minima sampled, the results above are encouraging. They suggest that, with improvements to enhance the sampling capability, BH is a promising tool for mapping the equilibrium conformational space of a protein and elucidating the connectivity between different stable states.

Conclusion
We have shown that BH is a general, versatile framework that allows structural characterization of important biological macromolecules, such as proteins. We have selected three different applications of importance in computational structural biology on which to show the power and promise of the BH framework. Domain-specific expertise is used to implement effective perturbation and local optimization components. Important generally recognized characteristics of the BH framework, such as adjacency of local minima and its relation to the quality of the reported global minimum, are demonstrated in the applications selected in this paper.
Taken together, the results show that BH is an effective framework for structural characterization of protein systems. It is more effective than the multistart method, and the adjacency of consecutively sampled local minima is directly related with the ability of BH to come close to the global minimum. The presented results make the case that BH can be an effective tool for generating good-quality decoys for ab-initio structure prediction and protein-protein docking and a promising framework for mapping the connectivity of functionally relevant states in flexible proteins.
We note that the implementations we offer here for the key components in BH are a first step, and further tuning can result in better performance. Further analysis into different implementations is necessary to obtain a better understanding of the BH framework and its capability to enhance sampling of molecular spaces. We are currently pursuing such a comparative analysis. For instance, in recent work [85] we show that the implementation of the perturbation component employed here is sufficient to escape a current minimum and that the greedy search employed for the local optimization is just as effective but more efficient than Metropolis MC searches at low temperature [22,102].
The application on proteins with diverse stable states serves as a proof of concept that BH can be employed to map the intermediate minima that connect stable states. The results presented here show that the framework is promising and merits further investigation in this context. The trajectory of minima obtained by BH in connecting two stable states can be considered a coarse conformational path. This path can be transformed into an actual trajectory that takes the protein through specific molecular motions from one state to another. The process is not dissimilar from how paths in robotics motion sampling are converted to actual execution trajectories with dynamics constraints [103]. The coarse transition paths can be refined through, for instance, short steered MD simulations connecting adjacent minima. Other path deformation techniques are available, and this is a direction we will explore in future research.
We believe the exposition of BH in this paper will bring more attention to this framework as a powerful global 16 Advances in Artificial Intelligence optimization tool for biological systems. Its versatility, as we show here in the context of three different yet related applications on proteins, merits further investigation. In particular, different implementations for the main components in BH can be investigated to balance between accuracy and efficiency. Moreover, related ideas from the evolutionary computing community on population-based strategies can be employed to promote diversity of minima, as proposed in recent work on geometrical problems [20]. Related ideas from our robotics-inspired search of molecular conformational spaces [40] can be exploited to organize the BH-sampled minima, steer the exploration away from overpopulated regions in the variable space, and so enhance the sampling capability in BH.