Mixing Energy Models in Genetic Algorithms for On-Lattice Protein Structure Prediction

Protein structure prediction (PSP) is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20 × 20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.


Introduction
Proteins are essentially sequences of amino acids. They adopt specific folded three-dimensional structures to perform specific tasks. However, misfolded proteins cause many critical diseases such as Alzheimer's disease, Parkinson's disease, and cancer [1,2]. Protein structures are important in drug design and biotechnology.
Protein structure prediction (PSP) is computationally a very hard problem [3]. Given a protein's amino acid sequence, the problem is to find a three-dimensional structure of the protein such that the total interaction energy amongst the amino acids in the sequence is minimised. The protein folding process that leads to such structures involves very complex molecular dynamics [4] and unknown energy factors. To deal with the complexity of PSP in a hierarchical way, researchers have used discretised lattice-based structures and simplified energy models [5][6][7].
There are a large number of existing search algorithms that attempt to solve the PSP problem by exploring feasible structures called conformations. For the low resolution hydrophobic-polar (HP) energy model, a memory based local search algorithm [8,9], a population-based genetic algorithm [10], and a hydrophobic core directed local search method [11] reportedly produced the state-of-the-art results on the face-centred-cubic (FCC) lattice. For the high resolution Berrera 20 × 20 energy matrix (henceforth referred to as BM energy model) [12][13][14] produces the state-of-the-art results. Nevertheless, the challenges in PSP largely remain in the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20 × 20 energy model (such as BM) could better capture the behaviour of the actual energy function than a low resolution energy model (such as HP). However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. Pairwise contributions that have large magnitudes could be overshadowed by the accumulation of pair-wise contributions having small magnitudes or opposite signs. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions particularly emphasising the pair-wise contributions with large magnitudes.
In this paper, we present a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. Protein structures have hydrophobic cores that hide the hydrophobic amino acids from water and expose the polar amino acids to the surface to be in contact with the surrounding water molecules [15]. We apply a macromutation operator that considers the HP energy model and attempts to build hydrophobic cores. We experimentally show that our way of mixing these two energy functions leads to significant lower energy structures compared to the stateof-the-art results.
The rest of the paper is organised as follows: Section 2 reviews background knowledge; Section 3 discusses related work on PSP; Section 4 describes our methods in detail; Section 5 presents the experimental results and analyses; and finally, Section 6 draws the conclusions and outlines the future research.

Background
There are three computational approaches for protein structure prediction. They are homology modeling, protein threading, and ab initio approach. The prediction quality of homology modeling depends on the sequential similarity with proteins that have previously known structures. On the other hand, protein threading, also known as fold recognition, is based on the structural similarity with the previously known fold families. Our work is based on the ab initio approach that depends only on the amino acid sequence of the target protein. Levinthal's paradox [16] and Anfinsen's hypothesis [17] are the basis of ab initio method for protein structure prediction. The idea was originated in 1970 when it was demonstrated that all information needed to fold a protein, resides in its amino acid sequence; that is, given the solvent properties, amino acids in the sequence determine the unique fold of a protein.
To explore an astronomically large search space and to evaluate the conformations using a real energy model are a big challenge for existing search algorithms in PSP. Therefore, simplified models have significant importance in understanding the protein folding process.

Simplified
Model. In our simplified model, we use 3D FCC lattice points to map the amino acids of a protein sequence. In the mapping, each amino acid of the sequence occupies a point on the lattice to form a continuous chain of self-avoiding walk. We use the BM and HP models together within a population-based genetic algorithm (GA) for protein structure prediction. The FCC lattice, the HP and BM energy models, and the GA are described below.
2.1.1. 3D FCC Lattice. The FCC lattice has the highest packing density compared to the other existing lattices [18]. In FCC, each lattice point (e.g., the origin in Figure 1) has 12 neighbours with 12 basis vectors as follows: In simplified PSP, conformations are mapped on the lattice by a sequence of basis vectors or by the relative vectors that are relative to the previous basis vectors in the sequence.

HP Energy Model.
Twenty different amino acids are the primary constituents of proteins. Based on the hydrophobic property, these 20 amino acids are broadly divided into two categories: (a) hydrophobic amino acids (Gly, Ala, Pro, Val, Leu, Ile, Met, Phe, Tyr, and Trp) denoted by H and (b) hydrophilic or polar amino acids (Ser, Thr, Cys, Asn, Gln, Lys, His, Arg, Asp, and Glu) denoted by P. In the HP model [19], when two nonconsecutive hydrophobic amino acids become topologically neighbours, they contribute a certain amount of negative energy, which for simplicity is considered as −1.
The total energy HP (as shown in (2)) of a conformation based on the HP model becomes the sum of the contributions over all pairs of non-consecutive hydrophobic amino acids: where = 1 if amino acids at positions and in the sequence are non-consecutive neighbours on the lattice, otherwise 0, and = −1 if th and th amino acids are both hydrophobic, otherwise 0.  [21] in 1985 statistically deduced a 20 × 20 energy matrix that considers residue contact propensities between the amino acids. By calculating empirical contact energies on the basis of the information available from a huge number of selected protein structures and following the quasi-chemical approximation, Berrera et al. [20] in 2003 deduced another 20 × 20 energy matrix. In this work, we use the latter model and denote it by BM energy model. Table 1 shows the BM energy model with amino acid names at the leftmost column and the bottommost row and the interaction energy values in the cells. The amino acid names that have boldface are hydrophobic. We draw lines in Table 1 to show groupings based on H-H, H-P, and P-P interactions. In the context of this work, it is worth noting that most energy contributions that have large magnitudes are from H-H interactions followed by those from H-P interactions.
The total energy BM (shown in (3)) of a conformation based on the BM energy model is the sum of the contributions over all pairs of non-consecutive amino acids that are one unit lattice distance apart: where = 1 if amino acids at positions and in the sequence are non-consecutive neighbours on the lattice, otherwise 0, and is the empirical energy value between the th and th amino acid pair specified in the matrix for the BM model.

Genetic Algorithms.
GAs are a family of populationbased search algorithms for optimisation problems. GAs maintain a set of solutions known as population. In each generation, it generates a new population from the current population using a given set of genetic operators known as crossover and mutation. It then replaces the inferior solutions by superior newly generated solutions to get a better current population. The generic pseudocode of GA is presented in Algorithm 1. A typical crossover operator randomly splits two solutions at a randomly selected crossover point and exchanges parts between them (Figure 2(a)) and a typical mutation operator alters a solution at a random point (Figure 2(b)). In the case of PSP, conformations are regarded as solutions of a GA population.

Crossover Operators.
The crossover operators are applied on two selected parent conformations to exchange their parts to generate child conformations. In a single-point crossover, both parents are split at a single point (Figure 2(a)) while in a multipoint crossover they are split at more than one point. Nevertheless, the crossover operations succeed if they produce conformations that satisfy the self-avoiding walk constraint. In lattice-based protein representation, a self-avoiding walk constraint ensures no revisitation of any lattice point during the sequence mapping. (Figure 2(b)) are applied on a single conformation. The operators can perform single-point change or multi-point changes. The mutation operations succeed if the resultant conformation remains a self-avoiding walk on the lattice.

Related Work
We explored the literature on protein structure prediction based on both HP and 20 × 20 energy models.
Cebrián et al. [32] used tabu-based local search, and Shatabda et al. [8] used memory-based local search with tabu heuristic and achieved the state-of-the-art results. However, Dotu et al. [34] used constraint programming and found promising results. A constraint programming based exact and complete algorithm for structure prediction is implemented in the CPSP tools by Mann et al. [33]. These algorithms can find the optimal solution if the target protein has a matching hydrophobic-core stored in the CPSP database.
Besides local search, Unger and Moult [24] applied genetic algorithms to PSP and found their method to be more promising than the Monte Carlo based methods [22]. The GA has been used by Hoque et al. [35] for cubic and 3D HCP lattices. They also introduced a twin-removal operator [36] to remove duplicates from the GA population.
However, using 3D FCC lattice points, the recent stateof-the-art results for HP energy models have been achieved by genetic algorithms [10,37], local search approaches [8,11], a local search embedded GA [38], and a multipoint parallel local search approach [39].

Empirical 20 × 20 Matrix Energy Based Approaches.
A constraint programming technique was used in [40] by Dal Palù et al. to predict tertiary structures of real proteins using secondary structure information. They also used constraint programming with different heuristics in [41] and a constraint solver named COLA [42] that is highly optimised for protein structure prediction. In another work [43], a fragment assembly method was utilised with empirical energy potentials to optimise protein structures. Among other successful approaches, a population based local search [44] and a population based genetic algorithm [14] were used with empirical energy functions.
In a hybrid approach, Ullah and Steinhöfel [12] applied a constraint programming based large neighbourhood search technique on top of the of COLA solver. The hybrid approach produced the state-of-the-art results for several small sized (less than 75 amino acids) benchmark proteins. In another   work, Ullah et al. [45] proposed a two-stage optimisation approach combining constraint programming and local search. The first stage of the approach produced compact optimal structures by using the CPSP tools based on the HP model. In the second stage, those compact structures were used as the input of a simulated annealing based local search that is guided by the BM energy model.
In a recent work [13], Shatabda et al. presented a mixed heuristic local search algorithm for PSP and produced the state-of-the-art results using BM energy model on 3D FCC lattice. The mixed heuristic local search in each iteration randomly selects a heuristic from a given number of heuristics designed by the authors. The selected heuristics are then used in evaluating the generated neighbouring solutions of the current solution. Although the heuristics themselves are weaker than the BM energy, their collective use in the random mixing fashion produces results better than that produced by using the BM energy itself.
In this paper, we use the hydrophobic-polar categorisation of the HP model within a hydrophobic-core directed macromutation operator; however, most of the time the search is guided by the BM energy model. The macromutation is applied just like another mutation operator and utilises the distance of the hydrophobic amino acids from the hydrophobic core centre. The hydrophobic core centre is the centroid of only the hydrophobic amino acids. In effect the macro-mutation operator helps explore the structures that would have more interactions between H-H amino acid pairs; these pairs have energy contributions with large magnitudes. In the Results and Analyses section, we compare our experimental results with the results produced by [12,13].

Methods and Implementations
In our genetic algorithm based ab initio protein structure prediction method, we use the BM energy model along with the HP energy model. We use face-centred-cubic lattice points for protein structure representation. In this section, we describe the implementation details of our GA framework.

Genetic Algorithm Framework.
The pseudocode of our genetic algorithm framework is presented in Algorithm 2. It uses an exhaustive generation approach to diversify the search, a hydrophobic core directed macro-mutation operator to intensify the search, and a random-walk algorithm to recover from stagnation. The algorithm initialises (Line 6) the current population. At each generation, it selects a genetic operator based on a given probability distribution to use through the generation (Line 8). In fact, we select the operators randomly giving equal opportunities to all operators. The selected operator is used in an exhaustive way (Lines 10-11 or Lines 13-15) to obtain all conformations in the new population. We ensure that no duplicate conformation is added to the new population. The add( ) method either at Line 11 or at Line 15 also takes care of avoiding the duplicates while adding solutions to the new population list. For a given number of generations, if the best conformation in the new population is not better than the best in the current population, our algorithm then resorts to a random-walk procedure (Line 17) to diversify the new population. Nevertheless, after each generation, the new population becomes the current population (Line 18), and the search continues. Finally, the best conformation found so far is returned (Line 19).
Along with the BM energy mode, we use HP energy model to guide a macro-mutation. The macro-mutation operator is used as other mutation operators (Figures 4(b)-4(e)) in our GA. The exhaustive generation and macromutation are described below.
(1) mutantsadd(conf) operators, our GA adds one resultant conformation for each conformation in the current population to the new population. Operators are applied to all possible points (Algorithm 3) exhaustively until finding a better solution than the parent. If no better solution is found, the parent survives through the next generation. On the other hand, for crossover operators, two resultant conformations are added to the new population from two randomly selected parent conformations. Crossover operators generate child conformations by applying the crossover operator in all possible points (Algorithm 4) on two randomly selected parents. The best two conformations from the parents and the children are then become the resultant conformations for the next generation.

Macromutation
Operator. Protein structures have hydrophobic cores that hide the hydrophobic amino acids from water and expose the polar amino acids to the surface to be in contact with the surrounding water molecules [15]. H-core formation is an important objective of HP based PSP. Macromutation operator is a composite operator (Figure 3) that uses a series of diagonal moves (Figure 4(c)) on a given conformation to build the H-core around the hydrophobic core center (HCC). The HCC is calculated by finding arithmetic means of , , and coordinates of all hydrophobic amino acids. The diagonal moves are applied repeatedly either at each P-or at each H-type amino acid position. Whether to apply the diagonal move on P-or H-type amino acids is determined by using a Bernoulli distribution (Algorithm 5 Line 2) with probability (intuitively we use = 20% for P-type amino acids). For a P-type amino acid, the first successful diagonal move is considered. However, for a Htype amino acid, the first successful diagonal move that does not increase the Cartesian distance of the amino acid from the HCC is taken. All the amino acids are traversed and the successful moves are applied as one composite move. Nevertheless, the macro-mutation squeezes the conformation and quickly forms the H-core by repeating the procedure. In our GA, macro-mutation is used like other mutation operators. Algorithm 5 presents the pseudocode of the macro-mutation operator.

Stagnation Recovery.
In genetic algorithms, the similarity among individuals within the population increase as generations after generations pass on. This characteristic pushes the search towards stagnation. Any premature convergence also leads the search towards stagnation. To deal with a stagnation situation, we remove identical individuals and apply randomwalk as described below.

Removing Duplicates.
In our approach, unlike [36], we remove duplicates from each generation to maintain the diversity of the population. During exhaustive generation, we check the existence of the newly generated child in

Applying Random Walk.
Premature H-cores are observed at search stagnation. To handle stagnation situation, in our GA (Algorithm 2 Line 18), a random-walk [46] algorithm is applied. The pseudocode of the algorithm is shown in Algorithm 6. This algorithm uses pull moves [47] (as shown in Figure 4(d)) to break the H-core. We use pull moves because they are complete, local, and reversible. Successful pull moves never generate infeasible conformations. During pulling, energy level and structural diversification are observed with a view to maintaining a balance between these two. We allow the energy level to change within 5% to 10% with changes in the structure from 10% to 75% of the original. We try to accept the conformation that is close to the current conformation, in terms of the energy level but as far as possible in structural diversity, and which is determined by the function checkDiversity (Algorithm 6 Line 5). In our genetic algorithm, random-walk is very effective [46] in recovering from stagnation.

Further Implementation Details.
Our GA requires a representation of the conformations, initialisation of the population, evaluation of the solution in each iteration, and a set of genetic operators on the conformations.

Representation.
We represent each conformation by using the relative encoding and the Cartesian coordinates of each amino acid. In the crossover operator, the relative encoding helps generate the offspring easily while the coordinates help in other operators particularly in finding free lattice points. The relative encoding is also used in removing duplicates within a given generation.

Initialisation.
Our GA starts with an initial population, which is a set of feasible conformations. We generate initial conformations following a self-avoiding walk on FCC lattice points. The pseudocode of the initialisation algorithm is presented in Algorithm 7. It places the first amino acid at (0, 0, 0). It then randomly selects a basis vector to place the successive amino acid at a neighbouring free lattice point. The mapping proceeds until a self-avoiding walk is found for the whole protein sequence within a given number of iterations. If no valid conformation is found within the given number of iterations, a deterministic structure is returned. However, we have not encountered this situation in our experiments.

Evaluation.
After each iteration, the conformation is evaluated by calculating the contact (topological neighbour) potentials where the two amino acids are non-consecutive.
The pseudocode in Algorithm 8 presents the procedure of calculating the interaction energy of a given conformation. The contact potentials are found in BM energy model.

Implementing Primitive Operators.
Along with exhaustiveness, macro-mutation and random-walk, the primitive operators (as shown in Figure 4) are implemented in GA. The operators we implemented are single-point crossover (Figure 4(a)), rotation mutation (Figure 4(b)), diagonal moves (Figure 4(c)), pull moves (Figure 4(d)), and tilt moves (Figure 4(e)). Rotation, diagonal moves, pull moves, and tilt moves are implemented as mutation operators.
(1) Crossover: at a given crossover point (dotted circle in Figure 4(a)), two parent conformations exchange their parts and generate two children. The success rate of the crossover operator decreases with the increase in the compactness of the structure.
(2) Rotation: one part of a given conformation is rotated around a selected point (Figure 4(b)). This move is mostly effective at the beginning of the search.
(4) Pull moves: the amino acids at points A and B are pulled to the free points (Figure 4(d)) and the connected amino acids are pulled as well to get a valid conformation. Pull moves [47] are local, complete and reversible. Pull moves are very effective especially when the conformation is compact.
(5) Tilt moves: Two or more consecutive amino acids connected in a straight line are moved by a tilt move to immediately parallel lattice positions [25]. Tilt moves pull the conformation from both sides until a valid conformation is found. In Figure 4(e), the amino acids at points C and D are moved and subsequently other amino acids from both sides are moved as well.

Results and Analyses
We implemented our GA framework in Java (J2EE

Benchmark.
The protein instances used in our experiments are taken from the literature (as shown in Table 2). The first seven proteins 4RXN, 1ENH, 4PTI, 2IGD, 1YPA, 1R69, and 1CTF are taken from [12] and the next five proteins 3MX7, 3NBM, CMQO, 3MRO, and 3PNX from [13]. Table 3 compares our experimental results with the state-of-the-art results obtained by [12,13] on FCC lattice and BM energy model. We compare our results using two criteria: free energy level obtained by our algorithm and that obtained by other algorithms, and the root mean square deviation (RMSD) of the structures obtained by our algorithm and that obtained by other algorithms.

Interaction Energy Level.
We present interaction energy values in two different formats: the global lowest interaction energy (column Best) and the average (column Avg) of the lowest interaction energies obtained from 50 different runs.
In case of the global best energy, our approach outperforms the state-of-the-art approaches in [12,13] in 9 out of 12 benchmark proteins. However, in case of average energy, our approach outperforms both of the approaches in all 12 benchmark proteins.

Root Mean Square Deviation.
The root mean square deviation (RMSD) is frequently used in measuring the difference between values predicted by a model and the values actually observed. For any given structure, the RMSD is calculated using where and denote the distances between th and th amino acids, respectively, in the predicted structure and the native structure of the protein. To calculate the RMSD, the distance between two neighbour lattice points ( √ 2 for FCC lattice) is considered as 3.8Å, the average distance between two -Carbons. From the experimental results, it is clear that our approach significantly improves the RMSD values in comparison to the state-of-the-art approaches [12,13].

Statistical Significance Test.
For both average interaction energy and average RMSD values, we performed the Mann-Whitney test with 95% confidence level to verify the significance of difference between our results and the state-of-the-art results. The test outcomes supported our improvements in all 12 proteins.

Relative Improvement (RI)
. The difficulty to improve energy level is increased as the predicted energy level approaches to a known lower bound of a given protein. For example, if the lower bound of free energy of a protein is −100, the efforts to improve energy level from −80 to −85 is much less than that to improve energy level from −95 to −100 though the change in energy is the same (−5). The rightmost RI columns in Table 3 show the relative improvements that our algorithm (target) achieved with respect to the state-ofthe-art approaches (reference). For each protein, the relative improvement of the target ( ) with respect to the reference ( ) is calculated using (5), where and denote the average energy values achieved by target and reference, respectively. We use a similar equation to calculate the relative improvement for RMSD: From these results, we see that our GA in all 12 proteins improves the search quality from 1.12% to 5.46% in the average interaction energy and from 2.93% to 36.92% in the RMSD values.

Detailed
Analyses. The BM energy model actually implicitly bears the characteristic of hydrophobicity. The matrix values present some variations within amino acids of the same class (H or P). A partition algorithm such as 2-means clustering algorithm easily reveals the H-P partitioning within the BM model. Given this knowledge, we study the effect of explicitly using hydrophobic property within our GA.

Effect of Macromutation
Operator. Our macro-mutation operator biases the search towards a hydrophobic core by applying a series of diagonal moves and thus achieves improvements in terms of BM energy values of the output conformations. One question to be investigated is whether the improvement comes only from the repeated application of diagonal moves but not from the exploitation of the hydrophobicity knowledge. If hydrophobicity is the reason for improvement, a related question is whether using solely the HP model instead of using BM model anywhere in the search will be useful; the output energy level of course will be in the BM model. In order to answer these questions, we implemented four different versions of our genetic algorithm.
(1) BH: this version is our final algorithm that we described in detail, and used in presenting our main results in Table 3 and in comparing with the stateof-the-art results. To reiterate, this version uses the BM energy model for search and energy reporting and hydrophobicity knowledge in the macromutation operator that repeatedly applies diagonal moves towards forming a hydrophobic core. (2) BD: this version of our GA uses the BM energy model for search and energy reporting. It also uses the BM energy model within the macro-mutation operator that applies diagonal moves repeatedly. This version will show the effect of the macro-mutation without using the hydrophobic property. (3) BM: this version of our GA uses the BM energy model for search and energy reporting. However, this version does not have any macro-mutation operator and thus could be seen as the baseline algorithm for the questions to be investigated. (4) HP: this version of our GA uses the HP energy model for search. However, we report the energy values of the final conformations returned by the GA in BM energy model. Note that this version has the hydrophobic core directed macro-mutation operator. This version will show whether HP model is sufficient even when the energy of a conformation is to be in the BM model. Table 4 presents the experimental results to show the effects of using hydrophobicity knowledge within the macromutation operator. The energy values are obtained by running each algorithm with a 60-minute time cutoff. The average values are calculated over 50 different runs. We also perform the Mann-Whitney test with 95% confidence level to test the significance of differences. From these results, we see that the HP version performs the worst followed by the BD version. The BD version itself performs worse than the BM version which is worse than the BH version. From all these, we conclude that using solely HP model is not sufficient and the application of repeated diagonal moves without using the hydrophobicity is not helpful.
For further analysis of the macro-mutation, in Table 5 we present the total numbers of contact and the numbers of H-H, H-P, and P-P contacts present in the output conformations of the above-mentioned four variants of our GA. The number of contacts are the average over the 50 runs for each protein.
Analysing the number of contacts, we see that the HP version has more H-H contacts, but fewer H-P, P-P, and total contacts than the other three versions. In the BM energy model the H-H contacts contribute the large energy values; however, unlike the H-P model, other types of contact in BM model also contribute energy values. In the BD and BM versions the improvement compared to the HP version comes from the increase in the number of H-P, P-P, and total contacts, although there is a decrease in the H-H contacts. The BD version performs worse than the BM version implying the repeated diagonal moves without using hydrophobicity are rather harmful. The BH version improves over BD and BM by increasing the H-H contacts and decreasing the H-P contacts; the total contacts and the P-P contacts remain almost similar. All these lead us to conclude that the explicit use of the implicit hydrophobicity knowledge within the macro-mutation is effective. The search algorithm could not utilise the HP knowledge implicitly buried within the BM model.
To demonstrate the search progress, we periodically find the best energy values obtained so far in each run. For a given period, we then calculate the average energy values obtained for that period over 50 runs. We used a 2-minute time interval. Figure 5 presents the average energy values obtained at each time interval for two different proteins: 4RXN ( Figure 5(a)) and 3PNX ( Figure 5(b)) are the smallest and the largest amongst the 12 benchmark proteins. From both of the charts, we see that the final version of our algorithm BH clearly outperforms the other three versions.

Effect of Initialisation.
Initialising the population is an important part of population based search algorithms. We initialise the GA population with randomly generated valid structures using the procedure shown in Algorithm 7. As noted before, our GA uses a macro-mutation operator that tries to take the search towards forming a hydrophobic core. This leads us to test whether initialisation with conformations that already have optimal HP core at the center would expedite the search. For this, we use the CPSP tools that produce structures with very compact hydrophobic cores. Table 6 shows the results of these experiments; the reported values are average over 50 runs for each protein. We see that compared to our random initialisation method, initialisation with the structures produced by the CPSP-tools, although it gives a very good energy value at the beginning, it leads to worse performance at the end. This observation remains the same when the hydrophobic core-directed macro-mutation operator is used and when it is not used.

5.7.
Simplified Structure. Figure 6 shows FCC structures of six different proteins at their lowest free energy levels as obtained by our final GA variant. We use Jmol (Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/) to draw the structures.