Modeling Beta-Traces for Beta-Barrels from Cryo-EM Density Maps

Cryo-electron microscopy (cryo-EM) has produced density maps of various resolutions. Although α-helices can be detected from density maps at 5–8 Å resolutions, β-strands are challenging to detect at such density maps due to close-spacing of β-strands. The variety of shapes of β-sheets adds the complexity of β-strands detection from density maps. We propose a new approach to model traces of β-strands for β-barrel density regions that are extracted from cryo-EM density maps. In the test containing eight β-barrels extracted from experimental cryo-EM density maps at 5.5 Å–8.25 Å resolution, StrandRoller detected about 74.26% of the amino acids in the β-strands with an overall 2.05 Å 2-way distance between the detected β-traces and the observed ones, if the best of the fifteen detection cases is considered.


Introduction
Cryo-electron microscopy (cryo-EM) has become a major experimental technique to study structures of large protein complexes [1,2]. Many large complexes have been resolved to about 3Å resolution recently [3,4], at which the position of protein backbone can be distinguished. For cryo-EM maps at lower resolutions, such as 5-8Å (referred to as medium resolution in the paper), detailed molecular features are not resolved. It is a challenging problem to derive atomic structures from such density maps. Two types of approaches have been proposed. Fitting relies on a suitable atomic structure [5][6][7][8][9] and de novo modeling relies on the match of secondary structures between those in the density map and those in the protein sequence [10][11][12][13][14][15][16][17]. The most characteristic patterns in cryo-EM density maps at medium resolutions come from secondary structures of a protein chain. An -helix often appears as a cylinder or a stick and can be identified using image processing methods [14,[18][19][20][21][22]. A -sheet consists of multiple -stands ( Figure 1). A -sheet often appears as a thin layer of density and can be detected computationally [14,20,[23][24][25]. The spacing between two neighboringstrands is between 4.5 and 5Å, and therefore -strands are not resolved at medium resolutions [26,27].
The position of -strands provides important constraints in backbone modeling of a protein. We previously proposed an approach to predict the location of -strands using StrandTwister [28]. StrandTwister is built on the principle of right-handed twist of -strands that was discovered as early as 1970s [29]. The right-handed twist was measured along the peptide orientation as about 0 ∘ to 30 ∘ per residue [30], and an estimation of the strand orientation was proposed using the corners of a -sheet [31]. StrandTwister was able to detect -strands from single -sheets. However, -sheets have a variety of shapes that add complexity for -strands detection. Some -sheets appear as rolls and propellers, and others are -barrels ( Figure 2). Currently there is no computational method to detect -strands from a -barrel density map. In this paper, we propose a method to predict -strands by utilizing prior knowledge about -barrels. The proposed method StrandRoller is quite different from the method of StrandTwister.
A -barrel is a large -sheet in which the first -strand is hydrogen-bonded with the last -strand. -barrels are commonly found in porins and other proteins that span cell membranes [32]. It was noticed by McLachlan in 1979 that the number of strands and their relative stagger completely determines the overall structure of a -barrel [33]. The  Figure 1: Secondary structure detection from density maps. Left: A density map (gray) was simulated at 10Å resolution using the atomic structure of protein 3GP6 (PDB ID) in (a), 1TIM (PDB ID) in (b), and EMAN software [38]. Middle: helices (colored lines) and -sheets (blue voxels) are detected from the density maps using SSETracer [24]. Right: The detected -helixes and -sheets are superimposed with the atomic structure in (a) and (b), respectively. Each atomic structure contains a -barrel. Figure 1(a) is reproduced with permission from IEEE [39].

Trefoil
Solenoid Propeller Sandwich · · · · · · Prism Roll main structural characteristics of an ideal -barrel have been discussed based on a cylindrical barrel [33][34][35]. Studies have shown that tilt angle and interstrand distance for all -barrel structures vary within a fairly small range [35][36][37]. Our method, StrandRoller, is designed to utilize such characteristics of tilt angles and interstrand distance.
A helix identified from the medium resolution cryo-EM map is often represented as a line (colored line in Figure 1), referred to as an -trace that corresponds to the central axis of a helix. We define a -trace as the central line along a -strand. In particular, the observed -trace is the line interpolating all geometrical centers of three consecutive C atoms on a -strand plus two C atoms at the ends of the -strand (black line in Figure 3). An observed -trace represents the line along the atomic structure of a -strand ( Figure 3). Given the -barrel density voxels, the problem of -strands detection is to find the orientation (Figure 3(a)) and location (Figure 3(b)) of -traces from the three-dimensional cryo-EM map.
Preliminary result of StrandRoller was shown in [39]. Details of the method and more thorough tests on different sizes of -barrels are shown in this paper. In addition, we tested our method on eight pieces of experimental cryo-EM data that were downloaded from EM DataBank. The result suggests that StrandRoller can be used for the prediction of -traces from medium resolution cryo-EM density maps of -barrel when the rough barrel region is segmented.
or catenoid surfaces [43]. Although atomic structures ofbarrels are different from -barrel density maps, many small -barrels visually appear as cylinders with nonuniform ends. Although various models can be used to approximate the major area of a -barrel, -barrel density maps often deviate from the mathematical models at certain regions. In order to create a surface well representing the density map, we used a simple elliptical model initially and adjusted the model at those regions that do not fit. Strand generation was then performed on the adjusted barrel surface model ( Figure 4). We assume that the -barrel density has been segmented from the entire cryo-EM density map.

-Barrel Surface Model from Cryo-EM Density Voxels.
In order to represent the shape of a -barrel, we reduced the -barrel density map to a surface. Two steps are involved in creating the surface model. The first step involves identification of the axis of the barrel density map ( Figure 5(a)). The barrel density map was first translated to global origin (0, 0, 0) based on its geometric center. An elliptical cylinder (1) was then utilized to search for the orientation of central axis. The orientation was selected using exhaustive search and least square fitting to the cylinder. The entire barrel density was then rotated such that the central axis aligns with theaxis ( Figure 5(b)). (1) Instead of a mathematical formula, our -barrel model consists of a thin layer of density voxels that closely represents the morphed barrel shape and outline of barrel at two ends (yellow points in Figure 5(e)). The barrel model was generated using cross-sections from bottom to top of the volume. The density voxels on each cross-section of -axis ( Figure 5(c), gray) appear to be nearby the ideal model of ellipse. The voxels that are closest to the ideal ellipse were Note that such a discrete model closely represents the threedimensional distribution of the voxels. For example, when the fitted ideal elliptical cylinder is outside the density (arrows in Figure 5(c)), the voxels on the density map were used to adjust the barrel model. It appears that the resulting barrel model clearly represents the morphed regions, especially at the two ends of barrel (arrows in Figure 5(e)). We find that it is important to have an accurate barrel surface to model the -traces accurately.

Strand Generation on the Barrel Model.
McLachlan noticed in 1979 that the number of strands and their relative stagger completely determines the overall structure of abarrel [33]. The main structural characteristics of idealbarrel have been discussed based on a cylindrical barrel [33][34][35]. Studies have shown that tilt angle ( Figure 5(f)) of a -strand can vary between 30 ∘ and 60 ∘ , as reflected in the known structures of membrane proteins [35][36][37]. The tilt angle may vary by ±15 ∘ for different strands in the samebarrel [37]. However, the interstrand distance remains to be 4.5∼5Å due to hydrogen bonds between two neighboringstrands. StrandRoller uses previous knowledge about the tilt angle and the interstrand distance in the modeling ofstrands.
An initial -trace (blue in Figure 5(f)) was produced by tilting the barrel axis with and projecting it onto the barrel surface model. The second -trace was then generated from (a) the previous one by traveling a horizontal distance ℎ (2) on the barrel surface ( Figure 5(f)).
Given a tilt angle, the entire set of -traces can be built iteratively on the barrel surface ( Figure 5(g)) until the last -trace is generated ( Figure 5(f)). The tilt angle was sampled every 5 ∘ between 35 ∘ and 55 ∘ , and three translations were sampled at each tilt angle. There are fifteen sets oftraces generated for one barrel volume. Note that the fifteen sets are within a very small range of tilt angle (20 ∘ ) and translation distance (4.8Å). The barrel density along with the detected -strands was eventually translated and rotated back to the original position in the map after the detection is done (Figure 5(h)).

Result
StrandRoller was tested using three sets of -barrel density maps: eighteen small simulated maps, fourteen large simulated maps, and eight experimental cryo-EM -barrel maps. The proteins used in the simulated test set were collected from the -barrel transmembrane superfamily of Orientations of Proteins in Membranes (OPM) database [44] with less than 40% sequence similarity. The atomic structures of -barrels were used to generate -barrel density maps at 10Å resolution using the pdb2mrc function in EMAN [38], with a sampling of 1Å/pixel. The experimental cryo-EM density maps were downloaded from EMDB (http://www.emdatabank.org/). Since atomic structures are available for such cryo-EM maps, the density region that corresponds to one chain of the protein was first segmented using the atomic structure as an envelope. The -sheet voxels were then manually outlined based on the atomic structure of the -barrel. Such segmented testing maps bare the characteristic of a -sheet and have an outline of a -barrel. The accuracy of -strand detection was evaluated using two parameters as previously implemented [28]: 2-way distance between the set of detected -traces and the set of observed -traces and number of amino acids covered in the detected -trace. The observedtrace is the line interpolating all geometrical centers of three consecutive C atoms on a -strand plus the two C atoms at the ends of -strand, as shown in Figure 3.
In order to estimate how much of a -strand was detected, the percentage of the detected C atoms of an observed -strand was calculated. An amino acid of a -strand is considered detected if the projection distance from its C atom to the corresponding detected -trace is less than 2.5Å, which is about half -strand spacing. Since the number of detected -strands may be different from the number of observed -strands, one-to-one correspondence needs to be established between subsets of the -traces. For example, if detected set contains five -traces while observed set contains six -traces, five out of the six observed -traces which have the overall smallest 2-way distance with the five detectedtraces will be selected for the calculation of 2-way distance. This ensures that the same number of detected -traces ( 1 , 2 , . . . , ) is compared to the same number of observed -traces ( 1 , 2 , . . . , ) in which is compared with for = 1, . . . , . The number of misdetected (and/or wrongly detected) -strands can be inferred from the difference between the total number of the observed and that of the detected -traces. The 2-way distance of a -strand , was calculated for each pair of lines and . The overall 2-way distance reflects how far the two sets of -traces (detected and observed) are from each other.
In formula (3), and are the numbers of points on detected -traces ( ) and observed -traces ( ), respectively. and are the indices of a point along lines and , respectively.
is the projection distance from point of to . The projection of point is required to be within line . In case it is outside, the distance between and an end of was used as an approximate distance.

Performance on the Simulated Density Data.
The purpose of this test is to investigate if traces of -strands can be modeled from -barrel density maps simulated to 10Å resolution, at which the separation of -strands is not visible. To discuss the ability of our -trace detection, we use the best of fifteen sampled sets. The best set is the one that is closest to the observed set in terms of 2-way distance.

Small-Medium Barrels.
Small-medium -barrels refer to those with less than 15 -strands in each. The test of eighteen simulated small-medium sized -barrel density maps shows that one of the fifteen sets of -traces aligns very well with the observed set of -traces, with an overall 2-way distance of 1.61Å for the detected -traces (Table 1). In the case of sheet A13 of PDB structure 1G7K, the detected set of -traces appears to align with the -strands very well (Figure 6(a)). In this case, all the eleven strands were detected with a small 2-way distance of 1.8Å (Table 1 row 1).
To analyze the sensitivity of the detection, we calculated the percentage of the detected C atoms of an observedstrand. For example, 1TX2 B has all eight -strands detected ( Table 1 row 5). It missed three amino acids. For the eight detected -strands, the 2-way distance is only 0.92Å. Among the eighteen test cases, StrandRoller appears to be able to detect 78.26% of the -strands fairly accurately in one of The number of -traces in the best of the fifteen possible sets/the number of -strands in the -sheet of the PDB structure. c The 2-way distance (inÅ) between observed -traces and modeled -traces for the best of the fifteen possible sets. d The number of detected/the total number of amino acids in the -barrel.
the fifteen sampled sets of -traces (Table 1). Seventeen test cases have the number of -strands detected the same as observed. The number of detected amino acids and 2-way distance are two parameters that have been used previously in accuracy measurement. Length-association method was proposed recently and can be a potentially more sensitive method to evaluate secondary structure detection [45].

Large Barrels.
Large barrels in this paper refer to those with more than 15 -strands. Large barrels appear to be more challenging. Some extremely large -barrels, such as the 22-stranded -barrels 2GUF D23 and 2HDI D23 (Table 2), were still well detected. The 2-way distance is 2.03Å in the case of 2GUF D23 (Table 2 row 1) and 1.93Å in the case of 2HDI D23 (Table 2 row 2). -Barrel 2GUF D23 has twentyone of twenty-two -strands detected ( Table 2 row 1). It missed sixty-six amino acids in which most are at the edge (arrows in Figure 6(b)). For the twenty-one detectedstrands, the 2-way distance is 2.03Å. Among the fourteen test cases of large sized -barrels, StrandRoller appears to be able to detect 69.46% of the -strands fairly accurately in one of the fifteen possible sets of -traces, with an overall 2-way distance of 2.12Å for the detected -traces (Table 2). It is noticed that the performance of StrandRoller is better on small-medium sized barrel than on the large sized barrel.
The number of -traces in the best of the fifteen possible sets/the number of -strands in the -sheet of the PDB structure. c The 2-way distance (inÅ) between the observed -traces and the modeled -traces for the best of the fifteen possible sets. d The number of detected/the total number of amino acids in the -barrel.
Large -barrels are more likely to adopt flexible shapes. The missing detection appears to be more at the edge of largebarrels, where the -strands tend to be more flexible (arrows in Figure 6(b)). The number of -strands in large -barrels also tends to be hard to detect due to the error accumulated during strand generation step. Since each -trace is deducted from the previous generated one, error could be propagated while traveling around the barrel.

Performance on Experimental Cryo-EM Data
. Stran-dRoller was tested using eight -barrels obtained from experimental cryo-EM density maps. The eight test cases are small ribosomal proteins in which the first -strand is hydrogenbonded with the last -strand. Experimental data are often more challenging to analyze due to the noise and missing density. Figure 7 shows three density regions that were segmented from cryo-EM maps at 5.8Å, 5.5Å, and 6.7Å resolutions, respectively. At these resolutions, -strands are not visible in density maps. StrandRoller was able to detect all -strands on the barrels and they align fairly well with the observed -traces. In the case of 70S ribosome EMD 1657 (sheet AH4 in protein 4V5H), the 2-way distance for the fivestranded barrel is 1.94Å, and it detected 25 of 30 amino acids on the -barrel (Figure 7(a)). In the case of 80S Ribosome EMD 1780 (sheet AH4 in protein 4V7E), the 2-way distance for the six-stranded barrel is 1.88Å, and it detected 24 of 28 amino acids on the -barrel (Figure 7(b)). We noticed that the eight -barrels in the cryo-EM maps are all small barrels with less than nine -strands. StrandRoller appears to be fairly accurate in detection of -strands from such cryo-EM maps with an overall 2.05Å 2-way distance and 74.26% of amino acids detected (Table 3).

Discussion
We previously showed that the accuracy of -strand detection is affected by the accuracy of -sheet detection [28]. This is also true in the context of -barrels. A -barrel is a closed structure, and the number of -strands may be estimated   Figure 7(c) is more conservative than that in Figure 7(d).
The relaxed segmentation of the same barrel includes more density volume at the edge of the barrel. Although the 2way distance is 1.73Å in the more conservative segmentation versus 1.83Å in the other, the main difference in the resulting -traces appears to be the length difference. Both detected the same number of -strands with similar orientation and position. Our result suggests that the number of detectedstrands is not sensitive to the density segmentation errors at the two ends of -barrel.

Conclusion
The position of -strands is critical for modeling atomic structures of proteins. However, it has been a challenging problem to detect -strands when no separation of thestrands is visible from the density maps. The variety of shapes of -sheets adds the complexity of this problem. We previously proposed StrandTwister to detect -strands from single -sheet using right-handed twist [28]. We propose a new method to predict -strands from a -barrel density map directly using the characteristic tilt angles of the -barrel. This approach bypasses the need to measure twist angles. Our results show that this approach is feasible. As long as the rough density region of a -barrel is isolated from the entire density map, location of -strands can be modeled. However, current limiting factor is the lack of automatic detection methods of -barrels from a cryo-EM density map. In fact a -barrel has a fundamental shape character in which a hole is surrounded by a -sheet. However, accurate detection of -barrels needs to consider different characters of the hole depending on different sizes of -barrels. We are hopeful that such a detection tool will be available in the near future.
StrandRoller does not require the resolution of cryo-EM density map to be higher than 5Å to resolve the separation of -strands. It applies to the maps with lower resolutions. In the test containing eight experimental cryo-EM -barrel maps between 5.5Å and 8.25Å, StrandRoller detected about 74.26% of the amino acids in the -strands in one of the fifteen sets of predicted traces. We demonstrate again that it is possible to derive -strands from density maps at medium resolutions. To our knowledge, StrandRoller is the first method that attempts to address the problem ofstrands detection from medium resolution -barrel maps. Future work includes developing more accurate methods in identification of -traces and generating alternative -traces for further evaluation in modeling.