Pathway complexity of model virus capsid assembly systems

As computational and mathematical studies become increasingly central to studies of complicated reaction systems, it will become ever more important to identify the assumptions our models must make and determine when those assumptions are valid. Here, we examine that question with respect to viral capsid assembly by studying the ‘pathway complexity’ of model capsid assembly systems, which we informally deﬁne as the number of reaction pathways and intermediates one must consider to accurately describe a given system. We use two model types for this study: ordinary differential equation models, which allow us to precisely and deterministically compare the accuracy of capsid models under different degrees of simpliﬁcation, and stochastic discrete event simulations, which allow us to sample use of reaction intermediates across a wide parameter space allowing for an extremely large number of possible reaction pathways. The models provide complementary information in support of a common conclusion that the ability of simple pathway models to adequately explain capsid assembly kinetics varies considerably across the space of biologically meaningful assembly parameters. These studies provide grounds for caution regarding our ability to reliably represent real systems with simple models and to extrapolate results from one set of assembly conditions to another. In addition, the analysis tools developed for this study are likely to have broader use in the analysis and efﬁcient simulation of large reaction systems.


Introduction
The study of viral capsids has attracted considerable attention from the fields of mathematical and computational modelling, in large part because of capsid assembly's value as a general model of the self-assembly of large macromolecular complexes. Capsids are remarkable for the size, complexity and diversity of the structures they build, as well as the high efficiency and fidelity of their assembly processes. They therefore serve as an important test of our ability to understand and predict self-assembly dynamics in both biological and artificial systems. By learning the principles by which viral capsids assemble so effectively, we hope to learn not only about viral biology, but also about how similar principles can apply to the many simpler examples of self-assembly in biology. Likewise, by understanding these principles, we may learn how to apply them to achieve similarly remarkable behaviours in self-assembling nanotechnology [15,29,31]. There is currently no feasible experimental method to directly observe a rapid, nanometer-scale assembly process such as that involved in building a capsid. Capsid assembly has thus been an important model for numerous simulation studies, where they have helped us elucidate general principles that may guide self-assembly [1,4,28,35], map out parameter spaces of possible assembly products and pathways [6,8,10,11,14,21,26], and perform model-based interpretation of indirect experimental measures of assembly progress in real virus systems [3,9,38,39].
One intriguing observation to emerge from many of these computational studies is the fact that a single assembly system can exhibit very different assembly pathways under only moderately different assembly conditions. Several simulation studies of capsid assembly have suggested that changing parameter values (e.g., binding free energies, concentrations or configurational tolerances of binding) so as to promote more rapid assembly can abruptly shift a system from a productive nucleation-limited assembly pathway to an unproductive pathway dominated by kinetically trapped incomplete structures [4,14,17,18,20,21,32,34]. Other studies have shown that similarly small changes in assembly conditions can shift pathways so as to alter the morphology of final assembled structures [1,14,17,18,21]. Experimental support from diverse real viral assembly systems supports these simulation predictions, showing that modest changes in assembly conditions can indeed shift capsid assembly from productive to kinetically trapped assembly [4,13,16,24,36], or induce malformations or other altered morphologies [5,12,19,22,37]. Recent modelling work by Sweeney et al. [26] has further suggested that even a single end state may be reached by very different pathways depending on small changes in local binding parameters. They showed that a simple, two-parameter T ¼ 1 icosahedral model could exhibit three prominent pathways depending on parameter choices: one proceeding by a fifthorder pentamer nucleation event followed by elongation through successive monomer additions (monomer pathway); a second proceeding by a rapid accumulation of dimers, a third-order nucleation through a trimer of dimers and elongation by dimer additions (dimer pathway); and a third proceeding by rapid accumulation of pentamers, nucleation through a trimer of pentamers, and elongation by pentamer additions (pentamer pathway). Modest shifts in rates or concentrations of subunits, within the ranges accessible to in vitro experiments, were sufficient to shift the system from one pathway to another or to a kinetically trapped region in which all three pathways converge.
All of these results point to a common conclusion: assembly pathways need not be an inherent property of a viral assembly system. Rather, the specific pathways used by a capsid system can shift quite dramatically in response to relatively small changes in assembly conditions. This conclusion has important implications for the study of viral assembly and for its use as a model system for complex self-assembly in general. First, it suggests that basic assembly mechanisms determined from in vitro capsid assembly systems could be quite different from those used by the same systems in vivo, a potentially serious problem given that we currently have a very limited ability to monitor capsid assembly in vivo. Accurately determining the assembly pathways of a given system is important not only as a basic research question, but also potentially for attempts to manipulate or interfere with those pathways, as in current attempts to develop capsid-targeted antiviral drugs [23,25,27,36]. Second, the sensitivity of pathway choice has implications for how we can model assembly mathematically and computationally. Capsid assembly is a complicated and computationally expensive process to model and there are thus tremendous advantages to working with highly simplified models of the process, provided such models are accurate. Endres and Zlotnick [6] have shown that just a few pathways are often adequate to describe almost perfectly the behaviour of a simple capsid model with hundreds of pathways in principle accessible to it. Likewise, mathematical models of capsid assembly based on nucleation theory [30] provide a powerful tool for understanding and predicting capsid behaviour from first principles, provided assembly is under conditions allowing for a simple nucleation/elongation pathway. Studies from stochastic models have suggested that even for complicated capsid assembly models, a large fraction of the accessible parameter space will fall into one of a small number of distinct parameter domains each explicable by a simple pathway model [6,11,14,26].
These studies have, however, left open the possibility that pathway complexity might rise significantly outside of, or between, these discrete domains. It may be the case that as we vary parameters to move between distinct simple pathway domains, we experience rapid 'phase transitions', with simple models providing accurate explanations for almost all parameter values. Or it may be the case that the parameter space contains broad border regions between the domains, in which simple pathway models break down and with them some of our most powerful simulation and analysis tools. In the present work, we attempt to distinguish between these two possibilities by examining how pathway complexity varies as we move across the assembly parameter space. We apply two complementary modelling approaches. First, we use the ordinary differential equations (ODE) framework developed by Zlotnick and Endres [6,35] to examine the trade-off between model complexity and accuracy in a fully deterministic, but relatively simplified setting. This model allows us to follow Endres and Zlotnick in asking rigorously how effectively one can prune the pathway set from a model without substantially affecting overall reaction progress. We also conduct a parallel study using stochastic capsid models [33], which allow us to drop some assumptions of the ODE models and look at a substantially larger model system, but at the cost of limiting our analysis tools. By examining intermediate distributions in this model, we distinguish regions of parameter space that appear well described by simple nucleation -elongation models from those that appear unlikely to be describable without an extremely large number of distinct pathways and reaction intermediates. Both lines of inquiry support the conclusion that the ability of a system to be described by a compact model can vary quite dramatically given relatively small parameter changes, suggesting that the notion of reaction pathways is not as cleanly defined for complicated assembly systems as we might hope. Our results provide guidance for issues of model selection and model-based interpretation of experimental results for capsid assembly and other complicated self-assembly systems. In addition, the analysis tools developed for this study may have broader applicability in efficiently modelling complex reaction systems in general.

Models and methods
We applied two complementary modelling methods to examine the variability of pathway complexity across assembly parameter spaces. ODE models [35] allow us to exhaustively determine fractional pathway usage, but at the cost of requiring highly simplified capsid models and limiting allowed assembly pathways. Stochastic simulation algorithm (SSA) models [7] allow us to perform a random sample of pathways accessible to a far more complicated model system, but their high computational cost and the stochastic noise in their results limit our ability to perform the kind of exhaustive analysis possible in the ODE case. This section describes each model type in turn and explains how it was used to examine pathway complexity across the parameter space.

ODE models
We began our analysis, using an ODE model based on the approach of Endres and Zlotnick [6]. As in that work, we used a simplified dodecamer capsid model, which can be considered a model of T ¼ 1 assembly from pentameric capsomer subunits. We applied two variants of this model. In the first model, the dodecahedral shell is assumed to assemble only through the addition of single subunits per reaction, each representing one of the 12 faces of the dodecamer. This assumption of assembly by successive monomer additions is commonly used to counter the rapid proliferation in possible oligomer/oligomer binding pathways with structure size, which would otherwise make the model computationally infeasible. A heuristic justification is provided by the argument that in typical self-assembly systems the equilibrium concentration of monomers is much larger than that of the intermediates. However, as has been shown recently using stochastic methods, there are regions of parameter domain where the intermediate concentrations can far exceed their equilibrium values during the assembly process. Therefore, we used a second model, which allowed dimer/oligomer binding also in addition to monomer/oligomer binding. This allows us to assess the effects of adding more pathways on the self-assembly kinetics. The distinction is illustrated in Figure 1. Figure 1(a) shows an example of a monomer addition reaction, where a monomer and an octomer bind to form a 9-mer. This reaction would be allowed in either model. Figure 1(b) shows a dimer reaction, with a monomer and a septamer binding to form a 9-mer. This reaction is allowed in the dimer/oligomer pathway set, but not in the monomer/oligomer pathway set. Figure 1(c) shows binding of a trimer to a hexamer to build a 9-mer; this reaction would not be allowed in either of the ODE pathway sets we consider. With either pathway set, we assume that the forward rate constants are identical for all reactions, in effect ignoring the relative variation in diffusion coefficient with size of the oligomer.

Defining the ODE pathway set
Each species in the assembly tree is identified by two indices (size, type), where size is the number of subunits in the species and type is an arbitrary assignment used to distinguish two sister species. Let us represent the molar concentrations of each species (j,k) as [ j, k ] and the degeneracy of forward reaction (monomer/dimer addition) between (j, k) and (m, n) as a m;n j;k : Similarly, the backward reaction degeneracy is represented by b j;k m;n · a m;n j;k and b j;k m;n capture the reaction degeneracies produced by symmetries of the structures, a topic explained in greater detail when we describe our protocol for testing graph isomorphism below. The relative stability of each species is determined by the Arrhenius factor exp (2 G/RT), where G is the molar free energy of the structure, R is the gas constant and T is the absolute temperature. Our model assumes that each bond contributes the same amount of molar free energy, denoted by DG. If species (j,k) has c j,k contacts or bonds, the relative stability of (j,k) with respect to (m,n) is The forward reaction rate also depends on the symmetry of the binding unit (monomer or dimer), denoted by O(n), {n ¼ 1, 2}. The constant forward reaction rate k on is then irrelevant to the kinetics and only sets the overall time scale. In terms of these quantities the set of kinetic equations can be written as: where (m 2 j) represents either the monomer or the dimer. The primary task in constructing the assembly tree is identifying all distinct intermediate oligomers and computing the forward and backward degeneracies for each reaction pair. Since the dodecahedral structure can be projected onto a unit sphere as 12 symmetrically placed points, each of the 12 sites are identified by their spherical-polar coordinates {u i , f i } and nearest neighbour locations can be stored as an incidence matrix. Our procedure represents graphs for any species G by a density distribution: r G ðu; fÞ In graph theoretical terms, each intermediate can be represented by a planar graph. Our representation of binding sites suggests a natural definition for distinguishing species, in analogy with rigid body transformations: if two species can be translated over the unit sphere and rotated about their centre of mass in such a manner that their configurations overlap then they are non-distinct. We iteratively construct the state space by successively adding a subunit to each available binding site of each species identified and testing the resulting oligomers for isomorphism. Each time a species is repeated, the forward degeneracy for the corresponding reaction is incremented by one. We are now in a position to give a precise definition of which graphs are identical: Definition 1. Two graphs G 1 and G 2 are non-distinct iff 't [ SO (3) such that the corresponding density functions satisfy: r G 1 (r) ¼ r G 2 (t r).
Here, t stands for both a member of SO(3) and its 3D representation in terms of the Euler matrices.
We have devised an algorithm to test planar graphs for isomorphism under this restricted set of transformations. The algorithm works in worst case time, steps for most pairs. We now give an outline of the algorithm used: (1) Compute the centroid of this density distribution: (2) Given the orientation of the centroid , check a necessary condition for isomorphism: This is just the SO(2) subgroup of SO (3). In terms of polar coordinates, this implies that the graphs differ only in rotations around the Z-axis.
, then assuming it has a degeneracy d u , it has d u possible candidates for its image in G 2 . Sequentially, test each of these cases. Let the set of possible images of i be If j is the image of i then the transformed GraphG 2 fulfils the identityG 2 ¼ R i;j G 2 ¼ G 1 . If this condition is not met for any j [ I i then the graphs cannot be related by an element of SO(3) and hence by Definition 1 are distinct. (6) We now address the exceptional case of graphs where the centroid is located at the origin. Here, we make use of the fact that if two graphs of size are isomorphic then their parent graphs G 1,N21 and G 2,N21 must differ in only two sites (call them b 1 and b 2 ): for i ¼ {1, 2}. If G 1,N21 and G 2,N21 are non-distinct, then G 1,N and G 2,N are trivially so. Consider the case where the parent graphs are distinct. Referring back to our Definition 1, the graphs are isomorphic iff 't [ SO(3) such that: which implies that 'n 1 [ G 1,N21 and n 2 [ G 2,N21 such that Search for this transformation t by searching for the image of b 1 in G 2,N and of b 2 in G 1,N .
To this end, sequentially choose n [ G 1,N 2 {b 1 } and construct a new graph G n ¼ G 1,N 2 {n}. Then, G i,N are isomorphic iff G n and G 2,N21 are isomorphic for some n [ G 1,N . Since the centroid of either of these does not coincide with the origin, we can check them for isomorphism by the usual procedure.
Once all the distinct species of a given size are identified, the backward reaction degeneracy can be computed using the principle of detailed balance.
where O( j, k) is the order of the symmetry group of species (j, k). A simple way to compute O( j, k) is to sum over all n in the previous equation. The right-hand side of the equation is already known in terms of the symmetry groups and forward degeneracies of the parent, while the left-hand side is 1/O( j, k) times the total number of ways (j, k) can decay, P n b j21;n j;k , which equals the size j minus the number of articulation points of (j, k). We used depth-first search to compute the articulation points, which eventually allows us to compute each individual degeneracy b j21;n j;k using Equation (7). An analogue of the previous equation for dimer -oligomer reactions allows us to compute b j22;n j;k from a j;k j22;n . A subset of the experiments were run using a pruned version of the ODE tree. We used a landscape approach similar to that of Endres and Zlotnick [6] to prune the assembly tree. The controlling parameter for the identification of intermediates to prune is the following probability-like quantity: Pð j; kÞ ¼ P l a j;k j21;l Pð j 2 1; l Þm j;k j21;l P k;l a j;k j21;l Pð j 2 1; l Þm j;k where m j;k j21;l is a control parameter used to assign relative weights to different reactions. One choice that, though ad hoc, has proven useful is to choose m ¼ 0.1 for reactions that proceed by a single subunit addition and 1 otherwise. We have tried to use a similar choice, but for a broader concentration regime, specifically those where the probability distribution among the intermediate states is expected to be more uniform.

ODE experimental design
We first carried out a series of experiments to determine how the efficiency of landscape pruning varies with the initial monomer concentration for a simple pathway model. These experiments were conducted allowing for monomer accretion reactions, such as that in Figure 1(a), but excluding dimer accretion reaction such as that in Figure 1(b), as well as interactions of higherorder oligomers as in Figure 1 We next conducted simulations to estimate the contribution of oligomer/oligomer interactions to pathway kinetics. For each of the following five concentration levels -10 mM, 100 mM, 200 mM, 500 mM and 1 mM -we conducted an additional simulation permitting reactions of dimers with arbitrary oligomers. That is, the simulations permit the reactions of both Figure 1(a) and (b), although, they still exclude higher-order reactions such as those of Figure  1(c). These simulations were likewise conducted with a free energy of binding of DG ¼ 2 3.5kcal/mole and were carried out by numerically integrating the ODE models using an adaptive step size embedded Runge -Kutta 4-5 method with relative error ¼ 10 29 .
For each of the 24 comparisons of pruned to unpruned models and five comparisons of monomer to monomer/dimer models, we examined the deviations in capsid concentrations vs. time between each pruned model and the full model. To perform this comparison, we define the quality factor (Q r ), a measure of error of the restricted model r with respect to the complete model c, as follows: The quality factor is meant to capture the average fractional deviation between the two models across the time to equilibration. We use a relative measure of deviation normalized by capsid count to account for the fact that deviations between the models tend to have a large proportional difference, but a small absolute difference, early in their execution and to converge to the same equilibrium late in their execution. Q r provides a sensitive measure of these large proportional changes early in simulations that would otherwise be difficult to detect. In the case of landscape pruning, the eventual equilibrium concentration depends on the model used, although, the difference is negligible under the experimental conditions examined here. In such cases, we defined the cut-off time T as the point at which the concentrations for the complete model had equilibriated to within 2%, i.e., ð½capsid eqb 2 ½capsid c Þ=½capsid eqb & 0:02, where [capsid] eqb is the equilibrium concentration of the fully formed capsid. This cut-off definition allows us to compare Q r values for different pruning schemes.

Stochastic models
Stochastic simulations were conducted using the simulator of Zhang et al. [33], which implements a SSA model [7] of capsid assembly. The SSA model allows us to test the scaling of pathway control to larger structures and larger pathway spaces than we can explore with the ODE model. In an SSA model, the assembly system proceeds by the activation of successive reaction events. For the models constructed here, each reaction event consists of either the binding of two structures to form a larger structure or the breakage of a structure into two smaller substructures. Binding is controlled by a local rule model [1], which specifies allowed reactions through a set of 'local rules' that define how each subunit can bind to its neighbours in any capsid or substructure. The model allows for any binding reaction of either monomers or oligomers provided the product of the reaction is a subset of a correctly formed capsid. So, for example, an analogous SSA model of a dodecamer would allow all of the reactions depicted in Figure 1(a) -(c). The model does not, however, allow for the production of malformed structures. Reverse reactions are allowed by breaking any single bond that results in a separation of a current assembly into exactly two sub-assemblies. The simulator disallows reverse reactions that would require the simultaneous breaking of two or more binding interactions.
For the present simulations, we used a 60-subunit icosahedral model originally described in Ref. [34], representing a full T ¼ 1 capsid assembly system at the monomer level. This model constructs icosahedrally symmetric 60-mers from an assembly subunit with three binding sites, two that bind to copies of one another to form pentamers of coat protein (capsomers) and one that binds copies of itself to link together capsomers to form the complete icosahedron. We refer to the former as intra-capsomer interactions and the latter as inter-capsomer interactions. We chose this model in part because it allows us to explore a model size beyond what we can feasibly simulate with the ODE approach and in part because the asymmetry of the assembly subunits makes it possible to independently vary two binding free energies to produce distinct productive assembly pathways, a capability that was previously exploited in Sweeney et al. [26].
All icosahedral assembly simulations were run with a fixed population of 6000 coat monomers. Binding kinetics in the model are parameterized by four rate constants: a forward binding rate for binding interactions within pentamers (the intra-capsomer association rate, k aþ ), a reverse binding rate for interactions within pentamers (the intra-capsomer dissociation rate, k a2 ), a forward binding rate for binding interactions between pentamers (the inter-capsomer association rate, k rþ ), and a reverse binding rate for interactions between pentamers (the intercapsomer dissociation rate, k r2 ). k a2 and k r2 were fixed at 10 23 for all simulations. The two forward rates were varied independently to produce a total of 378 simulations, spanning a range of 18 inter-capsomer binding rates (k rþ from 10 22.8 to 10 1.2 in factors of 10 0.2 ) and 21 intra-capsomer binding rates (k aþ from 10 22.6 to 10 0.8 in factors of 10 0.2 ). These values were chosen based on an analysis of data from Sweeney et al. [26] indicating that this region would include examples of all three simple pathways (monomer, dimer and pentamer) and extend into the kinetically trapped region in which the three pathways merge with one another. Each simulation was run for sufficient time to reach a pseudo-equilibrium, as verified by manual examination. Numbers of monomers, dimers, pentamers and 60-mers were recorded vs. simulator time across each simulation.
For each of the 378 simulations, we assessed fractional usage of the three pathway types based on the total mass fraction of the major assembly intermediate of each pathway (monomers for the monomer pathway, dimers for the dimer pathway and pentamers for the pentamer pathway), integrated over the course of the simulation. While all three pathways would be expected to exhibit some transient appearance of each of the three key intermediates, each should have a dominant presence of a single intermediate until the pool of small oligomers is exhausted into capsids or large kinetically trapped forms. Integrated mass fractions of monomers (C 1 ), dimers (C 2 ) and pentamers (C 5 ) were computed by the following formulas: where h ji is the count of assemblies of size j at time step i and T i is the simulator time at time step i. T 50 is the amount of simulator time required to reach 50% of the final yield for a given simulation, which we use as a normalizing constant to correct for varying timescales of different assembly reactions. We assessed fractional usage of the three pathways (P 1 for monomer fraction, P 2 for dimer fraction and P 5 for pentamer fraction) as follows:

ODE simulation results
We first examined the complexity of pathway space in the ODE model allowing assembly only by accretion of successive monomers. Figure 2 shows the relationship between quality factor Q r and degree of pruning for three concentration values. At all concentrations, we observe increasing Q r and thus, decreasing quality of fit, as we prune increasingly more common intermediates. The effect is more pronounced the higher the concentration, suggesting that in high concentration domains a larger number of intermediates and pathways contribute significantly to the overall assembly kinetics. Note that the absolute magnitude of the deviation is small at all points, consistent with the observations of Endres and Zlotnick [6]. The highest total deviation observed, for a 21-intermediate model at 1 mM concentration, is approximately 4.5%. Furthermore, under all parameter conditions examined here, the final equilibrium concentration was not noticeably different between pruned and full models. Rather, deviations appear to predominantly take the form of large relative shifts in concentrations early in the assembly process. The difference in Q r values between high and low concentration conditions appears to be predominantly a result of a slower rate of convergence of the pruned models with respect to the unpruned model when concentration is higher, as opposed to a greater amount of separation prior to convergence. Figure 3 illustrates these early deviations, showing the relative deviation as a function of time for a high-Q r condition (1 mM, Figure 3(b)), with a low-Q r condition (10 mM, Figure 3(a)) provided for comparison. Collectively, the data show that highly pruned models do provide very good fits over a broad parameter range, although they often produce large proportional deviations early in assembly. Furthermore, the deviations they produce vary significantly with assembly conditions, increasing approximately 20-fold between Figure 2. Variation in quality factor Q r as a function of degree of pruning for the ODE model. The figure shows three curves, representing three simulation concentrations (10 mM, 100 mM and 1 mM), with each curve plotting Q r across a range of pruning parameters. The X-axis is labelled with pruning parameter used, as well as the number of intermediates (from 73 possible) retained with that pruning parameter value.
a typical in vitro assembly concentration (10 mM) and a value not far above likely in vivo concentrations (1 mM). The minimal effect of pruning away the less stable intermediates on the kinetics at late stages of assembly is to be expected as the ODE model can be reduced to a linear ODE model whose slowest relaxation rate is bounded from above by the decay rate of the most stable intermediate. Introducing unstable intermediates with large decay rates would not affect the slowest relaxation rate appreciably. In that sense, kinetics in the late stages is primarily controlled by the equilibrium concentration. If a pruned model does not change the equilibrium distribution appreciably, it will display similar relaxation kinetics to the complete model. The value of the fully nonlinear ODE model is, then, for studying the kinetics at the early stages. As we examine early time scales, the unstable, fast decaying intermediates contribute more and more to the kinetics and any pruning strategy is bound to be inaccurate. This observation is clearly seen in Figure 3. The effects of pruning at extremely early times (, 10 units of time as opposed to the time scale over which the system equilibriates, , 10 5 units of time) are almost identical for the 10 mM and 1 mM concentrations. The difference lies only in the rate at which the fractional errors decay in time. Note that the actual units of the time scale are arbitrary, since the shape of the curves depends on the ratio of the forward and reverse rates and the time axis can be scaled arbitrarily without altering that ratio.
While, the ODE approach does not allow us to feasibly examine the full ensemble of oligomer/oligomer pathways, we can estimate the effects of pruning out oligomer/oligomer reactions by adding in the subset of dimer/oligomer reactions. Figure 4 shows deviations between simulation progress by monomer reactions only vs. those allowing monomer and dimer reactions as a function of concentration, as assessed by the Q r measure. The figure shows a rapid rise in Q r from 10 to 100 mM, followed by a more gradual increase across the rest of the concentration range. We attribute the value at 10 mM to the fact that that simulation is below the critical concentration of the system and therefore exhibits minimal growth under either model. The other data points show that pruning of oligomer/oligomer pathways does produce a noticeable error in results, although this error like those above, is small in absolute magnitude and large predominantly at the earliest stages of assembly.

Stochastic simulation results
We next examined how these results would apply to a more complicated but realistic system, defined by a 60-subunit T ¼ 1 stochastic model allowing for independent variation of two distinct binding rates. We assessed pathway usage in this system by monitoring concentrations of key intermediates over time across the simulation. Figure 5 shows mass fractions of the three intermediates -monomer, dimer and pentamer -for the T ¼ 1 capsid model as we vary intracapsomer and inter-capsomer binding rates over a parameter range of 3.4 orders of magnitude in inter-capsomer binding rate and 4 orders of magnitude in intra-capsomer binding rate. Figure 5(a) shows the mass fraction of monomers for each point in parameter space, Figure 5(b) the mass fraction of dimers and Figure 5(c) the mass fraction of pentamers.
Collectively, the images confirm the existence of the three distinct pathway regions identified in Sweeney et al. [26]. In the bottom left of each image, corresponding to low rates of binding for both interaction types, we see simulations dominated by monomers with minimal appearance of dimers or pentamers, as we would expect for the monomer pathway. In the upper left, corresponding to a high rate of inter-capsomer binding and a low rate of intra-capsomer binding, we see simulations dominated by dimers, as we would expect for the dimer pathway. In the lower right, corresponding to high intra-capsomer binding rate and low inter-capsomer binding rate, we see simulations dominated by the appearance of large numbers of pentamers, as we would expect for the pentamer pathway.
The majority of Figure 5 does, however, appear to show significant hybrid use of multiple pathways. The upper left half of each figure shows significant usage of both monomers and dimers, while the lower right half shows significant usage of both monomers and pentamers. The figure therefore, supports the conclusion that there are sizable border regions between discrete pathway domains in which neither pathway alone provides a valid description of the overall pathway space. We observe a steady, continuous shift in mass fractions of the intermediates as we interpolate between different pathway domains, as opposed to a sudden phase transition. It is therefore difficult to define exact boundaries of the border regions. We can somewhat arbitrarily define a border region to be a region of parameter space in which the dominant intermediate type has less than an 80% mass fraction of all three. Using that definition, we find that the border region between monomer and dimer pathways has a width of approximately 1.5 orders of magnitude change in either binding rate and the border between monomer and pentamer pathways has a width of approximately 1.0 orders of magnitude. The more rapid shift from monomer to pentamer vs. monomer to dimer may reflect the comparatively high effective reaction order of pentamer formation relative to dimer formation. The two borders also have very different slopes, approximately 0.4 for the monomer/dimer border and 1.7 for the monomer/pentamer border. We attribute the 0.4 slope to the competition between second-order dimerization needed by the dimer pathway and fifth order pentamer formation needed for the monomer pathway. The 1.7 slope may be explained by the balance between the fifth-order effective rate of pentamer formation, needed by both monomer and pentamer pathways, vs. the third order formation of trimer-of-pentamer nuclei needed by the pentamer pathway. The nature of the model type makes it computationally infeasible to rigorously determine how many intermediates are actually necessary to the model, as we did in the ODE case. Nonetheless, we can reasonably expect that regions in which two small oligomers coexist at high concentrations during most of the assembly will be characterized by an exponential explosion in the number of distinct reactions being used, due to the numerous ways we can get from nucleus to complete capsid by combinations of additions of two sub-assemblies. The figure thus supports the conclusion that border regions are both sizable and are characterized by high pathway complexity. It is possible that the wide border regions in the figure in part reflect the fact that the simulations were conducted close to the region in which kinetic trapping becomes significant, approximately corresponding to the upper limits of concentrations examined in the ODE simulations. Based on the analysis of Sweeney et al., we would expect this region to correspond approximately to assembly at likely in vivo concentrations of subunits (roughly 500 mM). We can observe kinetic trapping becoming prominent in the upper right portions of the images, where all three intermediate types begin to appear with high frequency. We do not observe any appreciable change in the width of the border regions over the parameter range examined here, suggesting that a similarly wide border may extend much further into the lowrate domain than we can feasibly examine. Nonetheless, we cannot be certain that the border region does not become thinner at lower rates more typical of in vitro concentrations. Computational resources required for a simulation increase rapidly with reduced binding rate beyond the region shown here because nucleation rates fall with the third order or fifth order of binding rate, depending on the pathway used. A high-precision examination of the parameter space beyond that shown here to more likely in vitro values would therefore not be possible with current computational tools.

Discussion
We have used a combination of ODE and stochastic models to explore how pathway complexity varies across a space of binding rate parameters for simple models of virus capsid assembly. Simulations reveal that pathway complexity can vary substantially with relatively small changes in model parameters. ODE simulations using a simplified dodecamer model show that the number of intermediates, and thus pathways, we must consider to achieve high model accuracy substantially increases in high vs. low concentration domains. Likewise, the contribution of dimer/oligomer reactions to the assembly kinetics goes up by a significant fraction as concentration varies from levels typically used in vitro to levels likely to be feasible at sites of virus assembly in vivo. Stochastic models allow us to test how pathway complexity behaves in a more complicated model defined by a 60-mer icosahedron with two independently varying binding rates. In this system, we can map out a pathway space that exhibits not only several distinct simple pathways, but also broad border regions where hybrids of the simple pathways appear to operate. It therefore appears that in a large fraction of the parameter space, no single pathway operates in isolation. Such border regions can be expected to require a large number of distinct reactions in order to adequately describe their kinetics.
This work may also have relevance beyond capsid assembly. Our observations of the ease of shifting pathways between in vitro and in vivo models may help in interpreting in vitro assembly studies of many important self-assembling systems in biology. The same questions are also of relevance in understanding how we might prototype, model and optimize assembly reactions for self-assembling nanotechnology. The methods developed here may also be more broadly applicable to modelling complex reaction systems. We have developed an efficient algorithm for testing planar graph isomorphism for a general class of chemical species that works in time O À N 2 * min{N; N 1=2 capsid } Á and shown its utility for efficiently enumerating the pathway space of a complicated assembly model. This method can be extended to include structures with multiple types of bonds, which is one of the directions for further exploration using the ODE model. A similar method may also prove useful for hybrid discrete/stochastic models, a possible avenue for overcoming some of the limits of each of the two model types examined here.