Dynamics of icosahedral viruses: what does Viral Tiling Theory teach us?

We present a top-down approach to the study of the dynamics of icosahedral virus capsids, in which each protein is approximated by a point mass. Although this represents a rather crude coarse-graining, we argue that it highlights several generic features of vibrational spectra which have been overlooked so far. We furthermore discuss the consequences of approximate inversion symmetry as well as the role played by Viral Tiling Theory in the study of virus capsid vibrations.


Introduction
It has been experimentally observed that viruses can alter their shape to fulfill specific functions. In particular, they may swell during maturation [1][2][3][4][5][6], twist to release their genetic material during infection, or morph during assembly. Such large scale conformational changes are consistent with the widespread hypothesis that viruses do vibrate, and it is therefore of interest to study their dynamics with the help of mathematical and computational techniques which have been tried and tested in the context of biomacromolecule vibrations (see [7] for a review).
Normal mode analysis is one such method [8][9][10][11], which has been successfully applied to the study of proteins and a variety of viruses to date [12][13][14][15]. A major challenge is the huge number of degrees of freedom involved in such systems. Several degrees of coarse-graining, as well as group theoretical methods (inspired by their successful application to small molecules and fullerenes [16][17][18]), have been implemented in computer simulations in order to extract information on the low-frequency modes of vibration which are thought to be relevant for protein and virus function [19,20]. Although such theoretical data become increasingly available for icosahedral systems [14,15,21] thanks to advances in computer power, a clear and insightful vibrational pattern across icosahedral viruses has not emerged yet. The art of coarse-graining is a delicate one, as it is often argued that excessive coarse-graining produces a dynamical picture that has little to do with reality. We actually need a hierarchy of coarse-grained calculations, which hopefully reveal complementary aspects of the dynamical jigsaw.
We argue here that even the crudest approximation, where each capsid protein is treated as a point mass located at its centre of mass, is helpful in highlighting dynamical features that are present in more sophisticated normal mode analyses, but have been overlooked so far. Our initial mathematical motivation was to assess to which extent Viral Tiling Theory, a recently proposed model for icosahedral viral capsids which solves a classification puzzle in the Caspar-Klug nomenclature [22,23], provides a new insight in the dynamics of viruses. In particular, we ask whether there is a correlation between the vibrational patterns of viruses with a given number of coat proteins and their viral tiling.
The paper is organised as follows. In Section 2, we briefly describe Viral Tiling Theory in the context of the viral capsids RYMV (T = 3), HK97 (T = 7ℓ) and SV40 (pseudo T = 7d), with emphasis on how the underlying icosahedral symmetry manifests itself in different subtle ways for these three cases. In particular, it has implications for the group theoretical analysis of normal modes of vibrations. An expanded version of these remarks, applicable to viruses and phages of all T numbers, is available in [24]. Section 3 provides a simple normal mode analysis for the three capsids above, where group theoretical techniques reminiscent of those used in calculations of vibrational modes of small molecules are implemented. This paves the way for the more extensive study performed in [25], which reveals an intriguing universal pattern of low frequency normal modes. We conclude with some open questions prompted by our investigations.

Tilings of Rice Yellow Mottle, Hong-Kong 97 and Simian
Virus 40 Viral tiling theory provides an elegant way of encoding the icosahedral symmetry of viral capsids by keeping track of the location of coat proteins and the orientation of capsomers on the viral shell, while also keying in the dominant 3 bond structure between those proteins. The Rice Yellow Mottle Virus (RYMV) belongs to the Sobemovirus genus. It is classified as a T = 3 virus in the Caspar-Klug labelling system [26], and its icosahedral capsid accommodates 180 coat proteins or subunits which are clustered in 12 pentamers around the 5-fold axes and 20 hexamers about the 3-fold global symmetry axes of the icosahedron. The location of the proteins are consistent with a triangular tilingà la Caspar-Klug, and each triangular tile encodes trimer interactions between coat proteins, as represented in Fig. 1. The HK97 bacteriophage on the other hand has a T = 7ℓ capsid made of 420 proteins arranged in 12 pentamers and 60 hexamers, with four types of dimer interactions modelled by rhomb prototiles; see Fig. 2. The SV40 virus is a member of the Polyomaviridae family and has a pseudo T = 7d capsid which accommodates 360 coat proteins organised in pentamers through two types of spherical prototiles, namely rhombs, encoding two types of dimer interactions, and kites encoding trimer interactions, as represented in Figure 2 of reference [27]. SV40 is an example of an all-pentamer capsid, for which the Caspar-Klug classification is not applicable, and whose symmetries are captured by Viral Tiling Theory.
In order to extract qualitative features of vibrational patterns from viral capsids, we restrict ourselves to a coarse-grained approximation where each capsid protein is replaced by a point mass whose location coincides with the centre of mass of the protein considered. This centre of mass is calculated by taking into consideration all crystallographically identified atoms of the protein, according to data stored in the Protein Data Bank or equivalently the VIPER website. We then assess how much deviation there is between the above distribution of point masses and a theoretical distribution exhibiting a centre of inversion. On the basis of the experimental data available to us, we argued in [24] that the SV40 capsid has an approximate centre of inversion, while RYMV 4 and HK97 do not. This has subtle consequences for the grouptheoretical properties of normal modes of vibrations: when the capsid exhibits an effective centre of inversion, the group involved is the full icosahedral group H 3 with 120 elements (usually called I h in the science literature), while it is reduced to its subgroup I of 60 proper rotations in the absence of a centre of inversion.
A viral capsid with N 'point mass' coat proteins has 3N degrees of freedom, and hence 3N modes of vibrations, of which 6 are associated with 3 rotations and 3 translations of the capsid as a whole. These are therefore not genuine normal modes of vibration. Group theory accounts for the degeneracies of these vibrational modes, and provides a mean to organize the normal mode spectrum of a given capsid [28]. A key ingredient in this exercise is the construction of the displacement representation of the given capsid, which is a reducible representation of H 3 or I according to whether the distribution of capsid proteins exhibits a centre of inversion or not. Such representation consists of 120 (resp. 60) matrices Γ displ 3N (g), g ∈ H 3 (resp. I) of size 3N × 3N , which encode how proteins are interchanged under the action of each element g, as well as how the displacements of each protein from the equilibrium position are rotated under the action of g. The latter information is gathered in 3 × 3 rotation matrices R(g) which form an irreducible representation of H 3 (resp. I), while the former is encoded in permutation matrices P (g) of size N × N , so that we have The permutation matrices P (g) act on vectors whose components are the vector positions r 0 i , i = 1, .., N of the N proteins at equilibrium. The entry P ij (g) of the permutation matrix is 1 if r 0 j is mapped on r 0 i by g, and is zero otherwise. Once the displacement representation is constructed, it remains to invoke the well-known property that it can be written in block diagonal form with the help of a (3N × 3N ) matrix where the multiplicities n p are obtained via the following character formula The characters χ p (g) of irreducible representations of the icosahedral group can be found in [24], while the characters of the displacement representations χ displ (g) are obtained by inspection of the displacement representation considered. Note that, in view of the very definition of the permutation matrices P (g) given in the previous subsection, and the fact that the characters of a representation are the traces of its constituent matrices, one has χ displ (g) = Tr (P (g)) Tr (R(g)) = ±(number of proteins unmoved by g) · (1 + 2 cos θ), (2.4) where θ is the angle of the proper rotation associated with g, and the minus sign is taken when g ∈ H 3 \ I. So χ displ (g) is zero when θ = 2π 3 or whenever g is such that no protein of a given capsid is kept fixed under its action.
The decomposition of the displacement representation of a given capsid boils down to the knowledge of the coefficients n p in (2.3) which, in view of the expression (2.4), are non zero whenever at least one capsid protein is unmoved under the action of an element g (and θ = 2π 3 ). It can be shown that distributions of capsid proteins with no centre of inversion are such that the only group element which keeps any 'point mass' protein unmoved is the identity element g = e (and under its action, all N proteins are obviously fixed). The second expression in (2.3) thus yields where we used dim I = 60 and χ displ (e) = 3N (taking the plus sign and θ = 0 in (2.4)).
Recalling that χ p (e) = p, we arrive at the following decomposition formula, The number N of capsid proteins is always a multiple of sixty, N = 60k. In the many cases where the proteins are organised in 12 pentamers and a number of hexamers, k is the T -number of the Caspar-Klug nomenclature. Then, the number of non-degenerate normal modes in the singlet (symmetric) representation Γ 1 + is 3T , while the number of p-fold degenerate normal modes (corresponding to the p-dimensional representation Γ p + ) is 3p 2 T , for p = 3, 4 and 5. In particular, N = 180 for RYMV, and the displacement representation decomposes into The 6 non-genuine modes belong to two copies of the Γ 3 + irreducible representation. There are nine non-degenerate and forty-five 5-fold degenerate Raman active modes, as well as twenty-five 3-fold degenerate infrared active modes.
Since N = 420 for HK97, the displacement representation decomposes into and by the same argument as above, one arrives at twenty-one non-degenerate and one hundred and five 5-fold degenerate Raman active modes, as well as sixty-one 3-fold degenerate infrared active modes. The normal modes of the SV40 capsid would be organised according to the decomposition (2.6) with N = 360 if we were not taking into account that the protein distribution on the capsid exhibits an approximate centre of inversion. We would have Instead, we use the first expression in (2.3) and note that the distribution of 'point-mass' proteins on the capsid is such that, besides the identity element g = e in H 3 which leaves all N proteins unmoved, the fifteen rotations g 2 ) = 0. Taking into account that for these group elements, T r(R(g 0 g (i) 2 )) = −(1 + 2 cos π) = 1, we arrive at the following decomposition of the displacement representation, The six non-genuine modes of vibrations are confined to one copy of the 3-dimensional irreducible representation Γ 3 + , and one copy of the 3-dimensional irreducible representation Γ 3 − . There are twelve non-degenerate and forty-eight 5-fold degenerate Raman active modes, and fifty-two 3-fold degenerate infrared active modes .

Low frequency modes
Our calculation of the low frequency normal modes is based on a spring-mass model, where the N 'point-mass' proteins are connected by a network of elastic forces described by a harmonic potential which is manifestly rotation and translation invariant, The associated force matrix or 'Hessian' is given by (3.12) In the above formulae, the vector r 0 m refers to the equilibrium position of protein m and the vector r m of components r i m , to its position after elastic displacement, all vectors originating at the centre of the capsid. The masses of the proteins are all set to unity (reflecting the fact that the various protein chains in a capsid have masses which are too good approximation identical), and κ mn is the spring constant of the spring connecting protein m to protein n. The set of non-zero spring constants we choose, i.e. the topology of the elastic network we adopt, is dictated by the information derived from the association energies listed in VIPER for RYMV (1f2n.vdb), HK97 (2fte.vdb) and SV40 (1sva.vdb). Fig. 3 encodes the bonds provided by VIPER before acting on them with the icosahedral group in order to generate the complete spring network.  We have used the relative values of these energies, and therefore we are left with one parameter κ in the force matrix, which sets the overall scale of the vibration frequencies. We are not aware of any experimental measurements of association energies between capsid proteins for the viruses and phages we are considering, and the absolute theoretical values calculated in [29] must be taken with extreme caution.
The force matrices F ij mn we consider here have size 3N × 3N with N = 180 for RYMV, N = 420 for HK97 and N = 360 for SV40. Although computers can handle a brute force diagonalisation of such matrices, and provide eigenvalues which are the square of the sought frequencies of vibration of normal modes, a group theoretical approach reduces considerably the size of the matrices to be diagonalized and above all, yields information on the distribution of normal modes within irreducible representations of the icosahedral group. This proves to be useful in an analysis of universal features of such vibrations.
We have calculated the lowest frequency modes of vibration for the RYMV, HK97 and SV40 capsids using well-known group theoretical methods. The association energies listed in Viper for RYMV allow for a stable capsid. Crucial to the stability are the C-arms linking together distant proteins of the C chain in Fig. 3a. The spectrum of the first 40 low frequency modes is presented in Fig. 4a. Apart from the six zero modes associated with the rotations and translations of the capsid as a whole, and which belong to two copies of the irreducible representation Γ 3 + of I, one notices a cluster of 24 normal modes of very low and similar frequencies organized in a sum of irreducible representations according to Γ 5 + + Γ 3 ′ + + Γ 5 + + Γ 4 + + Γ 3 ′ + + Γ 4 + . This low plateau is disrupted by a significant jump in wave number.  . The triangular-shape modes £ belong to 3-dimensional irreducible representations Γ 3 + of the icosahedral group, while the triangular-shape modes △ belong to 3-dimensional irreducible representations Γ 3 ′ + . Accordingly, the diamond-shape modes belong to 4-dimensional irreducible representations and the pentagon-shape modes to 5-dimensional irreducible representations. The x-axis labels the normal modes while the y-axis gives the wave numbers in cm −1 (up to an overall normalisation which cannot be fixed from Viper data).
A similar analysis was performed for HK97. The association energies listed in Viper for HK97 allow for a nearly-stable capsid, with nine strictly zero modes instead of the six expected. The 21 subsequent modes have similar frequencies, as can be seen from Fig. 4b. They are organized in the following sum of irreducible representations of I: Γ 4 + + Γ 4 + + Γ 5 + + Γ 3 ′ + + Γ 5 + . If the spurious triplet of zero modes were lifted by the addition of extra bonds in the spring-mass model of HK97, one would again observe a cluster of 24 low frequency normal modes forming a plateau disrupted by a jump of the same scale as that appearing in RYMV. We have observed this phenomenon in a large number of viral capsids, and we will detail our findings in [25].
The SV40 case is particularly interesting because it does not quite fit with the above observations. As mentioned in Section 2, the viral capsid has a near centre of inversion, and one might want to explore the implications of treating the normal mode analysis with a symmetry-corrected 'point-mass' protein distribution. This, however, destabilizes the capsid, as the vertices of some triangular cells in the network become collinear. We will therefore refrain from considering a capsid with a centre of inversion, and perform the normal mode analysis as in the two previous cases (RYMV and HK97). Once more, we have plotted the low frequency spectrum in Fig. 5.  Fig. 4.
Apart from the six zero modes associated with the rotations and translations of the capsid as a whole, one could argue that the next 23 modes should be considered as a cluster since their frequencies are very similar. However the plateau in this case is not disrupted by a spectacular jump in frequency, as the seven subsequent frequencies are roughly 1.6 larger than the first 23 non-zero modes. Early comparison with Murine Polyomavirus vibrational patterns does not shed light on the significance of these all-pentamer viral capsids spectra, and more investigations are needed.

Conclusion
We have discussed the vibration spectrum of icosahedral virus capsids, obtained from a coarsegrained model in which protein chains and their interactions are replaced by a spring-mass model. The goal of this programme is to understand, in a top-down approach, how properties of the capsid structure, such as an approximate inversion symmetry or a particular tiling type, reflect themselves in the vibrational spectrum. We believe this a useful complement to existing bottom-up approaches, which are rooted in all-atom computations.
A comparison of our results with the spectra obtained in earlier all-atom computations reveals some interesting similarities. The most striking one is the existence of a low-frequency plateau of 24 modes, separated by a rather large gap from the remainder of the spectrum. This plateau is present for RYM as well as HK'97 and a large number of other virus capsids. It has been seen before in isolated examples [13,14,30], but the simplicity of our model offers a better chance to understand the general reason behind its existence (more details will be provided in [25]).
While Viral Tiling Theory provides a beautiful classification of the structure of virus capsids, its role in understanding the vibration spectra is at present less clear. Besides the bonds which bind together proteins on the same tile, many other bonds are required in order to obtain a stable capsid. These other, inter-tile bonds are often of a similar strength as the bonds on a single tile. In fact, it is an interesting mathematical problem to understand the best network topology (in terms of the optimal number of bonds) required for stability of a capsid.
The present analysis focuses exclusively on the viral capsid, ignoring in particular the interaction of the virion with its environment and the presence of matter within the shell, which are undoubtedly worth considering in more elaborated models. Large-scale simulations have revealed that some virus capsids are unstable without RNA content [21]. It would be interesting to understand this instability, as well as the effect of RNA content, for larger classes of capsids.