Site-specific IR spectroscopy and molecular modelling combined towards solving transmembrane protein structure

Membrane protein structures are underrepresented in structural databases despite their abundance and biomedical importance. This review focuses on the novel method of site-specific infrared dichroism (SSID) combined with constraint molecular dynamics simulation, which has recently emerged as a powerful method to obtain structures of transmembrane α-helical bundles. The theory of SSID including its latest developments is reviewed with the aim to encourage widespread application of this method. This is followed by an outline of the conformational search using experimentally constraint molecular dynamics simulations. Finally a critical evaluation of recent applications, namely the Influenza M2 proton channel, the vpu ion channel of HIV-1 and the MHC-class II associated invariant chain, is conducted.


Introduction
Membrane proteins are estimated to comprise about 30% of most genomes and account for 70% of known pharmaceutical drug targets.Despite their importance, at the time of writing 82 unique membrane protein structures of sufficient resolution to identify secondary structure elements are available [51] in the Protein Structure Database compared to more than 23,000 structures of soluble proteins (incl.virus capsids).The discrepancy between the number of known transmembrane protein structures and the number of known soluble protein structures is due to experimental difficulties, i.e. the difficulty to crystallise membrane proteins for X-ray crystallography and the large size of protein/lipid or protein/detergent aggregates, which exceeds the size limit for solution state NMR spectroscopy.There is clearly a need in our post-genomic era for new methods to gain more structural information about membrane proteins.The most promising novel methods, which have emerged during the last decade, are solid-state NMR spectroscopy and site-specific infrared dichroism in combination with molecular dynamics simulation.Solid-state NMR uses highly aligned lipid bilayers on glass slides with peptides, in which a single residue is labelled with 15 N or a randomly oriented lipid bilayer/peptide sample utilising the technique of magic angle spinning, in which two residues are labelled differently, e.g. with 15 N and 13 C. Solid state NMR methods have been reviewed elsewhere [11,32].This review will focus on the method of sitespecific infrared dichroism (SSID), which has been developed by Arkin [2,25], in combination with a conformational search protocol developed by Adams using molecular dynamics simulations [1].Infrared (IR) spectroscopy has been used successfully to determine the secondary structure content of proteins as well as solvent accessibility of residues due to hydrogen/deuterium exchange (reviewed in e.g.[18,19]).The amide I peak exhibits a small shoulder due to 13 C labelling.
Automatic secondary structure analysis using neural network approaches is emerging as a useful tool for structural proteomics [22,23].A unique advantage of IR spectroscopy over X-ray crystallography is that protein dynamics can be studied using difference spectroscopy [4].In particular for membrane proteins IR spectroscopy is the method of choice as the lipid environment does not perturb the resolution or sensitivity of the spectra.In fact, the lipid molecules can be analysed in addition to the protein embedded.A typical spectrum of an α-helical transmembrane peptide in dimyristoyl-phosphatidylcholine (DMPC) lipids is shown in Fig. 1.Various absorption bands can be assigned to molecular vibrations as indicated in Fig. 1A.The protein vibrations most commonly analysed are the amide I mode, mainly the C=O stretch vibration, and the amide II mode, mainly an N-H bending mode with a contribution from C-N stretching (Fig. 1B).Membrane protein studies often employ the technique of attenuated total reflection (ATR) spectroscopy, in which lipid samples containing proteins are deposited on a planar surface of a diffraction element.Although ATR-IR spectroscopy has been extensively reviewed in the last decade [9,17,43], we will give a short introduction into ATR as it is essential to understand site-specific dichroism (SSID).This will be followed by a short treatment of SSID, an overview of molecular dynamics simulation with orientational constraints and finally applications of these methods to biological examples will be reviewed.

Attenuated total reflection IR spectroscopy
Contrary to transmission spectroscopy attenuated total reflection (ATR) IR spectroscopy measures the absorption of material attached to the surface of a diffraction element, while the beam of radiation is captured in the diffraction element by total internal reflection.Typical experimental setups for membrane proteins use a trapezoidal plate, in which the infrared light beam is captured by multiple total internal reflections (Fig. 2).At the points of total internal reflection light penetrates into the adjacent medium by a short distance, the evanescent field, and an absorption spectrum can be measured.The plate is often made of germanium, but also ZnS, ZnSe, KRS-5 or silicon is used.The geometry shown in Fig. 2 is the one most commonly used for lipid membrane samples, although a variety of ATR accessories including single reflection devices are available.Using a polarizer an absorption spectrum can be measured at IR light polarised parallel to the plane of incidence and perpendicular to the plane of incidence as defined in Fig. 2. For each particular absorption band a dichroic ratio R can be calculated as the ratio of absorption at parallel polarisation A parallel and perpendicular polarisation A perpendicular : The dichroic ratio is related to the order parameter S [16]: whereby α is the angle between the molecular director and the transition dipole moment of the particular vibration mode and E x , E y , E z are the electric field components of the evanescent field given by Harrick [21] assuming that the thickness of the deposited film is much larger than the penetration depth of the evanescent field (ca. 1 µm for Ge): where n 21 is the ratio of the refractive indices between the sample (n 1 = 1.43 for a lipid bilayer) and the diffraction element (n 2 = 4.0 for germanium).The angle of incidence ϕ between the diffraction element and the infrared beam is typically set at 45 • .The angle α between the molecular director and the transition dipole moment is 90 • for the symmetric and antisymmetric stretch vibrations of aliphatic lipid chains and also for the C=O stretch vibrations of the lipid ester group [15].The amide I transition dipole moment is oriented at α = 38 • to 39 • [35,49], although lower values have been reported.
In case of transmembrane helical peptides the order parameter S is defined as follows: whereby β is the angle between the helix and the z-axis, which coincides with the membrane normal.In conventional analysis of ATR spectra Eq. ( 4) is used to calculate the helix tilt angle against the membrane normal.However, it should be stressed, as pointed out by Arkin et al. [2], that the helix tilt obtained in this way is only a maximum value, assuming that the protein sample is completely ordered.The actual helix tilt can range from 0 • to β dependent on the sample order.

Site-specific infrared dichroism
Site-specific infrared dichroism (SSID) uses the fact that the frequency of molecular vibrations is sensitive to the mass of the participating atoms.Thus introducing an aminoacid residue where the carbonyl group is isotopically labelled, e.g. 13 C= 16 O, results in a shift of the corresponding absorption frequency for that residue.In case of a peptide composed of the transmembrane region of glycophorin A, a shift of 44 cm −1 towards lower wave numbers from the carbon-12 amide I absorption band located at 1657 cm −1 has been reported [2].Recently a number of other labels have been developed namely the 13 C= 18 O carbonyl group, which results in a shift of 64 cm −1 towards lower wave numbers [46], the double deuterated glycine [45] and the deuterated methyl group -CD 3 of alanine [47].The deuterated labels have the advantage, that the symmetric and antisymmetric stretch vibrations can be analysed separately thus requiring only one label to deduce helix tilt and orientation in the presence of sample disorder.Another advantage is that the vibrations of the deuterated labels are not coupled to nonlabelled groups, while the vibration of carbonyl group labels might couple to the unlabelled peptide backbone carbonyl groups thus reducing the accuracy of the derived geometrical parameters.However, the 13 C= 16 O and 13 C= 18 O labels have clearly visible bands in the absorption spectrum, while the absorption bands of deuterated labels are difficult to detect.That is the reason, why most applications to date have used the carbonyl group labels.Especially the oxygen-18 containing carbonyl bond has an absorption band in a transparent region of the spectrum, while the 13 C= 16 O appears as a shoulder of the unlabelled amide I peak (Fig. 1B) and mathematical band sharpening techniques have to be used to resolve this absorption band.The theory of SSID has been developed by Arkin et al. [2] originally Fig. 3. Definition of orientational parameters of a transition dipole moment P in an α-helix.The helix is tilted against the z-axis by the angle β, the transition dipole moment's rotational orientation with respect to the helix axis is given by the rotational pitch angle ω, while the angle between the helix axis and the transition dipole moment is given by α.The angle θ between the transition dipole moment and the z-axis is a function of ω and β at constant α, this parameter is used as orientational constraint in molecular dynamics simulations.modelling the sample order as a fraction f of the sample that is completely ordered, while the rest of the sample (1 − f ) is at random orientation.Recently this theory has been developed further with regards to sample disorder assuming a more realistic Gaussian distribution of a peptide α-helix around a particular tilt angle [25].
The theory of SSID establishes a relationship between the measured dichroic ratio R defined in Eq. ( 1), the geometry of a polypeptide α-helix as defined in Fig. 3 and the orientation of a particular transition dipole moment vector P. The α-helix is described by a helix tilt angle β and the distribution around this tilt angle characterised by the standard deviation σ.The orientation of the particular transition dipole moment of the labelled site is described in addition by the rotational pitch angle ω (Fig. 3), which is arbitrarily defined as zero, when the transition dipole moment lies in the same plane with the helix director and the z-axis.The measured dichroic ratio of the unlabelled helix R Helix for a particular sample 1 can then be expressed as (for detailed equations see Appendix): ( 5 a ) while the dichroic ratio of the site R site1 is additionally dependent on the rotational pitch angle ω: If the adjacent aminoacid residue in the sequence is labelled, the rotational pitch angle of this residue is given by ω + 100 • for a standard α-helix [39], thus the dichroic ratios for a second peptide with a label in an adjacent position are given by: Equations ( 5a)-(6b) form a nonlinear system of four equations with four unknowns, β, ω, σ 1 and σ 2 which can be solved with standard mathematical techniques.Equations ( 5) and ( 6) correspond to Eq. (A.5) and (A.2) in the Appendix.This analysis taking into account a Gaussian distribution of the helix tilt, described by σ, around a mean tilt angle presents a significant advantage over the earlier method, which assumed that a fraction of the sample was ordered, while the rest was completely disordered.The standard deviation σ yields a new experimental parameter, which can be related to the mosaicity of the membrane preparation and thermal fluctuations of the transmembrane helices [25].However, the experimental applications so far have made use of the fractional sample order model.

Conformational search and constraint molecular dynamics simulations
Because of the necessity to satisfy hydrogen bonding requirements in the apolar lipid environment, the transmembrane part of proteins has a well defined secondary structure.Transmembrane protein structures can be classified in beta-barrel and helical bundle structures [51].So far SSID has been applied to helical bundle structures, which occur frequently in animals as well as in lipid enveloped animal viruses.Small bitopic transmembrane proteins forming oligmerising α-helical bundles are conceptionally well defined structures and can be described by three structural parameters, the helix tilt β, the helix crossing angle Ω and the helix rotation φ (Fig. 4), which defines the sides of the helices interacting with each other [3].This leads to a relatively straightforward conformational search algorithm [1] in which a symmetrical helical bundle (as shown in Fig. 4 for a dimer) is generated and a rotation to all helices is applied simultaneously between φ = 0 • and φ = 360 • .The crossing angle is set at Ω = 25 • for lefthanded helical bundles and at Ω = −25 • for right-handed bundles.The helix rotation is typically varied in steps of 10 • giving raise to 36 × 2 = 72 structures.In molecular dynamics simulations the initial atom velocities can determine the outcome of the simulation; for that reason each structure should be simulated several times with different atom velocities assigned randomly.Four simulations with different random velocities are commonly used giving raise to 72 × 4 = 288 structures, which are subjected to a simulated annealing molecular dynamics simulation and energy minimisation protocol in vacuo.Clusters of similar structures are identified on the basis of the root mean square deviation of the peptide backbone coordinates not being larger than a value in the range of 1.0 to 2.0 Å.For each cluster an average structure is calculated, energy minimised and subjected to the same simulated annealing/energy minimisation protocol used in the conformational search.
Orientational data from SSID is incorporated into the forcefield for molecular dynamics (MD) simulations and energy minimisation as additional energy terms in the form of: where δ represents the actual angle and δ 0 the target angle.The helix tilt angle and the angle between the C=O bond and the z-axis for all labelled residues are constraint in this way.Constraints between pairs of atoms and external axes are not standard in most MD simulation programs, but energy terms similar to Eq. ( 7) have been incorporated into the simulation software GROMACS from version 3.1 onwards [7,31].

Biological applications
So far six different applications of SSID combined with MD simulations have been reported leading to transmembrane structures of several biomedical important systems, namely a structure for the M2 proton channel from the Influenza A virus [26], a structure for the transmembrane domain of vpu from HIV-I virus [27], the structure of the CM2 transmembrane domain from Influenza C virus [28], a structural model of human phospholamban [44], a structure of the transmembrane domain of human CD3-ζ [48] and a structure of the human MHC class II-associated invariant chain [29].In the following paragraphs three of the applications will be reviewed in greater detail, because each introduces a novel aspect of SSID application.

The M2 proton channel from Influenza A virus
M2 is a 97 aminoacid bitopic transmembrane protein located in the lipid envelope of the Influenza A virus.M2 forms a tetrameric proton channel and participates in the virus uncoating process after virus uptake by endocytosis as well as in the budding of newly made virus particles in later stages (reviewed in [30]).The method of SSID has been applied to a synthetic peptide containing the transmembrane domain of the sequence SSDPLVVAASIIGILHLILWILDRL. Two different peptides have been synthesised with one 13 C= 16 O labelled alanine residue each in the positions shown underlined.The SSID analysis gives a helix tilt of β = 32 • and the rotational pitch angle about the helix axis for Ala29 is ω = −60 • .A global conformational search has been carried out without orientational energy refinement terms and with energy refinement terms.The results of this conformational search are striking with respect to the energy distribution of individual structures.In absence of orientational energy refinement terms all structures of the conformational search show little differences in energy, while with energy refinement terms the structures separate in higher and lower energy structures, while one structure has a significantly lower energy than all others; its structure is also generated from the largest number of substructures forming a cluster of similar structures.The structure is shown in Fig. 5A with the residues Ala29, Ser31 and His37 highlighted.It is instructive to compare this model with the M2 structure determined by solidstate NMR shown in Fig. 5B.In this work a variety of sites have been labelled with 15 N at their amide group and orientational data has been obtained by solid-state NMR of highly aligned samples [50].In addition the distance between 15 N π labelled His37 and 13C γ labelled Trp41 has been determined by magic angle spinning of a randomly oriented sample [37].This structure is fully experimentally defined including the orientations of the side chains His37 and Trp41, while other side chain orientations have been taken from a rotamer library.It can be seen that the orientation of the highlighted residues in Fig. 5 is similar defining the interacting residues of the helices and the channel lining residues.However, the SSID model (Fig. 5A) shows a stronger coiled coil structure implicating strong interaction of the helices throughout their length.This is clearly an artifact of the in vacuo simulation procedure, in which a constant interhelical distance restraint of 10.5 Å has been applied to prevent the bundle from falling apart.In conclusion the similarity of the SSID model based on labelling of only two sites with the NMR structure is convincing, yet the in vacuo modelling does not adequately approximate the structure in a native lipid bilayer.A conformational search in a lipid bilayer/water environment would improve the model.

The vpu protein from HIV-1 virus
The 81-residue vpu protein belongs to the auxiliary proteins of the human immunodeficiency virus type 1 (HIV-1) [10,42].The bitopic transmembrane protein contains a single membrane spanning segment and forms homooligomers of at least four subunits according to SDS gel electrophoresis studies [33].The 5 residues short N-terminal domain is responsible for virus particle release and the C-terminal cytoplasmic domain causes the degradation of one of the HIV-1 coreceptor molecules, CD4 [40].The transmembrane domain has been studied independently and ion channel activity for monovalent cations has been observed in Xenopus oocytes and in planar lipid bilayers [13,41].The SSID analysis has been applied to a peptide of the sequence MQPIQIAIVALVVAIIIAIVVWSIVIIEYRK, using two peptides each containing a 13 C= 16 O double label at positions Val13, Val20 and Ala14, Val21 (labels shown underlined).The rationale for double labelling is that in a standard α-helix two residues 7 positions apart have approximately the same rotational pitch angle, i.e. ω i = ω i+7 − 20 • , thus increasing the signal of the 13 C= 16 O absorption band with respect to the background with some sacrifice of accuracy.SSID analysis yields a helix tilt of β = 6.5 • and the rotational pitch angle for the label pair Val30/Val20 is ω = 283 • .The conformational search applying orientational constraints given in Eq. ( 7) does not lead to a distinction of structures in terms of energy.By inserting Eq. (A.13) into Eq.( 7) it can be shown that the energetic discrimination between different rotational pitch angles is weak at low tilt angles: Plotting the energy against the rotational pitch angle ω (Fig. 6) shows that for low helix tilt angles the energetic discrimination between different rotational pitch angles is very weak.However, conducting the conformational search and analysing the resulting structures for the rotational pitch angle, yields a unique structure with ω close to the experimental data.As the oligomerisation number of vpu was unknown, simulations for tetrameric, pentameric and hexameric bundles have been performed and only the pentameric assembly gave a structure whose rotational pitch angle was close to the experimental data.This structure shown in Fig. 7A reveals a slightly tilted left-handed coiled coil.Interestingly the potential ion channel pore is occluded by Trp22, while the only hydrophilic residue Ser23 points to the lipid phase and the hydroxyl group is hydrogen-bonded to the carbonyl backbone atoms.
The pentameric oligomerisation has been confirmed recently by an intriguing experiment chemically linking vpu transmembrane peptides to a carrier template assembling a tetramer or pentamer of transmembrane helices [6].Ion channel measurements in planar lipid bilayers led to the conclusion that the native state corresponds to a pentameric assembly.However, a recent study by solid state NMR utilising uniformly 15 N-labeled peptides of residues 2-30 with a C-terminal GGKKKK attachment to aid solubility reports a helix tilt angle of 12 • for residues 8-16 and 15 • for residues 17-25 with a slight kink located at Ile17 [38].Furthermore the helix rotation is markedly different to the model proposed on the basis of SSID data; the residue Trp22 pointing inside the pore in the SSID model and points into the lipid phase in the NMR structure (Fig. 7B).Several reasons could account for this discrepancy, namely the difference in the sequences studied, i.e. the NMR structure contains a C-terminal attachment not found in naturally occurring viral strains as well as a Tyr29-Gly mutation, the different lipid systems used, i.e. dimyristoylphosphatidylcholine (DMPC) for the SSID study and a mixture of dioleoylphosphatidylcholine (DOPC)/dioleoylphosphatidylglycerol (DOPG) for the NMR study.Unsaturated chains adopt different conformations than the saturated aliphatic chains in DMPC and could thus favour a different structure of the embedded peptide.Furthermore strong electrostatic interactions between the negatively charged DOPG lipids and the C-terminal GGKKKK attachment can be expected.The small kink in the helix at Ile17, which is actually in between two 13 C=O labels, should not obscure the SSID analysis as to give a structure of a completely opposite helix orientation and different tilt angle.

The MHC class II-associated invariant chain
The class II major histocompatibility complex (MHC) plays an important role in the human immune defence system presenting peptide fragments of intruding organisms at the surface of B-cells.The MHC class II-associated invariant chain (Ii) is involved in the pathway of MHC maturation and peptide loading [8].The 279 residue Ii contains a single transmembrane domain and associates to trimers after biosythesis [34].The trimeric complex serves as a scaffold for the binding of three MHC class II αβ heterodimers forming an (αβ) 3 Ii 3 complex, which is exported through the endocytic pathway.In later stages Ii is progressively degraded and remaining fragments are finally released from MHC until the loading with antigenic peptides occurs [12].With a truncated version of Ii containing the first 80 residues it has been shown that a patch of hydrophilic residues in the transmembrane domain is important for trimerisation [5].
The transmembrane domain has been modelled with a peptide containing residues 29-60 of Ii with the sequence RGALYTGVSVLVALLLAGQATTAYFLYQQQGR, each underlined residue containing a 1-13 C= 18 O label amounting to 10 different peptides.This represents the first example, where the peptide has been labelled over the whole transmembrane region [29].The infrared spectroscopic data confirmed the predicted α-helical structure, in particular the position of the 13 C= 18 O absorption bands indicated that each labelled residue is in an α-helical environment.The results of the SSID analysis are presented in Table 1.There is considerable variation in the local helix tilt β ranging from 6 • to 25 • with an average of 13 • ; this indicates that the helix is not a straight cylinder, but shows local deviations from the ideal α-helix geometry.The structure as a result of constraint MD simulations (Fig. 8) forms a left-handed coiled coil; the residue Gln47 implicated in trimer formation forms strong interhelical contacts.Thr50 points to the inside of the trimeric coil and forms a network of hydrogen bonds, while Tyr33 and Leu43 are also involved in stabilising the trimer structure.

Conclusions and outlook
It has been shown that SSID can make a significant contribution to the structural elucidation of transmembrane proteins.The applications so far cover bitopic oligomerising transmembrane proteins, rather than larger proteins with multiple membrane spanning domains.Given that peptides have been used successfully to dissect the structural domains of voltage gated ion channels e.g.[20], application to larger integral transmembrane proteins are expected.The site specific labels used so far have been aminoacids labelled at their carbonyl groups with 13 C= 16 O or 13 C= 18 O.The 13 C= 16 O label is limited in its application, because of the 1% natural abundance of 13 C; in a protein of 100 aminoacid residues half of the signal would be expected to originate from naturally occurring 13 C carbonyl groups at random positions.The 13 C= 18 O therefore enables us to study proteins of any size.The same applies to the CD 2 labelled glycine and the -CD 3 labelled alanine, which are also predicted to find increased use in SSID applications.Recently the theory of SSID has been developed towards the analysis of β-sheet proteins [36] and we expect applications to expand in the field of transmembrane β-sheet proteins.
Another field of improvement are the constraint molecular dynamics simulations.It has been discussed above that the in vacuo simulations provide significant limitations for the outcome of the final structure in the way that helix distance constraints as well as constraints to maintain helix geometry have to be imposed.Furthermore the delicate interactions between the protein and lipid molecules with hydrophilic headgroups and hydrophobic acyl chains cannot be modelled in the framework of an in vacuo approach.It is common practice to simulate transmembrane peptides in lipid bilayers e.g.[14] and we predict that with increasing computer power and resolution of methodological difficulties the global conformational search outlined above will be conducted in a lipid bilayer system with water and ions added.
The ongoing work in this field has been supported by the Royal Society (Grant no.22605) and by the BBSRC (grant no.88/B19450).

Appendix
This section gives the relevant equations for the application of SSID; for a detailed mathematical treatment the reader is referred to [2,25].
In the ATR configuration the dichroic ratio R can be expressed with the electric field components E x , E y , E z and the integrated absorption coefficients K: For the site-specific dichroic ratio the integrated absorption coefficients depend on the rotational pitch angle ω of the labelled site: The angular brackets denote averaging over all possible tilt angles using a probability distribution function F (β): (A. 3) The probability F (β) of finding a helix with tilt angle β is given by a Gaussian distribution with mean tilt angle µ and standard deviation σ: In the α-helix the non-labelled transition dipole moments are distributed around the helix, therefore the integrated absorption coefficients are not dependent on the rotational pitch angle ω.The dichroic ratio is then given by: The averaged absorption coefficients are given again by integrating over all possible tilt angles: .6)

Fig. 1 .
Fig. 1. (A) Infrared spectrum of a transmembrane peptide derived from the Influenza M2 protein reconstituted in dimyristoyl-phosphatidylcholine lipids, with characteristic absorption bands indicated.(B) The amide I and amide II region expanded.The amide I peak exhibits a small shoulder due to13 C labelling.

Fig. 2 .
Fig. 2. Schematic drawing showing a trapezoidal ATR plate for multiple reflection ATR.The electric field vectors of the incident infrared beam are shown for parallel polarisation and perpendicular polarisation.The grey pattern denotes the sample deposited on top of the ATR plate.

Fig. 4 .
Fig. 4. Structural parameters of a transmembrane helical bundle: the helix tilt angle β, helix rotation φ and the helix crossing angle Ω. Please note that only for a dimer as shown here the helix crossing angle is twice the helix tilt angle.

Fig. 5 .
Fig. 5. Structure of the M2 transmembrane domain.Ala29, Ser 31 and His37 are highlighted.(A) The structure of the model generated by constraint MD simulation based on SSID data [26].(B) The structure determined by NMR [37].The figure was created with VMD [24].

Fig. 6 .
Fig. 6.The refinement energy E for the angle between the carbonyl bond and the z-axis in dependence of the rotational pitch angle ω for different helix tilt angles β.Calculated from Eq. (8), setting k dichro = 800 kcal/grad 2 and θ 0 = 21 • .

Table 1
Rotational pitch angle ω and local helix tilt β, for each13C=18O labeled Ii transmembrane peptide