Protein structure characterization with mass spectrometry

Mass spectrometry is now commonly being used to determine both the primary and higher order structures of proteins. The basis for these investigations lies in the ability of mass analysis techniques to detect changes in protein conformation under differing conditions. These experiments can be conducted on proteins alone (with no modifying substance present) or in combination with proteolytic digestion or chemical modification. In addition to primary structure determination, proteases and chemical modification have long been used as probes of higher order structure, an approach that has been recently rejuvenated with the emergence of highly sensitive and accurate mass analysis techniques. Here, we review the application of proteases as probes of native structure and illustrate key concepts in the combined use of proteolysis, chemical modification, and mass spectrometry. For example, protein mass maps have been used to probe the structure of a protein/protein complex in solution (cell cycle regulatory proteins, p21 and Cdk2). This approach was also used to study the protein/protein complexes that comprise viral capsids, including those of the common cold virus where, in addition to structural information, protein mass mapping revealed mobile features of the viral proteins. Protein mass mapping clearly has broad utility in protein identification and profiling, yet its accuracy and sensitivity is also allowing for further exploration of protein structure and even structural dynamics.


Introduction
Protein mass mapping combines enzymatic digestion, mass spectrometry, and computer-facilitated data analysis to produce and examine proteolytic fragments.This is done for the purpose of identifying proteins and, more recently, for obtaining information regarding protein structure.For protein identification, sequence-specific proteases are incubated with the protein of interest and mass analysis is performed on the resulting peptides.The fragmentation pattern is then compared with the patterns predicted for all proteins within a database and matches are statistically evaluated.Similarly, the higher order structure of a protein can be evaluated when mass mapping techniques are combined with limited proteolytic digestion.
The sequence specificity of the proteolytic enzyme plays a major role in the application of mass spectrometry to protein structure.A sequence-specific protease reduces the number of fragments that are produced and, concomitantly, both improves the likelihood for statistically significant matches between observed and predicted fragment masses, and reduces the opportunities for spurious matches.Another factor, the accessibility/flexibility of the site to the protease, also plays an important role in the analysis of structure.In this instance, ideally, only a subset of all possible cleavages are observed owing to the inaccessibility and/or inflexibility of some sites due to higher order protein structure.An example Fig. 1.Illustration of the use of limited proteolytic cleavage as a probe of protein structure.The arrows mark surface accessible loops that would be susceptible to proteolytic cleavage.If a sequence specific protease were used, the marked sites would also have to contain the protease recognition sequence to sustain cleavage.Mass analysis of all proteolytic fragments together yields the cleavage "map" that provides information on structure.
is illustrated in Fig. 1, where arrows mark potential cleavage sites that are exposed on the surface of a hypothetical protein.Since amino acids with hydrophilic side chains are found in greater abundance on the surface of proteins (at the solvent interface), proteases that cleave at hydrophilic sites are preferred in structural analysis.Trypsin and V8 protease, which cleave basic (K, R) and acidic sites (D, E), respectively, are good choices.Reaction conditions must be controlled to produce only limited proteolysis so that the cleavage pattern reliably reports on native tertiary structure.The cleavage of a single peptide bond can destabilize protein structure, causing local structural changes or even global unfolding.Subsequent protease cleavage reactions would not inform on native structure.
Protein mass mapping can also be used to probe the quaternary structure of multicomponent assemblies such as protein-protein complexes, protein-DNA complexes and even intact viruses [3][4][5].The first application of limited proteolysis and MALDI mass spectrometry to the study of a multicomponent biomolecular assembly was performed by Chait and co-workers in 1995 [6].In their studies, this combined approach was used to analyze the structure of the protein transcription factor Max, both free in solution and when bound to an oligonucleotide containing its specific DNA binding site.The common feature when analyzing either protein-protein complexes or protein-DNA complexes is that the protease provides a contrast between the associated and unassociated states of the system.The formation of an interface between a protein and another macromolecule will protect otherwise accessible sites from protease cleavage and therefore provide information about residues that form the interface.

Recognizing conformational changes
Protein mass mapping can be used to recognize simple conformational differences between different protein states [7,8].For example, X-ray crystallography data has shown that the protein calmodulin (CaM) undergoes conformational changes in the presence of calcium [9,10].The tertiary structure of calmodulin consists of an overall dumbbell shape (148 amino acids) with two globular domains separated by a single, long central alpha-helix connector (amino acids 65-92).It has been proposed that calciumbinding activates calmodulin by exposing hydrophobic residues near the two ends of the central helix.Mass maps resulting from digests by trypsin, chymotrypsin, and pepsin all demonstrated that the protein had undergone a tertiary structural change in the presence of calcium [11].Figure 2 shows the trypsin digests of calmodulin in the presence and absence of calcium.Comparison of the two mass spectra reveals differences corresponding to cleavages in the central helical region of the protein.Based on Fig. 2. MALDI mass spectra of the trypsin digestion of calmodulin in the presence and absence of calcium.Differences are observed corresponding to cleavages within the central helical region of calmodulin (black dots) which are not observed in the presence of calcium (white dots) indicating a tertiary conformational change.Cleavage sites that are present in the Ca 2+ /calmodulin complex (black dots) and those that disappear upon addition of Ca 2+ (white dots) are shown in both the spectra and the ribbon drawing.
the results of this relatively simple experiment, it can be appreciated that structural changes caused by Ca 2+ binding within the dumbbell domains are propagated to the central helix, as manifested as altered protease reactivity within this latter structural feature.
Electrospray ionization mass spectrometry has also been used to monitor protein folding and protein complexes [12].Early in the use of electrospray it was recognized that some proteins exhibit a distinct difference in their charge state distribution which was a reflection of their solution conformation(s).For example, two charge state distributions are shown in Fig. 3 which depicts a protein's less-charged native form and the more highly charged denatured conformer.The difference between the spectra is due to the additional protonation sites available in the denatured form.This phenomenon is demonstrated for the protein fibronectin [13] where the charge distribution is maximized at the lower charge states ( 7+), and the denatured protein has a distribution maximized at a higher charge state (10+).There is growing recognition of the view that proteins function through a diverse range of structural states, from highly ordered globular structures to highly flexible, extended conformations.ESI-MS is a simple but highly sensitive and informative method to characterize the functional shape(s) of proteins (i.e., globular or extended) prior to more material intensive and time consuming spectroscopic or crystallographic studies.
The use of hydrogen/deuterium (H/D) exchange to study conformational changes (Fig. 4) in proteins or protein/protein interactions has been primarily performed using ESI-MS, although some studies have employed MALDI-MS [13,14].The concept of this approach is relatively simple in that amide protons within the portion of the proteins in close inter-or intra-molecular contact may form hydrogen bonds and will have different exchange rates relative to other more accessible regions of the complex.By monitoring this amide hydrogen exchange, information can be gained on the noncovalent structure of a protein by itself or in a protein complex.It should be noted that while it is not possible through ESI-MS to monitor exchange in a residue-specific manner, populations of protein molecules with distinct masses Fig. 3. Electrospray mass analysis can be used to distinguish between native and denatured conformers of a protein, denaturing a protein can often enhance ionization by dramatically increasing the number of sites available for protonation.The data shown represent both the native and denatured conformers of the fibronectin module.Adapted from [13].can be distinguished.The combined application of ESI-MS with NMR spectroscopy to monitor protein folding reactions using HD exchange is particularly powerful.For example, Dobson and co-workers [15] characterized in detail transient protein folding intermediates for the protein egg white lysozyme using this dual approach.This was the first demonstration of discreet although transient intermediates during a protein folding reaction.Importantly, these key findings were made possible only by complementing the more traditional NMR H/D exchange approach with data from ESI-MS.

Protein mass mapping of a protein/protein complex
The cell cycle regulatory proteins, Cdk2 and p21-B (Fig. 5) have been examined using protein mass mapping which exploits the high mass accuracy, resolution, and sensitivity of MALDI-MS [3,4,[16][17][18][19][20][21][22][23][24][25][26][27].Given that these proteins have a known sequence and the enzyme, trypsin, a known sequence specificity, the mass measurement readily identifies the exact proteolytic fragment within each individual protein's sequence.First, proteolysis reactions are performed for one component before and after formation of a multi-protein assembly (Fig. 5).Proteolysis reactions for the complex are then performed and reaction products for both experiments are analyzed using MALDI-MS.The results obtained from the protein mass mapping experiments on the p21-B/Cdk2 complex are summarized in the histograms shown Fig. 5. Probing protein/protein interactions using proteolysis and MALDI-MS.Schematic view (left) of key concepts.Two cleavage sites are accessible for the protein of interest alone (top), yielding five fragments after limited digestion.In the complex with protein X, one site is protected (bottom), yielding fewer fragments.However, fragments from protein X are also produced (Xf).Actual results represented as a histogram are shown on the right which indicate the "region of interaction" of p21-B with Cdk2 in a 24 amino acid segment which was later confirmed through crystallographic data on a homologous system (inset on right). in Fig. 5. MALDI analysis of the tryptic fragments of p21-B was generated in the presence and absence of Cdk2 and revealed a segment of 24 amino acids in p21-B that is protected from trypsin cleavage, thus identifying the segment as the Cdk2 binding site on p21-B.Another segment within p21-B, near the NH 2 terminus, was also known to be important for the function of p21-B as an inhibitor of cell division.The identification of the Cdk2 binding site, near the COOH-terminus, as described above, allowed the NH 2 -terminal region to be identified at the cyclin A binding site.This established the concept that p21-B binds both components of Cdk/cyclin complexes.The concepts illustrated in this simple example can be extended to much more complex systems, allowing insights into tertiary and quaternary structure to be obtained using extremely small amounts of material.In studies of multi-subunit protein complexes, the resolution of MALDI-MS may not be sufficient to resolve all proteolytic fragments.In these cases, selective isotope labeling can be employed to identify the fragments of individual subunits.
In addition, high-throughput mass spectrometric protein complex identification (HMS-PCI) has recently been used to identify protein-protein interactions in yeast [28].Such studies have traditionally been accomplished using the yeast two-hybrid system, a typically low-throughput technique that often results in artifacts.In HMS-PCI, carefully selected bait proteins are epitope-tagged and attached to a column, and affinity-purification is carried out using whole cell lysates.One-dimensional SDS-PAGE is then used to separate the resulting immunoprecipitants, stained bands are excised and digested, and mass analysis is performed.Due to the likelihood of multiple proteins being present in a single stained band, LC/MS/MS is used to analyze the proteolytic peptides.In addition, automated LC/MS/MS allows for high throughput analysis.Following analysis, a network "map" can be created to illustrate various relationships.Figure 6 depicts several hypothetical interactions that could exist between a network of proteins.For example, one protein may interact with several downstream partners, resulting in a wide array of effects.These experiments are also clearly valuable for illustrating the dual functionality of proteins; for example, many proteins involved in the DNA repair process have also been shown to have roles in DNA replication [28].In addition, it appears that HMS-PCI can be used to identify protein complexes from a range of subcellular compartments including the cytoplasm, nucleus, plasma membrane, and mi-Fig.6. Hypothetical protein interactions.High-throughput mass spectrometric protein complex identification (HMS-PCI) is being used to characterize protein interactions and to elucidate molecular pathways.
tochondria.These studies are being conducted as part of a systematic approach to proteomics that will help define protein networks and pathways involved in cellular functions.

Time-resolved protein mass mapping of viruses
Since viral capsids represent the non-covalent quaternary association of protein subunits, viral analysis has been a logical step in the development of protein mass mapping techniques [29,30].For instance, cleavage sites which reside on the exterior of the virus should be most accessible to a proteolytic enzyme and therefore be among the first digestion fragments observed.Since proteolysis is performed in solution, different conformers can be detected; hence, this method can contribute to an understanding of the dynamic domains within the virus structure.
Limited proteolysis/MALDI-MS experiments have been performed on human rhinovirus 14 (HRV14) and flock house virus (FHV) [5,31].HRV14, a causative agent of the common cold, is a member of a family of animal viruses called the picornaviruses.Other members of this family include the polio, hepatitis A, and foot-and-mouth disease viruses.The HRV14 virion consists of an icosahedral protein shell, or viral capsid, surrounding an RNA core.The capsid is composed of 60 copies each of four structural proteins, VP1-VP4.Based on crystal structure data, VP1, VP2 and VP3 compose the viral surface while VP4 lies interior at the capsid/RNA interface.Like HRV14, FHV is also a non-enveloped, icosahedral, RNA animal virus.The mature protein coat or capsid is composed of 180 copies of β-protein and γ-peptide (Fig. 7).
As a means of mapping the viral surface, time-resolved proteolysis was performed on HRV14 and FHV and was followed by MALDI-MS analysis.It was expected that the reactivities of virus particles to different proteases would reveal the surface-accessible regions of the viral capsid.Indeed, cleavages on the surface-accessible regions were generated as anticipated; however, cleavages internal to the viral capsids were also generated.Observation of digestion fragments resulting from "internal" protein regions was initially perplexing.After further examination of these results alongside X-ray crystallography data, it Fig. 7.A non-enveloped icosahedral virus with a portion of the capsid proteins and RNA magnified above the virus.In protein mass mapping experiments, viruses undergo limited proteolysis followed by mass analysis of the proteolytic fragments.Time-resolved proteolysis allows for the study of protein capsid mobility.was determined that portions of the internalized proteins, such as VP4, are transiently exposed on the surface of the virus (Fig. 8).These experiments clearly demonstrate the dynamic nature of the viral capsid.
Limited proteolysis of intact viruses offers a complementary approach to the inherently static methods of crystallography and electron microscopy.In addition, experiments such as these have revealed dynamic structural changes of proteins in solution; a finding that may fundamentally alter the way we look at viruses.However, these observations are consistent with the events that shape the viral life cycle (cell attachment, cell entry, and nucleic acid release), a life cycle which demands a highly mobile viral surface that can accommodate structural changes.

Chemical cross-linking and chemical modification
Other methods of probing higher-order structure include investigating the chemical reactivity of individual amino acids in a protein and chemical cross-linking studies.Both approaches stem from the fact that MALDI has proven to be a powerful method with which to characterize covalent post-translational modifications.Using such simple modification chemistries such as acylation or succinylation, Glocker et al. have shown that there is a clear correlation between the relative reactivity of specific amino acids and their accessibility to the protein surface (and solvent) [32].Chemical cross-linking studies consist of treating proteins with cross-linkers prior to digestion whereupon adjacent protein regions or protein subunits are covalently attached to one another.The resulting proteolytic fragments are good indications of the overall tertiary and/or quaternary protein structure.However, the complexity in deconvoluting the proteolysis results obtained in such studies can be overwhelming.Here again, selective isotope labeling of individual subunits can be used to simplify interpretation of mass data for cross-linked fragments originating from different polypeptide chains.
More common than structural studies, chemical cross-linking has been employed to determine the stoichiometry of oligomers with MALDI/MS analysis (Fig. 9).MALDI-MS analysis of a complex before cross-linking allows for the determination of the molecular mass of the individual proteins.A crosslinking agent, such as glutaraldehyde, that reacts primarily with the ε-amino group of lysine to form cross-linking chains is added and then the reaction can be halted by the addition of the MALDI matrix.A simple illustration of such reactivity is shown for a viral protein (Fig. 10) after being exposed to acetylation resulting in limited chemical reactivity and the potential for characterizing the surface by determining the sites of modification.

Conclusion
The examples presented in this manuscript illustrate the utility of combining proteolysis as well as other chemical modification methods with MS analysis in structural studies of proteins.It is important to note that the key concepts of the methods are straightforward and the probing reactions are simple to perform.In the early stages of structural studies, the MS-based probing methods are particularly well-suited to provide rapid access to low-resolution maps that are then used to guide subsequent high-resolution studies.This stage may also be an end-point in some investigations where the simple identification of interacting residues is the desired information.As a complement to high-resolution structural information (either from X-ray crystallography or NMR spectroscopy), probing studies have already been shown to provide valuable and startling insights into protein structural dynamics and rearrangements.In the context of whole proteome analysis, mass spectrometry is the only technique that offers the opportunity to obtain structural information on minute quantities of samples using automated procedures.As a first step in whole proteome analysis, high throughput proteomics techniques, such as multidimensional protein identification technology (MudPIT), are being developed that will provide a global view of an organism's proteome [33].These techniques can be combined with those described herein to provide further details regarding the structure and function of individual proteins within a proteome.As illustrated by the use of HMS-PCI, mass spectrometry can also be combined with affinity chromatography to provide information on protein-protein interactions on a proteome-wide scale.These studies can eventually be used with interaction mapping methods to elucidate the individual residues that are involved in these protein complexes.

Fig. 4 .
Fig. 4. Theoretical mass spectra of two different populations of proteins, for example, native and denatured.The middle peak represents a theoretical protein folding intermediate this is partially folded.

Fig. 9 .
Fig.9.Example of how cross-linking of a noncovalent complex of proteins can alter the MALDI mass spectrum and provide the number of subunits in the complex.

Fig. 10 .
Fig. 10.Chemical reactivity to investigate protein topology, a MALDI mass spectrum of a chemically modified viral protein from an intact virus, the sites of reactivity provide information about the surface exposed topology of the virus.