An analysis of amino acid sequences surrounding archaeal glycoprotein sequons

Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.


Introduction
The covalent attachment of polysaccharides represents one of the most prevalent post-translational modifications proteins experience.In N-glycosylation, glycan moieties are linked to Asn residues found as part of NXS or NXT motifs referred to as sequons, where X is any residue apart from proline, although rare exceptions to these rules exist (Gavel andvon Heijne 1990, Kaji et al. 2003).Although essential for N-glycosylation, the presence of a sequon is not sufficient for modification, given the large number of unmodified sequons in both glycosylated and non-glycosylated proteins (Ben-Dor et al. 2004, Petrescu et al. 2004).
To delineate the rules determining whether modification of a given sequon occurs, researchers have considered numerous and varied properties of the target protein.These include the amino acid composition and secondary structure of the sequon and its surroundings, the number of sequons in the protein, their position(s) relative to the protein termini or to other sequons in cases where multiple potential N-glycosylation sites exist.Accordingly, studies on eukaryal glycoproteins have pointed to the importance of the character of the amino acids occupying several positions both upstream and downstream of Asn residues experimentally verified to be glycan-bearing.Based on consideration of the residues neighoring those eukaryal glycoprotein sequons experimentally shown to be modified, Ben-Dor et al. (2004) proposed a series of rules designed to predict whether a given sequon is modified.Similarly, a survey of some 500 nonredundant glycoproteins listed in the PDB database revealed an increased occurrence of aromatic residues at the +2 position following sequon Ser/ Thr residues and that N-glycosylation often transpires at sites where changes in secondary structure occur (Petrescu et al. 2004).Other studies reported that the presence of a Pro residue immediately following a sequon most often hinders glycosylation (Mellquist et al. 1998).In addition, distinctions can be made between the rules governing the glycosylation of NXS and NXT sequons.For example, glycosylation occurs less frequently at NXS than at NXT sequons (Bause andLegler 1981, Kasturi et al. 1995), Furthermore, NXS glycosylation efficiency is affected by the character of the residues at the X position and at the +1 positions following the sequon serine (Shakin-Eshleman et al. 1996, Mellquist et al. 1998).
Once thought to be exclusively restricted to Eukarya, it is now clear that prokaryotes are also capable of N-glycosylation (Lechner andWieland 1989, Szymanski andWren 2005).In addressing N-glycosylated proteins in Campylobacter jejuni, the only bacterium for which a full N-glycosylation pathway has been described (Szymanski et al. 1999), it was observed that modification occurs at NXS/T sequons, where X is any residue apart from proline, as in Eukarya (Nita-Lazar et al. 2005).Although essential for bacterial N-glycosylation, the presence of a sequon is not sufficient for Asn modification.In a recent detailed analysis of C. jejuni glycoproteins (Kowarik et al. 2006), it was noted that only those Asn residues found as part of a D/ EYNXS/ T motif, where X and Y are not proline, experience glycosylation.By contrast, glycosylation of a given sequon in yeast did not require a negatively charged reside at the -2 position.Moreover, earlier analysis of eukaryal N-glycosylation sites revealed that Asp and Glu are disfavored at this position (Ben-Dor et al. 2004).Such differences in the sequence requirements of eukaryal and bacterial N-glycosylation likely reflect differences between eukaryal and bacterial oligosaccharide transferases (OST), responsible for transferring the lipid-linked glycan moieties to sequons in the target protein (Kelleher andGilmore 2006, Weerapana andImperiali 2006).In Eukarya, OST comprises a multisubunit complex at the heart of which lies the Stt3 protein in yeast and its homologues in other species.In Bacteria, the Stt3 homologue PglB is thought to be solely responsible for OST activity (Wacker et al. 2002).
Although prokaryal N-glycosylation was first described in Archaea (Mescher andStrominger 1976, Lechner andSumper 1987), still little is known of either the extent of such modification or the rules governing the process in this form of life.Indeed, while the ability to glycosylate proteins has been demonstrated in numerous species by a variety of experimental methods, the modification of glycoprotein Asn residues has been directly demonstrated in only a limited number of cases (Eichler and Adams 2005).
In this study, archaeal glycoprotein sequences were considered in an attempt to unmask any putative contribution to Asn glycosylation made by residues found in the positions bordering modified sequons.Defining such N-glycosylation sequence requirements in Archaea carries significant implications for understanding how the enzymes involved in the process (in particular the archaeal Stt3 homologue) operate.In addition, selected archaeal genomes were analyzed in an effort to predict the extent of N-glycosylation in Archaea because such information could provide insight into the relationship between post-translational modification and the ability of archaeal polypeptides to remain properly folded in the face of extreme physical conditions that would normally lead to protein denaturation, loss of solubility and aggregation.

Predicted N-glycosylated archaeal secretory proteins
A database of proteins secreted by ten archaeal species (Aeropyrum pernix, Archaeoglobus fulgidus, Halobacterium sp.NRC-1, Methanococcus jannaschii, Methanothermobacter thermoautotrophicus, Pyrococcus abyssi, Pyrococcus horikoshii, Sulfolobus solfataricus, Thermoplasma acidophilum and Thermoplasma volcanium) was created as described previously (Bardy et al. 2003).Briefly, predicted protein-encoding genes from each organism were analysed using SignalP v.2.0 (www.cbs.dtu.dk/services/SignalP-2.0), designed to detect the presence of signal peptides.In these analyses, protein sequences were examined using the hidden Markov model with truncation set to 70 amino acids, trained on the different data sets available (Eukarya, Gram-positive or Gram-negative Bacteria).Proteins predicted as containing signal peptides by at least one data set were then subcellularly localized by Psort (Nakai and Horton 1999;www.psort.nibb.ac.jp).Only those proteins considered to be secreted in Gram-positive organisms or localized to the periplasm or outer membrane of Gram-negative organisms (listed in Bardy et al. 2003) were scanned for the presence of sequons.

Sequence profile of regions bordering modified sequons in archaeal glycoproteins
Amino acid residues near sequons could influence N-glycosylation by modulating access to the oligosaccharide transferase active site or the affinity of such interactions, by affecting sequon conformation or by providing sequon residues with partners for hydrogen bonding or other associations that could interfere with N-glycosylation.Toward determining whether any of these potential scenarios are applicable to archaeal N-glycosylation, the ten amino acids both proceeding and following archaeal sequons experimentally verified as glycanbearing (Table 1) were considered, as were the amino acids surrounding unmodified and proposed, but as yet uncharacterized, sequons in the same proteins.At each amino acid position, the relative frequency of a representative of a particular amino acid group was noted.In this study, amino acids were first clustered based on their being non-polar (AGVLIPF-MWC), uncharged and polar (QNSTY), positively charged (RKH) or negatively charged (DE).Where warranted, the amino acid clusterings adopted by Ben-Dor et al. (2004), designed to further divide residues along chemical property lines, were also considered.In these instances, the groups of the first clustering set were MLIV, TSC, FYW, AG, KR, DE, QN, P and H, whereas the groups comprising the second clustering set were DE, KRH, QNST, AMLIVFWY, C, G and P. In the latter sets of clusterings, the nonpolar and the uncharged, polar amino acids were further subdivided according to their chemical properties (e.g., hydrophobic, aromatic, small in size).
A search of the literature (June, 2006) uncovered 27 experimentally verified archaeal sequons distributed between ten proteins from five species, including halophiles (Halobacterium salinarum, Haloferax volcanii), methanogens (Methanothermus fervidus, Methanococcus voltae) and a thermoacidophile (Sulfolobus acidocaldarius).The evidence for Asn glycosylation in these cases came either from amino acid sequencing (Lechner and Sumper 1987, Lin and Tang 1990, Brockl et al. 1991, Mengele and Sumper 1992) or from mass spectrometry approaches (Paul et al. 1986, Zahringer et al. 2000, Voisin et al. 2005).Examination of the sequons themselves revealed that, as in Eukarya and Bacteria (Gavel andvon Heijne 1990, Nita-Lazar et al. 2005), the X position immediately downstream of every glycosylated Asn residue was never a Pro.In one third of the sequons, however, Ser or Thr were found at the X position.By contrast, the same residues account for only 17% of the amino acid content of these glycoproteins, apart from the sequon and bordering 10 residues.When the residues bordering glycosylated sequons were considered, several general traits were observed.In contrast to what has been reported for C. jejuni, where the sequon motif was recently expanded to include a necessary Asp or Glu at the -2 position upstream of the sequon Asn residue (Kowarik et al. 2006), negatively charged residues were detected only at the -2 position in the archaeal sequences in one of the 27 exam-ples considered.By contrast, nonpolar residues were found in this position 48% of the time, whereas uncharged polar residues were detected in 37% of the cases.The remaining examples presented positively charged residues at this position.At the +1 position following the sequon, either Ala or Gly were detected in 37% of the instances, although the same residues account for only 18% of the amino acid content outside the sequon and bordering residues, thus suggesting a preference for a small amino acid at the +1 site.The presence of Pro at the +1 position did not prevent glycosylation, as demonstrated by thermopsin peptide 2, although it should be noted that this is the only example where Pro is detected at this position.Further from the sequon, possible amino acid preferences were noted at the -4 position, where charged residues were detected only once but Ser and Thr were detected 41% of the time, and at the -8, -9 and -10 positions, where nonpolar residues were noted in 81%, 56% and 67% of the sequences, respectively.Specifically, Ala and Gly were observed at position -8 in 48% of the cases and at position -9 in 33% of the cases.Again, since Ser and Thr together and Ala and Gly together account for 17% and 18% of the amino acid content outside the sequon and its up-and downstream flanking 10 residues, respectively, the character of the -4, -8, -9 and -10 positions appears to be an enrichment in a particular type of residue.
Insight into archaeal N-glycosylation can be derived not only from analysis of the environment of modified sequons but also by addressing the surroundings of sequons shown to contain unmodified Asn residues.In the set of sequences examined in the present study, only Asn-17 of the H. salinarum S-layer glycoprotein was verified as unmodified despite being part of an NXT sequon (Lechner and Sumper 1987).In contrast to the 27 glycosylated archaeal sequons, where aromatic residues were never observed at the +1 position, the position immediately after 17 NYT 19 is occupied by a Tyr residue.Given the apparent preference of the glycosylation machinery for smaller residues at this site (see above), it is thus possible that the presence of a bulky Tyr at this site hinders glycan attachment.When the amino acids proceeding (NDYQRFNENT) and following (YSTASEDGKT) this H. salinarum S-layer glycoprotein sequon were considered, no marked differences from the corresponding residues surrounding modified sequons were evident.Similarly, no obvious patterns were apparent when the amino acid composition of the upstream and downstream regions surrounding the remaining 56 proposed, but as yet unverified, sequons in the same proteins was considered (Table A1).Nonetheless, no Pro residues were detected at the X position in any of these sequons, whereas at the +1 position, only nine sequences presented an aromatic residue and just one sequon offered a Pro at this site.Thus, it is likely that many of the additional sequons are glycosylated.
In addition to addressing questions of amino acid sequence, examining the distibution of modified sequons may also provide details on the N-glycosylation process in Archaea.Such information could offer insight into the ability of the archaeal oligosaccharide transferase to process sequons lying in close proximity to each other, thereby reflecting spatial considerations of the enzyme's activity.In the present study, such analysis revealed that, despite their proximity, both Asn residues within the 308 NSSATNTS 315 section of the H. volcanii S-layer glycoprotein are modified (Mengele and Sumper 1992).Similary, Asn-72 and Asn-77 of M. voltae FlaB2, again separated by four residues, are both modified (Voisin et al. 2005).In Eukarya, the closest glycosylated pair of naturally occurring Asn residues, found in haptoglobin-1, are three residues apart (Kurosky et al. 1980, Gavel andvon Heijne 1990).At present, no information on the acceptable proximity of Asn for dual glycosylation tolerated by the C. jejuni N-glycosylation machinery is available.
In terms of the distribution of modified sequons within the glycoprotein, only the M. voltae flagellins were considered, since 14 of the 15 putative N-glycosylation sites have been characterized (Voisin et al. 2005).In the four proteins considered, i.e., FlaA, FlaB1, FlaB2 and FlaB3, N-glycosylation sites were distributed throughout the proteins and showed no preference for any particular secondary structural region.

The environments of occupied archaeal NXS and NXT sequons are distinct
Several studies have pointed to the differential processing of NXS and NXT sequons in eukaryal glycoproteins (Kasturi et al. 1995, Shakin-Eshleman et al. 1996, Mellquist et al. 1998, Ben-Dor et al. 2004).When the two classes of sequons found in the archaeal glycoprotein sequences considered in the present study were examined, it appeared that the residues surrounding experimentally verified NXS sequons (n = 14) revealed traits distinct from the residues surrounding experimentally verified NXT sequons (n = 13).As reflected in Figure 1, the two sequon classes could be distinguished at the X position, with Ser or Thr being detected in 50% of the NXS sequons, but in only 15% of the NXT sequons.By contrast, the X position of NXT sequons corresponded to Gly in 31% of the sequences, but in only 14% of the NXS sequons.At the -2 position, Ser or Thr were detected in 46% of the NXT sequons, but in only 7% of the NXS sequons.At the +1 position, NXS sequons incorporated Gly at the X position 36% of the time, whereas Gly was not detected in the same position of the NXT sequons.Distinctions between NXS and NXT were also noted at positions further upstream or downstream of the sequon.Acidic residues were observed in 39% of the -7 positions of those sequences including the NXS sequon, but were absent from the same position in NXT-containing sequences.At the +8, +9 and +10 positions, NXS-containing sequences pre-sented Ser or Thr 46% of the time, whereas the same residues were only employed at these positions in 8% of the NXT-containing sequences.On the other hand, the hydrophobic residues Ile, Leu, Met and Val were observed in positions +9 and +10 in 46% and 39% of those sequences including NXT, respectively, but only 15% and 8% of the time at the same positions in NXS-containing sequences.

Putative N-glycosylation of predicted secretory proteins in Archaea
To get a general sense of the extent of N-glycosylation in Archaea, a previously assembled database of predicted secretory proteins from 10 archaeal genomes (Bardy et al. 2003) was scanned for the presence of NXS and NXT sequons (where X was not a proline), assuming these to be potential N-glycosylation sites (Table A2).Such analysis predicted major differences in the levels of putative N-glycosylated secreted proteins expressed by the various genomes examined (Table 2).Whereas only 39% of the 121 proteins supposedly secreted by A. pernix are predicted to be N-glycosylated, 30 of the 32 predicted M. jannaschii secretory proteins (i.e., 94%) contain NXS or NXT sequons.The other genomes considered are predicted to N-glycosylate 50-88% of their putatively secreted proteins.Moreover, not only do the different species potentially N-glycosylate their predicted secretomes to differing extents, but they can also often be distinguished by the unbalanced presence of either the NXS or NXT motif.In A. pernix, a total of 37 proteins contain either the NXS or NXT sequon alone yet only 10 proteins contain both.In Halobacterium sp.NRC-1, 11 proteins contain the NXS sequon, 13 contain the NXT sequon, whereas 10 proteins contain both.By contrast, the NXS and NXT sequons are found in only three and five M. jannaschii proteins, respectively, whereas 22 polypeptides present both motifs.In M. thermoautotrophicus, most of the 45 predicted N-glycosylated secretory proteins contain comparable amounts of the NXS and NXT sequons.This number includes, however, 13 proteins containing the NXT motif alone and only two sequences that include just the NXS motif.In A. pernix, the NXS motif is found in 23 proteins, whereas only 14 proteins present the NXT sequence.
The putative glycoprotein populations in the different species also differed in the number of proposed N-glycosylation sites per protein.In A. fulgidus, 42 proteins contained 1-4 putative N-glycosylation sites, whereas only eight proteins contained five or more such sites.In Halobacterium sp.NRC-1, 31 proteins are predicted to contain 1-4 N-glycosylation motifs, but only three proteins are thought to contain five or six such sites.No proteins are predicted to contain more than six sequons in this species.Likewise, A. pernix is thought to include 44 predicted glycoproteins that contain four or less possible N-glycosylation motifs, but only three proteins presenting more.By contrast, 43% of the putative secreted proteins that may be N-glycosylated in S. solfataricus contain more than six sequons.In M. jannaschii, only two proposed secetory proteins contain single N-glycosylation sites, whereas 28 sequences present multiple sequons for possible modification.

Comparing sequences bordering modified archaeal, bacterial and eukaryal sequons
Analysis of experimentally confirmed glycan-modified sequons and their surrounding residues in archaeal glycoprotein, albeit limited in number, reveals that although the archaeal sequences share much in common with their eukaryal and bacterial counterparts, differences from the rules and tendencies governing eukaryal and bacterial sequon glycosylation also exist.
Based on available data, the most obvious difference between sequon glycosylation in Archaea and Bacteria (as exemplified by C. jejuni ) is the neccessity for an acidic residue in the -2 position in the latter but not the former.Analysis of the putative N-glycosylation sites of the proposed secretory protein populations of 10 archaeal genomes failed to detect an enhanced presence of Asp or Glu at this position.Although further characterization of N-glycosylated proteins from other Bacteria will confirm whether such a requirement for negatively charged residues at the -2 position is unique to C. jejuni, the independence of archaeal sequon modification from this rule points to the archaeal Stt3, believed to be responsible for the transfer of a lipid-linked glycan moiety to target sequons (Abu-Qarn and Eichler 2006, Chaban et al. 2006), being closer to its eukaryal than to its C. jejuni homologue, PglB (Wacker et al. 2002).This relationship was previously suggested on the basis of sequence comparison, even though the archaeal OST appears to be composed of a single Stt3-like component as in Bacteria (Abu-Qarn andEichler 2006, Chaban et al. 2006).However, the amino acid composition surrounding modified sequons in archaeal and eukaryal glycoproteins also show differences.In eukaryal glycoproteins, there is a high probability of finding aromatic residues at positions -2 and -1, a small hydrophobic residue (Gly or Val) at the X position and a larger hydrophobic residue (Ile, Leu, Met, Phe, Trp or Tyr) at the +1 position (Ben-Dor et al. 2004, Petrescu et al. 2004).By contrast, Phe, Trp and Tyr were detected at archaeal positions -2 and -1 in only 4% and 11% of the archaeal sequences, respectively, Ser or Thr were readily found at the X position and Ala and Gly predominated at the archaeal +1 position.Aromatic residues were never detected at this site.The only time a Tyr was detected at the +1 position was in the only sequon experimentally shown as being unmodified (see above).Nonetheless, as is the case in archaeal glycoproteins, Asp and Glu are also rarely found at the -2 and -1 positions adjacent to modified sequons in eukaryal N-glycosylated proteins (Petrescu et al. 2004).Moreover, in both archaeal and eukaryal glycoproteins, basic residues are poorly represented at position +1.

Insights into the archaeal N-glycosylation process from sequence analysis
The present study not only offers insight into the relationship between the environment in which archaeal sequons lie and their likelihood of being modified but also sheds light onto the interplay between protein translation, protein translocation and N-glycosylation in Archaea.Earlier results have shown that the Asn residues of sequons found < 12-14 positions downstream from a transmembrane domain are not modified by the vertebrate oligosaccharide transferase, suggesting the active site of the complex lies 30-40 Å from the ER membrane (Nilsson and von Heijne 1993).If the active site of archaeal Stt3 is similarly positioned, then the modification of the Asn residue 9 positions from the C-terminus of M. voltae FlaB2 (Voisin et al. 2005) would require glycosylation at this position to transpire post-translationally, following translocation of the protein across the membrane.However, topological analysis of the yeast and mouse Stt3 proteins (Kim et al. 2005) predicts that the WWDYG motif, thought to be responsible for the oligosaccharide transferase activity of the protein (Yan andLennarz 2002, Wacker et al. 2002), is located 42 residues from the last transmembrane domain of the polypeptide.By contrast, predictive software (HMMTOP; Tusnady and Simon 2001) puts the same motif only 25 residues from the final transmembrane domain of the H. volcanii protein (Abu-Qarn and Eichler 2006).Thus, the spatial dimensions of co-translational/translocational modification by archaeal Stt3 may differ from those employed by its eukaryal homologue.Similar spatial considerations also apply when considering that Asn residues in the H. salinarum, M. fervidus and H. volcanii S-layer glycoproteins found one, six and 12 positions from the N-terminus, respectively, are glycan-modified.These observations suggest that glycosylation occurs following signal peptide cleavage since these residues would otherwise lie too close to the plasma membrane to be accessible to the archaeal Stt3 active site.A similar scenario has been proposed for the processing of eukaryal signal peptide-bearing glycoprotein precursors (Chen et al. 2001).

Conclusions
The finding that the characteristics of the amino acids sur-rounding glycosylated sequons in Archaea differ from those in Bacteria or Eukarya is in accord with earlier observations pointing to unique aspects of the archaeal process.Although NXS sequons are clearly modified in archaeal glycoproteins, exchanging the Ser residue of a particular Asn-Ala-Ser sequon within the H. salinarum S-layer glycoprotein sequence with a Val, Leu or Asn did not prevent N-glycosylation (Zeitler et al. 1998).Moreover, whereas glycans are linked to Asn residues of eukaryal glycoproteins almost exclusively through N-acetylglucosamine, Archaea can use other linking sugars (Lechner andWieland 1989, Voisin et al. 2005).Together, these observations suggest that the archaeal OST works differently from its eukaryal and bacterial counterparts or that it is less stringent in its target selection criteria.Alternatively, Archaea could express a second version of the OST that functions in a novel manner.To date, no candidates for this putative second OST have been proposed, although A. fulgidus is thought to encode two Stt3 proteins (Burda and Aebi 1999) that could be imagined to function distinctly.Accordingly, different catalytic activities have been reported for the two Stt3 homologues detected in vertebrates (Kelleher et al. 2003).
Analysis of the regions bordering the glycan-bearing Asn residues of archaeal glycoproteins may provide insight into various aspects of the N-glycosylation process in Archaea.With the recent development of experimental systems for deciphering the steps involved in the archaeal version of this post-translational modification (Abu-Qarn and Eichler 2006, Chaban et al. 2006), the rules governing sequon occupation, including those proposed here, can be tested.

Figure 1 .
Figure 1.Analysis of the characteristics of amino acids surrounding modified Asn residues in archaeal glycoproteins.The relative proportions of acidic (DE), basic (KRH), polar uncharged (NQSTY) and polar (AVLIPFMWGC) residues were considered at 10 positions upand downstream of modified archaeal NXS (upper panel, based on 14 sequences) and NXT (lower panel, based on 13 sequences) sequons.In each case, the character of the residue at the X position was also considered.In each panel, the positions of the modified Asn as well as the sequon Ser (upper panel) or Thr (lower panel) are indicated.

Table 2 .
Tally of predicted N-glycosylated secretory proteins in selected archaeal species.