The S-Layer Glycoprotein of the Crenarchaeote Sulfolobus acidocaldarius Is Glycosylated at Multiple Sites with Chitobiose-Linked N-Glycans

Glycosylation of the S-layer of the crenarchaea Sulfolobus acidocaldarius has been investigated using glycoproteomic methodologies. The mature protein is predicted to contain 31 N-glycosylation consensus sites with approximately one third being found in the C-terminal domain spanning residues L1004-Q1395. Since this domain is rich in Lys and Arg and therefore relatively tractable to glycoproteomic analysis, this study has focused on mapping its N-glycosylation. Our analysis identified nine of the 11 consensus sequence sites, and all were found to be glycosylated. This constitutes a remarkably high glycosylation density in the C-terminal domain averaging one site for each stretch of 30–40 residues. Each of the glycosylation sites observed was shown to be modified with a heterogeneous family of glycans, with the largest having a composition Glc1Man2GlcNAc2 plus 6-sulfoquinovose (QuiS), consistent with the tribranched hexasaccharide previously reported in the cytochrome b558/566 of S. acidocaldarius. S. acidocaldarius is the only archaeal species whose N-glycans are known to be linked via the chitobiose core disaccharide that characterises the N-linked glycans of Eukarya.


Introduction
In many Archaea the surface layer (S-layer) proteins are the sole cell wall component [1]. These S-layer proteins assemble into a natural 2-D crystal structure with very strong self interactions. In Archaea, which do not possess other cell wall components, the S-layer has to maintain the cell integrity and stabilize as well as to protect the cell against mechanical and osmotic stresses or extreme pH conditions. It is also predicted that the S-layer has to maintain or even determine the cell shape [2][3][4][5][6].
In Sulfolobus spp. the S-layer is composed of two proteins: a small protein of approximately 45 kD, SlaB, and a large protein, SlaA, of approximately 120 kDa. SlaB is an integral membrane protein and its strong interaction with SlaA, which covers the whole cell surface, tethers the S-layer to the membrane [8,9]. Taking into account the harsh growth condition of the thermoacidophilic Sulfolobus spp. (pH 2-3 and 75-80 • C), the S-layer proteins will play an important role in maintaining cell integrity and must be adapted to be functional under these conditions. One possible posttranslational modification proteins can undergo is glycosylation, which has a major effect on stability and half-life [10]. Indeed, all archaeal S-layer proteins which have been structurally studied to date, have been found to carry N-glycans [11][12][13][14][15][16][17].
Although Eukarya, Bacteria, and Archaea all share certain characteristics of the N-glycosylation pathway, the resulting glycan structures in Bacteria and Archaea are more diverse than in Eukarya [18,19]. Notably a far greater variety of monosaccharides is used, many of which carry functionalities such as sulfate and methyl groups, or even amino acids such as threonine [12]. In the last two years, substantial progress in describing the enzymes involved in  [7]. (b) This shows a symbolic representation of the glycan which is used in the annotations of  archaeal N-glycosylation pathways has been made [19][20][21][22][23].
The archaeal N-glycosylation machinery combines aspects of both the eukaryal and bacterial pathways. For instance Archaea and Eukarya use dolichol as the lipid carrier whereas Bacteria use undecaprenyl. On the other hand, in Archaea and Bacteria the oligosaccharyltransferase is comprised of a single subunit whereas in Eukarya it is a multimeric complex. So far proteins from about 25 species of Archaea have been reported to be glycosylated and about ten have had their N-glycans partially or fully characterized [19,24]. The best understood are S-layers of the halophiles Halobacterium salinarum and Haloferax volcanii, and S-layers and flagellins of the methanogens Methanothermus fervidus, Methanococcus voltae, and M. maripaludis. In contrast to the N-glycans of Eukarya, which are almost always branched and usually considerably greater than six residues in size, these archaeal glycoproteins contain unbranched glycans most of which have fewer than six sugar residues. The exception is Hbt. salinarum which, in addition to bearing a trisaccharide composed of monosulfated glucuronic acid linked via glucose at about ten consensus sites, has one N-linked site occupied by a GalNAc-linked polysaccharide comprised of multiple repeats of a sulfated pentasaccharide (composed of GlcNAc, GalNAc, Gal, GalA, and 3-O-methyl-GalA) [25]. In H. volcanii the S-layer glycoprotein is modified by the attachment of a pentasaccharide, composed of two hexoses, two hexuronic acids, and a methylester of hexuronic acid [26,27]. The N-glycans attached to the S-layer of Methanothermus fervidus are hexasaccharides containing 3-O-methylmannose, mannose, and GalNAc [28]. In M. voltae the flagellins and S-layer proteins are glycosylated with a complex trisaccharide composed of GlcNAc, GlcdiNAcA, and a threonine-substituted ManNAcA [12]. A second strain of M. voltae has been recently found to carry tetrasaccharides which share this trisaccharide sequence capped by an uncharacterized sugar [29]. An even more unusual tetrasaccharide has been found on the pilins and flagellins of M. maripaludis.
Glycosylation of extreme thermophile members of the Archaea domain is quite poorly understood despite the fact that one member of this class, Thermoplasma acidophilum, from the Euryarchaeota kingdom of Archaea, was amongst the first of the archaea to have its glycoproteins studied by biophysical methods [31]. This early study, which reported the presence of branched mannose-rich glycans linked via GlcNAc to Asn, has not, however, been followed up with more rigorous structure analysis. A second extremophile member of the Euryarchaeota kingdom to have its glycosylation studied is Pyrococcus furiosus, which, interestingly, has also been shown to biosynthesis branched glycans [32]. The oligosaccharyltransferase from this species has been purified, and its ability to glycosylate a fluorescently labeled peptide containing a consensus sequence has been assayed in the presence and absence of lipid-linked oligosaccharide (LLO) prepared from Pyrococcus furiosus cells. In the presence of the LLO a glycopeptide was produced which was shown by mass spectrometry to be a branched heptasaccharide having a pentose sugar attached to each of the second and third residues of a pentasaccharide of sequence HexNAc-HexA-Hex-Hex-HexNAc [32].
Branching is also a feature of the only glycan so far determined from a member of the Crenarchaeota kingdom. Thus cytochrome b 558/566 of Sulfolobus acidocaldarius, which grows optimally at 75-80 • C and pH 2-3, was shown to be modified with a tribranched hexasaccharide of composition Glc 1 Man 2 GlcNAc 2 plus 6-sulfoquinovose, an unusual sugar which is characteristic of chloroplasts and photosynthetic bacteria (see Figure 1 for structure) [7]). As, to date, this is the only characterised glycan structure from a crenarchaeal species our objective is to determine the glycan composition of other extracellular proteins of the Sulfolobales. It is known that nearly all extracellular proteins found in these organisms Archaea 3 are glycosylated [8,33,34]. As a first model protein, the Slayer protein of S. acidocaldarius was isolated and its glycosylation investigated using glycoproteomic methodologies.

Strains and Growth
Conditions. S. acidocaldarius (DSM639) was grown in Brock medium at pH 3 and 76 • C [35] and the medium was supplemented with 0.1(w/v) % of tryptone as sole carbon and energy source. Growth of cells was monitored by measuring the optical density at 600 nm.

S-Layer Isolation.
Fresh cells or frozen cell pellets from a 50 ml culture were resuspended in 40 ml buffer A (10 mM NaCl, 1 mM PMSF, 0.5% Na-Lauroylsarcosine) with the addition of a little bit of DNAse. The samples were shaken for 45 minutes at 37 • C and centrifuged for 20 min in an Optima Max-XU Ultracentrifuge (Beckman Coulter) at 16.000 rcf, yielding a brownish tan pellet. The pellet was resuspended in 1,5 ml buffer A and incubated for 30 min at 37 • C. After centrifugation in a tabletop centrifuge at 14.000 rpm the pellet was purified by repeatable washes in buffer B (10 mM NaCl, 0,5 mM MgSO 4 , 0.5% SDS), incubation for 20 min at 37 • C and subsequent centrifugation, until a translucent tan pellet was obtained. Once the pellet was translucent the Slayer proteins were once washed with water ad then stored in water at 4 • C.

Proteolytic Digestion for Glycoproteomic Analysis.
Purified S-layer samples of S. acidocaldarius were run on a 2-8% precast gel (Invitrogen, Paisley, UK) and stained with Novex Colloidal blue stain (Invitrogen). The S-layer was observed as a broadband between 116 and 160 kDa. The band was then excised and cut into pieces, destained using 400 μl of 50% (v/v) acetonitrile in 0.1 M ammonium bicarbonate (pH 8.4) and dried in a SpeedVac. Reduction/carboxymethylation was carried out by swelling and incubating the dried gel pieced in Dithiothreitol (10 mM) (200 μl) in ammonium bicarbonate (AMBIC) (50 mM, pH 8.4) (Roche, West Sussex, UK) at 56 • C for 30 min. The DTT solution was then removed and the gel pieces were washed with acetonitrile (200 μl) and dried. The gel pieces were incubated in dark at room temperature in 50 mM iodoacetic acid (IAA) (200 μl) (Sigma-Aldrich Dorset, UK) which was dissolved in ammonium bicarbonate (50 mM, pH 8.4). The IAA was then removed and gel pieces were washed in AMBIC buffer (500 μl) for 15 min. The AMBIC was removed and gel pieces were shrunk in acetonitrile (200 μl) for 5 min. The gel pieces were dried in SpeedVac and then subjected to digestion with trypsin or chymotrypsin. For the tryptic digest the dried gel pieces were reswelled in AMBIC solution and incubated at 37 • C with 25ng/μl trypsin (20 μl) (Promega cat V5111) overnight. The supernatant was removed and placed in a clean eppendorf. The gel pieces were then incubated in 0.1% TFA (50 μl) at 37 • C for 10 min. Acetonitrile (100 μl) was added to the mixture which was incubated at 37 • C for 15 min. The supernatant was then pooled with the previous supernatant and the process was repeated twice. The supernatant volume was then reduced (to about 35 μl) in preparation for LC-MS. For the chymotryptic digest re-swelled gel pieces were incubated at 37 • C in 25 ng/μl chymotrypsin (Sigma C-3142) dissolved in Tris-HCl (100 mM, pH 7.8), overnight. The remainder of the experiment was carried out as for tryptic digestion.

LC-MS Analysis.
The extracted peptides/glycopeptides from the gel pieces were analyzed a nano-LC-ES-MS/MS employing a quadrupole TOF mass spectrometer (Q-STAR Pulsar I, MDS Sciex). Separation of the peptides/glycopeptides was carried out by using a nano-LC gradient method generated by an Ultimate pump fitted with a Famos autosampler and a Switchos microcolumn switching module (LC Packings, Amsterdam, The Netherlands). The system was coupled to an analytical C 18 nanocapillary (75 m inside diameter × 15 cm, PepMap) and a microprecolumn C 18 cartridge for online peptide/glycopeptide separation. The digest was first loaded onto the precolumn and eluted with 0.1% formic acid (Sigma) in water (HPLC grade, Purite) for 4 min. The eluant was then transferred onto the column and eluted at a flow rate of 150 nL/min using the following gradient of solvent A [0.05% (v/v) formic acid in a 95:5 (v/v) water/acetonitrile mixture] and solvent B [0.04% formic acid in a 95:5 (v/v) acetonitrile/water mixture]: 99% A from 0 to 5 min, 99 to 90% A from 5 to 10 min, 90 to 60% A from 10 to 70 min, 60 to 50% A from 70 to 71 min, 50 to 5% A from 71 to 75 min, 5% A from 75 to 85 min, 5 to 95% A from 85 to 86 min, and 95% A from 86 to 90 min. Data acquisition was performed using Analyst QS software with an automatic information-dependent-acquisition (IDA) function.

Sugar Composition Analysis.
Samples were hydrolysed in 1 M methanolic hydrogen chloride at 80 • C for 16 h and the reagent was removed under a stream of nitrogen. Hexosamines were re-N-acetylated in 500 μl of methanol/pyridine/acetic anhydride (500:1:5, v/v/v) for 15 min at room temperature, then dried under nitrogen. Trimethylsilyl derivatisation was performed in 200 μL of Tri-Sil "Z" (Pierce) at room temperature for 30 min, after which the reagent was removed under nitrogen. Derivatized monosaccharides were resuspended in 1 ml of hexanes, centrifuged at 3000 rpm for 10 min, and the supernatant transferred and dried under nitrogen for analysis by gas chromatography-mass spectrometry (GC-MS).

GC-MS Analysis.
This was carried out using a Perkin Elmer Clarus 500 instrument, fitted with a RTX-5 (30 m × 0.25 mm internal diameter, Restek Corp.). Temperature program: the oven was held at 65 • C for 1 min before being increased to 140 • C at a rate of 25 • C/min, then to 200 • C at a rate of 5 • C/min and finally to a temperature of 300 • C at a rate of 10 o C/min.  T5  T4  T3   T6   T6   T7   T8  T9  T10   T14   T15   T16   T17   T18   T19   T20  T21   T22   T23   T25   T27  T28   T29  T30   T31   T32   T35   T37   T36   T33  T34   T26   T24   T2   T12 T13 T11 Figure 2: Polypeptide sequence of the S. acidocaldarius S-layer. The N-terminal signal sequence has been omitted. Consensus sequences for N-glycosylation are shaded and the predicted products of tryptic digestion are shown by underlining.

Strategy for Glycoproteomic Analysis of the S. acidocaldarius S-Layer.
The polypeptide sequence of the S. acidocaldarius S-layer SlaA protein (Saci 2355) is shown in Figure 2 [36]. The mature protein is predicted to comprise 1,395 amino acids and to contain 31 consensus sites for N-glycosylation. These are scattered throughout the S-layer with the greatest density being in the C-terminal domain where about 25% of the polypeptide contains one third of the consensus sequences. This domain is rich in Lys and Arg, in contrast to the remainder of the S-layer where these residues are quite sparse. As shown in Figure 2, the majority of the predicted tryptic peptides from Leu 1003 onwards are smaller than about 40 residues, and are thus well suited to electrospray tandem mass spectrometry (ES-MS/MS) whereas many of those in Archaea 5  the N-terminal domain are much larger and are therefore expected to be far less tractable to proteomic analysis. We decided, therefore, to focus our efforts on defining glycosylation in the C-terminal domain by performing nano-LC-MS/MS analyses of in-gel tryptic digests (Figure 3) of the S-layer and manually searching the resulting MS/MS data for potential C-terminal glycopeptides. First, we identified spectra containing fragment ions suggesting the presence of sugars. Then, promising MS/MS data were examined for the presence of peptide sequence ions that could be attributed to predicted tryptic peptides in the C-terminal domain. Finally, glycopeptide structures were deduced taking into account likely peptide and glycan compositions, the latter assignment being assisted by knowledge of the glycan on the cytochrome b 558/566 of S. acidocaldarius (Figure 1). Additionally, all the mass spectra acquired in, and adjacent to, the elution windows of the identified glycopeptides were examined for evidence of molecular ions consistent with glycoforms whose abundance and/or m/z values had precluded their selection for MS/MS analysis by the automatic software. The results of these analyses are presented in the following sections.

Evidence That T-26 and T-24 Are Glycosylated. Following on-line nano-LC-ES-MS analysis of the S-layer tryptic digest
and manual interpretation of the resulting data, a number of multiply charged molecular ions were observed whose product ion spectra were indicative of peptide glycosylation. Once recognised, related glycoforms were identified by summation across the appropriate nano-LC elution time.
The summed mass spectrum of components eluting between 46 and 52 min is shown in Figure 4.  (Figure 1). The signal at m/z 1111.59 was not selected for MS/MS but its m/z value is consistent with this component having one fewer hexose than m/z 1151.09 (Figure 4). Without MS/MS data, we are not able to determine whether Glc or Man are absent from the glycan. It is interesting that we do not observe a molecular ion corresponding to Hex 2 HexNAc 2 .
We observed similar patterns for all the other tryptic glycopeptides, although there was some variation in relative abundances, particularly of minor components. Some sites had significant amounts of glycans that were truncated to a single GlcNAc. An example is shown in Figure 6, which is the tryptic glycopeptide T-24 having the sequence LLNL-NVQQLNNSILSVTYHDYVTGETLTATTK. Triply charged [M + 3H] 3+ molecular ions are observed at m/z 1256.38, 1378.09, 1507.44 and 1561.45 corresponding to glycans of composition HexNAc, Hex 1 HexNAc 2 , Hex 2 HexNAc 2 QuiS, and Hex 3 HexNAc 2 QuiS, respectively. The MS/MS data confirmed the identity of the peptide T-24 (data not shown). For this glycopeptide a relatively abundant ion for the glycopeptides truncated to a single HexNAc was observed (m/z 1256.4). This was not a significant glycoform in the case of T-26.

All Observed Consensus Sequences in the C-Terminal
Domain Are Glycosylated. Using the same logic as applied to T-26 and T-24 (see above) we unambiguously identified all the remaining tryptic peptides in the C-terminal domain with the exception of T-29. This is a 66 residue peptide having three consensus sequences (A 1115 SVYY. . .SSLTK 1180 ) and is likely to be too large for the ES-MS experiment. The identified glycopeptides are shown in Table 1 which summarises the m/z values and compositions for glycoforms observed by ES-MS. In an attempt to identify the consensus sites falling within the T-29 tryptic glycopeptide we carried out a chymotryptic digestion. This is not an ideal enzyme for glycoproteomics because it cleaves relatively nonspecifically at large hydrophobic residues as well as at aromatic residues. Hence it yields a very complex digest and consequently the MS and MS/MS data are an enormous challenge for manual interpretation. Fortunately, however, two useful sets of glycopeptide data from the C-terminal domain of the S-layer were revealed by manual inspection. Firstly high quality spectra were found for the C-terminus itself (AGGPVLSEYPAQLIFTNVTLTQ; designated C-1 in Table 1), which includes the consensus sequence at Asn 1390 . MS/MS results (not shown) confirmed the major glycoform was the same as that assigned in the tryptic digest (Table 1) where our evidence had been confined to MS data only. More usefully the chymotryptic digest gave data corresponding to a glycopeptide of sequence TIVPNNTVVQIPSSL which spans the consensus site at Asn 1168 within T-29 (designated C-2). Once again the glycan profile was similar to other observed sites with the hexa-and trisaccharides being the most abundant components (Table 1).

Sugar Analysis Confirms Man, Glc, and GlcNAc Content.
The S. acidocaldarius S-layer was analysed for its sugar composition by GC-MS of trimethylsilyl (TMS) methyl glycoside derivatives and the data are shown in Table 2. Consistent with both the MS data described earlier, and the structure reported by Zähringer and colleagues [7], the analysis showed Man, Glc, and GlcNAc as the only observable sugars. No GalNAc was present, confirming the chitobiose core sequence. The Man:Glc ratio is consistent with the   Table 2: GC-MS analysis of TMS methyl glycoside sugar derivatives obtained from S. acidocaldarius S-layer glycoprotein. Note that for experimental reasons GlcNAc recoveries are always poor in sugar analysis experiments. We consider it likely that this is the reason for the GlcNAc:Mannose ratio being much lower than expected from the LC-ES/MS/MS data, although we cannot rule out the possibility that other mannose-containing polymers are present in the sample.

Discussion
In this study we have employed glycoproteomic methodologies to map the C-terminal domain of the S. acidocaldarius Slayer, spanning residues L 1004 -Q 1395 . This domain has eleven potential N-linked glycosylation sites, nine of which we have shown to be glycosylated (Figure 7), including Asn 1390 which is only six residues from the C-terminus. The two sites which were not identified, Asn 1120 and Asn 1154 , fall within a 66 residue tryptic peptide (T-29) that has three consensus sequences, the third of which (Asn 1168 ) was found to be occupied via analysis of a chymotryptic digest of the S-layer. Unfortunately the other two sites were not revealed by this digest. It is noteworthy that we were successful in obtaining good quality data on T-31, which is a 54 residue tryptic glycopeptide carrying two N-glycans (see Figure 7 and Table 1) whose size is not very different from the predicted value for monoglycosylated T-29. We think it is likely, therefore, that our failure to detect signals attributable to the T-29 glycopeptide is probably due to either or both Asn 1120 and Asn 1154 being glycosylated, in addition to Asn 1168 , thus moving this glycopeptide outside the observable m/z range of the glycoproteomics experiment. Irrespective of whether these two sites are indeed occupied, the glycosylation density in the C-terminal domain is quite remarkable, averaging one glycosylation site for each stretch of 30-40 residues.
Each of the observed glycosylation sites was found to be heterogeneously glycosylated with a family of glycans, the largest of which has a composition consistent with it having the sequence of the tribranched hexasaccharide found in cytochrome b 558/566 of S. acidocaldarius (Figure 1, [7]). The other members of the family appear to be biosynthetic precursors of this glycan. The most abundant is Man 1 GlcNAc 2 (see annotations on Figures 4 and 6). In addition, a nonextended GlcNAc is a minor component at some sites (Figure 6), and a pentasaccharide lacking one of the hexoses of the mature glycan was also observed ( Figures  4 and 6). Interestingly Man 2 GlcNAc 2 was only observed as a very minor component at a single site (T-34, Table 1), suggesting that either the second mannose is added after the 6-sulfoquinovose, or addition of the latter is rapid compared with the second mannosylation. It is not surprising that the S-layer and the cytochrome share a glycan sequence because conservation of N-glycosylation within a species appears to be characteristic of Archaea [24]. Moreover, the presence of a family of biosynthetically related glycans is not unexpected since the S-layer of H. volcani exhibits a similar phenomenon [27].
With the exception of the two halophiles described in the Introduction, whose glycans are linked via glucose, all known archaeal glycans are attached to Asn via GalNAc or GlcNAc. Interestingly, S. acidocaldarius is the only species characterized so far whose glycans are linked via chitobiose (GlcNAcβ1-4GlcNAc), the core disaccharide shared by the N-linked glycans of Eukarya. Moreover the tribranched topology of the Sulfolobus glycan is reminiscent of eukaryotic glycans which are usually multiantennary.
Sulfolobus spp. are developing into model organisms for studies on Eukarya-like mechanisms of transcription, translation, and cell division, and the vast amount of recently established genetic tools now make these organisms a prime choice to study these phenomena. The glycan Archaea 9   T  V  S  L  I   Q  K  V  V  N  E  A  I  F  W  P  P  A  F  G  T  Y  P  Y  A  T  L  L  F  E  V  V  G  A  G  K  A  V  L  S  L  T  S  L  A  V  L  K  T  T  A  T  L  T  E  G  T  V  Y  D  H  Y   S  L  T  A  V  T  Q  G  Y  I  N  T  I  V  T  Q  G  S  Y  Y  V  S  A  K  G  G  L  G  V  V  V  T  S  L  S  L  I  A  S  P  N  T  K  A  F  Q  L  D  S  Y  T K  E  L  G  T  I  S  I  G  F  Q  F  T  G  N  C  A  Q  L  E  Q  G  V  L  Q  P  F  L  T  Q  M  N  S  T  T  L  P  I  V  V  T  T  G  S  N  N   T  I  N   T  I  structural information reported in this paper will facilitate the application of genetic tools to the elucidation of the N-glycosylation pathway in S. acidocaldarius. It will be very interesting to establish whether the commonalities in core structure between the glycans of S. acidocaldarius and those of Eukarya are mirrored in the biosynthetic pathways. Moreover, identification of the glycosylation enzymes of S. acidocaldarius could lead to interesting biotechnological applications.