Model for the phycobilisome rod with interlocking disks based on domain-weighted linker-polypeptide sequence homologies of Mastigocladus laminosus

The 8 linker-polypeptides from the cyanobacterium Mastigocladus laminosus were examined by comparative amino-acid sequence analysis for the predicted domain structure reported in the literatur (Glauser 1991, Esteban 1993) in detail using split sequences for the rod, rod-core and for the repeats from the core-membrane linker-polypeptide (Lcm). This analysis gives two distinct domains, where domain 1 (∼ 22 kDa, identity between 31 and 70%) is present in the N-terminal two thirds of the class II linkers (∼ 30 kDa) and in the repeats of the Lcm, and domain 2 (∼ 10 kDa, identity between 28 and 41%) in the C-terminal part of the class II rod linkers (Lr and Lr) and in the two capping linkers (Lc and Lr). Based on these data, X-ray structure analysis from phycobiliproteins and proteolysis experiments, an interlocking model for the phycobilisome rod organization is proposed, with linkers protruding from one phycobilisome disk into the neighbouring one. Abbreviations: APC= allophycocyanin (encoded by apcA and apcB, for the αand β-subunits, respectively), apc= gene locus of allophycocyanin, cpc= gene locus of phycocyanin, L= linkerpolypeptide [the superscripts denote the molecular weights, according to protein sequence calculation, the subscripts give the location denoted as r (rod), rc (rod-core), c (core), and cm (core-membrane)], the correspondance of genes and the derived linker polypeptides are as follows: Lr(cpcD), Lr (cpcC), Lr (pecC), Lrc (cpcG1), Lrc (cpcG2), Lrc (cpcG3), Lc (apcC) and Lcm (apcE), M.= Mastigocladus, PBS= phycobilisome, PC= phycocyanin (encoded by cpcB and cpcA, for the βand α-subunits), PE= phycoerythrin, PEC= phycoerythrocyanin (encoded by pecB and pecA, for the βand α-subunits), pec= gene locus from phycoerythrocyanin.


INTRODUCTION
Phycobilisomes (PBS) are the extra-membraneous light-harvesting complexes of cyanobacteria, rhodophytes and cyanelles which funnel the light energy absorbed by their 500 to >1000 chromophores to two specialized ones, and from their to the reaction centers, with close to 100% quantum yield (Porter et al. 1978, Manadori andMelis 1985).The PBS of the cyanobacterium, Mastigocladus (M.) laminosus (Fischerella PCC 7603), contains at least 16 different polypeptides used each in several (up to 108) copies (Sidler et al. 1981, Zuber 1986, Glauser 1991, Esteban 1993), other PBS can be even more complex (Gantt 1986).Biochemically, the two major PBS components are the phycobiliproteins carrying covalently bound linear tetrapyrrole chromophores (MacColl and Guard-Friar 1987), and the linker polypeptides which have primarily a structural role, but are involved as well in the fine-tuning of phycobiliprotein absorption and can also carry chromophores (Glazer 1984).Morphologically, the two major components are (a) a central core comprised of allophycocyanins (APC) and the core linkers (L c 7.7 , L cm 127. 6), and (b) several rods which in M. laminosus are comprised of phycocyanin (PC), phycoerythrocyanin (PEC), and the rodlinkers (L r 8.2 , L r 31.5 , L r 32.3 , L rc 31.8 , L rc 28.7 , and L rc 29.6 ).
PEC can be replaced by PC depending on the growth conditions (irradiation and CO 2 content, Esteban 1993).
The genome organisation and DNA or amino-acid sequences of all known components of the PBS from M. laminosus have been determined (Zuber 1986, Eberlein and Kufer 1990, Glauser et al. 1992, Esteban 1993, Kufer, Gene Bank database but otherwise unpublished results).This includes three rod-core linkers, which have been located on the cpc-operon, while a fourth one is still under discussion.For several linker-free phycobiliproteins including PC and PEC of M. laminosus (Schirmer et al. 1986, Schirmer et al. 1987, Dürring et al. 1990, Ficner et al. 1992) high-resolution three dimensional structures are available.The doughnut-shaped tri-(α 3 β 3 ) or hexamers (α 6 β 6 ) are the basic phycobilisome building blocs.They appear to be at the same time the largest biliprotein aggregates which can be formed in the absence of linker polypeptides (Lundell et al. 1981, Fischer et al. 1990, Gottschalk et al. 1991, Glauser et al. 1992).The latter are then mainly responsible for the assembly of the tri-and hexameric biliproteins to the entire phycobilisome, where they must provide both the interaction energies and specifities (Lundell et al. 1981, Glazer 1984, Tandeau de Marsac et al.

1988).
According to electron microscopy (Mörschel et al. 1977) the linkers are located in the interior holes of the tri-and/or hexameric biliproteins.This view has been strengthened by the X-ray structure of a B-PE-hexamer containing a chromphore-bearing γ-subunit believed to replace the linker (Ficner and Huber 1993).Although no details of this γ-subunit are discernible due to the threefold disorder induced by the C 3 -symmetry of the biliprotein, it fills snugly the central hole of a single hexamer.The X-ray structure has recently become available (Reuter et al. 1999) of a small linkerpolypeptide within an AP trimer (α 3 β 3 * L c 7.7 ) from the phycobilisome of the cyanobacterium, M. laminosus.This structure provides several new aspects: 1.The linker, a three-stranded β-sheet and two αhelices, is in contact with only two of the three βsubunits and interacts directly with the corresponding chromophores of the phycobiliprotein subunits.
2. It influences the structure of the αβ-subcomplexes, bringing closer together the β-chromophores.
3. In spite of the generally acidic character of the phycobiliproteins and basic character of the linker, there is no evident clustering of charged, polar and hydrophobic residues at the protein-protein interface.In the framework of a project aimed at investigating structure-function relationships in linker polypeptides, we have recently resequenced part of the cpc-operon of M. laminosus and analysed the domain structure of all linkers.Here we wish to extend this idea of part of the linkers protruding from the central hole, such that this leaves part of this hole at the other end of the hexamer empty.This can function as a docking place for the C-terminus of the linker of the neighbouring hexameric disk or, at the end of the rod, with a small capping linker, giving rise to an interlocking model of discs in the rods.

MATERIALS AND METHODS
EMBL-Phage bank.M. laminosus was grown at 55 • C in Castenholz-medium d (Castenholz 1970).The cells were washed and lysed according to the method of Mazel et al. (1996), and the DNA purified by CsCl/ethidiumbromid gradient-centrifugation as described by Lau et al. (1987).Genomic DNA (100 µg) was partially digested with Sau3A, and the fragments fractionated on a 10-40% linear sucrose-gradient in 10 mM Tris-HCl buffer (pH 7.6), containing 1 mM EDTA.Centrifugation was carried out in a swing-out rotor for 24h at 120 000 *g.
The fragments of ∼16 kb size were ligated into BamHI-cleaved EMBL3 phage vector (Frischauf et al. 1983), and the phages grown in E. coli strains LE392 and Y1088 (Ausubel et al. 1987).The genomic library was screened with a radiolabeld DNA fragment, encoding part of cpcAC (radiolabelling was carried out using a nick translation kit, Boehringer Mannheim) and hybridization under stringent conditions according to Maniatis et al. (1982).
Subclones were constructed in plasmid vector pUC19 (Yannish-Perron et al. 1985) maintained in E. coli strain DH5α (Ausubel et al. 1987).Transformations were carried out according to the standard procedure of Hanahan (1985).
DNA sequencing.Plasmid DNA to be used in sequence analysis reactions was isolated by an alkaline lysis method (Heinrich 1986).The sequencing reactions were primed using the pUC/M13 sequencing primers (Boehringer Mannheim) and labelled with 10 µCi [α− 32 P] dATP using a T7 DNA sequencing kit (Pharmacia).Samples were electrophoresed on 6 or 8% denaturing polyacrylamid gels, which were subsequently dried on Whatmann 3 MM paper before exposure to Xray films (Amersham).
Alignment of sequences.Amino acid sequences were either analysed by pairwise alignments using the FASTA program (Pearson and Lipman 1988), or by multiple alignment using the HUSAR system (Deutsches Krebs Forschungs Zentrum, Heidelberg) and CLUSTAL V (Higgins and Sharp 1988).Calculated homologies count unmatched amino acids as mismatches, unless otherwise stated.For definition of the domain boundaries see results and discussion.
Published sequences used in this analysis were taken from Genbank accession number M75599 (Kufer, privat communication) for the genes pecC, cpcC, and cpcD, from Glauser (1991) for cpcG1, cpcG2, and cpcG3 and from Esteban (1993) for apcC and apcE.

RESULTS AND DISCUSSION
The work is based on a 16 kbp EMBL phage fragment containing the pec-operon and the 3 -adjacent cpc-operon.The sequence of pecB and pecA have been published by Eberlein and Kufer (1990), those of the entire pec-operon and cpcB through D were deposited in the Gene Bank database (see Material and Methods) but have not yet been published, and that from the end of cpcF through cpcG3 by Glauser (1991).The sequencing strategy and the nucleotide and derived protein sequences for cpcF and cpcG1 are shown in Figure 1.The structure determined for the cpc-operon is identical to that published by Glauser et al. (1992).CpcB and cpcA coding for the β-and α-subunits of PC, respectively, are followed by two linker genes (cpcC coding for L r 32.3 and cpcD coding for the rod-terminating L r The last three genes (cpcG1, cpcG2, cpcG3) code for the linkers responsible for attachment of the rods to the core (L rc 31.8 , L rc 28.7 , L rc 29.6 , respectively).The search for a fourth one, which has been suggested by Glauser et al. (1992) was negative.Using a homologous N-terminal cpcG1 probe (nucletide 758 to 939, Figure 1B) under stringent conditions (using 40% formamid in hybridization buffer at a temperature of 42 The pec-operon has a similar ordering (pecB, pecA, pecC, pecE, pecF), but lacks a D-gene (coding for the small rod linker), and the G-genes coding for the rodcore linkers.PEC is believed to be located at the distal ends (Bryant et al. 1979) or at least in the central portion of the rods (Reuter and Nickel-Reuter 1993), the lack of the G-genes is therefore expected.The absence of a small rod linker in the pec-operon is surprising if a terminal location is assumed.One might speculate, that in view of the rather similar three-dimensional structures of PC and PEC, L r 8.2 can cap either biliproteins at the ends of the rods.Reuter and Nickel- Reuter (1993) have questioned this view and used stoichiometric arguments for positioning (at least some of the) PEC in the center of the rods.
A comparison of the derived amino-acid sequences of the M. laminosus linkers (Figure 2) indicates that all of them are composed of a set of domains, which are combined differently for the different linkers (identity values are reported in Table 1).The domains were found by comparing first the class II rod (L r ) to the small class III linkers (L r and L c ). (The terminology of class I, II, and III is based on their size in the 100, 30, and 9 kDa range, respectively Tandeau de Marsac et al. 1988).High homologies with the latter were found in the C-terminal parts of the former, which defines what we would like to call the domain 2. The remaining N-terminal parts of the class II rod linkers where then compared with each other and the L r c and the L cm .This analysis corroborates the results of Glauser (1991) and Esteban (1993) with two alterations: Table 1.Homologies among the phycobilisome linkers or linker domains.The regions used for comparison are shown in Figure 2, with "N" and "C" denoting the N-and C-terminal parts of the class II linkers (L r 31.5 , L r 32.5 , L rc 31.7 , L rc 28.7 , and L rc 29.6 ), and RA through RD denoting repeats A through D (250 to 410, 550 to 690, 700 to 870, and 950 to 1110) of L cm 127. 6.The values given in the table are identities of amino acids calculated from the FASTA programm, "-" shows that no homologies were indicated by the used programs.(a) The domain-boarders found differ slightly from those reported.The "repeats" of the L cm 127.6 give higher homologies if they are extended (A, B, and C are extended by 25 to 50 amino acids in N-terminal direction, repeat C by 20 amino acids in the C-terminal direction).These extensions lead to shorter "arm" regions between the repeats, but include the highly conserved core regions.

AGTTGCTGGCGCAATTCCCAGCACTTTCTTATCCATGGCCGCAAGCATTGCTCCCAGCGG 1440 V A G A I P S T F L S M A A S I A P S G AATCAGCTACCAACGCACCGCCGACAGCGCCAGAACATTCATCTCCACTGTCAAGCTTCC 1500 I S Y Q R T A D S A R T F I S T V K L P CGAAACCACAAGTGAATCTAAAACCCCTCCTCCCACCGTCAAACCTGCAACTGTTGCTCT 1560 E T T S E S K T P P P T V K P
(b) A homology between the small linkers (especially L c 7.7 ) and the C-terminal part of the L cm 127.6 (denoted as "arm 5" in Esteban, 1993) could not be detected.This might be due to the different domain boarders used for the comparative sequence analysis, or by differences in stringency for the alignment programs.C-terminal extension of the domain boarder is supported by isolation of an APC-L cm fragment complex corresponding to repeat D (Gottschalk et al. 1993), but extending at least to aa 1116, which is 22 aa longer than the boarder given in Esteban 1993.
This defines what we call domain 1 which comprises the N-terminal ∼ 22 kDa portion of the class II linkers, viz. of the rod (L r 32.3 and L r 31.5 ) and rod-core linkers (L rc 31.8 , L rc 28.7 , and L rc 29.6 ), and the repeats in the L cm .
The homology (based on identity of amino acids) in this domain is 30.9 to 69.1%.It is even higher within the two rod-linker sub groups, e.g., among the three L rc (59.1 to 70.2%) and among the two L r (54.3%) and amounts to 39 to 45% among the L cm repeats.
The domain boarders found in this analysis can be compared with linker fragments in the 20 kDa range, obtained by proteolysis of biliprotein-linker complexes.An N-terminal 22 kDa fragment of L rc 31.8 corresponding to this domain 1 has been identified in a PC-trimer with a strongly red-shifted absorption spectrum (Gottschalk et al. 1991).Its C-terminus has not been determined, but is expected to from the mobility in SDS-PAGE about 10 aa C-terminal from domain boarder.This value is based on an apparent size of 115 ± 1 Da/aa in the three L rc proteins.More recently, the fourth, C-terminal repeat of L cm 127.6 (22 kDa) has been isolated as a component of an APC-trimer complex (Esteban et al. 1990, Esteban 1993, Gottschalk et al. 1993).This fragment extends at least to aa 1116 (from C-terminal protein sequence analysis, Gottschalk et al. 1993, see above).
In both cases the portion of the linker isolated with the biliprotein corresponds to domain 1, which is most likely protected from proteolysis because it is buried in the central hole of the doughnut-shaped trimer.
The homology among the five class II linkers sharply decreases in the remaining C-terminal parts of the molecules.However, the two rod linkers are homologous among themselves with this region forming a second additional domain 2.This domain 2 of the rod linkers has considerable homology to the class I linkers, viz. to L r 8.2  APC linker (28.3 to 32.1%).Since these two small linkers are also homologous to each other, they are grouped in domain 2 which is then present in all rod linkers (L r ) irrespective of their internal (L r ) or distal-terminal po-sition (L r 8.2 ).We use the functional designation of Lundell et al. (1981), whose experiments have shown that increasing proportion of L r 8.2 inhibits the elongation of rods, and have therefore termed it rod-terminating linker.The homology of L c 7.7 suggests that the latter may perform a similar capping function in the short rods of which the core is build up.
The volume of the central hole of a hexameric disk of any of the biliproteins has been calculated to be in the range of one monomer (∼34 kDa) (Ficner et al. 1992, Ficner andHuber 1993).Any of the class II rod or rodcore linkers could therefore fit well into a single such hexamer.However, the presence of domain 2 in the Cterminal part of the rod-linkers and (over almost the entire length) of the capping linkers L c 7.7 and L r 8.2 suggests a different arrangement of the rod components which is shown in Figure 3. Starting at the distal end of a rod, the capping linker (L r 8.2 ) fills one third of the distal PC (or PEC) hexamer's central hole, leaving space for only ∼22 kDa of the next linker.This space is taken up by the N-terminal 22 kDa part (∼domain 1) of the outermost rod linker (L r ), which therefore protrudes from the hexamer with its C-terminal domain 2.This in turn fits (as the homologous L r 8.2 ) into the second hexamer, leaving again space for the N-terminal domain 1 of the next rod linker.The linkers are generalized as L rt , L r PEC , L r PC and L rc corresponding to L r 8.2 , L r 31.5 , L r 32.3 , and the three rod-core linkers L rc 31.7 , L rc 28.7 , or L rc 29.6 , respectively.The orientation is indicated by "N" for the N-and "C" for the Cterminus.The C-terminal protrusion of L rc at the right site, is responsible for attachment to the core.
The asymmetric arrangement of the small rodterminating linker (L rt ) has been taken in analogy to the respective APC-linker; this is propagated along the rod by the terminal portions of the rod-and rod-core linkers.This is repeated in the third hexamer (assuming a three-disk rod), but now the rod-core linkers fit with their N-terminal (and homologous) domains 1 into the remaining space.The C-terminal portions of these link-ers differ considerably among each other and from the L r 32.3 and L r 31.5 .This provides the structures for attaching the rods to the APC core, where the orientation of the adjacent disks is perpendicular rather than parallel as in the rods, and also provides the specificity of the docking positions.
It should be noted that the size of the suggested domain 1 fits the space of one single trimer only incompletely (space in the interior hole of α 3 β 3 is 16 to 18 kDa).The first part of the N-terminal domain 1 is then perhaps responsible for building up the stable hexamer (α 6 β 6 )-linker complex in some cyanobacteria (leaving out strains where stable hexamers without linkers could be detected).This function of the first part of domain 1 may account for the reduced homology within the first 40 amino acids of the rod and rod-core linker polypeptides reported by Glauser (1991).This is supported by the X-ray structure of an AP-linker complex (α 3 β 3 L c 7.7 ).The linker-polypeptide corresponding to a type 2-domain, fills only 2/3 of the central cavity of the phycobiliprotein trimer, leaving space for parts of other linker polypeptides (Reuter et al. 1999).
A domain structure could also be discerned in the chromophore-containig γ-subunit of a marine cyanobacterium.Wilbanks and Glazer (1994) suggest a two (or three) domain structure, consisting of: (a) an N-terminal 50 amino-acid chromophore containing domain (in size nearly identical to the small L r 8.2 and L c 7.7 and domain 2 of the rod and rod-terminating linkers, but without significant sequence identity), and of (b) a linker domain 1 type stretch of ∼200 amino acids, showing a homology to the phycocyanin and phycoerythrin associated linker polypeptides.
A complete domain 2, as represented in the rod linkers and the small linkers, is missing, and only the last 25 amino acids of the C-terminus could be interpreted as the beginning of domain 2 (the facile proteolytic cleavage of this part underlining this suggestion).These results fit very well with the observed X-ray structure from Ficner and Huber (1993) where no electron density could be detected outside of the hexameric disk.
The proposed model for the rods might be extended to the core as well, with the repeating domains 1 of L cm 127.6 filling the interior parts of the APC holes, and L c 7.7 (domain 2) capping them.

Figure 1 .
Figure 1.A: Sequencing strategy for the sequence shown in B. B (next page): Nucleotid and derived amino-acid sequence of the sequenced genes cpcF and cpcG1.

Figure 2 .
Figure 2. Sequence comparison of the class I and class II linkers of the phycobilisome from Mastigocladus laminosus (Fischerella PCC7603), and of the four repeats of the core-membrane linker L cm .Those parts used for the homology analysis shown in Table 1 are marked by different shading: domain 1 is indicated by a light gray background, domain 2 is boxed.The ••• and * * * indicate overall homologies and identities, respectively, between the N-terminal regions (or repeats) of all the linkers.However, for clarity of display those linkers showing high degrees of homology are grouped.A: rod class II linkers and class I linkers, B: rod-core linkers, and C: linker-like repeats (domain-type 1) A to D of the L cm .
(40.3 to 40.9%) and even to the related L c 7.7

Figure 3 .
Figure 3. Schematic representation of the interlocking rod model, for phycobilisome rods from M. laminosus, shown for a rod containig one PEC and two PC hexameric disks.The linkers are generalized as L rt , L r PEC , L r PC and L rc corresponding to L r 8.2 , L r 31.5 , L r 32.3 , and the three rod-core linkers L rc31.7 , L rc28.7 , or L rc29.6 , respectively.The orientation is indicated by "N" for the N-and "C" for the Cterminus.The C-terminal protrusion of L rc at the right site, is responsible for attachment to the core.
If their is an additional rod-core linker, it should have a significantly different sequence in the probed region.
• C) an identical hybridization pattern was obtained with four restriction enzymes (NheI, EcoRI, Pst I, and HindIII), starting from either the 16 kbp EMBL phage fragment, or from genomic DNA from M. laminosus.