Sequence Analysis of SSR-Flanking Regions Identifies Genome Affinities between Pasture Grass Fungal Endophyte Taxa

Fungal species of the Neotyphodium and Epichloë genera are endophytes of pasture grasses showing complex differences of life-cycle and genetic architecture. Simple sequence repeat (SSR) markers have been developed from endophyte-derived expressed sequence tag (EST) collections. Although SSR array size polymorphisms are appropriate for phenetic analysis to distinguish between taxa, the capacity to resolve phylogenetic relationships is limited by both homoplasy and heteroploidy effects. In contrast, nonrepetitive sequence regions that flank SSRs have been effectively implemented in this study to demonstrate a common evolutionary origin of grass fungal endophytes. Consistent patterns of relationships between specific taxa were apparent across multiple target loci, confirming previous studies of genome evolution based on variation of individual genes. Evidence was obtained for the definition of endophyte taxa not only through genomic affinities but also by relative gene content. Results were compatible with the current view that some asexual Neotyphodium species arose following interspecific hybridisation between sexual Epichloë ancestors. Phylogenetic analysis of SSR-flanking regions, in combination with the results of previous studies with other EST-derived SSR markers, further permitted characterisation of Neotyphodium isolates that could not be assigned to known taxa on the basis of morphological characteristics.

Although taxa such as N. lolii are haploid in nature, other Neotyphodium species were shown to contain multiple gene copies and conform to heteroploid genomic constitutions [17]. The single or multiple gene copies of asexual Neotyphodium species appear to correspond to those of specific haploid Epichloë species. This observation has been interpreted to support a hybrid origin for heteroploid taxa: for instance, N. coenophialum has been proposed to have arisen through hybridisation and subsequent nuclear fusion events involving the extant taxa E. typhina, E. baconii, and E. festucae. The relative genome sizes of haploid and heteroploid endophytes (c. 30 Mb for N. lolii; c. 60 Mb for N. coenophialum) lend some support to this hypothesis [18], subject to the possibility of selective gene loss subsequent to hybridisation events. Phylogenetic relationships between endophyte taxa are hence complex and reticulated. Sequence analysis of individual gene loci may be used to infer such relationships based on affinities between shared genomes. However, performance differences between individual genes have been observed. The resolution capacity provided by rDNA and actA genes was low in comparison to other genes [12][13][14][15], possibly due to homoplasy effects [13]. Heteroploid-like Neotyphodium species also display aneuploidy for some loci, such as the rDNA gene, limiting resolution of complete phylogenies [19]. A broader survey of gene classes is hence desirable to further clarify affinities between endophyte taxa.
Simple sequence repeats (SSRs) or microsatellites [20] have been widely used for analysis of genetic variation within and between closely related species [21]. A high rate of mutation [22] renders SSR array length polymorphism particularly useful for intraspecific genetic studies. However, sequence analysis has revealed complex mechanisms controlling allele size variation, limiting the efficiency of interspecific phylogenetic analysis. Repeat number variation is thought to arise from polymerase slippage during replication [23], but constraints on threshold size for allele expansion [24] and on allele size range [25] are evident. In addition, interruptions of the repeat structure tend to stabilise SSR loci [26]. Constraints on allele size may consequently lead to inaccurate assessment of phylogenetic divergence between taxa. Size homoplasy of distinct alleles arising from insertions, deletions, and base substitutions in the SSR flanking regions are also common [27][28][29][30]. Changes in flanking regions appear to occur independently of changes in the SSR repeat array [28,30]. Due to these factors, allelic variation of SSR loci, as assessed by amplicon size variation, is appropriate only for phenetic analysis and not suitable for phylogenetic reconstruction.
In contrast, several studies have performed phylogenetic interpretation through analysis of SSR-flanking sequence regions. The resolving power of evolutionary studies using individual structural genes may be constrained by limited divergence [31], and studies of a small number of gene loci may not be representative of whole-genome variation [32]. However, the abundant genomic distribution of SSRs [33][34][35] permits phylogenetic assessment across the transcriptional units of multiple gene classes. SSRflanking regions have been used for phylogenetic analysis of multiple organisms [31,32,36,37], resolving relationships to otherwise inaccessible levels [31,32,37].
Consistent with these previous studies, gene-associated SSR loci have previously been shown to discriminate endophyte taxa based on size polymorphism [38], but did not permit phylogenetic analysis. The present study describes the comparison of sequences that flank the SSR array in 5 independently selected gene loci, across 23 distinct fungal endophyte isolates. The derived data have determined the extent of molecular variation underlying SSR size polymorphism, confirmed current models for genome affinities, inferred phylogenetic relationships and models of genome evolution (including a role for selective gene loss), and elucidated the genomic origin of several previously unclassified Neotyphodium taxa.

Endophyte Isolates.
Phylogenetic analysis was performed on 20 endophyte isolates representing three Neotyphodium and five Epichloë taxa, as well as three Neotyphodium isolates which could not be assigned to known taxa based on their morphological characters (A. Leuchtmann, pers. comm.), and a tall fescue endophyte taxon (FaTG-2) which has yet to be allocated a Linnean name (Table 1). Endophyte isolates were cultured and DNA was extracted as described previously [38].

DNA Sequence Analysis of EST-SSR Amplicons.
Genomic amplicons obtained with primer pairs designed to five EST-SSR loci (NCESTA1AB04, NCESTA1FH03, NCESTA1AG07, NLESTA1GF09, and NLESTA1NF04) were analysed. Amplicons were obtained as described previously [38]. Amplicons from haploid taxa were analysed by direct sequencing, with the exception of locus NLESTA1NF04 for which sequencing was performed on purified plasmids containing the cloned amplicon [38]. Sequencing reactions were performed in 10 μl reaction volumes containing 4 μl Sequencing Reagent Premix from the DYEnamic ET Terminator Cycle Sequencing kit (Amersham Biosciences, Little Chalfont, UK), 0.5 μM of forward or reverse primer for the locus of interest and 5 μl of amplicon in a thermocycler (GeneAmp; PE Applied Biosystems, Forster City, California, USA.) programmed for 20 seconds at 92 • C followed by 30 cycles of 20 s at 95 • C, 15 s at 50 • C, 2 minutes at 60 • C, then 10 min at 60 • C. Sequencing products were purified using Autoseq 96 plates (Amersham Biosciences), dried at 80 • C for 30 min and resuspended in 5 μl of sterile Milli-Q water before analysis on a ABI Prism 3700 automated sequencer (PE Applied Biosystems). For multiple products amplified in a single reaction from nonhaploid taxa, cloning was used to separate the different amplicons. Following amplification, the products were purified using a Microspin S-300 HR Column (Amersham Biosciences). The purified products were cloned into pGEM-T Easy Vector (Promega, Madison, Wisconsin, U.S.A.) and transformed into competent cells. Inserts were amplified from transformed colonies and sequenced. Consensus sequences were derived through analysis of several independently isolated clones or direct sequencing of both strands. Sequences were compared using Sequencher (version 4.0) (Gene Codes Corporation, Ann Arbor, Michigan, U.S.A.). BLASTX (version 2.2.1 and 2.2.6) [39] was used to search for similarities between the EST sequence of SSR loci and protein sequences in the protein databases available from the National Centre for Biological Information (NR, PDB and SwissProt; http://www.ncbi.nlm.nih.gov/BLAST/).

Phylogenetic Analysis of EST-SSR Amplicons.
The DNA sequences of unique amplicons were prepared for phylogenetic analysis by compilation in FastA format into a single file for sequence alignment in ClustalX (version 1.8) [40]. Manual realignment of sequences removed primer termini and polymorphic SSR arrays and converted insertion-deletion (indel) regions into single multistate characters. Sequence alignments were analysed by clustering or tree searching methods available in PHYLIP (version 3.6a3) (J. Felsenstein, University of Washington, Seattle, Washington, U.S.A., available from http://evolution.gs.washington.edu/phylip.html). For parsimony analysis, sequences were analysed with indels either removed, or coded as single multistate characters. Where multiple trees were resolved, the Kishino-Hasegawa-Templeton (KHT) test [41,42] or Shimodaira-Hasegawa (SH) test [43] was used to test for significant differences. The robustness of the trees was measured by the Bootstrap method [44] with 1000 replicates. A bootstrap value of 70% or greater was considered to be well supported. For Maximum Likelihood (ML) and distance-based analysis, sequences were analysed with indels removed. Distance matrices were obtained using the F84 model [42,45] and clustered using the Fitch-Margoliash (FM) method [46] or Neighbor-Joining (NJ) method [47]. The transition/ transversion ratio was estimated using the Tree-Puzzle program (version 5.0) [48] or by the ML method. To estimate the transition/transversion ratio by the ML method, different possible values for the transition/transversion ratio were evaluated in multiple runs to find the value with the maximum likelihood estimate. The same approach was used to estimate the among-site rate heterogeneity by the ML method. To estimate the among-site rate heterogeneity by the Minimum Evolution (ME) method, distance matrices generated for each site were analysed and the total branch length was taken as the estimated value of the rate of change for each of the sites.

Characterisation of Endophyte EST-SSR Amplicons.
Genomic amplicons from five EST-SSR loci, ranging in length from 181-385 bp, were characterised from 12 Neotyphodium and 10 Epichloë isolates, as well as FaTG-2 ( Table 2). The sequences of two loci (NCESTA1AB04 and NLESTA1GF09) shared amino acid sequence similarity with hypothetical or predicted proteins of Neurospora crassa and Magnoporthe grisea and were mainly composed of coding sequence (CDS) as well as 5 -untranslated region (5 -UTR) and intron sequences, respectively. The sequences of the remaining three loci did not show similarity with any proteins in public databases. Amplification products from locus NCESTA1GA07 were predominantly composed of intron sequence, based on comparison of EST and genomic DNA sequences.
Size polymorphisms between taxa for the selected loci resulted from variation at a number of indel sites (Table 2). Differences also occurred in the repeat unit number of the SSR array for three loci (NCESTA1FH03, NCESTA1GA07 and NLESTA1NF04; data not shown), accounting for the majority of the observed size polymorphisms. A number of sequence haplotypes (defined here as amplicons with multiple sequence variant content, but clearly related to a single common reference) for each locus were identified across the sample set, with identity observed between those of several Neotyphodium and Epichloë isolates. Single haplotypes for individual genes were observed for N. lolii, unclassified Neotyphodium isolate 9727, and the different Epichloë species, while N. coenophialum, FaTG-2, N. uncinatum, and unclassified Neotyphodium isolates 9303/2 and 9728 generated multiple haplotypes. The number of haplotypes present in these species varied between loci, but with a maximum of three for N. coenophialum and Neotyphodium isolates 9303/2 and 9728, and two for FaTG-2 and N. uncinatum. Variation was observed in cloning efficiency of different PCR products for those species possessing multiple haplotypes. Aberrant haplotypes, which were likely to be chimeras generated by PCR-mediated recombination, were also obtained.
Pairwise comparisons identified between 23-51 informative characters for the different loci ( Table 2). The proportion of informative characters ranged from 14% (locus NLESTA1GF09) to 31% (locus NLESTA1NF04), but was c. 20% for the other loci. A similar proportion of informative characters occurred in both the coding and noncoding sequences of the eligible loci.

Phylogenetic Analysis of Endophyte EST-SSR Amplicons.
Loci were analysed through sequence alignment (Supplementary Material: Appendices 1-5 available at doi:10.4067/ 2011/921312) individually, rather than as a combined dataset, due to variation of both inferred ploidy level and number of observed haplotypes between different SSR loci from heteroploid isolates. Between one and four trees were resolved for the different loci using the Parsimony method (Figures 1-5). A single tree was resolved for locus NCESTA1FH03 (Figure 2). The multiple trees obtained for loci NCESTA1AB04 (Figure 1), NLESTA1GF09 ( Figure 4) and NLESTA1NF04 ( Figure 5) only differed in the placement of one or two species. More variation was evident in the branching of the multiple trees identified for the locus NCESTA1GA07 ( Figure 3). The trees, however, were not found to be significantly different in the KHT or SH tests. The majority of branches in the trees were supported by bootstrap analysis. Similar trees were resolved for the different loci using the ML, FM, and NJ methods (data not shown). In tests performed using the ML and ME methods, no significant differences were detected in the rate of change between coding and non-coding sequences or between exon and intron sequences for eligible loci (data not shown).
International Journal of Evolutionary Biology  Phylogenetic analysis of the different loci revealed similar genomic relationships among endophyte species, as summarised in a network format ( Figure 6). Close relationships between haplotypes from different taxa were deduced to indicate partial or complete commonality of genome content. Some locus-dependent differences in tree topology (Figures 1-5) were observed, but specific taxa consistently grouped together. In most instances, the Epichloë species were separated into two groups, separation being supported by bootstrap analysis. The first contained E. festucae, E. baconii, and E. bromicola, and the second contained E. typhina, E. clarkii, and E. sylvatica, Neotyphodium species being included within both groups. Group 1 Epichloë species were further divided into distinct branches according to their taxonomic classification, while Group 2 endophytes showed higher levels of genetic similarity. Close genetic relationships were evident between several Neotyphodium and Epichloë species. These relationships were observed in most of the trees and were also supported by bootstrap analysis. The single haplotypes derived from both N. lolii and E. festucae were grouped together in all trees and were identical in structure for two of the loci (NCESTA1FH03 and NLESTA1GF09). Multiple haplotypes from N. coenophialum, FaTG-2, and N. uncinatum were consistently associated with counterparts from specific Neotyphodium and Epichloë species. N. coenophialum and N. uncinatum shared identical or very closely related haplotypes for all five loci. These variants also grouped with those from Group 2 Epichloë species. N. coenophialum also shared common haplotypes with FaTG-2 and E. baconii. The remaining haplotypes common to N. coenophialum and FaTG-2 grouped with the corresponding single haplotypes from E. festucae and N. lolii. For a subset of the target loci, N. uncinatum-derived sequences grouped with the corresponding haplotypes from E. bromicola.
Unclassified Neotyphodium isolates also displayed close genetic relationships with known taxa. Neotyphodium isolate 9727 produced single haplotypes from each locus that were either identical or very similar to the haplotypes common between N. coenophialum and N. uncinatum, and grouped most closely with those derived from E. sylvatica. Neotyphodium isolates 9303/2 and 9728 were closely related, all locus-specific haplotypes showing a high degree of sequence similarity. One of three subclasses of derived haplotypes grouped to form a distinct well-supported group with those from E. festucae, N. lolii, N. coenophialum, and FaTG-2. Isolates 9303/2 and 9728 exhibited a second haplotype subclass that grouped with counterparts from E. bromicola and N. uncinatum, and the same class was present in isolate over two loci. The remaining haplotype sub-class (observed in isolate 9303/2 for four loci, and in isolate 9728 for one locus)  Figure 1. Note that the nucleotide sequence of the amplification product from N. lolii isolate North African 6 detected at this locus by autoradiography [38] was not obtained. grouped with the equivalent haplotypes from E. typhina and E. clarkii.

Application of EST-SSR Loci to Endophyte Phylogenetic
Analysis. The application of SSR markers for phylogenetic analysis is limited by two main factors: complex molecular evolution of SSR loci and the occurrence of size homoplasy between distinct SSR alleles. Sequence analysis of selected SSR loci in the current study has demonstrated that these factors influence the generic inability of SSR markers to resolve phylogenetic relationships among endophyte species [38]. Changes in the SSR array repeat number appeared to be independent of flanking region changes: some of the locus NLESTA1NF04-derived haplotypes from multiple E. festucae isolate differed for SSR array number, but exhibited identity for flanking sequence, while others showed the converse relationship. SSR allele size homoplasy occurred between different endophyte taxa of distinct origins as a result of insertions, deletions, and base substitutions in both the SSR motif and flanking sequences, as observed for the locus NCESTA1FH03-specific E. festucae and E. baconii-related N. coenophialum haplotypes. Endophyte SSR locus arrays were highly variable, and differences in repeat unit number generally accounted for allele size variation between closely related endophyte species, while indel and base substitution incidence increased when comparisons were made between more distantly related taxa.
The results of this and other studies [30,31] suggest that size variation may provide a relatively accurate measure of genetic variation between closely related species. Although homoplasy was not taken into account, SSRs have previously proven useful for genetic discrimination within and between endophyte species [38]. Presumably, the inherently variable nature of SSRs and the large number of loci analysed reduced the potential biasing effects of individual loci. The complex nature of SSR loci, however, demonstrates the critical value of sequence level analysis for phylogenetic inference.
The flanking regions of gene-associated SSRs were highly conserved within and between endophyte taxa (80%-100% sequence identity across coding and non-coding regions), supporting a common origin for these species [1,9]. Despite this level of sequence conservation, SSR-flanking regions were informative for studying genetic relationships. The different individual loci obtained similar genetic relationships, consistent with previous studies of other genes. Differences in the power of the individual loci to resolve relationships were identified due to variation in number of informative characters and composition (exon, intron, coding, or non-coding) of amplicons. In other studies, SSR-flanking sequences from different loci have been aggregated to increase the number of informative characters and improve resolution of phylogenetic relationships [31,32,36,37]. However, variation of inferred ploidy level and of number of haplotypes derived from different loci in heteroploids would potentially bias such aggregation studies. As a consequence, each locus was analysed separately in this study.

Genome Affinities between Neotyphodium and Epichloë
Species. The close relationships between taxa were in accordance with those predicted from genes more commonly used for phylogenetic analysis such as rDNA, tubB, tefA, and actG. Single locus-specific haplotypes were obtained from all Epichloë species and from N. lolii, the latter being closely related to E. festucae. Other Neotyphodium species contained multiple haplotypes that were similar to those from different Epichloë species. Occurrence of different haplotype subclasses in N. coenophialum, FaTG-2, and N. uncinatum is consistent with the heteroploid or nonhaploid nature of these species [11,14] and indicates the presence of multiple genomes ( Figure 6).

Relationships with Unclassified Neotyphodium Isolates.
Neotyphodium isolates that could not be assigned to known morphological classes also appear to differ from characterised taxa at the molecular level [17]. Isolate 9303/2 and isolate 9728 have been assigned to taxonomic groupings HeuTG-2 and LpTG-2, respectively, based on phylogenetic analysis of the tefA and tubB genes (A. Leuchtmann, pers. comm.; [17]). Analysis of SSR-flanking regions in this study, however, suggests that the isolates show closer affinities than formerly predicted. Moon et al. [17] reported the detection of two haplotype classes for each isolate, closely related to those from E. bromicola and E. typhina (HeuTG) and from E. festucae and E. typhina (LpTG-2). These phylogenetic affinities were also detected in the current study. However, International Journal of Evolutionary Biology both flanking sequence analysis, as well as phenetic studies based on a larger number of SSR loci [38], detected a third haplotype sub-class for both isolates and suggested common affinities with E. festucae, E. bromicola, and E. typhina, respectively. Accurate inference of phylogenetic relationships among Neotyphodium and Epichloë species consequently requires characterisation of a number of different genomic loci.
Although isolates 9303/2 and 9728 share common affinities, DNA-based phylogenetic and phenetic analyses suggest mutual genetic divergence and placement in taxonomic groups with different relative gene content. Although similar-sized haplotypes were detected, SSR polymorphism between these isolates was greater than that detected within N. coenophialum, N. lolii, and E. festucae [38]. In addition, Note that the nucleotide sequence of the second amplification product from FaTG-2 and the third amplification product from unidentified Neotyphodium isolate 9728 detected at this locus by autoradiography [38] was not obtained.

N9303/2, N9728
All loci  Predicted partial or complete genomes, based on variant haplotype sub-classes, are indicated as black circles. Lines that connect putative common genomes, and the level of support for each inference are indicated in terms of the number of loci providing confirmatory data. This information relates to the next most adjacent taxon in the topology of the diagram. An asterisk indicates a locus-specific variant obtained by PCR, but for which the nucleotide sequence was not obtained. The dotted line defines the inferred division between the two major groups of Epichloë species.
9303/2 and 9728 failed to cluster together in an AFLPderived phenogram [38], which represents a genome-wide assessment of genetic polymorphism. SSR polymorphism analysis also detected substantial differences in the number of locus-  [10,16]. However, these endophytes appear to be related to different E. typhina strains and also differ in their genome structure [10,13,16]. Differences in transcript levels associated with different gene-specific sequence variants, as observed for the 60S ribosomal protein-encoding gene in this study, may also contribute to phenotypic trait variation between different heteroploid endophyte taxa. Two asexual endophyte species, N. huerfanum and N. tembladerae, are known to occur in Festuca arizonica [17]. Phylogenetic analysis of the third unclassified Neotyphodium isolate (9727), which was also derived from F. arizonica, suggests that it may belong to the former taxon. Isolate 9727 produced single haplotypes and these sequences, like those of the N. huerfanum tefA and tubB loci [17], are closely related to the inferred E. typhina-related haplotypes from N. coenophialum and N. uncinatum (Section 4.2). These results were also supported by SSR polymorphism-based phenetic analysis, in which 9727 clustered with E. typhina, E. clarkii, and E. sylvatica, while in AFLP analysis the isolate clustered with N. uncinatum [38].

Origins of Neotyphodium and Epichloë
Species. Due to their close phylogenetic relationships with specific Epichloë species, Neotyphodium species have been proposed to have originated from these sexual endophyte taxa either directly through the loss of the sexual state, or through interspecific hybridisation of distinct Epichloë and Neotyphodium species. The first process is proposed to have given rise to haploid Neotyphodium species such as N. lolii, while the heteroploid Neotyphodium species such as N. coenophialum, FaTG-2, and N. uncinatum may have arisen through the second evolutionary process. Because Epichloë species form unique mating populations [49][50][51] and Neotyphodium species are not known to sporulate in vivo [52], this second mode of evolution is thought to have been a parasexual process involving somatic fusion of endophyte hyphae. This hypothesis does, however, require physical colocation between endophyte taxa that generally occur in distinct host species. In addition, mechanisms of gene loss following nuclear fusion are necessary to account for the observed genomic composition of contemporary heteroploid taxa, as a range of studies [2,10,11,13,14,16,17] have shown that extant Neotyphodium species do not appear to have the full complement of genes present in phylogenetically related Epichloë species. Loss of genes involved in sexual reproduction and pathogenicity would be a prerequisite for such genomic rearrangement events, as well as genes vulnerable to dosage-dependent effects. It is also formally possible that sexual Epichloë species may have arisen from asexual Neotyphodium species in response to selective environmental pressures, a mechanism requiring both gene loss and gene gain, possibly through horizontal gene transfer. Mechanisms for both processes have been inferred through comparisons of different fungal taxa at the whole genome levels [53] and may have been facilitated by structural features such as presence of conserved repetitive elements.
In conclusion, this study demonstrates the application of SSR-flanking sequences to studies of genome affinities between pasture grass fungal endophyte species for clarification of novel modes of genome evolution. The inferred affinities were consistent with those obtained from gene loci that are more commonly used in molecular phylogenetics, but provided a more extensive survey of genomic loci, that may be ultimately extended to whole genome comparisons based.on second-generation sequencing technologies.