New Insights on the Evolutionary History of Aphids and Their Primary Endosymbiont Buchnera aphidicola

Since the establishment of the symbiosis between the ancestor of modern aphids and their primary endosymbiont, Buchnera aphidicola, insects and bacteria have coevolved. Due to this parallel evolution, the analysis of bacterial genomic features constitutes a useful tool to understand their evolutionary history. Here we report, based on data from B. aphidicola, the molecular evolutionary analysis, the phylogenetic relationships among lineages and a comparison of sequence evolutionary rates of symbionts of four aphid species from three subfamilies. Our results support previous hypotheses of divergence of B. aphidicola and their host lineages during the early Cretaceous and indicate a closer relationship between subfamilies Eriosomatinae and Lachninae than with the Aphidinae. They also reveal a general evolutionary pattern among strains at the functional level. We also point out the effect of lifecycle and generation time as a possible explanation for the accelerated rate in B. aphidicola from the Lachninae.


Introduction
Aphids constitute a diversified group of insects widespread and of economical relevance as crop pests. The underlying reason of their ecological success is their novel capability to exploit ecological niches with little competitors, mainly due to their diet based on phloem, which is abundant and of easy access but represents an unbalanced source of nutrients, rich in sugars and poor in amino acids [1]. The clue to the use of new resources lies in the establishment of an obligate endosymbiotic relationship between the ancestor of aphids and a gamma-proteobacterium, the ancestor of Buchnera aphidicola. This single event of infection has been dated at least 150-200 million years ago (MYA) [2] according to the fossil record or to 80-150 MYA based on molecular data [3]. As a result of millions of years of cospeciation of host and endosymbiont, the current species of aphids carrying their specific strains of B. aphidicola emerged.
The vertical mode of transmission of B. aphidicola, from mother to eggs and embryos, together with the location in specific host cells (the bacteriocytes), determines a population scenario for this bacterium characterized by their low effective population size, with frequent bottlenecks and little chance of genetic recombination with other bacteria. As a result, the genome reductive process undergone by B. aphidicola encompasses a decrease in the genomic size due to the loss of unnecessary genes in the new intracellular context, the increase in A+T content compared to its freeliving relatives, a significant acceleration in evolutionary rates, mainly due to the accumulation of nonsynonymous substitutions, the loss in codon bias, loss of many regulatory proteins and functions, as well as the retention of genes linked to their symbiotic role [4][5][6][7][8][9].
This particular history of genome reduction is pertinent to understand the coevolution between particular aphid hosts and B. aphidicola. Many of the genes that are involved in recombination and/or genetic transference were lost at the beginning of the symbiotic association and, consequently, the B. aphidicola clones have evolved independently in each particular host with no or little chance of gene exchange among B. aphidicola from different aphid hosts [10].
The comparison of the topology of phylogenetic trees based on aphid genes and those from B. aphidicola reveals a perfect match [2,11]. As a result of this parallel evolutionary pattern, B. aphidicola can be regarded as an excellent marker in order to elucidate the evolutionary relationship of aphids harboring particular B. aphidicola strains.
The analysis of B. aphidicola genes that follow an evolutionary pattern that agrees with the molecular clock hypothesis [12,13] can be used to estimate the divergence time between pairs of aphids. This is possible because two aphid species, Acyrthosiphon pisum and Schizaphis graminum belonging to two tribes of the subfamily Aphidinae, have an estimated divergence time calibrated from their fossil record of 50 to 70 MY [14]. In addition, using molecular data from complete B. aphidicola genomes available, Pérez-Brocal and coworkers calculated the divergence time of aphids belonging to subfamilies Eriosomatinae (Baizongia pistaciae) and Lachninae (Cinara cedri) [15]. Based on morphological traits, the subfamilies Eriosomatinae and Lachninae have traditionally been considered very divergent. In fact, most phylogenetic hypotheses based both on morphological and molecular data consider the Lachninae as a sister group of the Aphidinae [11,16]. However, the position of this subfamily remains controversial, as recent phylogenies based on molecular sequences located the subfamily in a basal position [17][18][19]. Here, we follow a genomic approach to deepen the evolutionary analyses and propose a phylogeny of the three subfamilies of aphids based on the genome sequence of their primary endosymbionts B. aphidicola. In addition, in order to detect if there is any selective effect related to the specific role of the genes, we also gave a closer look to the acceleration pattern of each functional category.

Sequence Alignments.
For protein-coding genes, nucleotide sequences were translated into amino acids using the ClustalW tool implemented in the MEGA4 package [23]. The generated amino acid sequences were used, in turn, as a template to align the corresponding nucleotides with MUSCLE v3.6 [24], to reduce ambiguities.

Estimate of Strain-Specific Evolutionary
Rates. B. aphidicola BCc was used as a reference strain since it is the one with the lowest gene complement of those analyzed. For each one of the genes present in B. aphidicola BCc having an orthologous in at least one of the other B. aphidicola strains, an analysis of relative substitution rates between pairs of B. aphidicola strains was carried out, using E. coli as outgroup. Specifically, we applied a Tajima's relative rate test [25] with MEGA4, generating six comparisons for each of the aligned genes. Genes showing accelerated rates were grouped according to a nonredundant categories classification based on that used in the sequencing work on Aquifex aeolicus [26], with some modifications [27].

Estimate of Evolutionary Acceleration among Genomes.
The sequence from the 338 protein-coding genes shared by the four B. aphidicola strains plus E. coli was used to quantify the relative degree of evolutionary acceleration among strains. To do this, nucleotide sequences were concatenated with BioEdit and aligned using the ClustalW tool implemented in the MEGA4 [23]. Three different estimates of substitution rates per site between species i and j (K i j ) were carried out with MEGA4, using (a) the total and (b) nonsynonymous nucleotide positions, under the Kimura 2-parameters and the modified Nei-Gojobori methods, respectively, and (c) amino acid sequences, using the JTT substitution matrix. K 01 and K 02 were calculated according to Moran [28], being taxon 0 the last common ancestor of the endosymbiont strains compared in each test (taxa 1 and 2). The calculation of total and nonsynonymous substitutions allowed us to account for the phenomenon of saturation. To check for saturation, the "transition and transversion versus divergence" plot was implemented by DAMBE v4.2.13 for the concatenation of shared genes using the first and second positions as well as the third one [29]. This method has been successfully used previously to estimate saturation due to divergence [30][31][32][33].
Additionally, for each protein-coding gene under study, the values of both synonymous (d S ) and nonsynonymous (d N ) nucleotide substitutions were calculated, using a modified Nei-Gojobori model (Jukes Cantor) implemented by MEGA4 [23]. To calculate the synonymous (λ S ) and nonsynonymous (λ N ) nucleotide substitutions per million years, we used the expression λ = K/2T, where K is the number of nucleotide differences per site and T the estimated divergence time. The T values used in these analyses were 107 MY for (B. aphidicola BAp-BSg-)BBp, 111 MY for (B. aphidicola BAp-BSg-)BCc, and 112 MY for B. aphidicola BBp-BCc. These are the previously determined lowest values for each range of estimated divergence times among strains [15], based on the range of 50 to 70 MY since the strains used for calibration (B. aphidicola BAp and BSg) diverged as estimated from the fossil record [14]. The global average λ S and λ N values for each pair of B. aphidicola strains was calculated, as well as the partial average λ S and λ N values for each functional category [26,27] between all the strain pairs.

Phylogenetic Analyses.
Since saturation was achieved at the third position in all comparisons but BAp and BSg, in order to reduce the loss of phylogenetic signal we excluded this position when working with nucleotides to perform International Journal of Evolutionary Biology 3 our phylogenetic analyses. The concatenated sequence of the 338 protein-coding genes shared by the four B. aphidicola strains was used to reconstruct the phylogenetic relationships among them. Maximum Likelihood (ML) analyses were carried out with PAUP4.0b10 [34] for nucleotides, and Phyml v2.4.5 [35] for amino acids, according to the best models of nucleotide (GTR+I+G) and amino acid (CpREV+I+G+F) substitutions for those genes derived from jModelTest [36] and ProtTest 1.4 [37], respectively. Nucleotides and amino acids were also used for Bayesian analysis, with MrBayes v3.1.2 [38], using four MCMC strands, 1,000,000 generations, with trees sampled every 100 generations. Consensus trees were produced after excluding an initial burn-in of 25% of the samples, as recommended.
In a previous study, the evolutionary analyses of the four B. aphidicola strains showed that only 21 genes fulfill the molecular clock hypothesis [15]. The topologies of the 21 phylogenetic trees based on these genes were obtained by ML using PAUP 4.0b10 [34], in order to determine the most plausible evolutionary relationships among strains and compare them with the phylogenetic reconstruction.

Statistical Analyses.
All statistical analyses were performed using the software package R (http://www.r-project.org) [39]. A chi-square analysis was applied to the global distribution of the accelerated genes among B. aphidicola strains compared to the distribution within functional families, to test whether any particular functional category contains a significantly increased or reduced number of accelerated genes. Twelve comparisons with Yates' correction were carried out, at a significance level α = 0.05.
The average rates of synonymous (λ S ) and nonsynonymous (λ N ) substitutions per site per million years of the six possible comparisons among B. aphidicola strains were compared using a one-way ANOVA analysis followed by Tukey's range tests to find which means are significantly different from one another.

Comparison of the Evolutionary Rates in B. aphidicola
Strains at a Genome Level. The relative rate test on the 338 concatenated protein-coding genes (Table 1) reveals that, since the last common ancestor of each pair of strains, the accumulation of both nucleotide and amino acid substitutions, as well as the nonsynonymous substitution rates follows different rates in the different strains, but the values obtained using all three parameters are equivalents for any given strain pair. Thus, for the nucleotide sequences, a similar pattern of relative evolutionary rates was observed when total and nonsynonymous substitution rates are considered. B. aphidicola BSg and BAp show a similar rate (1.12 : 1), the one in B. aphidicola BBp being slightly higher (1.3-1.4-fold that of B. aphidicola BSg and BAp) and B. aphidicola BCc being the one with more accelerated rates (1.7-fold that of B. aphidicola BBp and more than 2-fold that of B. aphidicola BAp and BSg). As for the amino acid sequences, the relative acceleration shows a similar patter as the one observed for the nucleotides, but with values in B. aphidicola BCc of 2 to 3-fold those of B. aphidicola BBp and BAp-BSg, respectively.
The evolutionary acceleration among genomes was also determined through the analysis of the synonymous (λ S ) and nonsynonymous (λ N ) nucleotide substitutions per million years. The results show that both rates exhibit an opposite pattern (Figure 1). Differences in both λ S and λ N are statistically significant (ANOVA test, significance level 0.05), clustering into three separate groups for λ S and two groups for λ N , according to Tukey's range tests. When synonymous substitutions (Figure 1(a)) are considered, the more accelerated rate is found in the comparison between strains B. aphidicola BAp and BSg, a second group includes B. aphidicola BBp with the two aforementioned, and the least accelerated one includes all rates in which B. aphidicola BCc is involved. A different pattern is found for nonsynonymous substitutions (Figure 1(b)), where the more accelerated group includes all the comparisons involving B. aphidicola BCc, and the other one includes the remaining three comparisons.

Analyses of the Evolutionary Rates at a Functional
Level. The general pattern identified at a genomic level is reproduced at every functional category (see Section 2), with the same three and two groups of B. aphidicola strain pairs found in λ S and λ N , respectively ( Figure 2). On the other hand, no significant differences are found among functional categories in any strain for λ S (Figure 2(a)). However, a significant increase in λ N is found for the genes involved in cell envelope in all the strains (P < .05) and to a lesser extent in the category of poorly characterized genes (Figure 2(b)). This could be due to a significant acceleration of the flagellar genes still remaining in B. aphidicola, especially in BCc, the strain which has undergone the most drastic reduction in the flagellar machinery.
In a previous study, we determined the global relative distribution of accelerated genes displayed by the strains, using Tajima's relative rate test [15]. According to this test, B. aphidicola BCc presents a higher number of accelerated genes (56%-83%), while B. aphidicola BBp presents intermediate values (0.6%-35%), and the fewest appear in B. aphidicola BSg and specially BAp. This trait is observed in each functional category with no significant differences (Figure 3). This homogeneous distribution of the accelerated genes across functional categories was tested by the application of χ 2 tests, based on the observed number of accelerated genes for each category and the expected number of genes based on the totality of them for each pair of strains. None of the tests was statistically significant at P < .05 (Table 2).

Phylogenetic Analyses Show an Evolutionary Radiation
Pattern. According to the molecular clock hypothesis, two taxa sharing a common ancestor should have accumulated the same number of substitutions since they diverge. In the B. aphidicola case, only 21 genes do not reject the molecular clock hypothesis [15]. These genes can be used to identify the phylogenetic relationships among the strains under study, which will also reflect the relationships among their insect hosts. However, three different tree topologies appear in   a similar number of cases for these 21 genes (see Figure 4). Six genes generated the topology a (B. aphidicola BCc basal), seven the topology b (B. aphidicola BBp basal), and eight the topology c (B. aphidicola BCc and BBp clustered). Therefore, the analysis of these genes, individually considered, does not resolve the position of the B. aphidicola BCc and BBp strains. This result points at the possibility of a radiation within a relatively short period of time, giving rise to the subfamilies. To confirm this point, and in order to solve the deepest relationship among subfamilies, a more exhaustive phylogenetic reconstruction was carried out, based on all the concatenated protein-coding genes shared by the four B. aphidicola strains. The resulting phylogenetic tree ( Figure 5) shows the same topology as tree c in Figure 4, that is, a well supported clade consisting of both members of the subfamily Aphidinae, as expected, and another clade that shows a clustering of B. aphidicola BBp and BCc, also with the maximum statistical support. The uneven branch length, being that of B. aphidicola BCc significantly longer, indicates the evolutionary acceleration experienced by this strain. The topology obtained using amino acid sequences is identical, but the relative length of B. aphidicola BCc's branch is even longer, reflecting a higher value of nonsynonymous substitutions.

Reconstruction of the Evolutionary History of Aphids
Belonging to Subfamilies Aphidinae, Eriosomatinae and Lachninae. Aphids emerged as a monophyletic group of viviparous insects about 250 MYA as a divergent group from the oviparous Adelgidae and Phylloxeridae [11]. The basal radiation of the family Aphididae was dated by molecular data to the Cretaceous, 80 to 150 MYA [3]. Although the initial development of aphids took place on gymnosperms during the Mesozoic, most of their current diversity is linked to angiosperms, especially to grass [40]. The extraordinary diversity of aphids found today, affecting specially the subfamily Aphididae, started during the Tertiary (Miocene), as a consequence of the proliferation of herbaceous angiosperms [41,42]. The phylogenetic position of the subfamily Lachninae within the Aphididae is controversial. Traditionally, phylogenies based on both morphological characters [11,16] and on mitochondrial rDNA [3] have placed them as a monophyletic group clustering with the Aphidinae. However, phylogenies based on sequences from both nuclear and mitochondrial aphid genes (long-wavelength opsin gene, the elongation factor 1α gene, and mitochondrial genes encoding ATPase 6 subunit and the subunit II of the cytochrome oxidase), as well as those based on their primary endosymbiont B. aphidicola (16S rDNA and the β subunit of the F-ATPase complex) [17][18][19] place them as a basal group apart from the Aphidinae. This fact has implications about those aphids feeding on conifers (such as most members of the subfamily Lachninae, including C. cedri) being regarded as ancestral to groups feeding on angiosperms or, alternatively, as more recent secondarily derived conifer suckers.
Our phylogenetic analysis supports the presence of one clade clustering B. aphidicola BBp and BCc, and another clade consisting of B. aphidicola BAp and BSg. This result is consistent with a panorama of a rapid evolutionary radiation of the main subfamilies of aphids, during the early Cretaceous (144-100 MYA), which seems concordant with previous proposals [3]. In addition, our evolutionary molecular data from B. aphidicola point out that aphids belonging to subfamilies Eriosomatinae and Lachninae share a common ancestor more closely related than compared to the members of subfamily Aphidinae. If true, our data refute the traditional phylogenetic reconstructions that placed Aphidinae and Lachninae as a monophyletic group [11]. However, we do not have evidence to conclude whether, within the subfamily Lachninae, tribes feeding on conifers are ancestral or more recent than those living on herbaceous angiosperms, since our analysis does not resolve which strain (and thus which host aphid) is basal compared to the others.  [14,28], where mutations with amino acid replacement are not efficiently eliminated by a relaxed purifying selection, leading to a greater accumulation of amino acid changes than in free-living bacteria. These nonsynonymous substitutions end up in fixation by genetic drift, due to the mode of transmission and the population dynamics of B. aphidicola. This acceleration of evolutionary rates is particularly evident in B. aphidicola BCc, presumably because factors promoting the accumulation of nonsynonymous substitutions are more intense in this strain. One of those factors is the extreme reduction of the repair machinery, barely able to counterbalance the accumulation of slightly deleterious mutations. In addition, there is a stronger effect of genetic drift that promotes the fixation of slightly deleterious mutations probably imposed by its coexistence within the aphid with a secondary symbiont, Serratia symbiotica, and its larger size compared to other B. aphidicola lineages [43]. A closer look at the particular genes that contribute to this acceleration observed in B. aphidicola BCc allows us to conclude that they are distributed among different functional categories, with none of them accumulating significant differences in the proportion of accelerated genes (as seen in Figure 3 and Table 2). This fact reveals that the process of gene degradation acts on any type of gene independently of their functional role. However, our results indicate that even if the accelerated genes are scattered homogeneously across all the functional categories in all B. aphidicola strains, genes of some functional categories, such as cellular envelope, are significantly more accelerated within all the lineages. That points to the ongoing action of selective constraints affecting nonsynonymous substitution rates. Regarding synonymous substitutions, when pairs of strains of B. aphidicola were compared based on the average number of synonymous substitutions per site (d S ), a greater accumulation was observed in the B. aphidicola BBp strain compared to bacteria from aphids of the subfamily Aphidinae (B. aphidicola BAp and BSg), while the smallest value is found between the B. aphidicola BAp and BSg strains [15]. However, if the temporary factor is considered, the rates of synonymous nucleotide substitutions per site and million years are greater in the endosymbionts from the Aphidinae (B. aphidicola BAp and BSg strains), registering the B. aphidicola BCc strain the smallest values. These results demonstrate that the synonymous substitution rate in B. aphidicola is a variable character, yet the explanation for   Figure 4: Topologies of the phylogenetic trees for the 21 genes that follow the hypothesis of molecular clock [15]. The trees were obtained by maximum likelihood, with the program PAUP 4.0b10. these divergent patterns is not obvious. As stated elsewhere [7,14,44], these differences can be attributed to differences in the host's life cycle, as well as ecological factors such as host-alternation and variations in the effective population size showed by the two members of the Aphidinae subfamily compared to the other two aphid lineages. Additionally a differential mutation rate per generation cannot be ruled out. For example, endosymbionts from aphids with short generation times can accumulate more synonymous mutations per million years (case of the Aphidinae) than those with longer generation times, such as the Eriosomatinae and the Lachninae. Future studies are required to understand the evolutionary processes driving these patterns.