On the Coevolution of Transposable Elements and Plant Genomes

Plant genomes are unique in an intriguing feature: the range of their size variation is unprecedented among living organisms. Although polyploidization contributes to this variability, transposable elements (TEs) seem to play the pivotal role. TEs, often considered intragenomic parasites, not only affect the genome size of the host, but also interact with other genes, disrupting and creating new functions and regulatory networks. Coevolution of plant genomes and TEs has led to tight regulation of TE activity, and growing evidence suggests their relationship became mutualistic. Although the expansions of TEs represent certain costs for the host genomes, they may also bring profits for populations, helping to overcome challenging environmental (biotic/abiotic stress) or genomic (hybridization and allopolyploidization) conditions. In this paper, we discuss the possibility that the possession of inducible TEs may provide a selective advantage for various plant populations.


Transpositional Strategies, Distribution, and Regulation of TEs
Transposable elements (TEs) comprise a palette of immensely diverse DNA structures that can be unified by the following definition: they all are (or have been) able to insert themselves (or new copies of themselves) into new locations within genome. According to their mechanism of transposition, TEs can be classified [1] into class I elements (retroelements) transposing through an RNA intermediate, and class II elements (DNA transposons) moving only via DNA. The major superfamilies of class I are Ty1-copia and Ty3-gypsy retrotransposons, while class II is represented by TIR (terminal inverted repeat) elements and Helitrons, which are sometimes classified separately [2]. Among both retrotransposons and transposons, nonautonomous forms (e.g., MITEs, SINEs, and LARDs) are quite prevalent [3], utilizing the transpositional machinery of autonomous TEs. A significant portion of plant genomes is constituted by class I elements (specifically LTR retrotransposons, with direct long terminal repeats at both ends), which replicate in a "copy-and-paste" manner. In brief (according to [3,4]), the genomic DNA copy of a retroelement is transcribed into mRNA that enters the cytosol, similarly to standard DNA transcripts. The information of the mRNA is translated, typically creating a structural protein GAG and a polyprotein POL. These protein products associate with other retroelement mRNA copies and pack them into viruslike particles. Within these structures, dimerized mRNA copies are reversely transcribed into cDNA and the whole complex enters the nucleus, where the new cDNA copy integrates at a new site. This mode of transposition, if not suppressed, allows retroelements to massively increase their copy numbers, resulting in a rapid expansion of genome size.
Compared to retrotransposons, class II elements are anticipated to have a smaller potential of increasing their copy number. Owing to their "cut-and-paste" insertional mechanism mediated by transposases, multiplication arises only when a transposon from a recently replicated genomic region is transposed to a region about to undergo replication [5] or in cases when the excision site is repaired by gene conversion, using the sister chromatid as a template [4]. Nevertheless, short DNA transposons like MITEs (miniature inverted-repeat TEs) can be remarkably effective in increasing their copy numbers each generation [6,7]. In the genome of Lotus japonicus, the copy number of detected MITEs is similar to, or higher than, the copy numbers of major LTR retrotransposon superfamilies [8], and the overall 2 Journal of Botany contribution to the genome size is smaller only because of the short length of MITEs (200-500 bp).
For most of the TEs in plant genomes, a certain balance between TE proliferation and minimal damage to the host has evolved. This balance is widely achieved by epigenetic silencing, an important feature of which is its reversibility [9]. Epigenetic suppression of TEs are realized on different levels-from general transcriptional control by TEunspecific histone modification and siRNA-directed DNA methylation [10][11][12], to posttranslational processes, like the species-specific control of nuclear localization of transposase [13,14]. Silencing of TEs can be immensely effective. For example, LTR retrotransposons constitute a large part of the Gossypium genome, but almost no transcripts were found in the Gossypium EST database [15]. Interestingly, a variety of LINE-like transcripts have been found in the same EST libraries, suggesting different levels of suppression for particular TE classes. But attenuation of TEs is not irreversible. Disruption of epigenetic silencing patterns and consequent derepression and proliferation of TEs are thought to be associated with two natural phenomena-interspecific hybridization [16][17][18] and biotic/abiotic stress [19][20][21][22][23].
TEs are not evenly distributed along chromosomes. DNA transposons are overrepresented in gene-rich or euchromatic regions [7,24], avoiding exons [7], while class I elements concentrate in gene-poor, heterochromatic regions around centromeres [24][25][26][27][28]. Exceptions to this general pattern are, for example, copia retroelements of maize overrepresented in euchromatin [29], or the FIDEL retrotransposon absent from heterochromatin in Arachis [30]. Authors frequently suppose the existence of yet unknown mechanisms of regionspecific TE targeting. However, the uneven distributions of class I and class II elements across the genomes might be satisfactorily explained by probabilistic principles (as follows), without the need to hypothesize any targeting mechanism.
DNA transposons in heterochromatic regions may be less likely to be transposed because of the DNA topology inaccessible for transposases. On the contrary, DNA transposons in euchromatic or genic regions are being transposed frequently, and owing to the transposition strategy, the excised element is prone to reinsert at a nearby genomic location, which is likely to be euchromatic too (the shorter the distance between the original and novel position, the higher the probability that the two loci are in the same condition). The passive copies in the heterochromatin are eventually removed from the genome while only the active copies have chances to multiply, and thus colonize the euchromatin. Transpositions into exons are mostly filtered out by natural selection, so the majority of DNA transposons is observed in introns, or 5 and 3 adjacent regions. The less severe consequences of short TE insertions for splicing might be the reason why MITEs predominate in introns over other elements.
The situation is different for LTR retrotransposons. New copies of class I elements are generated outside the nucleus-in the cytosol, thus the probability for a new copy to reintegrate in the proximity of the maternal element is extremely low. Without any targeting mechanism, retroelements integrate randomly across the whole genome, and the observed uneven distribution can be attributed to the following factors: (i) the majority of transpositions within or near genes is filtered out by selection because of the usually deleterious effects; and (ii) LTR retrotransposons (or TEs in general) are removed more efficiently from highlyrecombining regions because the mechanisms of removal are recombination dependent [31][32][33]. This hypothesis is supported by the finding that Sorghum LTR retrotransposons younger than 10,000 years appear to be randomly distributed along chromosomes [34]. Distribution patterns of TEs are therefore likely to be a function of the transpositional strategy and age of the individual TE family, affected by methylation [35] and some genomic particularities of the host species (e.g., gene density and recombination landscapes).  [36] (of course, the latter does not hold true for polyploids). The correlation between the proportion of TEs-specifically LTR retrotransposons-and the physical length of the genome is so evident in the examined plant species [36,37] that the genome size can be generally regarded as a linear function of TE content, and the dynamics of LTR retrotransposons as the major contributor to 1C-value differences among plants.

Effects of Transposable Elements on
To answer the question what is the "typical" TE contribution to the length of plant genome, one needs some factual idea of a "typical" plant 1C-value. For this purpose, Tenaillon et al. [37] use an arithmetic mean of the 1C-values provided by Bennett and Leitch [38]. Their Plant DNA Cvalue database currently comprises 6,287 and 204 entries for angiosperms and gymnosperms, respectively; mostly based on Feulgen microdensitometry and flow cytometry data. Although this dataset probably cannot be considered a representative sample of all land plants (e.g., monocots being overrepresented), as the most comprehensive one it still provides informative insights into plant genome size variation. The average genome size (1C) of angiosperms (flowering plants) and gymnosperms sampled in this database is 5.809 Gbp and 18.157 Gbp, respectively. However, the distribution of 1C-values, especially in angiosperms, shows strong positive skewness ( Figure 1), with the data unevenly distributed around the mean (72.3% of the examined angiosperm species having smaller genomes than the mean value of the dataset). For such distributions, the median is a better indication of central tendency than the arithmetic mean.
The median values of the angiosperm and gymnosperm genome size data sampled in the Plant DNA C-value database are 2.401 Gbp (1.416 Gbp for eudicots; 5.746 Gbp for monocots) and 17.506 Gbp, respectively. In other words, half of the examined flowering plants has genome sizes below 2.4 Gbp. Surprisingly, the interval 0.4-0.6 Gbp represents here the modal category for angiosperm 1C-values, comprising 8.38% of examined species (Figure 1(b)). Hence, Oryza sativa, Lotus japonicus, Medicago truncaluta, Vitis Table 1: Known TE proportions in plant genomes when various authors provide different values, the range is given. Inconsistent estimates of TE content result from incomplete genomic representation and/or varying bioinformatical approaches. Examination of complete genomes and application of the same TE discovery pipeline is therefore essential for comparative analyses on intra-and interspecies level. No data on TE content is available for gymnosperms, and angiosperms with large genomes (>2.4 Gbp) are underrepresented. In most genomes, Helitrons were not surveyed appropriately.

Species
Predominant fertilization S-self-pollination; C-cross-pollination; ns-not specified; * only full-length LTR retrotransposons analysed; * * only TEs with homology to known repeat elements considered.
vinifera, Brasica oleracea, and Populus trichocarpa have "typical" genome lengths (Table 1), therefore it can be implied that transposable elements typically constitute one-third of angiosperm genomes. Genome size changes are not unidirectional. TE fragments and solo LTRs have been found to constitute a significant fraction of repetitive elements (e.g., [39][40][41][42][43]), and are believed to be remnants of removed TEs [31]. There is evidence that transpositional bursts can be followed by DNA loss [44], and it has been reported that the removal of LTR retrotransposons can proceed with different efficiency in distinct species [33,40,42]. However, it should be noted that the estimations of a retroelements' half-life assume constant removal rates for repetitive sequences, and rely on the  Genome size (Gbp) molecular clock principle; therefore, the revealed interspecies differences in TE survival should be interpreted cautiously.
Disregarding the mutational effects of TEs at this point, a dramatic increase of noncoding or repetitive fraction in the genome theoretically raises the nutritional and time requirements for DNA replication and maintenance in each cell (leading to putative costs formulated as the "large genome constraint hypothesis," [45]). The higher nutritional and time demands may lead to decreased fecundity and prolonged generation time, the two main constituents affecting selective advantage [46]. While such changes are apparently disfavouring in populations of unicellular organisms, the growing genome size does not seem to impose an evident and unambiguous selective disadvantage in spermatophyta (seed plants) [45,47].
Offering one intriguing example: phosphorus (P) is known to be the limiting nutrient in most soils (reviewed in [48]). As it is one of the basic components of nucleic acids, plants with significantly larger genomes have higher P demands for DNA synthesis, leaving lesser amounts for other essential cell processes (e.g., ATP or phospholipid dependent). Therefore, it seems natural to anticipate that plants with large C-values should loose the ecological competition against the plants whose genomes are severalfold smaller. Despite that, genome sizes in examined land plants range 2056-fold [49], with Genlisea margaretae (1C = 63.4 Mbp = 0.0648 pg) [50] and Trillium hagae (1C = 132.5 pg) [51] on the opposite poles. Relatively large, almost 22-fold differences in 1C-values have also been reported for members of a single genus (Eleocharis) at the same ploidy level [52].
If we reject the unlikely possibility that such enormous genome size variation in plants is attributable to stochastic processes only, the question arises as to why some plants maintain relatively small genomes (Figure 1) while others sink into genomic obesity [53]. In theory, there are two possible causes that may allow transposable elements, and thus genomes to expand extensively: (i) deficiency in the mechanisms of suppression and/or removal of TEs from the genome and (ii) selective advantage that favours individuals and/or populations with high TE activity.
In relation to the former possibility (i), Weil and Martienssen [9] compare the interaction of TEs and host genomes to resistance to pathogens, and hypothesize that as transposons evolve ways around host silencing, host organisms evolve new genes for silencing, perhaps through duplication and subfunctionalization. This idea is supported by the discovery of positive selection acting on some LTR retrotransposons in the rice genome [54]. On rare occasions, mutant variants of retroelements can escape host recognition and rapidly amplify, leading to what is commonly observed as bursts of amplification. Consequently, the detected variability in TE activity in time and taxa can be attributed to different phases of the "host-parasite" interaction. The occasional escape of TEs from the host suppression via random mutations and positive selection seems plausible; however, such events should exhibit roughly the same periodicity in all genomes, assuming similar substitutional rates of repetitive DNA among species. Therefore, this "red queen race" in itself can hardly explain the enormous differences in plant genome sizes. Moreover, the host-pathogen analogy also implies that the species with below-average performance in TE Journal of Botany 5 suppression are evolutionary disadvantaged to those species who have successfully prevented transpositional bursts of parasitic retroelements. Comparing the plant species with small and large genomes, we lack any direct evidence for such generalization.
The above-mentioned conflicts direct the attention to the latter alternative (ii), which suggests plant genome size variation to be caused by a differential selective advantage of TE possession acting in distinct species or populations (i.e., the presence and activity of transposable elements might be beneficial in some species/populations while detrimental in others). Testability of this hypothesis depends on identifying the nature of such advantage; therefore, other effects of TEs on plant genomes need to be carefully considered (see below).

2.2.
Mutability. The discovery of transposable elements was accompanied by the observation of their mutability [55], which unlike the growing genome size, can often have a phenotypic manifestation. Transpositions into the coding regions of genes are usually deleterious; however, those transpositional events that passed through the sieve of selection can induce a variety of genetic changes, including interrupting host genes, creating different expression forms, changing intron length, and affecting expression levels of adjacent genes [56]. Downregulation of genes may be caused not only by TE insertions disrupting the promoters, but more likely by siRNA-guided DNA methylation, which is primarily directed to suppress the TE activity but affects the expression of nearby genes too [35]. Whole-genome differences in TE-siRNA interactions have such dramatic effects on expressional patterns that they may contribute to speciation [12]. Among recently reported TE-induced mutations are cluster-shaped somatic variation in grapevine caused by the insertion of Hatvine1-rrm DNA transposon in the VvTFL1A gene promoter [57]; flower color gene mutation caused by TgmExpress1 transposition into the intron 2 of F3H gene in soybean [58]; or transposoninduced DNA methylation of CmWIP1 promoter leading to sex determination in melon [59].
Comprehensive and genome-wide analyses of TE mutability have been accomplished on the fully sequenced genome of rice. According to [39], while LTR retrotransposons constitute ∼17% of the rice genome, 22% of these sequences lie within putative or established rice genes. Within the genic regions, fragmented elements have been predominantly identified, and full-length elements are rare [39]. Available genomic sequences of the two rice subspecies-japonica and indica-provide a powerful resource for comparative and functional genomic analyses, which has been utilized by Huang et al. [56] to study transposon insertion polymorphisms (TIPs). Interestingly, more than 10% of TIPs between Nipponbare and 93-11 rice cultivars were located in expressed gene regions. Roughly half of those TIPs occurred in introns, often resulting in alternative splicing, and more than a third were found in [−1, −250] regions, relative to the transcription start site. Effects of TE insertions within the promoters are particularly impressive in the case of two genes, causing 18-fold upregulation and 23-fold downregulation of gene expression [56]. In another study [7], high-throughput sequencing was utilized to determine 1,664 insertion sites of mPing transposon in a population of 24 rice plants. Subsequent comparative microarray analysis concluded that the vast majority of TE insertions either have no impact, or preferentially enhance transcription under normal conditions. However, seven out of ten loci, unaffected by mPing insertion under normal conditions, were inducible by salt and cold stress. Scanning mPing sequences for cisacting plant regulatory elements resulted in identification of 96 putative regulatory motifs, one-third of which were stress responsive. These experiments demonstrate that the mPing transposon, resembling a mobile gene enhancer, provides new binding sites for transcription factors or other regulatory proteins, and may actually benefit the host by creating potentially useful allelic variants and novel, stressinducible regulatory networks [7].
Among the most intriguing features of transposable elements is the ability of certain classes to capture gene fragments. The potential to contribute to gene evolution by combining genes, exons, and introns into novel functional units is most apparent in Helitrons. Although these elements were initially challenging to identify due to the absence of typical TE structural features [60,61], an effective structurebased program has been developed recently, leading to the detection of thousands Helitrons in several plant genomes [61]. The frequency of the gene capture is particularly striking in the genome of maize [62,63]. For example, Morgante et al. [62] randomly selected nine genic insertions polymorphic in maize inbred lines and demonstrated that eight of them are nonautonomous Helitrons, each containing between one and seven different fragments of host genes. Yang and Bennetzen [63] have shown on a genome-wide scale that 60% of maize Helitrons contains captured fragments of nuclear genes, 4% of which are under purifying selection, and another 4% exhibiting apparent adaptive selection, which suggests beneficial effects for the host or Helitron transposition/retention. Although the vast majority of the genes captured by Helitrons are incomplete, defective copies of conserved functional genes (including exons and also introns), a fraction of those gene fragments may serve as the template for interfering RNAs or new gene functions via exon shuffling, expanding the repertoire of mutable changes provided by TEs.
The last significant mutable effect of TEs to be mentioned is the macrotransposition, that is, a transposition involving two physically close, interacting elements, and an intervening chromosomal segment. Such transposon pairs may produce other complex rearrangements, including deletions, inversions, and reshuffling of the intertransposon segment [64], thus macrotransposition can be another contributor to genome divergence and speciation.

From Individuals to Populations: From Junk to Treasure
In the study of coevolution of TEs and plant genomes, two different populations affected by inconsistent selection forces should be recognized-the population of transposable elements within the genomic niche of an individual, and the population of diverse individuals within an ecological niche. In this section, the term population refers to the latter one.
Since the early work of Barbara McClintock, the presence and action of transposable elements in the host genomes has been studied from the perspective of an individual. Because the mutational consequences of TE activity on genes and their expression patterns (see above) are undirected by the host and principally random, it has become apparent that the uncontrolled TE transposition or expansion is, in the vast majority of cases neutral, detrimental, or lethal for an individual. The absence of evident benefits of the possession of transposable elements for an individual led to TEs being regarded as an archetype of selfish or parasitic DNA, whose only functional aim is to reproduce itself, regardless of the effect on the host genome. The large genomic regions occupied by ancient, suppressed, or still active transposable elements have acquired a label "junk DNA"-an unnecessary burden for cell and organism.
The individual-based viewpoint in the TE research was needed, because a fundamental description of TE diversity, prevalence, and modus operandi was required, and also understandable, because the technical possibilities to study TE dynamics on the population level were unavailable until recently. However, evolution acts on populations, not on individuals. Some recent studies have drawn the attention to population processes related to TEs [37,[65][66][67], and it is becoming clearer that to answer the questions about the origin, evolution, function, and importance of transposable elements in plant genomes, it is necessary to move the research focus from individuals to populations.
Environmental stresses are known initiators of TE activity [20][21][22][23]68], and diverse effects of transpositional events on the expression of adjacent genes have been reported [7,35,56,57,59,68]. Although the mutational impact of TE bursts is likely to be detrimental for an individual, TE activity creates new variability in the population, providing raw material for selection forces. An illustrative example of how effective the TEs can be in generating genetic diversity is provided by the activity of the mPing DNA transposon in some rice strains. With roughly 40 new transpositions of mPing per plant per generation, even small populations contain thousands of new insertions, a large portion of which upregulates genes in their vicinity under stress conditions [6,7].
Hence, TE activity may actually help the population to overcome changing environmental conditions and adapt to new ecological settings. Under diversifying selection, this ability of quick adaptations is likely to outbalance the costs of decreased fitness of some individuals, or the possible large genome constraint. From this perspective, the escape of TEs from the silenced state resembles more a regulated response to cope with stress on population level rather than an undesired side effect of stress exposure, an idea initially hypothesized by McClintock [55] as the "response to genomic shock." Possession of a mechanism that can boost the evolutionary changes and be switched on and off depending on the situation might be the decisive factor for the survival or extinction of a population in changing environments. It is suggestive to hypothesize that transposable elements might represent such a tool and were actually "invented," or at least modified by eukaryotes to fulfil this function. And if plants possess and use an autonomous mechanism to control TE proliferation, it means they also have a basic control over their own genome size.
The hypothesis of transposable elements as intrinsic tools for increasing genetic variability has some testable implications (i)-(iii). For example, (i) TE-driven stimulation of variability could be especially beneficial for species, populations, or genomic regions exposed to strong diversifying selection (e.g., host-pathogen systems). Such entities would be therefore expected to possess more transposable elements and their TE dynamics to be more responsive to stress conditions. Interestingly, Nielen et al. [30] have found LTR retrotransposon FIDEL associated with conserved Arachis genes less frequently than what was expected by chance, but its presence close to fast-evolving NBS genes (resistance gene analogues) was in agreement with random distribution.
(ii) Asexually reproducing organisms and self-pollinating plants, which lack the opportunity to recombine their genetic material, might profit from the enhanced diversity sustained by the TEs, as suggested for rice by Naito et al. [7]. However, the outcrosser Arabidopsis lyrata shows 2-3 times higher TE content than the selfer A. thaliana ( Table 1). The relationship between the mating system and TE dynamics for these two relatives has been studied [68,69], and among the possible causes are differences in effective population size and related stochastic processes, or more deleterious consequences of the accumulation of recessive mutations in self-pollinating plants. Self-pollinators might also be more efficient in suppressing TEs owing to more rapid fixation of epigenetic silencing patterns.
On the other hand, Bestor [70] assumes that the aggressiveness of transposons in self-fertilizing sexuals is self-limited comparing to outcrossing sexuals, where the transposon fixation is nearly certain provided that the coefficient of selection imposed by the transposon is less than 0.5 when there is one or more transposition events per generation. It implies that self-pollinators are not expected to have higher transposon content than related outcrossers, unless the TEs provide some net benefits to the host. While Triticeae seems to be an interesting tribe for such comparisons (Table 1), more comprehensive data on the TE content is required.
(iii) Taxa adapted to environmentally stable niches, such as ocean depths or high altitudes, would be expected to contain significantly less TEs comparing to the organisms from more unstable environments, which supposedly overcame multiple bursts of TE activity. An excellent opportunity to study these expectations is provided by three diploid sunflower species. Helianthus anomalus, H. deserticola, and H. paradoxus are independently derived via hybridization events between the same two parental taxa, H. annuus and H. petiolaris. The three hybrid taxa encountered a rapid, retrotransposon-mediated genome expansion [71,72], and all of them occupy habitats considered abiotically extreme relative to either parental species. H. anomalus and H. deserticola inhabit arid desert-like environments whereas H. paradoxus occurs exclusively in saline environments.
Interestingly, the scale of copy number increase for copia LTR retrotransposons differs considerably among the three sunflower hybrid species, with a 3.7-fold increase in copy number in the genome of H. paradoxus (relative to the average parental species value) versus a lower 1.7-fold and 2.2-fold increase for H. anomalus and H. deserticola, respectively [72]. The copy number increase of gypsy LTR retrotransposons is even more stunning, with 5.6-to 23.6fold multiplication in the hybrid taxa, compared to parental populations (which did not differ significantly) [71]. Ungerer et al. [71] suggest that hybridization, abiotic stress, or both may have been involved in this extensive retrotransposon proliferation. In either case, it is tempting to hypothesize that this is an example of TE proliferation being "switched on" by the host regulatory mechanisms (possibly coevolving with the TEs) as an action to elicit mutational consequences potentially helpful in adapting to new environments.

Conclusion
Genome-wide and population-based examinations of similar projections have the potential to illuminate the role of transposable elements in speciation, adaptation, and in the evolution of plant genomes, in general. Besides, a detailed knowledge of TE activity regulation is required to understand the coevolution of TEs and plant genomes, and the extent of consequential benefits utilizable by plants. During the past decades, the notion of TEs has varied from autotelic junk to valued tool of evolutional response. In the light of new evidence, the terms like "junk" or "selfish" DNA, and even "host genome" and "defence mechanisms for TE suppression" are becoming more misleading than ever. At present, comparative and functional genomic studies targeted on TE population dynamics and TE-cell interactions, supported by high-throughput technologies, are on the way to finalize this paradigm shift.