Transcriptome Comparison Reveals Distinct Selection Patterns in Domesticated and Wild Agave Species, the Important CAM Plants

Agave species are an important family of crassulacean acid metabolism (CAM) plants with remarkable tolerance to heat and drought stresses (Agave deserti) in arid regions and multiple agricultural applications, such as spirit (Agave tequilana) and fiber (Agave sisalana) production. The agave genomes are commonly too large to sequence, which has significantly restricted our understanding to the molecular basis of stress tolerance and economic traits in agaves. In this study, we collected three transcriptome databases for comparison to reveal the phylogenic relationships and evolution patterns of the three agave species. The results indicated the close but distinctly domesticated relations between A. tequilana and A. sisalana. Natural abiotic and biotic selections are very important factors that have contributed to distinct economic traits in agave domestication together with artificial selection. Besides, a series of candidate unigenes regulating fructan, fiber, and stress response-related traits were identified in A. tequilana, A. sisalana, and A. deserti, respectively. This study represents the first transcriptome comparison within domesticated and wild agaves, which would serve as a guidance for further studies on agave evolution, environmental adaptation, and improvement of economically important traits.


Introduction
Agave species assembled an important group of crassulacean acid metabolism (CAM) plants with remarkable tolerance to heat and drought stresses in arid regions [1]. CAM plants usually have a higher efficiency in water use than C3 and C4 plants [2]. For this reason, CAM plants brought a great chance to enhance sustainable production of food and bioenergy under the background of limited freshwater resources and global climate change [3]. As a traditional cultivated CAM plant, Agave tequilana has been used for the production of distilled spirit tequila for centuries [4]. Further, it also shows a great potential in bioenergy production [5]. Besides, Agave sisalana has also been widely cultivated as a cash crop for fiber production in tropical regions [6]. As a native wild plant in the Sonoran Desert regions of the Southwestern United States and Northwestern Mexico, Agave deserti has successfully survived in a severe environment within elevation ranges that experience both hot, dry summers and occasional freezing temperatures in winter [7,8]. This capacity of high tolerance to multiple stresses has important values to the improvement of the main food crops. These economic and stress-tolerant features have made agave a model CAM crop system for a hot and dry/droughty/xeric environment [9].
However, few reports have revealed the physiology and molecular basis of agaves, especially in A. sisalana. To date, A. tequilana-related studies mainly focused on fructan, with the obvious purpose of improving fructan production and application to generate bioenergy [5,10,11]. A. desertirelated studies were mainly conducted for its ecological and physiological adaptation to a severe environment, which is highly valuable for the improvement of the main food crops [12]. Besides, a series of saponin-related researches have been reported, while few reports were related to fiber in A. sisalana [13,14]. These three agave species are closely related species but with different remarkable biological features. A previous study has revealed the phylogenetic relations according to their trnL sequences [15]. The result showed that they were closely related species in spite of different origins for A. tequilana (Jalisco), A. deserti (the Sonoran Desert), and A. sisalana (Chiapas) [4,6,7]. Agaves have very large genomes, which has significantly restricted their genome assembly and limited our understanding to their evolution pattern [16]. In other crops, accessible genomes and genome-wide association analysis have revealed many economically important traits for crop improvement [17][18][19]. Recently, the development of next-generation sequencing has brought a new direction for gene-related studies without the restriction of genome data [20,21]. Furthermore, transcriptome comparison has also been conducted for evolution analysis and searching economically important traits in some genome unavailable crops [22]. In this study, we selected three transcriptome databases for comparison to reveal the phylogeny and evolution pattern of the three agave species [23,24]. Those genes related to species-specific traits would be also identified and evaluated for their importance in agronomy production and environmental adaptation of agaves. This study represents the first transcriptome comparison within domesticated and wild agaves, which would serve as a guidance for further studies on agave evolution, environmental adaptation, and improvement of economically important traits.

Materials and Methods
2.1. Phylogenetic Analysis. Phylogenetic analysis was conducted by MEGA 5.0 software with the minimum-evolution method [25]. The methods and parameters were according to the previous study [15] [15,24]. 2.4. In Silico Gene Expression Analysis. The expression pattern of positive selected unigenes was subjected to in silico gene expression analysis in agave leaves. The reads per kilobase per million mapped read (RPKM) value of these unigenes in A. deserti and A. tequilana were calculated by RSEM software according to the previous study [23,30]. The RPKM data was further normalized with two reference genes (tubulin and serine/threonine-protein phosphatase) in each agave species [31].

Selection Pressure Detection and Protein Structure
Modeling. To detect the selection pressure on positive selection unigenes, Ka/Ks ratios were calculated in sliding window (30 bp under a step size of 6 bp) by using DnaSP 5.0 [32]. Translated protein sequences of positive selection unigenes were used for structure modeling by Swiss-Model (https:// swissmodel.expasy.org/) [33].  selected the four chloroplast sequences (about 2490 bp) for phylogenetic analysis and found that A. deserti (GAHT01022741) was separated with the other 3 species (Figure 1(b)).

Sequence Comparison between Agave Transcriptomes.
A summary for the three databases was described in Table 1 (Figure 2(a)). Among these orthologous unigene pairs, more than 91% unigene pairs had an identity over 91% (Figure 2(b)). Furthermore, a total of 6130 unigene terms were obtained from the three agave transcriptomes. GO functional classification indicated that these genes were assigned to 30,405 functional terms. There were 14,915 terms in biological process (49.05%), 8546 in molecular function (28.11%), and 6944 in cellular component (22.84%) (Figure 3).

Identification of Genes Selected in the Domestication of
Agaves. The Ka, Ks, and Ka/Ks values were calculated for 6130 orthologous unigene pairs in A. tequilana and A. sisalana separately with A. deserti (Figure 4(a)). The correlation between Ka and Ks values was also estimated in A. tequilana (r = 0 515, P < 0 05) and A. sisalana (r = 0 206, P < 0 05) unigene pairs, respectively. 393 unigenes (6.5%) in A. tequilana and 262 unigenes (4.5%) in A. sisalana showed a Ka/Ks ratio higher than 1, while the Ka/Ks ratio of more than 90% unigenes was lower than 1 (Figure 4(b)). The significance of the Ka/Ks value for all 6130 unigene terms was analyzed, and the results indicated that 1117 unigenes were significantly selected at least in A. tequilana or A. sisalana (P value < 0.05) (Supplementary Table 1). These genes were further characterized into 15 classifications according to their annotations ( Table 2). Among these, 54 unigenes were subjected to positive selection (Ka/Ks > 1), while the residues of 1064 unigenes were subjected to purifying selection (Ka/Ks < 1). Furthermore, 27 and 19 unigenes were positively selected in A. tequilana and A. sisalana, respectively. 8 unigenes were positively selected in both agave species (Supplementary Table 1  We further characterized the 54 unigenes subjected to positive selection and found that three unigenes were annotated as disease resistance protein (Table 3). In A. tequilana, four unigenes were annotated as 6G-fructosyltransferase, d-2-hydroxyglutarate dehydrogenase, gluconokinase, and 6-phosphofructokinase, respectively, and they might be related to fructan production, while in A. sisalana, five unigenes were characterized as auxin-responsive protein IAA6, gibberellin receptor, PHD and RING finger protein, elongation of fatty acids protein, and zinc finger A20 and AN1 protein, respectively. These unigenes are probably related to the fiber development in A. sisalana. Besides, two unigenes, designated as RING-H2 finger protein and TOM1-like protein, were positively selected in both A. tequilana and A. sisalana.

In Silico Expression of Genes under Positive Selection in
Agave Species. The expression patterns of genes under positive selection were analyzed according to the three transcriptome databases. In agave leaves, the 27 and 8 unigenes were grouped into two expression modes ( Figure 5). However, the 19 unigenes in A. sisalana were not distinctly clustered into different expression modes. All these genes were differentially expressed in A. tequilana or A. sisalana when compared with A. deserti.

Selection Pressure and Structure Model of Putative
Economic Trait-Related Genes. The sliding window analysis was used to examine the selection pressure of putative economic Trait-related genes. The results indicated the existence of different selection pressures in A. tequilana and A. sisalana genes ( Figure 6). The three disease resistance genes all had a strong selection pressure in most sequence regions. Five unigenes showed a stronger selection pressure in A. sisalana (GAHT01109565, GAHT01002417, GAHT01054013, GAHT01038220, and GAHT01027892). Only GAHT01031288 showed a stronger selection pressure   Zinc finger A20 and AN1 protein in A. tequilana. The residue of five unigenes shared a strength-similar but region-different selection pressure. The structure modeling for the 14 unigenes was further conducted to analyze the difference of agave proteins. According to the results, four unigenes designated as 6Gfructosyltransferase (Figure 7(a)), d-2-hydroxyglutarate dehydrogenase (Figure 7(b)), gluconokinase (Figure 7(c)), and 6phosphofructokinase (Figure 7(d)), could match Swiss-Model sequences with identity > 30%, coverage > 75%, and appropriate description (Supplementary Table 2). Their structure models also showed significant differences in A. tequilana or A. sisalana. The unigene with a significantly distinct structure was differentially expressed when compared to their orthologous unigenes in other two species ( Figure 5).

Discussion
A previous phylogenetic analysis suggested that A. deserti was closely related to A. angustifolia [15]. However, we compared the recently published chloroplast sequence (GAHT01022741) with the former one (DQ500894 + DQ500928) in A. deserti and found them having totally different sequences. It might be caused by sample collection because the two studies were separately conducted in Mexico and America. Wild agave species are usually identified by morphological traits, which is not as reliable as molecular identification. Our phylogenetic results also indicated a different evolution relation by both short and long chloroplast sequences (Figure 1). This indicated the closer evolution    [6,7]. The phylogenetic result has also proved that, however, the two species actually possessed very different agronomic traits and applications. An important reason should be artificial selection even if the lowfrequency serendipitous backyard hybridization would lead to distinct domestication of crops. This is a historical inheritance in Mexico and has significantly enriched the genetic diversity of crops [34]. Agave species originate from Central America with high tolerance to drought and temperature, which makes them a main and important kind of plants there [9]. Therefore, they would inevitably confront a series of abiotic and biotic stresses. Actually, we identified 62, 127, and 77 unigenes with classifications of disease/defense, signal transduction, and transcription factor, respectively. Among them, 13 significant selected unigenes were related to disease resistance and ten of them subjected to purifying selection (Supplementary Table 1). This might be accompanied with the process of agronomic Trait-derived domestication. Many disease resistance genes were also found to be lost during domestication in other crops [17,22]. A previous study has already revealed a differentiated selection pressure on NBS-LRR genes in some agave species [35]. In this study, the sliding window analysis also showed a strong selection pressure with the three disease resistance unigenes ( Figure 6). Furthermore, two of the three unigenes (GAHT01070676 and GAHT01099649) were subjected to purifying selection in A. sisalana (Table 3), which might be harmful for growth and development. This might also be responsible for the susceptibility to zebra disease caused by Phytophthora nicotianae in A. sisalana [36].
It has been reported that several transcription factor families play an important role in abiotic stress regulation, such as bHLH, zinc finger, MYB, AP2, NAC, WRKY, and bZIP families [37][38][39][40][41][42]. We also found 47 TFs subjected to purifying selection either in A. tequilana or in A. sisalana, from the bHLH (8) Table 1). For agave species, drought and high temperature are the main abiotic stresses. We speculated that different habitats should be an important natural selection pressure that affected the shape and size of the three agave species [43]. The purifying selection of the 47 stress-related candidate TFs might weaken the drought tolerance of A. tequilana and A. sisalana but enhance their biomass accumulations. The complex regulation and interaction of these TFs might be the key to reveal the mechanism of the remarkable drought tolerance in A. deserti. Much more molecular characterizations are still needed in future studies.
It has been reported that fruit/seed-related traits were subjected to high artificial selection pressure for their economic value [18,19]. In A. tequilana, the most important economic trait focuses on fructan, and several studies have conducted functional characterization for fructosyltransferase genes [10,11,44]. In this study, we identified a fructanrelated unigene and it was subjected to positive selection in A. tequilana (Table 3). Besides, three carbohydrate-related unigenes were also subjected to positive selection. Their positive selection might be responsible for the improvement of fructan yield in A. tequilana. In A. sisalana, fiber is the main economic purpose for its cultivation in tropical areas. In the present study, we found 5 unigenes subjected to positive selection only in A. sisalana (Table 3). A previous publication has reviewed the hormonal regulation of secondary cell wall formation [45]. The zinc finger family TFs are also proved to regulate cell wall development and cellulose biosynthesis [46,47]. Besides, the elongation of fatty acids protein plays an important role in cell elongation [48]. Therefore, we speculated that the fiber-related traits in agaves are more likely controlled by hormonal and transcriptional regulation. And there are significant differences when compared with fructan-related traits, which might be mainly controlled by metabolic regulation in A. tequilana. Natural fiber is commonly generated as the result of secondary cell wall thickening in the main fiber crops such as cotton, ramie, flax, and , gluconokinase (c), and 6-phosphofructokinase (d) in three Agave species by using Swiss-Model [33]. The significant structure difference within species was marked in red dotted line.
hemp [49][50][51][52]. As a constitutive structure of plant cells, there are many housekeeping and regulating genes during cell wall development, especially the secondary cell wall development [45,[53][54][55]. In contrast, fructans are mainly responsible for carbohydrate storage to vegetative tissues in many plant species [56]. They have also been increasingly considered protective agents against abiotic stresses [57]. However, the capacity to produce and store fructans in A. tequilana is much stronger than that in A. deserti, even if there is a much more severe environment for A. deserti [23,44]. It is probably because fructan-related traits are regulated at the metabolic level. A more recent study has combined transcript, protein, and metabolite methods to reveal the molecular basis of the CAM process in agave [58]. The rapid development of high-throughput molecular methods has brought a great opportunity for the further understanding of the differences and evolution patterns among agave species.

Conclusion
This study represents the first transcriptome comparison within domesticated and wild agave species. The results revealed the importance of abiotic/biotic natural selection in agave evolution. Four unigenes related to fructan in A. tequilana and five unigenes related to fiber in A. sisalana were positively selected. These genes revealed the difference between A. tequilana and A. sisalana evolution, which would serve as a guidance for further studies on agave evolution, environmental adaptation, and improvement of economically important traits.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflict of interest.

Authors' Contributions
XH and KY conceived and designed the experiments. XH analyzed the data and drafted the manuscript. BW carried out the Ka/Ks analysis. JX, JG, HC, SZ, JZ, and KY contributed to the transcriptome data. YZ revised the manuscript. CH and WW helped in the expression analysis. YL helped in the selection pressure detection. All authors read and approved the final manuscript.