Complete Chloroplast Genomes of 14 Mangroves: Phylogenetic and Comparative Genomic Analyses

Mangroves are a group of plant species that occupy the coastal intertidal zone and are major components of this ecologically important ecosystem. Mangroves belong to about twenty diverse families. Here, we sequenced and assembled chloroplast genomes of 14 mangrove species from eight families spanning five rosid orders and one asterid order: Fabales (Pongamia pinnata), Lamiales (Avicennia marina), Malpighiales (Excoecaria agallocha, Bruguiera sexangula, Kandelia obovata, Rhizophora stylosa, and Ceriops tagal), Malvales (Hibiscus tiliaceus, Heritiera littoralis, and Thespesia populnea), Myrtales (Laguncularia racemosa, Sonneratia ovata, and Pemphis acidula), and Sapindales (Xylocarpus moluccensis). These chloroplast genomes range from 149 kb to 168 kb in length. A conserved structure of two inverted repeats (IRa and IRb, ~25.8 kb), one large single-copy region (LSC, ~89.0 kb), and one short single-copy region (SSC, ~18.9 kb) as well as ~130 genes (85 protein-coding, 37 tRNAs, and 8 rRNAs) was observed. We found the lowest divergence in the IR regions among the four regions. We also identified simple sequence repeats (SSRs), which were found to be variable in numbers. Most chloroplast genes are highly conserved, with only four genes under positive selection or relaxed pressure. Combined with publicly available chloroplast genomes, we carried out phylogenetic analysis and confirmed the previously reported phylogeny within rosids, including the positioning of obscure families in Malpighiales. Our study reports 14 mangrove chloroplast genomes and illustrates their genome features and evolution.


Introduction
Mangroves grow on the intertidal zone of the ocean, the transition zone connecting the land and ocean. Mangrove ecosystems provide essential habitats for marine creatures and benthic organisms and play important roles in regulating energy cycle and maintaining biodiversity [1,2]. According to their habitats, root morphology, and salt metabolism patterns, mangroves are generally categorized into true man-groves and mangrove associates (or semi-mangroves) [3]. The true mangroves exclusively live in mangrove ecosystems and usually have distinct marine environment adaptations, including the ability to grow in seawater, complex root structures (allowing enhanced nutrient absorption and respiratory metabolism), and viviparous reproduction (seeds germinating on trees) [4]. Semi-mangroves are amphibious, and many can inhabit both terrestrial and aquatic environments (for instance, Pongamia pinnata (L.) Pierre). In mangrove ecosystems, they may grow at the edge of the true mangroves and are often dominant species on degraded beaches.
There are more than 80 mangrove species, covering approximately twenty families [5,6]. Due to their ecological importance, wide distribution, and unique biological features for adaptation, the genome features and genome evolution of these species would be of considerable interest yet remain largely unexplored. As an essential organelle of plants, the chloroplast has an independent genome with stable sequence structure and a relatively conserved number of genes associated with energy production and metabolism. Chloroplast genes such as rbcL and psbA were once evidenced to be resultful in inferring the evolutionary origins and phylogenetic relationship of mangroves species from different clades or geographical regions [7][8][9]. DNA barcodes of rbcL, matK, and trnH-psbA genes have also been used to identify unknown mangrove species [6]. However, whole chloroplast genomes of mangrove species were limited until now [10]. Detailed whole chloroplast genome comparison and phylogenetic analysis has to date been lacking. In order to acquire more mangrove genetic resources and determine the evolutionary location of mangroves in rosids, we sequenced and assembled the complete chloroplast genomes of 14 mangrove species, including Pongamia pinnata (L.) Pierre, Avicennia marina (Forssk.) Vierh., Excoecaria agallocha L., Bruguiera sexangula (Lour.) Poir., Kandelia obovata Sheue, Liu & Yong, Rhizophora stylosa Griff., Ceriops tagal (Perr.) C.B.Rob., Hibiscus tiliaceus L., Heritiera littoralis Dryand., Thespesia populnea (L.) Sol. ex Correa, Laguncularia racemosa (L.) C.F. Gaertn., Sonneratia ovata Backer, Pemphis acidula Forst., and Xylocarpus moluccensis (Lamk.) Roem.. They represent mangroves of eight families, five rosid orders, and one asterid order (as an outgroup for the phylogenetic analysis). We examined their genome structures and gene contents. Comparative genomics and molecular evolution analyses were performed to illustrate mangrove chloroplast genome features further and reveal relationships among mangrove species.

Materials and Methods
2.1. Sequencing, Chloroplast Genome Assembly, and Annotation. Fresh leaves of mangroves were provided by collaborators in Guangzhou, China. DNA were extracted according to a CTAB method and then sequenced on a BGISEQ-500 platform. After sequencing, we randomly extracted five million pair-end reads. We used the MITObim v1.9 [11] for the initial assembly, following a closest reference-based strategy. The size of each chloroplast genome was estimated by SPAdes v3.13.0 [12]. With the initial assembly and the estimated genome size, we applied NOVOPlasty v2.7.2 [13] to assemble the complete chloroplast genome. Finally, we carried out manual curation to obtain circular sequences.
3.2. Simple Sequence Repeat Content. Simple sequence repeats (SSRs) are tandem repeats (1~6 bp units repeated multiple times) in the genome which have been widely applied as markers for population studies and crop improvements [30][31][32][33]. In this study, we detected and compared SSRs in the mangrove and 57 terrestrial plant chloroplast genomes (Table S2). The SSR contents are highly variable among species ( Figure S2). Of the 14 mangroves, Kandelia obovate has the highest number of SSRs (194), while Avicennia marina has the fewest (61) ( Table 3). Comparing between orders, the Malpighiales (number of SSRs ranges from 133 to 194; Excoecaria agallocha, Bruguiera sexangula, Kandelia obovata, Ceriops tagal, and Rhizophora stylosa) has more SSRs than species of orders Malvales (number of SSRs ranges from 80 to 110; Hibiscus tiliaceus, Heritiera littoralis, and Thespesia populnea) and Myrtales (number of SSRs ranges from 88 to 118; Laguncularia racemosa, Sonneratia ovata, and Pemphis acidula) ( Table 3). Assessing SSR categories in the 14 mangrove chloroplast genomes, we found that the mononucleotide type accounted for at least half of the total SSRs (in Laguncularia racemosa and Avicennia marina up to 80%). A/T tandem repeats are most frequent, followed by dinucleotide, tetranucleotide, trinucleotide, pentanucleotide, and hexanucleotide repeats. Similar patterns of SSR variability and constitution were also observed in the 57 terrestrial plant chloroplasts ( Figure S2). We propose that the SSRs identified here can serve as useful genetic resources for future population and evolution studies.
3.3. Phylogenetic Relationships of Mangroves. Similar to mitochondrial genomes used in vertebrate genetics, chloroplast genomes are widely used to settle phylogenetic and evolutionary disputes [28]. In order to reveal the phylogenetic relationships of mangroves, we constructed phylogenetic trees from chloroplast genomes from the 14 mangroves species and 57 terrestrial plant families in 16 rosid orders and one asterid order (a data set of 71 plant species) (Table S2). Based on these complete chloroplast genomes, we identified 44 highly conserved genes in the 71 species and constructed  Figure 1 and Figures S3-S7). Three trees constructed using BI and ML strategies exhibited the same topology ( Figure 1, Figure S3, Figure S5, and Table S3). Thus, we have produced a wellsupported phylogenetic tree of mangrove and terrestrial plants.
Based on our phylogenetic tree, we found species from the same order to be in one group, and Rosid I (Malpighiales, Oxalidales, Celastrales, Fagales, Cucurbitales, Fabales, Rosales, and Zygophyllales), Rosid II (Malvales, Brassicales, Huerteales, Sapindales, Myrtales, and Geraniales), and a clade of rosids (Saxifragales and Vitales) were classified. For mangrove species, Avicennia marina is close to the other asterid terrestrial plant Echinacanthus lofouensis. Using these two species as outgroups, we obtained a clear phylogenetic relationship of the rest 13 rosid mangrove species. Myrtales is close to Geraniales in this tree and contains five families, of which Myrtaceae and Melastomataceae are in one clade,     [35]. For Malvales, three mangroves (Hibiscus tiliaceus, Heritiera littoralis, and Thespesia populnea) together with Huerteales and Brassicales species are clustered as neighboring orders. The relationship of genera within the family of Malvaceae is in agreement with trees in Aquilaria sinensis [36] and Heritiera angustata [37] chloroplast studies, and we further confirmed that the semi-mangrove Thespesia populnea is close to Gossypium species. However, for Malpighiales, an order of high morphological and ecological diversities, the phylogenetic relationship of different families especially Linaceae was less resolved. Other than the grouping of families Linaceae and   Euphorbiaceae, our phylogenetic tree is consistent with a previous study which employed 82 plastid genes of 58 species from Malpighiales [38]. We found that Euphorbiaceae constitutes a single branch, while Rhizophoraceae (including four mangroves Bruguiera sexangula, Kandelia obovata, Rhizophora stylosa, and Ceriops tagal) is a neighboring branch to Erythroxylaceae and Clusiaceae. Also, Linaceae forms a sister lineage with Chrysobalanaceae and Malpighiaceae. Again, these relationships are largely accordant with a study of Linum plastome [39]. Finally, our phylogenetic tree supports a sister relationship between the mangrove Pongamia pinnata with the other orders of Rosales, Cucurbitales, and Fagales.

Synteny and Divergence of the Chloroplast Genomes.
We next analyzed the synteny and divergence between the mangrove and related chloroplast genomes. For mangrove species, the genomes have a conserved gene order similar to sister clades, except Heritiera littoralis which probably had been subjected to segmental rearrangements (Figures 2 and  3). Compared to its closely related species Hibiscus tiliaceus and Thespesia populnea, we found a notable rearrangement at position 8,109 bp to 33,498 bp in the Heritiera littoralis chloroplast genome. This region encodes 16 genes, including trnC-GCA, petN, psbM, trnD-GUC, trnY-GUA, trnE-UUC, rpoB, rpoC1, rpoC2, rps2, atpI, atpH, atpF, atpA, trnR-UCU, and trnS-CGA. Assessing the genetic divergence of mangrove chloroplast genomes by the most closely related species and within the orders (see Materials and Methods), we found the lowest divergence in the genera Heritiera and Xylocarpus (Figure 3). Compared to Heritiera and Xylocarpus, there is a relatively higher divergence between Hibiscus rosa-sinensis and Hibiscus tiliaceus, reflecting a higher level of genetic polymorphism of chloroplast genomes within genus Hibiscus.

Conclusions
Our study reports 14 complete mangrove chloroplast genomes, as well as a comprehensive comparative chloroplast genome analysis of mangrove and related plant species. The sequenced mangroves span six orders (five rosids and one asterid), making it the first large-scale study on mangrove chloroplast genomes. We found that mangrove chloroplast genomes are similar in structure and gene content. Notable exceptions include the retainment of the translation initiation factor gene infA in two mangrove species (the asterid Avicennia marina and the rosid Heritiera littoralis) and an inversion in the LSC region of mangrove Heritiera littoralis. We used our new mangrove genomes to create a wellsupported phylogeny. Protein-coding genes of mangroves were found to be under pressure to maintain gene function, with only a small number of genes in a handful of species showing evidence of positive or relaxed selection. In conclusion, we report 14 complete chloroplast genomes from diverse mangrove species and analyzed their phylogeny and genome features. This study provides a useful resource for future studies on the evolution of mangroves and environmental adaptation.

Data Availability
The 14 assembled chloroplast genome sequences along with the annotation can be found in CNSA (https://db.cngb.org/cnsa/) of CNGBdb under the accession number CNP0000567.

Disclosure
This manuscript has been released as a preprint in bioRxiv [49].

Conflicts of Interest
The authors declare no conflict of interest. Figure S1: the whole chloroplast genomes of 14 mangroves. The inner circle marks the LSC, SSC, and IR regions. Genes' position and orientations are shown along the outer circle. Genes with different functions are colored. Figure S2: the SSR distribution in the 14 mangroves and 57 terrestrial plant chloroplast genomes. (A) Number of different SSR types in each species. Mangroves are marked in red. (B) The SSR numbers in 17 orders (dots for each species and bars for average number). Figure S3: the ML phylogenetic tree based on whole chloroplast genes of 14 mangroves and 57 land species. Figure S4: the BI phylogenetic tree based on whole chloroplast genes of 14 mangroves and 57 land species using partition model. Figure S5: the ML phylogenetic tree based on whole chloroplast genes of 14 mangroves and 57 land species using partition model. Figure S6: the BI phylogenetic tree based on four conserved genes (ndhF, matK, rbcL, and atpB) of 14 mangroves and 57 land species. Figure S7: the ML phylogenetic tree based on four conserved genes (ndhF, matK, rbcL, and atpB) of 14 mangroves and 57 land species. Figure  S8: the genomic comparison and similarities of whole chloroplast sequences among mangroves and their related species within orders Lamiales, Fabales, Malpighiales, Malvales, Myrtales, and Sapindales. CNS: conserved noncoding sequence. Figure S9: the Ka/Ks values of chloroplast protein-coding genes in species from Oxalidales, Celastrales, Fagales, Cucurbitales, and Rosales (Rosid I clades). Figure  S10: the Ka/Ks values of chloroplast protein-coding genes in species from Brassicales (Rosid II), Huerteales (Rosid II), Geraniales, and Saxifragales. Table S1: GC content in the 14 mangrove chloroplast genomes. Table S2: the 57 public terrestrial species used for genomic comparison analyses (SSR, phylogeny, and evolution).