Phylogenetic Information Content of Copepoda Ribosomal DNA Repeat Units: ITS1 and ITS2 Impact

The utility of various regions of the ribosomal repeat unit for phylogenetic analysis was examined in 16 species representing four families, nine genera, and two orders of the subclass Copepoda (Crustacea). Fragments approximately 2000 bp in length containing the ribosomal DNA (rDNA) 18S and 28S gene fragments, the 5.8S gene, and the internal transcribed spacer regions I and II (ITS1 and ITS2) were amplified and analyzed. The DAMBE (Data Analysis in Molecular Biology and Evolution) software was used to analyze the saturation of nucleotide substitutions; this test revealed the suitability of both the 28S gene fragment and the ITS1/ITS2 rDNA regions for the reconstruction of phylogenetic trees. Distance (minimum evolution) and probabilistic (maximum likelihood, Bayesian) analyses of the data revealed that the 28S rDNA and the ITS1 and ITS2 regions are informative markers for inferring phylogenetic relationships among families of copepods and within the Cyclopidae family and associated genera. Split-graph analysis of concatenated ITS1/ITS2 rDNA regions of cyclopoid copepods suggested that the Mesocyclops, Thermocyclops, and Macrocyclops genera share complex evolutionary relationships. This study revealed that the ITS1 and ITS2 regions potentially represent different phylogenetic signals.


Introduction
Copepods are important components of zooplankton and the food chain in marine and freshwater ecosystems. The subclass Copepoda is believed to contain approximately 13,000 morphospecies; however, the actual number of species in this subclass might be much greater [1]. The majority of the freshwater copepod species belong to the order Cyclopoida, which includes the free-living species (approximately 800) in the family Cyclopidae. The other two free-living families (Oithonidae and Cyclopinidae) contain mainly marine species except for a few species in Oithonidae [2].
Molecular markers such as genomic DNA fragments are used for phylogenetic analyses to elucidate the evolutionary history of living organisms, and the region of genomic DNA analyzed is critical. Mitochondrial DNA fragments (genes encoding the cytochrome c oxidase subunit I (COI), 16S rRNA, and cytochrome b) [9-21, 39, 40, 49-52] and/or nuclear rDNA regions have been used for the phylogenetic analysis of cyclopoid copepods [22-37, 41, 53-58]. Mitochondrial DNA fragments might be less useful for the analysis of copepod phylogeny compared to the phylogeny of other taxa; furthermore, amplification of COI is difficult in some copepods [59][60][61]. However, these DNA fragments might be informative for analyses of population differentiation or cryptic speciation [9, 11-15, 38-40, 50, 52]. Therefore, comparison of nuclear rDNA regions might be informative for the phylogenetic analysis of copepods. In most eukaryotes, rRNA genes are located in a multigene family of genomic clusters of repeated sequences. Within these clusters, the 18S, 5.8S, and 28S rRNA genes are separated by internal transcribed spacers (ITS1 and ITS2) and an intergenic spacer [62] (Figure 1). Ribosomal DNA (rDNA) is a reliable and informative phylogenetic marker [63] that contains sequences with different rates of evolutionary variability. In most eukaryotes, the most evolutionarily conserved genes are the rRNA genes; comparison of their sequences allows estimation of the evolutionary distances at intergeneric and higher taxonomic levels. Comparison of the more evolutionarily variable spacer sequences enables the study of phylogenetic relationships at the species and population levels [63][64][65]. Therefore, comparison of different regions of rDNA enables the phylogenetic analysis of organisms over extended evolutionary distances.
Notably, analysis of the evolutionary history of living organisms based on only one molecular marker can uncover bifurcating phylogenetic trees, revealing branched evolution. However, it recently becomes evident that evolution is not always tree-like. Comparisons of gene trees based on different genetic loci often reveal conflicting tree topologies. These discrepancies are not always due to the problems with the sampling and the gene tree reconstruction methods. Reticulation events such as horizontal gene transfer (HGT) and hybridization may be responsible for contradictions in lineages. During an HGT event, a DNA segment is transferred from one organism to another which is not its offspring, whereas hybridization describes the origin of a new species through an interspecies mating. Both processes yield genomes that are mixtures of DNA regions derived from different species. Consequently, evolutionary relationships between species whose past includes reticulation can often be better represented by using phylogenetic networks rather than trees [65][66][67].
In this study, we analyzed the phylogenetic relationships within a small group of cyclopoid copepods representing several genera of freshwater (Cyclops, Thermocyclops, Diacyclops, Megacyclops, Macrocyclops, and Mesocyclops) and marine (Oithona and Paracyclopina) organisms. Specific freshwater species were selected for analysis because these species are important for the maintenance of food chains in Russian freshwater ecosystems. The aim of this study was to analyze the sequence characteristics of the rDNA 28S gene, ITS1, and ITS2 regions as phylogenetic markers for the selected group of organisms. Individuals of each species were collected for further analysis at the specified locations over a 0.5-to 1-hour period.

Materials and Methods
No specific permission was required to collect samples at these locations. None of the studied species is endangered or protected.

DNA Extraction, PCR Amplification, and Sequencing.
The genomic DNA was isolated from 10-20 individuals of each collected species using the DNeasy Blood & Tissue kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions and was frozen at −20 ∘ C. The rDNA region (approximately 2000 bp) was amplified from the genomic DNA by polymerase chain reaction (PCR) using the universal eukaryotic rDNA primers DAMS18 and DAMS28 [95][96][97] (Figure 1). The amplified rDNA regions contained the ITS1 (261-388 bp) and ITS2 (188-262 bp) regions, the 5.8S gene (157 bp), and approximately 200 and 1000 bp of the 18S and 28s genes, respectively. The amplification was performed in 50 L reactions using a PCR Master Mix (2X) (Fermentas, Vilnius, Lithuania) according to the manufacturer's instructions; the reactions were performed in a Primus 25 advanced Thermocycler (PEQLAB, Erlangen, Germany) using previously published rDNA-specific parameters [98]. The PCR products were resolved on 1.0% agarose gels, and DNA was extracted from the observed unique bands using the QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany). The extracted products were cloned into the pGEM-T Easy vector (Promega, USA), and the resulting plasmids were used to transform Escherichia coli JM109 competent cells (Promega, USA) according to the manufacturer's instructions. For each species, the amplified product and five clones were sequenced. Automated sequences were generated on an ABI PRISM 310 Genetic Analyzer according to Sanger et al. [99] with a BigDye Termination kit (Applied Biosystems, USA). The sequences generated in this study were deposited in GenBank under the accession numbers KF153689-KF153701.

Phylogenetic Analyses.
The rDNA sequences were aligned using ClustalW 2.1 [100,101] with some manual adjustments. The boundaries of the ITS1 and ITS2 regions and the 28S gene were identified by comparing the primerdelimited sequences against sequences in the GenBank database using BLAST analysis. The boundaries of the conserved sequences were considered to represent the 5.8S, 18S, and 28S gene flanking regions if they were 100% similar to the boundaries of rDNA sequences in the GenBank database. The initial sequence alignment flanked by DAMS18/DAMS28 primers was divided into ITS1 and ITS2 alignments. The rDNA genetic distances were estimated using the MEGA V5.2 software [102]. DAMBE (Data Analysis in Molecular Biology and Evolution) software was used to analyze substitution saturation [103][104][105]. This method computes the entropy-based index of substitution saturation and its critical value. If the index of substitution saturation (Iss) approaches 1 or if the Iss is not smaller than the critical Iss value (Iss.c), then sequences are considered to contain substantial saturation. As is known, the substitution saturation decreases phylogenetic information contained in sequences and has plagued the phylogenetic analysis involving deep branches. In the extreme case when sequences have experienced full substitution saturation, the similarity between the sequences will depend entirely on the similarity in nucleotide frequencies, which often does not reflect phylogenetic relationships [106].
The rDNA-based phylogenetic trees were estimated using probabilistic (maximum likelihood (ML), Bayesian) and distance (minimum evolution (ME)) methods [107][108][109][110]. ML and ME analyses of ITS1, ITS2, and 28S data were performed using the program MEGA V5.2. Branch support was assessed using the bootstrap method [111] (1,000 replicates) with the close-neighbor-interchange (CNI) algorithm at a search level of 1 for ME analysis and heuristic search for ML analysis. The Bayesian information criterion (BIC), as implemented in MEGA V5.2, was used to identify the bestfit model of sequence evolution for the trees estimated using ML. The evolutionary history was inferred using the ML method based on the general time reversible with the gamma distribution shape parameter (GTR+G) model for 28S and the Hasegawa-Kishino-Yano with gamma distribution shape parameter (HKY+G) model [112] for the ITS1, ITS2, and concatenated ITS1/ITS2 alignments. In addition to these methods, ITS1 and ITS2 alignments were constructed using the MAFFT version 7 (http://mafft.cbrc.jp/alignment/server/) [113] and Gblocks version 0.91b (http://www.phylogeny.fr/ version2 cgi/one task.cgi?task type=gblocks) software programs [114][115][116] to eliminate poorly aligned and highly divergent regions. Default parameters were used for both of these methods. The Tamura 3-parameter model (T92) [117] and HKY with evolutionary invariable (HKY+I) for Gblockstreated MAFFT ITS1 and ITS2 data, respectively, were used to infer evolutionary history inference using the ML method.
The Bayesian analysis was performed using MrBayes version 3.1.2 software [118,119]. Two replicate analyses of 1 million generations each were performed for each dataset, with sampling every 10 generations. The hierarchical likelihood ratio test (hLRT) implemented in MrModeltest version 2.3 software [120] was used to identify the model of best fit (Hasegawa-Kishino-Yano with invariant sites and gamma distribution shape parameter (HKY+I+G) [112] for ITS1 and the HKY+G model for ITS2). Trees from the first 53,000 and 118,000 generations were discarded as burn-in for ITS1 and ITS2, respectively. The Bayesian tree was estimated from the majority-rule consensus of the post-burn-in trees.
A reticulogram [121] was constructed using the T-REX version 4.01a software [122] with the distance matrix computed using the Kimura 2-parameters model (ignoring missing bases); the weighted least-squares method was used for tree reconstruction [123], and addition of reticulation branches stopped when = 1 branches were added.
Network reconstruction was performed using Splits Tree 4 version 4.11.3 software [65]. The neighbor-net network method and uncorrected p-distances were used to analyze and visualize reticulate relationships. All gaps were excluded for analysis. Network robustness was tested using 1,000 bootstrap replicates.

Characteristics and Analysis of an rDNA Sequence Dataset.
In each species, the nucleotide sequences of the amplified rDNA region and five clones were not significantly different from each other. The frequency of variable nucleotides did not exceed the average rate of nucleotide substitutions caused by DNA polymerase errors, which is approximately one substitution per 1,000 nucleotides. The compared sequences contained both relatively evolutionarily conserved (fragments of 18S and 28S rDNA and the complete 5.8S rDNA) and evolutionary variable genomic regions (ITS1 and ITS2). For different taxa, the ITS1 and ITS2 sequences vary significantly among individuals at the inter-and intrapopulation levels; furthermore, these sequences can exhibit intragenomic variability [25,41,53,54]. Recently, a high level of intrapopulation polymorphism of the 28S rDNA sequences was observed within Oithona spp. [22]. However, there are instances of strong evolutionary conservation of the 28S and ITS sequences [15,23,30,34]. Notably, the M. leuckarti ITS2 sequence obtained in this study did not exhibit any nucleotide substitutions compared to the M. leuckarti ITS2 sequence described previously (GenBank accession number GQ848499) [42]. Therefore, the strong evolutionary conservation of ITS1 and ITS2 sequences is a characteristic feature of the copepod species analyzed in this study.
In this study, the applicability of different segments of rDNA containing the ITS1 and ITS2 regions, the 5.8S RNA gene, and fragments of the 18S and 28S rRNA genes was examined for reconstruction of the phylogenetic relationships among freshwater cyclopoid copepods. The 5.8S gene and the analyzed fragment of the 18S gene were not considered for phylogenetic reconstruction due to their short length and strong evolutionary conservation: only a few nucleotide substitutions were detected by comparing these sequences with evolutionary distant species (data not shown).
For 15 specimens of Cyclopoida species (including the two marine species), the average length of the 28S gene fragment sequenced was 1051 bp. We trimmed these sequences to 703 bp and compared them with the 28S gene sequences of marine species available in GenBank. These 703 bp of 28S rDNA sequences were aligned, and 342 variable sites were observed. Oncaea sp. (Oncaeidae family) was used as the out group.
The ITS1 sequence lengths varied from 267 to 388 bp among the 13 Cyclopidae specimens. The ITS1 sequence alignments possessed 442 characters, and among them, 283 were variable. The ITS2 sequence lengths varied from 188 to 262 bp among the 13 Cyclopidae specimens. ITS2 sequence alignment possessed 302 characters, and 190 were variable.
All alignment sets were examined for homogeneity of base frequencies and substitution saturation. The average base frequencies of the 28S gene fragment ( = 20.52,  To analyze whether the divergence of 28S, ITS1, and ITS2 rDNA fragments among species was saturated, we performed a substitution saturation test and generated saturation plots. Using DAMBE, the substitution saturation test revealed an Iss value that was significantly ( < 0.001) lower than the Iss.c in all cases (Table 1). This result indicated the suitability of the data for phylogenetic analysis. The total numbers of transition and transversion substitutions were plotted individually against model-corrected maximum-likelihood pairwise distances for the 28S, ITS1, and ITS2 sequences (see Supplementary Figure 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2014/926342). Using linear regression analysis on the 28S, ITS1, and ITS2 saturation graphs, the coefficients of determination ( 2 ) were calculated for both classes of substitutions: for transitions, the 2 values were 0.79, 0.68, and 0.74 for the 28S, ITS1, and ITS2 sequences, respectively; for transversions, the 2 values were 0.95, 0.93, and 0.91 for the 28S, ITS1, and ITS2 sequences, respectively. The 2 values indicated that no less than 70% of the total variation in pairwise transitions and transversions could be explained by the linear relationship between pairwise distances and the total number of transitions and transversions. All saturation plots showed significant linear correlations (Supplementary Figure 1). Therefore, both transitions and transversions steadily accumulated as the corrected pairwise divergence increased, indicating that saturation was not reached.

Distance Analyses and Phylogenetic Tree Reconstruction.
Phylogenetic analysis of the cyclopoid copepods species based on rDNA showed that the 28S rDNA sequences are informative for the phylogeny of both higher-level and closely related Copepoda species, whereas the ITS1 and ITS2 sequences are highly informative for reconstruction of the evolutionary history of closely related species. The ITS1 and ITS2 sequences are known to evolve more rapidly than the ribosomal RNA genes. Consistent with this observation, in  this study, the pairwise ITS1/ITS2 p-distances were significantly higher thanthe 28S p-distances (compare Tables 2  and 3). These data are consistent with other studies showing considerable variation in ITS1 and ITS2 divergence levels among different groups of copepods [40,42,52,124,125]. In this study, fragments of 28S rDNA sequences were used for the analysis of marine and freshwater cyclopoid copepods species, whereas ITS1 and ITS2 sequences were used exclusively for the analysis of freshwater cyclopoid copepods species.
The cladogram based on comparison of the 28S rDNA sequences reflected the evolutionary history of the analyzed species (Figure 2). Oncaea sp. was used as the out group. Similar topologies and levels of support at most nodes were obtained for all 28S phylogenetic trees constructed using the ML and ME methods. The specimens belonging to the order Cyclopoida with high bootstrap support (ML/ME 99) formed two major clades on the tree (Figure 2). One clade combined the marine cyclopoid copepods species, whereas the freshwater species specimens formed the second clade. The p-distance between these two clades varied in the range of 0.171-0.245 (  nana [22]. The cladogram based on the comparison of the concatenated ITS1/ITS2 sequences is shown in Figure 3. Notably, the 28S and ITS1/ITS2 cladograms had several common features, reflecting the evolutionary history of the analyzed freshwater cyclopoid copepods species. Both cladograms revealed that D. bicuspidatus and specimens of the Cyclops genus with high bootstrap values (>80) are separated from other studied freshwater copepods in a distinct clade. The pdistance between D. bicuspidatus and Cyclops spp. calculated based on ITS1/ITS2 analysis varied in the range of 0.232-0.250, whereas the p-distance between D. bicuspidatus and Thermocyclops spp. varied in the range of 0.298-0.333, and the p-distance between D. bicuspidatus and other analyzed freshwater species varied in the range of 0.310-0.405. This result is consistent with a previous phylogenetic study based on 18S rDNA sequence analysis [48]. Notably, the systematic position of this species, based solely on the analysis of morphological characteristics, remained unclear. Diacyclops bicuspidatus is considered to be evolutionarily closer to Thermocyclops spp. [8].
Another important conclusion from the analysis of 28S and ITS1/ITS2 cladograms relates to the systematic position of C. strenuus. The Cyclops genera subclade was divided into      the C. kolensis: C. strenuus subsubclade and the C. insignis subsubclade (Figures 2 and 3). The p-distance between C. strenuus and C. kolensis calculated based on ITS1/ITS2 analysis is 0.006, whereas the p-distance between C. strenuus and C. insignis varied in the range of 0.042-0.054. Therefore, C. strenuus is more closely related to C. kolensis than to C. insignis. Notably, the phylogenetic relationships between the studied Cyclops species could not be elucidated solely on the basis of morphological characteristics.
The only difference between the 28S and ITS1/ITS2 cladograms within freshwater copepods was the position of M. leuckarti (Figures 2 and 3). The cladogram based on comparison of the 28S rDNA sequences showed that the M. leuckarti and Thermocyclops cluster together to form a separate subclade (Figure 2). This result is consistent with the previous observation that the Mesocyclops and Thermocyclops genera are phylogenetically closely related, which was confirmed by the similarity of morphological characteristics and using molecular data [42]. ITS1/ITS2 analysis revealed that M. leuckarti is located separately from Thermocyclops and other clades (Figure 3). Using phylogenetic networks, we analyzed whether the M. leuckarti position in the ITS1/ITS2 cladogram was caused by different contributions of ITS1 and ITS2 sequences to the phylogenetic signal.

Phylogenetic Networks.
A reticulogram-based phylogenetic network inference approach was used to verify the reticulate evolution of the studied copepods. Concatenated ITS1/ITS2 sequences of 10 species from the Cyclopidae family were used for reticulogram reconstruction. The reticulogram revealed a network with Mesocyclops and Thermocyclops clustered together and a reticulation (lateral branch) connecting M. leuckarti to the Macrocyclops clade node (Figure 4). Therefore, the reticulogram indicated the reticulation in Mesocyclops evolution.  A split network represents incompatible edges of trees as a band of parallel edges. Parallel edges split a network into two sets of nodes. Split-graph analysis of concatenated ITS1/ITS2 sequences of 10 species from the Cyclopidae family revealed a reticulate relationship between Mesocyclops, Thermocyclops, and Macrocyclops with high reliability ( Figure 5). All principal splits were well supported. Two splits were observed in the ITS1/ITS2 split network. The first split (parallel edges highlighted with bold red) separated M. leuckarti, T. oithonoides, and T. crassus with 80.2% bootstrap support. The second split (parallel edges highlighted with bold blue) separated M. leuckarti, M. distinctus, and M. albidus with 65.0% bootstrap support.
In addition to the network data, we performed phylogenetic reconstruction based on independent ITS1 and ITS2 analyses using probabilistic and distance methods. Irrespective of the method used, the main difference between the topologies of the ITS1 and ITS2 phylogenetic trees was as follows: based on the ITS1 analysis, M. leuckarti is clustered with Thermocyclops, whereas the ITS2 analysis revealed that M. leuckarti clustered with Macrocyclops (Figures 6(a)-6(d)).
The impact of the chosen DNA sequence on the clustering of M. leuckarti might reflect the different evolutionary histories of ITS1 and ITS2, which indicates the potential hybrid origin of M. leuckarti. However, the values of bootstrap support for the clustering of Mesocyclops and Thermocyclops and of Mesocyclops and Macrocyclops depended on the method used for phylogenetic tree reconstruction and varied over a wide range (Figures 6(a)-6(d)).
Phylogenetic trees can be inconsistent due to the socalled long-branch attraction (LBA) phenomenon, which occurs when two nonadjacent taxa share many homoplastic character states along long branches and/or from uncorrected sequence alignments. Interpretation of the observed similarity depends on the method used for phylogenetic analysis, and this similarity can often be interpreted as homology. Model-based methods are most resistant to LBA, but these methods can exhibit LBA if their assumptions are seriously violated or if there are insufficient taxa in the analysis to accurately estimate the parameters of the evolutionary model [126]. Taxon sampling is a crucial factor for avoiding LBA in phylogenetic analysis [127]. The inclusion of additional taxa in phylogenetic analysis increases the accuracy of the inferred topology by dispersing homoplasty across the tree and reducing the effect of LBA. The LBA effect might also be revealed by exclusion of the long-branched taxon from the analysis [127].
To reduce the possible effects of LBA and correctness of the sequence alignment, we used the following approaches: (1) three taxa the most evolutionarily distant from M. leuckarti (M. distinctus, M. viridis Borok1, and M. viridis Borok2) were removed from the list of species used for ITS1 and ITS2 phylogenetic tree reconstruction (the taxa selection was based on the data presented in Table 3) and (2) ITS1 and ITS2 sequences were aligned using new multiple sequence alignment programs to eliminate poorly aligned and highly divergent regions (see Section 2). The final ITS1 and ITS2 alignments are shown in Supplementary Figures  2(a) and 2(b), and the resulting phylogenetic trees are shown in Figures 6(e) and 6(f). Based on the ITS1 analysis, M. leuckarti clustered with Thermocyclops, whereas the ITS2 analysis revealed that M. leuckarti clustered with Macrocyclops (Figures 6(e) and 6(f)). Notably, the topology of the new phylogenetic trees had high bootstrap support: 84 (ML)/77 (ME) for ITS1 and 77 (ML)/72 (ME) for ITS2 (Figures 6(e) and 6(f)).  Figure 6: Phylogenetic relationships of Cyclopidae based on ITS1 and ITS2 sequences. The ITS1 consensus Clustal tree of ten Cyclopidae species constructed using (a) maximum likelihood and minimum evolution and (b) Bayesian inference. The ITS2 consensus Clustal tree of ten Cyclopidae species constructed using (c) maximum likelihood and minimum evolution and (d) Bayesian inference. The consensus Gblocks-treated MAFFT trees of eight Cyclopidae species constructed using maximum likelihood and minimum evolution: (e) ITS1, (f) ITS2. The numbers above the branches indicate bootstrap percentages and Bayesian posterior probabilities. The values are listed for ML/ME. Missing or weakly supported nodes (<50% or 0.5) are denoted by a "-".
We think that one of the most intriguing explanations for the observed differences in the clustering of M. leuckarti is the interspecific hybridization between extinct taxa (presumably closely related) that were ancestral to both Mesocyclops, Macrocyclops, and Thermocyclops. However, a rigorous proof of this hypothesis requires further analysis of a larger number of species. This will be the subject of our further research.

Conclusion
We evaluated the utility of a ∼2000 bp fragment of rDNA (easily amplified by universal primers) for the phylogenetic reconstruction of the relationships of Copepoda species. Our data showed that the 28S rDNA and the ITS1 and ITS2 regions are highly informative for the phylogeny of both higher-level and closely related Copepoda species. Comparative analysis of the ITS1 and ITS2 nucleotide sequences among closely related Copepoda species revealed an unusual evolutionary history of these spacer sequences; therefore, the ITS1 and ITS2 regions might contain different phylogenetic signals.