High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

1 Institute of Fisheries Science, College of Life Science, National Taiwan University, No. 1, Section 4, Roosevelt road, Taipei 10617, Taiwan 2 Biologie du comportement-Ethologie et Psychologie Animale, Départment de Biologie, Ecologie et Evolution, Institute de Zoologie, Université de Liège, 22 quai Van Beneden , 4020 Liège, Belgium 3 Institut de Recherche pour le Développement, Laboratoire Ecologie des Systèmes Marins Côtiers, UMR 5119, Université Montpellier II, 34095 Montpellier, France 4Department of Life Science, College of Life Science, National Taiwan University, No. 1, Section 4, Roosevelt road, Taipei 10617, Taiwan


Introduction
The advent of next generation sequencing started an era of discovery for a wide array of genetic markers for population genetics, phylogeny, and so forth.However, we have only scratched the surface on the analyses of these new markers and most researchers still resort to using the mitochondrial DNA (mtDNA) for population and biodiversity studies.Furthermore, the nonrecombining nature due to maternal inheritance and fast evolution rate compared to the nuclear genome [1] makes mtDNA a popular marker for studying vertebrate population genetics [2][3][4].Within the mitochondrial genome, evolutionary rates are highly variable among genes.For example, in salamanders, nucleotide substitution rates may range from 0.1 to 1 substitution/site/million years (MY) [5].Among the coding genes, the cytochrome b (cyto b) and cytochrome oxidase I (COI) present the highest evolutionary rates; however, the noncoding region of the mitochondrial genome (i.e., the control region (CR)) evolves 2-5 times faster [6].Because of this exceptionally high mutation rate characteristic, the CR was quickly considered of great utility for addressing intraspecific evolutionary questions [7,8].
In Teleostei, the CR is located between tRNA Pro and tRNA Phe .Structurally, the CR is composed of a central conserved domain flanked by two highly variable (left and right) domains (Figure 1).In these large domains, several short (approximately <30 bp) but conserved sequenced blocks (CSBs) have been described and varied in occurrence between taxa [2].In the central conserved domain, the CSB-D is involved in the heavy strand replication, including initiation by a 3-strand displacement loop (D-loop) [9].The CSB-I, CSB-II, and CSB-III are located in the right domain.The functions of these CSBs are not well understood but in mammals the transition to DNA synthesis occurs in the region surrounding CSB-II, which is also evident in flounder fishes [2].Within the regions of the CR, the tRNA Pro end (extreme 5  end) is a popular chosen marker for evolutionary and population genetics studies because of its rapid evolutionary rate compared to other mitochondrial genes and nuclear coding regions [2,10].In fishes, this region is also widely used for inter-and intraspecies investigations, but its hypervariability cannot be generalized to all taxa [10].For example, in some salmonids (brown trout) the tRNA Pro end of the mtDNA CR seems to evolve at a lower rate [11,12].Indeed, differences in sequence patterns and evolutionary rate of the CR are evident but poorly understood [2,10].Intraspecific diversity recovered from the CR is quite variable between fish taxa and among them, the cosmopolitan flathead mullet, Mugil cephalus, which occurs between 42 ∘ N-42 ∘ S latitudes mostly in the tropical and subtropical areas of the world [13], shows exceptionally high CR diversity [14][15][16].Between remote populations from Atlantic and Pacific oceans, Rocha-Olivares et al. [15] showed a level of divergence ranging from 38 to 75%.More recently, Jamandre et al. [16] observed CR haplotypes with sequence divergence of 48% between sympatric populations of M. cephalus in continuous sampling locations along the northwest Pacific.Such exceptional level of sequence divergence brought Rocha-Olivares et al. [15] to question the origin of the CR amplified in their study, as nuclear pseudogenes stemming from the duplication of mtDNA regions can greatly diverge from their mitochondrial copies.However, by sequencing M. cephalus CR they successfully demonstrated the mitochondrial origin of all haplotypes observed in Pacific and Atlantic populations of this species.The levels of divergence observed might be due to an accelerated evolutionary rate in the Mugil lineage as suggested earlier by Caldara et al. [17].In this context, CR would be an interesting marker to investigate the worldwide phylogeographic structure of M. cephalus because all phylogeographic investigations to date, using different markers (mtDNA RFLP, allozymes), failed to highlight any significant genetic structure consistent with the geography [18,19].However, before utilizing it in such way, determining the basic properties of this marker for phylogenetic usefulness is warranted, especially by checking the level of homoplasy that may blur phylogenetic signals.
In this study, we describe a remarkably high evolutionary rate of mtDNA CR in M. cephalus by analyzing its sequence structure and mutations from previously reported and new worldwide samples.Since Mugil liza from South America belongs to the M. cephalus clade as demonstrated by Fraga et al. [20], Heras et al. [21], and Durand et al. [22], we also included samples of this species in this study.

Data Gathering, Sample Collection, and DNA Extraction.
To provide a general and worldwide view of sequence variations and genetic diversity in M. cephalus, we gathered all available M. cephalus CR sequences in GenBank (http://www.ncbi.nlm.nih.gov/)(Table 1).These sequences or haplotypes belong to the different M. cephalus lineage previously described by Rocha-Olivares et al. [14,15], Miya et al. [4], and Jamandre et al. [16].New M. cephalus samples were also added and analyzed from Senegal ( = 1) and Togo ( = 2) in West Africa and South Africa ( = 1).It was stressed by Fraga et al. [20], Heras et al. [21], and Durand et al. [22] that Mugil liza belongs to the M. cephalus clade.Thus, a sample of M. liza from French Guyana (and also to represent the southwest Atlantic) was added in this study to further investigate M. cephalus and M. liza relationships.
For each specimen collected, muscle tissues and fin clips were taken and preserved in the 95% ethanol-water solution and brought back to the laboratory.Genomic DNA was then extracted from a small piece of either muscle tissue or fin clips, using a commercially available total genomic DNA extraction kit (Bioman Scientific Co. Ltd., Taiwan).
PCR amplification was carried out in a 25 L total volume containing 0.5 L DNA template, 0.25 L taq polymerase, 1 L of each primer, 2.5 L PCR buffer, 0.5 L 10x DNTP, and 19.25 L ddH 2 O. Thermal cycling was performed as follows:

Sequence
Analyses.CR sequences were aligned using the software MAFFT version 6 [23] with default parameters and manual improvement.Multiple sequence alignment is a crucial step in the analysis of any genomic data especially for sequences with many overlapping and nonoverlapping gaps.As such, the software MAFFT was used to align the M. cephalus CR data set, since it uses both iterative and progressive methods of alignment, which has been proven to be more accurate and consistent in dealing with gaps, compared to other popular alignment software [24].The mtDNA CR domains (5  and 3  hypervariable domain and central conserved region), termination associated sequences (TAS), and conserved sequence blocks (CSBs) were identified by alignment with reference to the previously published M. cephalus control region sequences [14,15] and other fish: Crossostoma lacustre [25], Chanos chanos [26], Dascyllus trimaculatus [27], Notopterus notopterus [28], Perca fluviatilis [3], Padogobius nigricans [29], and Sinipercine fishes [30].
Intraspecific nucleotide diversity and variability were assessed from all sequence data available.Sequence divergence was estimated accounting for different rates of transition and transversion, unequal base frequencies, and among site variation rates because evolutionary relationships might be blurred by homoplasic mutations (i.e., multiple substitutions) at a specific site [31].To assess the level of saturation of CR, the number of nucleotide substitutions per site for transitions and transversions was plotted against genetic divergent distances.In this case the Tamura and Nei's (TrN-G) model was used.Genetic divergences between groups were calculated using the MEGA 4.1 software [32,33].
The appropriate DNA substitution model was obtained via testing alternative models of evolution using the Modeltest software version 3.7 [34].Bayesian inference phylogenetic tree was constructed with MrBayes v3.1.2[33] using the model of evolution indicated by Modeltest.Four Metropoliscoupled Markov Chain Monte Carlo chains with 2 × 10 6 generations were sampled every 1,000th step.500 trees were discarded as burn-in and a consensus phylogram of the 1,500 recorded trees and posterior probability were calculated.By comparing M. cephalus CR sequences to CR of other fishes species (see Table 2), three domains were highlighted, namely, the left and right hypervariable domain and the central conserved domain (Figure 1).Several conserved sequence blocks (CSBs) were located at the central conserved region and at the right hypervariable domain (Figure 2).CSB-I, -II, and -III are commonly found in vertebrate CR.These were also present in all sequences of M. cephalus.Putative CSB-D was also found in the sequences (Figure 2).Pyrimidine tract was also identified at the central conserved domain.

Control Region Sequence
Insertions and deletions (indels) were found in both the left and right hypervariable domain and generally associated with repetitive sequences (Figure 2).The percent composition of nucleotide varied from 35.96 to 37.50% (A), 29.80 to 34.57% (T), 16.88 to 20.64% (C), and 11.29 to 13.27% (G), which infers that M. cephalus CR is AT rich and poor in G.
The transition/transversion (Ts/Tv) rate ratios are  1 = 20.443(purines) and  2 = 14.496 (pyrimidines).An overall Ts/Tv ratio of 5.131 was estimated from the dataset (Figure 3).A linear relationship between the number of nucleotide substitutions per site and the sequence divergence was observed for both, transversion and transitions mutations (Figure 3).Slope of the transitions /divergence regression indicates a higher transition rate than transversion rate.from the whole mtDNA genome (a sample from Japan) was incorporated and assigned to the lineage 2 of NW Pacific M. cephalus.Hawaii was assigned as the central Pacific population.NW Atlantic and Gulf of Mexico M. cephalus were grouped together as no genetic divergence was found among these locations as described in Rocha-Olivares et al. [14,15].Togo and Senegal individuals were grouped as West Africa and South Africa has its own lineage.M. liza was assigned as a SW Atlantic M. cephalus group.Consensus sequence was obtained at each group, and these were eventually used for further phylogenetic and sequence divergence analyses.The most appropriate model for the CR data set was the GTR + G model with a gamma distribution shape parameter  = 0.7997.Phylogenetic relationships of M. cephalus revealed no geographic structuring.Unrooted Bayesian tree showed short branch lengths stemming from the ancestral nodes (Table 2, Figure 4).The star-like tree had nodes with moderate (71%) to highly significant (100%) posterior probability values.Gulf of Mexico and Atlantic samples gained the longest branch lengths implying high divergence from all of the other populations.M. liza individuals from French Guyana did not present a divergence level higher than the divergence observed within M. cephalus sensu stricto.

Discussion
We compiled and analyzed new sequences and extracted from GenBank database of NCBI all available complete CR sequences of M. cephalus from different populations around the world.The total length of the CR varies greatly among population/lineage.The longest CR sequences were found in individuals belonging to the NW Pacific (909-1015 bp), whereas the shortest were observed in individual samples in the Atlantic and the Gulf of Mexico (884-894 bp).Such size variation is consistent with observation from other teleost fish (e.g., Cichlid-888 bp; Atlantic cod-996 bp; sturgeon-976; salmon-1071; yellowtail ∼ 1600; from (2)).However, to date, there are no other reports of fish species with such highly intraspecific sequence length difference.The observed CR length variations in M. cephalus are either due to the presence of variable number of tandem repeats (VNTR) at the right hypervariable domain (Figure 2) and indels.Indels in M. cephalus were already described by Rocha-Olivares et al. [14]; however they only gave general descriptions on indels patterns and phylogenetic significance.In the sequence alignment presented here, there are two sets of permanent deletions unique to lineage 1 of northwest Pacific samples: 22 bp deletion (starting at position 857 bp in the alignment; Figure 2) and 7 bp deletion 45 bp following the former.However, the lineage 1 of northwest Pacific has tandem repeats (ATn) located at 30 bp after CSB-I, which cannot be found in the other populations.Second, West African samples were found to have 10 to 14 bp deletions at the central conserved domain/region.Lastly, both the left and right hypervariable domain contain considerable amount of indels that greatly affected the sequence length of CR among populations of M. cephalus.An 88 bp indel due to repeated sequence (TATAATATn) was found again in the 3  or left hypervariable domain in lineage 1 of NW Pacific when the consensus sequences among populations were aligned (Figure 2).These indels seem to be unique to lineage 1 of NW Pacific individuals.Indels in CR are common in teleost fish [35].However large deletions are uncommon and only reported in some fish: an 86 bp indel in the 3  region of Oncorhynchus mykiss [36], 40 bp indel in central conserved region of the rockfish Sebastes capensis [37], and 48 bp indel in milkfish Chanos chanos [26].These studies focused more on the interspecies sequence variation with the exception of the milkfish.The large deletions found in this study are intraspecific comparison and seem to be very rare in fish.This is the first study to show large deletions in CR within the M. cephalus species and the family Mugilidae.These indels were hypothesized not to affect the organism due to the noncoding nature of CR [1]; however these indels might be of importance for population identification.The CSBs observed in M. cephalus are commonly found in other teleosts with slight length variations.Comparing the CSB-I to other fish (Table 2), it is obvious that the CSB-I in M. cephalus is shorter than the other fishes.CSB-I appears in M. cephalus to be more conserved than CSB-II and CSB-III.The two latter CSBs show a few nucleotide indels in the alignment.
The phylogenetic tree inferred from CR shows that M. cephalus may have experienced quick radiation due to short branch lengths stemming from the ancestral node.Such branching pattern can indicate either homoplasic mutations that can limit resolution of ancient nodes or a quick ancestral radiation.Rapid evolutionary rate as reported in CR of Chaetodontidae [10] may result in an excess of homoplasic mutations ∼ mutations that reached a "ceiling." Because no transition or transversion saturation was detected in M. cephalus CR sequences, this assumption was not considered.However, comparing between samples from different locations, the long branch lengths can be attributed to deep divergences between these samples.Livi et al. [38] described that these divergences observed among worldwide samples of M. cephalus might have been due to both demographic (bottleneck events, etc.) and geographic and temporal isolations.
The star-like shaped phylogenetic tree was already observed in cichlids of the African lakes, which was interpreted as a rapid radiation [39][40][41].Such an evolutionary event can be attributed to both allopatric and sympatric speciation, which is due to geological events, reproductive isolation, and other selections caused by intrinsic and extrinsic forces [41].In the case of a South American silverside Odontesthes perugiae species complex (a strictly freshwater species), rapid evolution was assumed to be due to its marineestuarine origin and geological events [38,42].
Lastly, the star shaped pattern of the phylogenetic tree inferred no geographic patterning according to the sequence analysis.Previous phylogenetic studies using other regions of mtDNA have also shown no geographic pattern among worldwide population of M. cephalus genetic structure [18,19,21].In conformity with other studies and the comprehensive Mugilidae molecular phylogeny by Durand et al. [22], we clearly confirmed the presence of M. liza in the M. cephalus clade because there are no other or any long branches present to differentiate this species from other M. cephalus haplotypes.The high CR sequence divergence leads Rocha-Olivares et al. [14,15] to reconsider the real taxonomical status of M. cephalus which was also questioned by Heras et al. [21].This indicates the presence of M. liza in the M. cephalus clade.

Conclusion
This study presents a very high sequence and genetic divergence within groups of the same species.Applying the definition of the biological species concept is hard for globally distributed species, mainly due to difficulties in the determination of interbreeding levels between populations or species.In congruence to the growing evidence and as indicated in this study, on a worldwide scale, M. cephalus should be regarded as a complex species.

2 InternationalFigure 1 :
Figure 1: The mtDNA CR structure of Mugil cephalus.Arrows indicate locations of primer binding sites.Broken lines indicate location of variable number of tandem repeats (CSB-conserved sequence block).
Structure and Variability.The observed lengths of M. cephalus control region varied among sampling locations.The complete CR sequences of M. cephalus from northwest Pacific ranged from 909 to 1015 bp, whereas in central Pacific (Hawaii), it ranged from 919 to 930 bp.The Atlantic and Gulf of Mexico M. cephalus CR sequences ranged from 884 to 894 bp while the West and South African individuals ranged from 899 to 912 bp and 911 bp, respectively.The CR length of the Mugil liza individual is 906 bp.
Divergence.NW Pacific M. cephalus were subdivided into lineages 1 and 2 as described in Jamandre et al.[16].The CR sequence International Journal of Zoology Table 2: Comparison of conserved sequence blocks (CSBs) identified in the control region of Mugil cephalus to other teleost fishes.

Figure 2 :
Figure 2: Multiple alignment of consensus sequences of Mugil cephalus from L1 (Lineage 1) and L2 (Lineage 2) of the NW (northwest) Pacific, Hawaii/C (central) Pacific, Gulf of Mexico/ Atlantic, Togo and Senegal/W (west) Africa, and South Africa.Structural features (CSBconserved sequence block and Pyrimidine-tract) are also indicated.

Figure 3 :
Figure 3: Bayesian tree inferred from complete CR sequences.Posterior probability values (%) are indicated close to the nodes of the branches.

Table 1 :
Location, number of samples (), and the GenBank (http://www.ncbi.nlm.nih.gov/)accession numbers of the mtDNA control region sequences of M. cephalus used in this study.C, followed by 35 cycles consisting of 30 sec at 94 ∘ C denaturation, 45 sec at 58 ∘ C annealing, and 45 sec at 70 ∘ C extension, and a final elongation of 10 min at 70 ∘ C. PCR products were visualized on a 1% agarose gel stained with ethidium bromide.Afterwards, the products were sequenced with forward and reverse primers using dye-terminator cycle sequencing on an ABI 373A sequencer.