Unraveling the Limits of Mitochondrial Control Region to Estimate the Fine Scale Population Genetic Differentiation in Anadromous Fish Tenualosa ilisha

The mitochondrial control region has been the first choice for examining the population structure but hypervariability and homoplasy have reduced its suitability. We analysed eight populations using control region for examining the population structure of Hilsa. Although the control region analysis revealed broad structuring between the Arabian Sea and Bay of Bengal (F ST  0.0441, p < 0.001) it was unable to detect structure among riverine populations. These results suggest that the markers used must be able to distinguish populations and control region has led to an underestimation of genetic differentiation among populations of Hilsa.


Introduction
The Hilsa shad, Tenualosa ilisha, is an anadromous fish with broad distribution ranging from foreshore areas, estuaries, brackish water lakes, and freshwater rivers Indo-West Pacific region from the Persian Gulf, along the coast of Pakistan, India, Bangladesh, and Burma to South Vietnam [1]. It ascends the rivers for breeding during the monsoon season and returns to the sea after completion of spawning. There has been increased exploitation on Hilsa fisheries in east and west coast of India over the years because of the introduction of efficient mechanized crafts such as moshari (mosquito net) seine net, behundi jal (set bag net), and char ghera jal (fencelike net operation around the char) and makeshift gears and fishers are tempted to exploit Hilsa stocks without caring for size and season. Overfishing may reduce population sizes to a level at which inbreeding and loss of genetic diversity occur and may result in extinction of local populations [2]. Proper scientific and judicious management actions by assessing the genetic make-up and variability of fish stock must be taken to ensure sustainability of Hilsa population.
The rate of evolution of mitochondrial DNA is generally higher than that of nuclear genes due to the lack of a known repair mechanism for mutations that arise during replication [3]. The control region (about 1 KB) has role in initiation of replication and transcription and is the marker of choice to identify population connectivity, conservation units, and migration routes. It has been used in many phylogenetic and population genetic studies due to high copy number and high mutation rate, as well as its maternal and haploid mode of inheritance [4]. However high evolutionary rate that has made the control region an attractive marker for biologists may be masking the true relationships between populations due to high haplotype diversity as well as homoplasy. So, an alternative marker with a slower evolutionary rate may be more suitable than the control region to reveal the population structure of Hilsa [5].
The population structuring of Hilsa was investigated by various researchers using morphomeristical, biochemical, and molecular approaches and also they differentiated three stocks of Hilsa belonging to Hooghly, Padma, and Ganga rivers using biometrical parameters. Based on tagging experiment, Pillay et al. [6] concluded that the same Hilsa individuals come up the Hooghly River during subsequent seasons, that is, winter and monsoon. Ghosh et al. [7] differentiated Hilsa into slender and broad morphotypes using morphometric data.

Scientifica
Brahmane et al. [8,9] reported more than one stock of Hilsa from India using RAPD markers and cytochrome b region. So, the baseline information on genetic stocks needs to be authenticated. However several morphometric and molecular studies (RAPD, RFLP) were conducted in Bangladesh and India but no study was done using mitochondrial D-loop to understand population genetic structure and patterns of gene flow of T. ilisha [10].

Fish
Sampling. The present study included 77 specimens of T. ilisha from the geographical distribution range in India, namely, rivers draining in Bay of Bengal (i.e., Ganga, Hooghly, and Godavari) as well as from Diamond Harbour and Paradip Port and rivers draining to Arabian Sea (i.e., Narmada and Tapti). T. ilisha was identified and discriminated from T. toli and Hilsa kelee based on morphometric and meristic data following Talwar and Jhingran [11] and Fisher and Whitehead [12]. The Hilsa body is oblong and compressed with 30-33 spines like scutes on abdomen. Difference between two major Hilsa species, that is, T. ilisha and T. toli, is very minute. In the former, dorsal and ventral profile of the body is equally convex, while in the later abdominal profile is more convex than that of dorsal. Further, about 150 to 200 straight to slightly curved gill rakers are present on the lower part of first arch in T. ilisha, while in T. toli gill rakers are curved and the number of gill rakers is 80 to 90 as discussed by Huda and Haque [13]. The fish specimens were photographed on graph papers and meristic counts of the specimens were compared. Muscle and fin tissues were preserved in 95% v/v ethanol and the vouchers were kept in 10% v/v formaldehyde. Specific and unambiguous code was given to tissue samples and voucher of each fish specimen (Table 1).

DNA Extraction, PCR Amplification, and DNA Sequencing.
Approximately 50 mg of caudal or anal fin or muscle tissue was used for DNA isolation following standard phenol : chloroform : isoamyl alcohol method [14]. Precipitated DNA was resuspended in TE buffer (10 mM Tris-HCl, 0.1 mM EDTA, and pH 8) and concentration was determined using Nanodrop 2000 (Thermo Scientific, USA). The primers TIDF (5 -AACTTCCACCCCTAACTCCC-3 ) and TIDR (5 -GTGCTTGCGGGGCTTG-3 ) were designed using Primer3 [15] and BLASTn [16] software of NCBI, so as to amplify complete control region of mitochondrial DNA (mtDNA). The PCR reaction of 50 L volume contained 1x buffer, 100 M dNTPs, 2 mM MgCl 2 , 10 picomoles of each primer, 3 U Taq DNA polymerase, and 100 ng template DNA. Amplifications were performed in Veriti 96 fast thermal cycler (Applied Biosystems, Inc., USA). The thermal regime for control region consisted of initial denaturation of 3 min at 94 ∘ C, followed by 35 cycles of denaturation at 94 ∘ C for 50 sec, annealing at 47 ∘ C for 30 sec, and extension at 72 ∘ C for 80 sec with final extension of 10 min at 72 ∘ C. PCR products were visualized on 1% agarose gels stained with ethidium bromide and documented using a gel documentation system (UVP, USA). DNA sequencing was performed following the dideoxynucleotide chain termination method [17] using an automated ABI 3730 sequencer (Applied Biosystems, Inc., USA).

Sequence Analysis.
Complete control region sequence was generated from forward and reverse sequence reads using MEGA 5.1 [18]. Ambiguities were referred against the sequencing electropherograms. The consensus sequences were blasted in NCBI for the nearest similar sequence matches using BLASTn and submitted to GenBank (Table 1). DNA sequences were analysed by ClustalW, Arlequin version 3.5 [19], DnaSP version 5.10 [20], and MEGA 5.1 software for nucleotide composition, number of polymorphic sites ( ), haplotype diversity (ℎ), and nucleotide diversity ( ).
The evolutionary history was inferred using the ML method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analysed. The evolutionary distances were computed using the Kimura 2-parameter method [21] and are in the units of the number of base substitutions per site. The rate variation among sites was modelled with a gamma distribution (shape parameter = 1).

Haplotype Analysis.
Minimum spanning network of haplotypes was prepared by Network 4.6 software [22]. Intrapopulation diversity was analysed by estimating haplotype diversity, which indicates the probability that two randomly chosen haplotypes are different, and nucleotide diversity, which indicates the probability that two randomly chosen homologous nucleotides are different.

Population Genetic Analysis.
The isolation-by-distance effects on population genetic structure were estimated by IBDWS 3.23 and pairwise ST statistics [23] using Arlequin 3.5. The hierarchical nesting of genetic diversity was estimated using the analysis of molecular variance (AMOVA) approach and was calculated using Arlequin 3.5. Significance of pairwise population comparisons was tested by 20,000 permutations. The AMOVA tests were organized in a hierarchical manner, and 1,000 permutation procedures were used to construct null distributions and to test the significance of variance components [24]. To detect population expansion or contraction, Tajima's D and Fu's FS values were estimated based on pairwise differences between sequences. Tajima's D test [25] calculates the distribution of allele frequency of segregating sites, whereas Fu's FS test [26] is based on the distribution of alleles or haplotypes. Table 1 shows the number of samples ( ), number of haplotypes ( Hap), haplotype diversity (Hap ), and nucleotide diversity ( ) for each population. A total of 77 individuals were sequenced for the mtDNA control region (873 bp).

Molecular Characterization and Genetic Diversity.
Among 77 samples, 94 polymorphic sites and 58 haplotypes were detected. These polymorphisms included 64 parsimony informative sites and 32 singleton sites. The nucleotide composition (%) was 31.7 (A), 27.9 (T), 23.7 (C), and 16.7 (G). The haplotype diversity (ℎ) of the analysed populations was rather high with observed values between 1.000 ± 0.0058 in Ganga and 0.756 ± 0.01678 in Tapti population. The nucleotide diversity ( ) within each population was very low, ranging from 0.00935 ± 0.0000012 in Hooghly to 0.01835 ± 0.0000059 in Tapti population (Table 1).

Haplotype Distribution and Phylogenies.
In phylogenetic study ( Figure 1) and minimum spanning network of 58 haplotypes (Figure 2), eight populations of T. ilisha were grouped in four lineages. The haplotypes shared among different populations were 25

Population Genetic Analysis.
There is significant geographical structuring among populations only when all populations were grouped in Bay of Bengal and Arabian Sea. AMOVA revealed 4.42% variation among populations and 95.58% variation within population (Table 3), which was further supported by significant ST value (i.e., 0.0441, < 0.001). Estimates of genetic differentiation between eight populations using statistics are given in Table 4. The populations from Narmada and Tapti showed high level of genetic differentiation from other populations.
Tajima's D was nonsignificantly negative for all except Narmada and Tapti populations. Fu's FS test also showed nonsignificant negative values for all except Diamond Harbour and Tapti populations. Large differences were observed between 0 (population before expansion) and 1 (population after expansion). In addition, the SSD and Hri values were also nonsignificant except for Diamond Harbour and Tapti populations (Table 5). Also, low haplotype diversity and moderate to high nucleotide diversity was observed in Diamond Harbour and Tapti populations. Mismatch distribution curves were constructed to study the genetic bottleneck and all histograms presented multimodal curves characteristic of populations with constant size over time. The plot between genetic distance and geographic distances showed a highly significant positive correlation indicating that geographic distance corroborates variation in genetic distance between Hilsa populations ( 2 = 0.230, ≤ 0.01) (Figure 3).

Discussion
The significant positive correlation between geographical and genetic distances usually referred to as "isolation by distance" is typically suggestive of migration-drift equilibrium. Our result showed high haplotype diversity (high genetic variation) and low nucleotide diversity and significant positive Fu's FS and Tajima's D test except for Tapti and Diamond Harbour indicative of recent population expansion after a genetic bottleneck or founder events [27,28].  In this study, statistics ( ST ), AMOVA, haplotype network, and phylogenetic analysis revealed genetic differentiation among the T. ilisha populations, suggesting that T. ilisha does not have a single panmictic population. Brahmane et al. [8] separated Ganga/Yamuna rivers stock from Hooghly and Narmada using RAPD markers while Brahmane et al. [9] reported low genetic diversity and absence of population differentiation of Hilsa by using cytochrome b region in Ganga    and Hooghly rivers and also felt the need of advanced genetic markers like ATPase 8/6, nicotinamide dehydrogenase subunit 2 (ND2), and microsatellite for confirmation. Our results also strengthen the presence of more than one stock of Hilsa in Indian subcontinent; however Ganga and Hooghly Feeder Canal and Hooghly populations shared the same genepool.
Geographical divergence and gene flow among subpopulations can be assessed by phylogenetic analysis and haplotype network provided that gene lineages have accumulated sufficient polymorphisms over time and populations have been isolated completely to allow genetic drift to act. In present study ML tree and haplotype network of control region have clearly distinguished four major clades/haplogroups, namely, Lineage 1 (Bay of Bengal populations, Diamond Harbour, Ganga, Hooghly Feeder Canal, Godavari, Paradip Port, and Hooghly), Lineage 2 (Godavari + some haplotypes from Bay of Bengal population), Lineage 3 (Tapti population), and Lineage 4 (Narmada population). Lineage 2 had mixed haplotypes mostly from Godavari and some from Hooghly Feeder Canal (Hap 1), Diamond Harbour (Hap 12), Ganga (Hap 14), and common haplotype (Hap 8); this can be explained by slightly mistaken migration routes taken by some individuals. However formation of four clades may be explained by the philopatry driven genetic differentiation that also corroborates previous tagging studies that demonstrated limited natal and breeding dispersal of Hilsa [6]. This type of behaviour has also been reported in highly studied American shad which exhibits a high rate of philopatry, with 97% of spawners returning to their natal stream [29]. These results were also confirmed by hierarchical AMOVA analysis, that is, amonggroup analysis (two-gene pool analysis of Ganga, Hooghly, Hooghly Feeder Canal, Diamond Harbour, Paradip Port, Godavari and Tapti, and Narmada gene pools) having significant variation of 3.61%. This behavioural isolation of Hilsa populations was also strongly supported by pairwise significant ST values of Tapti and Narmada with Godavari, Paradip Port, Diamond Harbour, Hooghly Feeder Canal, and Ganga.  Although phylogenetic and phylogeographic analysis (minimum spanning network) shows that genetic variation is not randomly distributed among Bay of Bengal rivers, a pattern of population structure and gene flow has been difficult to verify statistically. Analysis of pairwise population ST provides strong support for divergence between Bay of Bengal and Arabian Sea population; however biologically accepted level of statistical significance ( ≤ 0.05) is too stringent to reveal the subtle genetic differences within Bay of Bengal and Arabian Sea populations when the Bonferroni correction for multiple value is applied [30]. The genetic marker most commonly used to elucidate population structure in fish has been control region and was considered sensitive enough to test for structure among most populations due to rapid accumulation of mutation in the control region as well as the simplicity of mtDNA inheritance through maternal line [3]. However its suitability is questionable at interbasin level as true relationship between populations was masked by high haplotype diversity and homoplasy. statistics ( ST ) and AMOVA could not clearly demarcate the population structure among populations from Bay of Bengal. Mazumder and Alam [10] used RFLP of mitochondrial D-loop region to differentiate riverine, estuarine, and marine stocks and observed significant differentiation between the riverine and marine (Cox's Bazar) populations, but not between the marine and one of the estuarine populations as electrophoretic analysis of PCR-RFLP has lower resolving power than sequencing of the same PCR product. They suggested sequencing of D-loop region and faster evolving molecular markers for population structure studies, such as microsatellite loci. Similar results were recorded for yellowfin tuna where analysis of the control region between Atlantic and Indo-Pacific populations was unable to detect structure but clearly resolved by PCR RFLP of the ATPase 6/8 and COI III genes [31].

Conclusion
The populations are supposed to be separated by the philopatric behaviour. Phylogeographical structure defined by AMOVA and region specific haplotypes distinguished populations up to sea level (Bay of Bengal and Arabian sea) but population structuring at basin level was not noticed using D-loop as a marker so we suggest the use of some moderately evolving markers such as ATPase 8/6, nicotinamide dehydrogenase subunit 2 (ND2), and microsatellites for elucidating population structure of Hilsa.