Species Identification of Marine Fishes in China with DNA Barcoding

DNA barcoding is a molecular method that uses a short standardized DNA sequence as a species identification tool. In this study, the standard 652 base-pair region of the mitochondrial cytochrome oxidase subunit I gene (COI) was sequenced in marine fish specimens captured in China. The average genetic distance was 50-fold higher between species than within species, as Kimura two parameter (K2P) genetic distances averaged 15.742% among congeners and only 0.319% for intraspecific individuals. There are no overlaps of pairwise genetic variations between conspecific and interspecific comparisons apart from the genera Pampus in which the introgressive hybridization was detected. High efficiency of species identification was demonstrated in the present study by DNA barcoding. Due to the incidence of cryptic species, an assumed threshold is suggested to expedite discovering of new species and biodiversity, especially involving biotas of few studies.


Introduction
Fishes are important animal protein sources for human beings, and they are frequently used in complementary and alternative medicine/traditional medicine (CAM/TM). The delimitation and recognition of fish species is not only of interest for taxonomy and systematics, but also a requirement in management of fisheries, authentication of food products, and identification of CAM/TM materials [1][2][3].
Due to the complexity and limitations of morphological characters used in traditional taxonomy, several PCR-based methods of genotype analysis have been developed for the identification of fish species, particularly for eggs, larvae, and commercial products. Sequence analysis of species-specific DNA fragments (often mitochondrial or ribosomal genes) and multiplex PCR of species-conserved DNA fragments are efficient for fish species identification [4][5][6][7][8][9][10]. However, these molecular methods are limited to particular known species and are not easily applicable to a wide range of taxa. Therefore, Hebert et al. advocated using a standard DNA sequence that is DNA barcoding to identify species and uncover biological diversity [11,12]. For many animal taxa, sequence divergences within the 5 region of the mitochondrial cytochrome oxidase subunit I (COI) gene were much greater between species than within them, and this in turn suggests that the approach is widely applicable across phylogenetically distant animal groups [12,13]. To date, some published papers explicitly address that COI barcodes effectively discriminate different species for a variety of organisms [14][15][16][17][18][19][20][21][22][23]. However, several scientists express concerns that species identification based on variations of single mitochondrial gene fragment may remain incorrect or ambiguous assignments, particularly in cases of possible mitochondrial polyphyly or paraphyly [24,25]. In the current study, we test the efficacy of DNA barcoding in marine fishes of China. The sea area of China is part of the Indo-West Pacific Ocean, which is regarded as the center of the world's marine biodiversity [26]. Highly species-rich biotas are particularly attractive to test the reliability and efficiency of DNA barcoding.

Material and Methods
The majority of fish specimens were captured with the drawl net at 20 localities along the coast of China (collection information available at http://www.barcodinglife.org/). A total of 329 specimens from one hundred species of fish were collected. Vouchers were deposited in the South China Sea Institute of Oceanography, Chinese Academy of Sciences, and all specimens were preserved in 70% ethanol. Tissue samples were dissected from the dorsal muscle, and genomic DNA was extracted according to the standard Barcode of Life 2 Evidence-Based Complementary and Alternative Medicine protocol [27]. Firstly, fragments of the 5 region of the mitochondrial COI gene were PCR-amplified using C FishF1t1/ C FishR1t1 primer cocktails [28]. The cocktail C FishF1t1 contained two primers (FishF2 t1/VF2 t1), and C FishR1t1 also contained two primers (FishR2 t1/ FR1d t1). All PCR primers were tailed with M13 sequences to facilitate sequencing of products. The nucleotide sequences of the primers were FishF2 t1: * 5 -TGTAAAACGACGGCCAGTCGACT-AATCATAAAGATATCGGCAC-3 . VF2 t1: * 5 -TGTAAAACGACGGCCAGTCAACCA-ACCACAAAGACATTGGCAC-3 .
The thermocycling protocol used was 1 min at 95 • C and 35 cycles of 30 sec at 94 • C, 40 sec at 50 • C, and 1 min at 72 • C, with a final extension at 72 • C for 10 min. Sequecing PCR and sequencing followed above procedure.
DNA sequences were aligned with SEQSCAPE v.2.5 software (Applied Biosystems, Inc.). Sequence divergences were calculated using the Kimura two parameter (K2P) distance model [29], and unrooted NJ trees based on K2P distances were created in MEGA software [30]. In the chosen taxonomic group, phylogenetic analysis was carried out in PAUP 4.010b using the maximum parsimony (MP) method, with 1,000 replications of the full heuristic search.
The following categories of K2P distances were calculated: intraspecific distances (S), interspecies within the congener (G), and interspecies from different genus but within intrafamily (F). These values were plotted using the boxplot representation of R. Boxplots [31] in SPSS 11.5 software (SPSS Inc., Chicago, IL, USA). Only for families containing 2 or more genera, separate boxplot was constructed for the sake of comparisons among taxonomic categories. Boxplots describe median (central bar), interquartile range (IQR: between upper (Q3) and low (Q1) quartile), values lying within 1.5× IQR beneath Q1 or 1.5× above Q3 ("whiskers"), and extreme values (outliers). Mann-Whitney tests were performed between S, G, and F distributions to estimate the overlap among taxonomic ranks.

Results
A total of 329 specimens were analyzed, from which 321 sequences (all >500 bp) belonging to 121 species (another species was identified to the genus level) were ultimately obtained (GenBank accession numbers: EF607296-EF607616). These species cover the majority of fishes living in the coastline of the South China Sea. All sequences were aligned with a consensus length of 652 bp, and no insertions, deletions, or stop codons were observed in any sequence. However, multiple haplotypes were detected for some species.
The mean intraspecies K2P (Kimura two-parameter) distance was 0.319%; the distance increased sharply to 15.742% among individuals of congeneric species. Overall,  the average genetic distance among congeneric species is nearly 50-fold higher than that among individuals within species. For the higher taxonomic ranks (family, order, and class), mean pairwise genetic distances increased gradually and reached 20.199%, 24.656%, and 25.225%, respectively (Table 1). Standard errors for K2P genetic distances were small, and values of the mean and median were close within different taxonomic ranks (Table 1). This indicates fluctuations of K2P genetic distances tend to be convergent (Figures 1 and 2). In the unrooted NJ (neighbour-joining) tree (Figure 3), three specimens of Pampus argentenus were grouped together and contained within the cluster of Pampus cinereus. These Pampus argentenus specimens were collected in the same site off the west coast of the South China Sea, and were difficult to identify because of their complex morphological characteristics (available at http://www.barcodinglife.org/). They possessed combined characteristics of Pampus cinereus and Pampus argentenus: the asymmetrical tail of Pampus cinereus and silver color of Pampus argentenus. If the suspicious congeneric K2P distances in the genera Pampus are excluded (the extreme outliers in Figure 1), the pairwise genetic  divergences among congeneric species are above 10%. There are no overlaps between intraspecific and congeneric K2P distances within the same family (Figure 3).
At the species level, all COI sequences clustered in monophyletic species units. At the family level, there were paraphyletic clusters for three families (Carangidae, Gobiidae, and Ariidae) (Figure 3), though over 98% of specimens fell into the expected division of families. Intrafamily K2P distances (F) were generally higher than congeneric (G) distances, which were definitely higher than intraspecific (S) distances (Table 1, all Mann-Whitney tests were highly significant, P value <10−6). However, overlaps between F and G distances were observed in Clupeidae, Carangidae, Mullidae, and Muraenesocidae.
The factors responsible for deviations from taxonomic monophyly may be varied and complex [35]; one potential cause of species-level polyphyly is the occasional mating between distinct species, resulting in hybrid offspring carrying a mixture of genes from both parent species. Furthermore, mitochondrial genes are generally subjected to introgression more frequently than nuclear ones, and introgression also leads to phylogenetic paraphyly [35][36][37][38], like the hybridization between Pampus argentenus and Pampus cinereus in this study. In such cases, combinations of morphological and genotypic data are needed for species assignment of hybrids.
Biological mechanisms, water dynamics, or historical events may cause deep genetic structuring of populations in marine species [26,39]. Many explanations for genetic population structuring on local and regional scales involve behaviors such as the adoption of pelagic early life stages and movement over broad geographic ranges, and these factors are theoretically associated with gene flow [40][41][42]. For many marine fishes, there is a lack of phylogeographic structure among populations [43,44]; in this study, for individuals from long distance localities, some intraspecific genetic variations reduced to zero within families Carangidae, Sciaenidae, and Mullidae. However, some pairwise K2P distances exceeded 1.00% within the coastal species such as Acentrogobius caninus, Scomber japonicus, Terapon jarbua, Upeneus sulphureus, Elops hawaiensis, Gymnothorax pseudothyrsoideus, and Dendrophysa russelii. It implied that biological mechanisms were responsible for the fluctuation of intraspecific genetic divergences in marine fishes.
The neighbor-joining method was originally employed in this study for species identification, but some phylogenetic information was also revealed by the dendrogram, and over 98% of specimens were allocated into different families without polyphyly/paraphyly in the NJ tree ( Figure 3). However, DNA barcoding is independent of the way the taxonomy has been built, and it cannot be regarded as the "taxonomic" tag [45]. DNA barcoding is no substitute for taxonomy Ebach and Holdrege [46], and a great deal of work is needed to bring about the reconciliation between traditional and molecular taxonomy. It is unfeasible to build the phylogeny of fishes only based on mitochondrial DNA fragments. Polyphyly/paraphyly in the NJ tree probably results from "bad taxonomy" when named species fail to identify the genetic limits of separate evolutionary entities, particularly for perplexing taxa involving cryptic species [47]. If we cannot set a threshold of the genetic variation in species delimitation, we find ourselves sunk in the dilemma facing new or cryptic species. On the one hand, the morphological taxonomy cannot give a definite identification. On the other hand, we cannot claim that it may be a new species based on molecular analysis without the species delimitation. An assumed threshold is helpful to expedite discovery of new species and biodiversity, especially in dealing with little-studied biotas, although a single, uniform threshold for species delimitation seems arbitrary because rates of molecular evolution vary widely within and among lineages [24,25,48].