The Complete Mitochondrial Genome of Ctenoptilum vasava (Lepidoptera: Hesperiidae: Pyrginae) and Its Phylogenetic Implication

We here report the first complete mitochondrial (mt) genome of a skipper, Ctenoptilum vasava Moore, 1865 (Lepidoptera: Hesperiidae: Pyrginae). The mt genome of the skipper is a circular molecule of 15,468 bp, containing 2 ribosomal RNA genes, 24 putative transfer RNA (tRNA), genes including an extra copy of trnS (AGN) and a tRNA-like insertion trnL (UUR), 13 protein-coding genes and an AT-rich region. All protein-coding genes (PCGs) are initiated by ATN codons and terminated by the typical stop codon TAA or TAG, except for COII which ends with a single T. The intergenic spacer sequence between trnS (AGN) and ND1 genes also contains the ATACTAA motif. The AT-rich region of 429 bp is comprised of nonrepetitive sequences, including the motif ATAGA followed by an 19 bp poly-T stretch, a microsatellite-like (AT)3 (TA)9 element next to the ATTTA motif, an 11 bp poly-A adjacent to tRNAs. Phylogenetic analyses (ML and BI methods) showed that Papilionoidea is not a natural group, and Hesperioidea is placed within the Papilionoidea as a sister to ((Pieridae + Lycaenidae) + Nymphalidae) while Papilionoidae is paraphyletic to Hesperioidea. This result is remarkably different from the traditional view where Papilionoidea and Hesperioidea are considered as two distinct superfamilies.


Introduction
The taxonomic status and the phylogenetic position of skippers (Hesperiidae) within Lepidoptera remain a controversial issue [1][2][3]. Due to the distinct differences between the skippers and the typical butterflies/moths in terms of morphological and behavioral characteristics, such as the short stout bodies, hooked antennae, and rapid skipping flight, the skippers were previously proposed to represent a separate group that is distinct from butterflies/moths in lepidopterans. More specifically, the skippers are assigned to the family Hesperiidae in a monotypic superfamily Hesperioidea, a sister lineage to the typical rhopaloceran butterflies, which mostly belong to superfamily Papilionoidea (true butterflies) [2,4,5]. In addition, the three superfamilies Hesperioidea, Papilionoidea, and Hedyloidea share numerous morphological characteristics, particularly in their egg, larval, and pupal stages, and thus were considered to be a large natural group [2].
The Tawny Angle, Ctenoptilum vasava, is a typical skipper commonly found in southern East Asia, such as China, India, Burma, Thailand, and Vietnam. In this study, we sequenced the complete mitochondrial genome of the skipper, representing the first mt genome sequence from the family Hesperiidae (superfamily Hesperioidea). We next compared this sequence with other lepidopteran mt genomes sequences and examined the phylogenetic relationships within lepidopterans and reevaluated the phylogenetic position of skippers. We show that the skipper shares the general organization and structure of the mt genome with other species from the order Lepidoptera. By examining currently available mt genomes in lepidopterans, we find the Hesperioidea is placed within the Papilionoidea, which may be a paraphyletic group.

Sample Collection.
Adult individuals of Ctenoptilum vasava (Lepidoptera: Hesperiidae: Pyrginae: Ctenoptilum) were captured from National Natural Conservation Areas of Jiu Lianshan Mountain, Jiangxi Province, China, in August, 2009. After a brief examination for species identification, the fresh tissues were preserved in 100% ethanol immediately for DNA fixation and stored at −20 • C until further use for genomic DNA isolation.

DNA Extraction and PCR Amplification.
The whole genomic DNA of C. vasava was isolated from the thoracic muscle of an adult individual using the proteinase K-SiO 2 method as described by Hao et al. [32]. Partial sequences of COI, COIII, Cytb, ND4, 16S rRNA, and 12S rRNA genes were amplified using insect universal primers [33]. Polymerase chain reactions (PCRs) were performed under the following condition: 5 minutes of initial denaturation at 95 • C, 35 cycles of denaturation at 95 • C for 50 seconds, annealing at 45-55 • C (depending on primer pairs) for 50 seconds, extension at 72 • C for 1 minute, and a final extension at 72 • C for 10 minutes. Based on the sequences from the newly acquired gene fragments, the long PCR primers were designed (Table 1) according to the conserved regions by the program Primer premier 5.0 [34], and the entire mt genome of the skipper was in turn amplified in five long fragments (12S-COI, COI-COIII, COIII-ND4, ND4-Cytb, Cytb-12S) by using Takara LA TaqTM (Takara). The long PCR condition is as follow: an initial denaturation at 95 • C for 5 min, 15 cycles of denaturation at 95 • C for 50 seconds, annealing at 50-55 • C (depending on primer pairs) for 50 seconds, extension at 68 • C for 150 seconds, additional 15 cycles of denaturation at 95 • C for 50 seconds, annealing at 50-55 • C for 50 seconds, extension at 68 • C for 150 seconds, and a final extension at 68 • C for 10 minutes. PCR products were examined by electrophoresis on a 1% agarose gel and purified using a DNA gel extraction kit (Takara). All PCR fragments were directly sequenced in both strands after purification with QIA quick PCR Purification Kit (Qiagen). Long PCR fragments were sequenced using the primer walking strategy (walking primer information will be provided upon request).

Sequence Analysis.
The raw sequence files were proofread and assembled in BioEdit version 7.0 [35]. The concatenated amino acid sequences of the 13 PCGs were obtained and analysed by the ClustalX [36] and the MEGA 3.0 [37] softwares. The structures of the 23 tRNAs and a tRNA-like gene of the skipper were identified by the software tRNAscan-SE version 1.21 [38]. The putative tRNAs, which were not found by tRNAscan-SE, were identified by sequence comparisons between the skipper and other lepidopteran tRNAs. Nucleotide composition was calculated using MEGA 3.0 software [37], and the tandem repeats in the AT-rich region were predicted by the Tandem Repeats Finder available online (http://tandem.bu.edu/trf/trf.html) [39]. The mt genome sequence has been submitted to GenBank database under the accession number JF713818.

Phylogenetic Analysis.
The multiple alignment of the 13 PCG concatenated nucleotide sequences of the 32 lepidopteran mt genome sequences (one is from this study, and 31 were extracted from GenBank, see Table 2) was conducted using Clustal X1.8 [36] and was checked by eye. The phylogenetic trees were reconstructed with the maximum likelihood (ML) and bayesian inference (BI) methods using a hymenopteran species Apis cerana (GenBank accession number NC 014295) as the outgroup. In both phylogenetic analyses, the third position of all the codons was excluded. The ML analyses were conducted in PAUP (version 4.0b8) [40] with searching method of TBR branch swapping (10 random addition sequences), the general time reversible model with invariant sites and among-site variation (GTR+I+Γ) was selected as the best fit model using  [42] with the partitioned strategy, and the best fit substituion model was selected as in the ML analysis. MrBayes 3.1.1 simultaneously initiates 2 Markov Chain Monte Carlo (MCMC) that runs to provide additional confirmation for the convergence of posterior probability distribution. Four simultaneous chains were run, 3 hot and 1 cold. Analyses were run for one million generations until the average standard deviation of split frequencies to be less than 0.01, which means convergence was reached. Chains were sampled every 1,000 generations. Starting trees were random.

Genome Organization, Gene Arrangement, and Base
Composition. The organization of the skipper mt genome was shown in Figure 1. The complete mt genome is 15,468 bp in length, containing 13 protein-coding genes (ND1-6, ND4L, COI-III, Cytb, ATP6, ATP8), 2 ribosomal RNAs (12S and 16S), 24 putative tRNAs, and an AT-rich region, same characteristics as with those of other butterfly species available (Tables 3 and 4). The size of so far sequenced mt genomes in lepidopteran insects ranges from 15,140 bp in Artogeia melete [7] to 15,928 bp in Bombyx mandarina [13]; our newly sequenced mt genome of the skipper is within the above size range. Mitochondrial genes of the skipper are arranged in the same order and orientation as those of other lepidopterans, except for the presence of an extra copy of trnS (AGN) and a tRNA-like insertion trnL(UUR). Similar to many insect mt genomes, the heavy strand (H-strand) encodes more genes (9 PCGs and 14 tRNAs), whereas the light strand (L-strand) encodes less (4 PCGs, 8 tRNAs, and 2 rRNA genes). The composition of A, T, G, C nucleotides in the skipper mt genome L-strand indicating a remarkably high AT Ctenoptilum vasava Mitochondrial DNA 15468 bp content (80.55%) and a low GC content (19.45%). This result is consistent with the AT bias generally observed in other lepidopteran mt genomes, given that the composition of AT is ranging from 77.9% in Ochrogaster lunifer [16] to 82.7% in Coreana raphaelis [6]. The AT and GC skew values in Hstrand of the skipper mt genome are −0.0293 and −0.216, respectively, indicating that T and C are favored over A and G, respectively. The start codons for COI gene of the lepidopteran insects are not uniform. All lepidopteran species examined to date use Arginine (R), which correspond to the codon CGA, as the initial amino acid for COI [43]. In this study, the CGA is not conserved across all lepidopteran mt genomes, but it has been suggested to serve as the COI start codon [8]. However, in other insect groups, some other canonical codons, such as the TTA [44], TCG [45], TTG [46], ACG [47], were reported as the COI start codons. Additionally, some tetranucleotides, such as the ATAA, ATCA, and ATTA [48], as well as the hexanucleotides, such as the ATTTAA [49][50][51], TATCTA [52], TTTTAG [13], and TATTAG [29], were also proposed as the start codons for this gene.

Protein-Coding Genes (PCG
The 13 protein-coding genes all possess complete stop codons except for the COI and COII in this study. Most stop codons for COII are single T in most sequenced lepidopteran mt genomes. The single T stop codon is usually located nearby the trnK, the structure of which is recognized by endonucleases processing the polycistronic pre-mRNA transcription, and produces functional stop codons by polyadenylation from its contiguous protein-coding genes [53][54][55]. Accordingly, partial stop codons observed in most lepidopteran species would be one strategy for the selection of stop codon; in other words, it would minimize the intergenic spacers and gene overlaps from an evolutionary economic perspective. Table 3: Comparison of the mt genome characteristics of all butterfly species available. All genomes share the features of 13 protein-coding genes (ND1-6, ND4L, COI-III, Cytb, ATP6, ATP8), 2 ribosomal RNAs (lsu rRNA (12S) and lsu rRNA (16S)), and an AT-rich region. The amino acids composition and the codon usage of the skipper mt genome are shown in the Tables 5 to 6. The results showed that the codon usage of all the genes has a strong bias, and the RSCU (relative synonymous codon usage) of NNU and NNA codons are greater than 1, indicating that the third positions of the UA have a high frequency of codon usage. The codon usage bias in protein-coding genes and the third position of AT bias (92.1%) are positively correlated. In addition, our analysis also showed that UUU (Phe), UUA (Leu), AUU (Ile), AUA (Met), and AAU (Asn) are the most frequently used codons (45.45%). Together, all observations suggest the strong AT bias of the protein-coding genes in the skipper mt genome.

Ribosomal and Transfer RNA Genes.
The srRNA (12S rRNA) and lrRNA (16S rRNA) genes in the skipper mt genome are 774 bp and 1343 bp in size, respectively. Similar to other lepidopteran rRNAs [8,22], the skipper 12S rRNA is located between trnL (CUN) and trnV, whereas its 16S rRNA resides between trnV and the A+T-rich region. The A+T contents of the two rRNA genes are 86.43% and 84.14%, respectively, which is consistent with those observed in other lepidopterans as well.
The skipper mt genome contains 23 transfer RNAs and a tRNA-like gene (trnL (UUR)), and these genes are interspersed throughout the whole genome and ranged in size from 61 to 79 bp. Twenty-three of them are shown to be folded into the expected clover-leaf secondary structure; however, the trnS (AGN) lacks the dihydrouridine (DHU) loop ( Figure 2) as shown in the other insect mt genomes examined to date. Therefore, the incomplete trnS should be considered as one of common features in most, if not all, insect mt genomes. A total of 32 unmatched base pairs were detected in the skipper tRNAs, and 18 of them are GU pairs, which form a weak bond in the tRNAs, the remaining 14 are atypical pairs, including 11 UU pairs, two AC pairs, and one AA pairs (Figure 2).
The extra trnS was detected between trnS (AGN) and trnE in the skipper mt genome (Figure 2), and the tRNA insertions were also observed in other lepidopterans, such as C. raphaelis and A. issoria. The high similarity between the two tandemly repeated copies of the trnS in primary and secondary structures may suggest a recent duplication event [6,9]. Interestingly, a 79 bp insertion of a tRNA-like gene was detected between the trnL (CUN) and lrRNA genes, and this observation was not shown in any lepidopteran mt genomes to date, despite that tRNA translocation is a frequent event in the evolution of lepidopteran mt genomes [56].

Intergenic Spacers and Overlapping
Sequences. The skipper mt genome contains a total of 607 bp intergenic spacers, which distribute in 11 regions, ranging from 2 to 61 bp in length. Most of the spacers are shorter than 10 bp, only 7 spacers are longer ( Table 2). The longest intergenic spacer (61 bp) is located between the trnQ and ND2 genes, with a high AT content (98.4%). This spacer is usually considered as a constitutive synapomorphic feature of lepidopteran mt genomes because of the absence in nonlepidopteran insects [17]. The second longest one (ATACTAAAAATATATTA) is inserted between the trnS2 and ND1 and shared the ATACTAA motif observed in most insects, including lepidopterans [17], and possibly fundamental to site recognition by the transcription termination peptide (mtTERM protein) [57]. The third longest spacer (CAATTTCTTTT) is inserted between the trnC and trnY, which is the same as O. lunifer. The fourth longest spacer (AAATTATTAAATTT) is located between trnK and trnD, which is also found in other lepidopterans. The fifth longest one (TTTTCTTTTCTTT) is   resided between the two trnS, which is almost identical to that in C. raphaelis. The skipper mt genes are overlapped in a total of 51 bp at 12 locations, with the longest overlap of 7 bp, which is located between the ATP8 and ATP6. The overlapped nucleotides are shared in all the lepidopteran mt genomes so far examined and are probably helpful to form the structure of hairpin loop for posttranslation modifications [6,43]. Other overlap regions include the 5 bp nucleotides between the trnW and trnC, 3 bp nucleotides between the trnY and COI, and the remaining 9 overlapping nucleotides are shorter than 3 bp (Table 4). Of the overlaps, the one between the trnF and ND5 is only 1 bp in size, substantially shorter than those of other lepidopteran species, such as the A. pernyi (15 bp

The AT-Rich Region.
The 429 bp AT-rich region with the AT content of 88.11% is located between the srRNA and trnM. This region includes the O N (origin of minority or light strand replication) which has the motif ATAGA and is followed by 19 bp poly-T that is conserved across all lepidopterans. The AT-rich region also includes a few short microsatellite-like repeat regions such as poly-A, poly-T, and (AT) 3 (TA) 9 prior to the structural motif ATTTA or ATTA that is common to all the lepidopteran mt genomes. In addition, for the first time, we found two large repeated elements, the 20 bp repeat (TATAATTTAATAAAAAAAATA), in the ATrich region in a lepidopteran mt genome (Figure 3), similar observation was also reported in satyrid Eumenis autonoe, despite that their functions still remain unknown [11].

Phylogenetic Analysis
A total of 32 lepidopteran species represent seven lepidopteran superfamilies (Tortricoidea, Pyraloidea, Papilionoidea, Hesperioidea, Bombycoidea, Geometroidea, and Noctuoidea). Of them, the superfamilies Papilionoidea and Hesperioidea are usually referred to as the macrolepidopterans together with the superfamilies Bombycoidea, Geometroidea, and Noctuoidea. Their phylogenetic relationships have long been a subject of controversy [58][59][60][61]. One of the most compelling hypotheses is that the Hesperioidea and Papilionoidea are closely related to each other [58,60].
The ML and BI phylogenetic trees in this study based on 13 protein coding genes of the mitochondrial genomes revealed two major clusters: the first includes Bombycoidea, Geometroidea, Noctuoidea, Tortricoidea, and Cramboidea, and the second contains Papilionoidea and Hesperioidea with strong support (Figure 4(a)). Because the first cluster was also revealed by Kim et al. [11] and Mutanen et al. [62], we here focus on the second cluster, in which the superfamily Papilionoidea appears to be paraphyletic (Figure 4(a)) or polyphyletic (Figure 4(b)), and the superfamily Hesperioidea is placed within the Papilionoidea as a sister to ((Pieridae + Lycaenidae) + Nymphalidae). This result is consistent with those reported earlier based on combined data sets of a few nuclear and mitochondrial genes [62][63][64]. It is, however, remarkably different from the traditional view where Papilionoidea and Hesperioidea are considered as two distinct superfamilies. In conclusion, on the contrary to the common view that Papilionoidea and Hesperioidea are two separate groups in light of morphological and behavioral characteristics, our new mt genome data support that, while the majority of the Papilionoidea lineages is a sister group to Hesperioidea, the Papilionoidae is a separate clade among the two superfamilies. Our results call for more data from Hesperioidea to establish the accurate phylogeny of skippers (Hesperioidea) and true butterflies (Papilionoidea).