Codon Usage Bias in Two Hemipteran Insect Species: Bemisia tabaci and Homalodisca coagulata

Codon bias is the nonuniform use of synonymous codons which encode the same amino acid. Some codons are more frequently used than others in several organisms, particularly in the highly expressed genes.The spectacular diversity of insects makes them a suitable candidate for analyzing the codon usage bias. Recent expansion in genome sequencing of different insect species provides an opportunity for studying the codon usage bias. Several works on patterns of codon usage bias were done on Drosophila and other related species but only few works were found in Hemiptera order. We analyzed codon usage in two Hemipteran insect species namely Bemisia tabaci and Homalodisca coagulata. Most frequent codons end with A or C at the 3rd codon position. The ENC (a measure of codon bias) value ranges from 43 to 60 (52.80) in B. tabaci but from 49 to 60 (56.69) in H. coagulata. In both insect species, a significant positive correlation was observed between A and A3%, C and C3%, and GC and GC3%, respectively. Our findings suggest that codon usage bias in two Hemipteran insect species is not remarkable and that mutation pressure causes the codon usage pattern in two Hemipteran insect species.


Introduction
Codon usage bias is the phenomenon of unequal usage of codons during transcription and translation of genes to proteins.Some codons are used more often than alternate synonymous codons in genes where such bias exists.Such codons are often referred to as optimized codons or preferred codons.The extent of codon bias is mainly caused by mutation and translational selection [1][2][3][4].
Repeatedly used codons are often termed optimal or major codons, whereas less repeatedly used ones are termed nonoptimal or minor codons.Less repeatedly used codons usually correspond to less abundant tRNAs in the cell than optimal codons do [5][6][7][8][9] and the translational machinery is more likely to pause there [10].During elongation stage of translation, these processing errors can occur in the cell and are termed as premature termination [10].Therefore, the intensity of purifying selection is expected to increase as peptide elongation proceeds [9,11] and so the codon usage bias in a gene should also lead to a steady increase [11].In addition, optimal codons may be selected for translational effectiveness and so are more often used in highly articulated genes [2,9,[12][13][14][15][16]. Therefore, codon usage bias may be accounted for by negative selection against nonoptimal codons and positive selection for optimal codons.Analysis of codon usage bias is a well-established technique for understanding the protein coding sequences of genomes.As usage of synonymous codons during translation is nonuniform, identifying the patterns of codon usage bias is important towards understanding the mode of translational selection of protein coding genes among related species.The study of codon usage bias is gaining momentum with the initiation of whole genome sequencing of numerous organisms [17].Molecular evolutionary investigations suggest that codon usage bias varies both within and between genomes and may have a major significance in understanding genome evolution among related species at molecular level [18].The investigation of causes and consequences of the pattern of codon usage bias and the detection of selective forces that shape their evolution are of practical importance to studies of genome biology.

Advances in Biology
The analyses on codon usage patterns in insects are of great interest for several reasons.The exceptional diversity of insects within the animal kingdom makes insects suitable for studying the codon bias patterns at different evolutionary time scales.Moreover, the recent spurt in genome sequencing of different insect species complexes also provides an exceptional opportunity for studying the evolution of codon bias in coding sequences within specific families and genera.Although studies have been taken up on codon usage bias within the genus Drosophila or between a few other insect species [19][20][21], a comprehensive analysis of codon bias patterns between various Hemipteran insects sequenced genomes is lacking.The insect species analyzed in this study assume importance because they are relevant as vectors to the transmission of various plant diseases and these insect species are currently considered as important agricultural pests for many crop plants.The silver-leaf whitefly, Bemisia tabaci, which is also referred to as the sweet-potato whitefly is one of numerous whiteflies that had a major impact on tobacco crops.It has been transmitting geminiviruses such as lettuce infectious yellows virus, tomato yellow leaf curl virus, and African cassava mosaic virus for years and over many continents [22].The glassy-winged sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae), which sucks xylem fluid is native to the southwestern United States.The discovery of the role of this insect in transmission of pathogenic bacterium Xylella fastidiosa which causes Pierce's disease of grapevines [23][24][25] has acquired importance in scientific study.
The codon usage bias is most prominent in highly expressed genes.Difference of codon optimization between genes provides differential efficiency as well as accuracy in the translation of genes [26][27][28].The selection associated with translational efficiency/accuracy is often termed as "translation selection." The study of codon bias is gaining renewed attention of scientists across the globe with the advent of whole genome sequencing of numerous organisms [17].Various factors such as gene expression level, gene length, composition bias (%GC content and GC skew), recombination rates, and RNA stability are known to influence codon bias in organisms [4,[29][30][31].
In this study, one of the major objectives is to understand the patterns of codon usage bias among the two Hemipteran insect species: Bemisia tabaci and Homalodisca coagulata.The Hemiptera order includes the majority of insect genomes that have been sequenced so far.Although studies have been conducted on codon usage bias within the genus Drosophila or between a few other insect species [19][20][21] comprehensive analysis of codon usage bias patterns between the sequenced genomes of these two species is lacking.
The goal of this study is to perform a comparative analysis of codon usage bias pattern among the sequenced genes of these two insect species belonging to the same order (Hemiptera).We have analyzed a few selected genes, which are available in NCBI, of the two insect species Bemisia tabaci and Homalodisca coagulata in this investigation.Our results provide useful insights on the patterns of codon usage bias that facilitate better understanding of the structure and evolution of gene coding sequences of these insects.

Materials and Methods
2.1.Sequence Data.The complete coding sequences of genes for different insect species were retrieved from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Genbank/).To minimize the sampling errors [32], we have analyzed only those CDS sequences that are 100 codons or more in length and have correct initial and termination codons.

Compositional Properties.
General nucleotide composition (A, C, T, and G%) and nucleotide composition at the third position of each codon (A3, C3, T3, and G3%) were analyzed for insect CDSs using an in-house Perl script developed by SC.The GC and GC3 indices referred to the overall GC content in the gene sequence and at the third position of synonymous codon (excluding met, trp, and termination codons), respectively.

Measures of Synonymous Codon Usage Bias.
A large number of indices for measuring codon usage bias have been proposed; some of the most relevant and widely used measures analyzed in this study are discussed below.

Relative Synonymous Codon Usage (RSCU).
Relative synonymous codon usage (RSCU) is calculated as the ratio of the observed frequency of a codon to the frequency expected if all synonymous codons of a particular amino acid are used equally.RSCU value greater than 1.0 indicates that the corresponding codons are used more frequently than the expected frequency whereas the reverse is true for RSCU values less than 1.0 [12].Consider where   is the frequency of occurrence of the th codon for th amino acid (any   with a value of zero is arbitrarily assigned as a value of 0.5) and   is the number of codons for the th amino acid (th codon family).

The Effective Number of Codons (ENC).
The effective number of codons used by a gene (ENC) is generally used to measure the bias of synonymous codons [32].The values of ENC range from 20 (when only one codon is used for each amino acid) to 61 (when codons are used randomly).If the calculated ENC is greater than 61 (because codon usage is more evenly distributed than expected), it is adjusted to 61.Consider (2)

Codon Adaptation Index (CAI).
The codon adaptation index (CAI) [33] is a very widely used measure of codon bias in prokaryotes [34][35][36] as well as eukaryotes [37][38][39].CAI is a measure of the relative adaptiveness of the codon usage of a gene towards the codon usage of highly expressed genes.The relative adaptiveness () of each codon is the ratio of the usage of each codon, to that of the most abundant codon within the same synonymous family.Nonsynonymous codons and termination codons (dependent on genetic code) are excluded from this analysis.CAI values range from 0 to 1, with higher values indicating a higher proportion of the most abundant codons [33].The CAI is calculated as where  is the relative adaptiveness of the th codon and  is the number of synonymous codons in the gene.The analysis for codon usage bias was done by using a Perl script developed by SC.

Statistical Analysis.
Correlation analysis was used to identify the relationship between overall nucleotide composition and each base at 3rd codon position.All the statistical analyses were done using the SPSS software.

Compositional Properties. The genes of Bemisia tabaci
and Homalodisca coagulata with their accession number, gene length, ENC, CAI, and overall GC(%), GC1(%), GC2(%), and GC3(%) are shown in Tables 1 and 2 2 shows the distribution of the overall GC contents in B. tabaci genes.In H. coagulata, the average overall GC and GC3% are 51.07 and 60.10, respectively.Figure 3 shows the distribution of the overall GC contents in H. coagulata genes.
The ENC values range from 43 to 60 with a mean value of 52.80 in B. tabaci but from 49 to 60 with a mean value of 56.69 in H. coagulata, respectively (Figure 4).Higher ENC value means low bias.This result indicates that codon usage bias is not very remarkable in these species and is apparently maintained at a stable level.CAI is used to measure the level of gene expression.Although gene expression is more in H. coagulata than B. tabaci, the values are not statistically significant ( > 0.05) as shown in Figure 5.

Codon Usage in Insect Species.
The overall RSCU values for the 59 codons in Bemisia tabaci and Homalodisca coagulata indicated that C occurred most frequently at the third codon position (as shown in Tables 5 and 6 in supplementary  materials).In B. tabaci and H. coagulata, the most frequently used codons are the same and end with C at the 3rd position of codon.These codons are TTC, CTC, ATC, GTC, TCC, CCC, ACC, GCC, TAC, CAC, AAC, GAC, and TGC.In H. coagulata, the codon (GGC) has been found to be frequently used.These results indicate that the codon usage pattern in these two insect species is mostly contributed by compositional constraints.Figures 6 and 7 show the codons that are used more frequently than expected RSCU > 1 in B. tabaci and H. coagulata, respectively.

Amino Acids Contribute Differently to a Gene's Codon
Usage Bias.Except methionine and tryptophan, the amino acid usage is different in the genes.Figure 8 shows that amino acid alanine is distinctly the single amino acid that accounts for thegreatest usage and cysteine accounts for the least usage in the genes of both B. tabaci and H. coagulata, respectively.

Discussion
The present investigation highlights the codon usage patterns in a comparative manner between the two important insect species, Bemisia tabaci and Homalodisca coagulata, belonging to the order Hemiptera.As synonymous codon usage is not uniform during translation process, the identification of the codon usage pattern is important to understand the translational selection of codons in protein coding genes in two species of Hemiptera.
Here we analyzed the synonymous codon usage bias in the two insect species belonging to the order Hemiptera.In this study, we found that the most frequent codons end with C in both species.This finding may be the result of compositional constraint that occurred in codon usage pattern in these insect species.Similar results were reported in other insect orders.In dipterans most frequent codons end with G or C mostly at 3rd codon position whereas in hymenopteran insect order, most frequent codons contain A or T at its 3rd codon position [40].In Drosophila generally G and especially C are favored at synonymous sites in biased genes.However, some genes, even very tightly linked genes, can vary greatly in codon bias across species of Drosophila.Moreover, Drosophila genes are as biased or more biased than those in microorganisms [29].In B. tabaci, the ENC valueranges from 43 to 60 with a mean of 52.80 and in H. coagulata, from 49 to 60 with a mean value of 56.69, respectively.Relatively high ENC values showed that codon bias was not remarkable in these species.But in Drosophila the distribution of effective number of codons was different in short genes (300-500 bp) compared with longer genes [30].In our study H. coagulata genes had greater gene expression than B. tabaci genes.In both of these species a highly significant positive correlation was observed between A and A3%, C and C3%, and GC and GC3%.In addition, a significant positive correlation was observed between T and T3% ( = 0.8485 * * ,  < 0.01), positive correlation between G and G3%, and a negative correlation between GC and T3%.The correlation between gene length and synonymous codon usage was negative in Drosophila [30].These results suggested that mutation pressure had been a major factor for codon usage bias in these species.In Drosophila, the synonymous codon usage is well explained by tRNA availability and is probably influenced by developmental changes in relative abundance [15].

Conclusion
This is the first work on the comparative analysis of the pattern of codon usage in two insect species of Hemiptera order.This work is useful for understanding the evolution of pattern of codon usage in these species.Codon usage bias was not very remarkable in these species.Nucleotide constraint and compositional constraint are two significant factors as these affect the codon usage pattern in these species.The most frequent codons end with C at the 3rd position, most probably suggesting the role of the compositional constraint under mutation pressure.We find that mutation pressure is the main factor for the pattern of codon usage bias.However, further analysis would elucidate the role of any other factor that might be responsible for codon usage bias in insects.
(see Supplementary Material available online at http://dx.doi.org/10.1155/2014/145465).The overall nucleotide composition and nucleotide composition at the third codon position of Bemisia tabaci and Homalodisca coagulata coding sequences are provided in Table3(Supplementary Material).In both species bases A and C occurred more frequently than T and G as shown in Figure1.The nucleotide C occurred most frequently at the third codon position (average C3% = 38.79 ± 13.97; 33.60 ± 8.48) and A occurred least frequently (average A3% = 19.19 ± 7.69; 15.35 ± 5.05) as shown in Figure1 in B. tabaci and H. coagulata, respectively.The overall nucleotide composition and the composition at the third codon position in B. tabaci and H. coagulate genes suggest that compositional constraint might be influencing the codon usage pattern of these species.The average GC and GC3% in B. tabaci are 50.14 ± 6.11 and 60.73 ± 17.72, respectively.Figure