It is vital to unravel the codon usage bias in order to gain insights into the evolutionary forces dictating the viral evolution process. Influenza
Influenza
What makes influenza
The degeneracy of the genetic code has rendered the privilege of using more than one codon to code for the same amino acid. The phenomenon is called synonymous use of codons. The use of synonymous codons, however, is not uniform in different species ranging from prokaryotes to complex organisms as well as in viruses; certain synonymous codons are used preferentially. This tilted use of codons is termed as codon usage bias (CUB). With the rapidly growing stockpile of sequences in public databases after whole genome sequencing of large number of species, investigators have engaged in research in the context of codon usage bias in specific genes as well as whole genome of a vast range of organisms [
The preferential use of synonymous codons is governed by different evolutionary forces [
Several workers have reported that the overall codon usage bias in RNA viruses is low, which is attributed to GC compositional properties and dinucleotide content in these viruses [
In this study, a total of 32 complete coding sequences of the hemagglutinin (HA) gene of human-host derived influenza
Nucleotide composition of the genes used in the study.
Sl No. | A% | T% | G% | C% | A3% | T3% | G3% | C3% | GC% | GC3% | AT% | AT3% | ENC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 35.4 | 24.2 | 22 | 18.3 | 36.4 | 27 | 16.6 | 20 | 40.3 | 38.4 | 59.7 | 61.6 | 57 |
2 | 35.2 | 23.9 | 22.3 | 18.6 | 35.5 | 26 | 17.5 | 20.9 | 40.9 | 40.2 | 59.1 | 59.8 | 58 |
3 | 35.4 | 24.2 | 22.2 | 18.2 | 36.6 | 27 | 16.4 | 20 | 40.4 | 38.3 | 59.6 | 61.7 | 57 |
4 | 35.2 | 23.9 | 22.3 | 18.6 | 35.3 | 26 | 17.7 | 20.9 | 40.9 | 40.4 | 59.1 | 59.6 | 58 |
5 | 35 | 24 | 22.4 | 18.6 | 35.3 | 26 | 17.7 | 20.9 | 41 | 40.4 | 59 | 59.6 | 58 |
6 | 35.3 | 24 | 22.2 | 18.4 | 36.5 | 26.6 | 16.4 | 20.4 | 40.6 | 38.8 | 59.4 | 61.2 | 58 |
7 | 35.3 | 23.9 | 22.3 | 18.5 | 35.7 | 26 | 17.3 | 20.9 | 40.8 | 40 | 59.2 | 60 | 58 |
8 | 35.4 | 23.9 | 22.2 | 18.5 | 35.5 | 26.2 | 17.5 | 20.8 | 40.7 | 40 | 59.3 | 60 | 58 |
9 | 35.4 | 24.2 | 22 | 18.3 | 36.7 | 27 | 16.2 | 20.1 | 40.4 | 38.3 | 59.6 | 61.7 | 57 |
10 | 35.4 | 24.2 | 22.2 | 18.2 | 36.7 | 27 | 16.2 | 20.1 | 40.4 | 38.3 | 59.6 | 61.7 | 57 |
11 | 35.6 | 24.2 | 21.9 | 18.4 | 37.2 | 26.6 | 15.8 | 20.4 | 40.3 | 38.6 | 59.7 | 61.9 | 57 |
12 | 35.3 | 23.9 | 22 | 18.8 | 36.1 | 25.9 | 16.6 | 21.4 | 40.8 | 39.9 | 59.2 | 60.1 | 58 |
13 | 35.2 | 23.8 | 22.2 | 18.8 | 35.9 | 25.5 | 17.1 | 21.5 | 41 | 40.4 | 59 | 59.6 | 58 |
14 | 35.3 | 24 | 22.2 | 18.4 | 36.5 | 26.6 | 16.2 | 20.6 | 40.6 | 39 | 59.4 | 61.2 | 58 |
15 | 35.4 | 24 | 22.2 | 18.5 | 36.4 | 26.1 | 16.8 | 20.7 | 40.6 | 38.8 | 59.4 | 61 | 58 |
16 | 34.9 | 24.1 | 22.4 | 18.6 | 36 | 26.1 | 17.1 | 20.8 | 41 | 39.2 | 59 | 60 | 58 |
17 | 35.3 | 24 | 22.1 | 18.5 | 36.7 | 26.2 | 16.2 | 20.9 | 40.7 | 38.7 | 59.3 | 60.9 | 58 |
18 | 35.3 | 24.2 | 22.1 | 18.4 | 36.4 | 26.4 | 16.6 | 20.7 | 40.6 | 38.7 | 59.4 | 60.8 | 58 |
19 | 35.1 | 24.1 | 22.4 | 18.4 | 36.4 | 26.5 | 16.5 | 20.5 | 40.8 | 38.8 | 59.2 | 61 | 58 |
20 | 35 | 24.2 | 22.2 | 18.6 | 36.4 | 25.9 | 16.7 | 21 | 40.8 | 38.6 | 59.2 | 60.2 | 58 |
21 | 35.3 | 24.1 | 22.2 | 18.5 | 36.4 | 26.5 | 16.4 | 20.7 | 40.6 | 38.7 | 59.4 | 61 | 58 |
22 | 35.1 | 24.1 | 22.4 | 18.3 | 36.2 | 26.7 | 16.7 | 20.4 | 40.7 | 38.8 | 59.3 | 61 | 58 |
23 | 35.1 | 24.2 | 22 | 18.6 | 36.3 | 26.3 | 16.3 | 21.3 | 40.7 | 38.7 | 59.3 | 60.4 | 58 |
24 | 35.3 | 24.1 | 22.2 | 18.5 | 36.4 | 26.4 | 16.5 | 20.7 | 40.6 | 38.5 | 59.4 | 60.8 | 58 |
25 | 35.2 | 24 | 22.2 | 18.5 | 36.3 | 26.3 | 16.3 | 21.1 | 40.8 | 38.6 | 59.2 | 60.7 | 58 |
26 | 35.2 | 24 | 22.3 | 18.5 | 36 | 26.4 | 16.7 | 20.9 | 40.8 | 38.7 | 59.2 | 60.5 | 58 |
27 | 34.9 | 24 | 22.3 | 18.7 | 36.3 | 25.9 | 16.8 | 20.9 | 41 | 39 | 59 | 60.2 | 58 |
28 | 35.3 | 24 | 22.3 | 18.5 | 36.4 | 26.4 | 16.5 | 20.7 | 40.7 | 38.6 | 59.3 | 60.8 | 58 |
29 | 35 | 24.2 | 22.2 | 18.6 | 36.6 | 26.1 | 16.4 | 20.8 | 40.8 | 39 | 59.2 | 60.4 | 58 |
30 | 35.5 | 24 | 22.2 | 18.3 | 37 | 26.3 | 16.3 | 20.3 | 40.4 | 38.6 | 59.6 | 61.4 | 58 |
31 | 35.3 | 24 | 22.2 | 18.5 | 36.6 | 26.2 | 16.6 | 20.6 | 40.7 | 39.2 | 59.3 | 61.8 | 58 |
32 | 35.4 | 23.8 | 22.1 | 18.7 | 36.4 | 26 | 16.9 | 20.7 | 40.8 | 38.9 | 59.2 | 60.7 | 58 |
Relative synonymous codon usage (RSCU) [
The effective number of codons (ENC) estimates the enormity of codon usage bias in a gene [
Nucleotide composition plays a crucial role in the codon usage pattern in the genes because most of the indices of codon usage bias are based on the base composition of the genes. GC3 is the frequency of the nucleotides G+C at the synonymous 3rd positions of the codons excluding the
Gene expressivity was measured by codon adaptation index (CAI) as given by Sharp and Li [
Frequency of optimal codon (Fop), originally proposed by Ikemura in the year 1981, is one of the first estimators used in the study of codon usage bias. As an index, Fop shows the optimization level of synonymous codon choice in each gene to translation process [
The codon usage bias measures, namely, RSCU, ENC, GCs, Fop, and CAI for each coding sequence, were estimated in our study by using an in-house Perl program developed by SC.
The coding sequences were analyzed thoroughly for their nucleotide composition. Individual nucleotides as well as GC and AT content in three synonymous codon positions were estimated. The nucleotide composition in the analyzed genes is summarised in Table
The frequency of codons containing dinucleotide TpA is much higher in comparison to those containing dinucleotide CpG. Four codons, that is, CGA, CGC, CGG, and CGT, out of possible nine codons containing CpG, are absent in the analyzed gene; the frequencies of the remaining codons are also very low with the highest value of 9 for GCC. In contrast, most of the codons (5 out of 6) containing TpA showed higher frequency with the highest value of 17 for GTA and the lowest 6 for TTA. While three codons containing TpA are preferred, there are no preferential codons containing CpG.
The overall GC content in the dataset was found to be much lower in comparison to overall AT content (40.7% and 59.3%, resp.). The suppression of GC content as compared to AT content is also evident from GC/AT content at the silent position. The overall GC3 was found to be low (39.0%) as against AT3 (60.7%) (Figure
Correlation between different nucleotide compositional parameters.
A3% | T3% | G3% | C3% | GC3% | AT3% | |
---|---|---|---|---|---|---|
A% |
|
|
|
|
|
|
T% |
|
|
|
|
|
|
G% |
|
|
|
|
|
|
C% |
|
|
|
|
|
|
GC% |
|
|
|
|
|
|
AT% |
|
|
|
|
|
|
**Means correlation is significant at the level of 0.001.
Comparison of AT and GC content at synonymous third codon positions in the genes under study. Clearly, AT3 is much higher than GC3 in all the accessions.
Previous studies have revealed that the CpG underrepresentation is attributable to immunologic escape, in order to avoid host immune system using the unmethylated CpGs as a pathogen marker [
The general trend of the ENC values suggests the absence of strong codon bias in the hemagglutinin gene. The ENC values were consistently found in higher range with an average value of
In an attempt to find out the nature of codon usage bias in the genes under study, the RSCU values of the 59 codons were analyzed (Table
Synonymous codon usage pattern in 32 coding sequences.
AA | Codon | RSCU* | Fop* |
|
---|---|---|---|---|
Ala |
|
|
|
|
GCC | 1.01 | 0.26 | 9 | |
GCG | 0.24 | 0.06 | 2 | |
GCT | 0.62 | 0.16 | 5 | |
|
||||
Arg | CGT | 0.00 | 0.00 | 0 |
CGC | 0.00 | 0.00 | 0 | |
CGA | 0.00 | 0.00 | 0 | |
CGG | 0.05 | 0.01 | 0 | |
|
|
|
|
|
AGG | 1.32 | 0.22 | 4 | |
|
||||
Asn |
|
|
|
|
AAC | 0.69 | 0.34 | 14 | |
|
||||
Asp | GAT | 1.00 | 0.50 | 13 |
GAC | 1.00 | 0.50 | 13 | |
|
||||
Cys |
|
|
|
|
TGC | 0.81 | 0.40 | 6 | |
|
||||
Gln | CAA | 0.87 | 0.40 | 6 |
|
|
|
|
|
|
||||
Glu |
|
|
|
|
GAG | 0.66 | 0.31 | 11 | |
|
||||
Gly | GGT | 0.87 | 0.22 | 9 |
GGC | 0.51 | 0.12 | 5 | |
GGA | 1.31 | 0.33 | 13 | |
|
|
|
|
|
|
||||
His |
|
|
|
|
CAC | 0.92 | 0.46 | 7 | |
|
||||
Ile |
|
|
|
|
ATC | 0.52 | 0.17 | 6 | |
ATA | 0.98 | 0.33 | 12 | |
|
||||
Leu | TTA | 0.77 | 0.12 | 6 |
TTG | 1.35 | 0.22 | 10 | |
CTT | 0.12 | 0.02 | 1 | |
CTC | 0.64 | 0.11 | 5 | |
|
|
|
|
|
CTG | 1.33 | 0.22 | 10 | |
|
||||
Lys |
|
|
|
|
AAG | 0.72 | 0.36 | 15 | |
|
||||
Phe | TTT | 0.86 | 0.44 | 9 |
|
|
|
|
|
|
||||
Pro | CCT | 0.23 | 0.06 | 1 |
CCC | 0.87 | 0.22 | 4 | |
|
|
|
|
|
CCG | 1.02 | 0.26 | 5 | |
|
||||
Ser | TCT | 1.16 | 0.19 | 9 |
TCC | 0.54 | 0.09 | 4 | |
|
|
|
|
|
TCG | 0.12 | 0.02 | 1 | |
AGT | 0.87 | 0.14 | 7 | |
AGC | 1.08 | 0.18 | 8 | |
|
||||
Thr | ACT | 1.00 | 0.25 | 9 |
ACC | 0.21 | 0.05 | 2 | |
|
|
|
|
|
ACG | 0.31 | 0.08 | 3 | |
|
||||
Tyr | TAT | 0.96 | 0.48 | 13 |
|
|
|
|
|
|
||||
Val | GTT | 0.76 | 0.19 | 7 |
GTC | 0.64 | 0.16 | 6 | |
|
|
|
|
|
GTG | 0.78 | 0.19 | 7 |
Note: *All values are mean values;
In quest for possible under- and over-representation of codons, RSCU values were sorted from lower to higher values. We observed that majority of the codons, both preferred as well as non-preferred, fall under unbiased or randomly used category (0.6 < RSCU < 1.6). Seven codons (GCA, AGA, CTA, TCA, ACA and GTA) showed very high RSCU values (RSCU > 1.6) and hence, were considered to be “over-represented”. Similarly there were ten under-represented codons (RSCU < 0.6) (Figure
Over- and underrepresented codons in the genes used in the study. The overrepresented codons (RSCU > 1.6) are shown in blue, while the underrepresented (RSCU < 0.6) ones are shown in red.
All the amino acids showed preference over a particular codon except
Frequency of the amino acid usage in the genes under study. Leucine and serine are clearly the most frequent amino acids.
Trend of RSCU and Fop values in the coding sequences of the genes.
Highly expressed genes show a tendency of high biasness towards some codons and tend to use those codons frequently. To find out such biasness and predict the expression of the genes, CAI values were estimated, values of which range from 0 to 1. The CAI values for the hemagglutinin genes were found in the range of 0.3143–0.3447 with an average of 0.3829 and standard deviation of 0.0391, indicating that the codons are not translationally optimized for expression of these genes.
The frequency of optimal codons (Fop) in a gene can be used as an indicative measure to check if the codons are optimized for efficient translation [
Amidst much debate, mutational pressure and natural selection have been cited as the major stimulants in framing the codon usage profiles of different viruses [
Hemagglutinin constitutes one of the most important sites for human immune system to act on, thus, making it a potential drug target against this virus. Untangling the underlying mechanisms operating behind the synonymous codon usage profile of the virus will possibly bring up new avenues in the research involving development of antiviral drugs against this hazardous virus.
The authors did not avail any financial assistance from any source in undertaking the present study.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors are grateful to Assam University, Silchar, Assam, India, for providing the research facility.