Patterns of synonymous codon usage on human metapneumovirus and its influencing factors.

Human metapneumovirus (HMPV) is an important agent of acute respiratory tract infection in children, while its pathogenicity and molecular evolution are lacking. Herein, we firstly report the synonymous codon usage patterns of HMPV genome. The relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values, nucleotide contents, and correlation analysis were performed among 17 available whole genome of HMPV, including different genotypes. All preferred codons in HMPV are ended with A/U nucleotide and exhibited a great association with its high proportion of these two nucleotides in their genomes. Mutation pressure rather than natural selection is the main influence factor that determines the bias of synonymous codon usage in HMPV. The complementary pattern of codon usage bias between HMPV and human cell was observed, and this phenomenon suggests that host cells might be also act as an important factor to affect the codon usage bias. Moreover, the codon usage biases in each HMPV genotypes are separated into different clades, which suggest that phylogenetic distance might involve in codon usage bias formation as well. These analyses of synonymous codon usage bias in HMPV provide more information for better understanding its evolution and pathogenicity.


Introduction
Human metapneumovirus (HMPV) is a negative singlestranded RNA virus of the family Paramyxoviridae and closely related to the avian metapneumovirus (AMPV) subgroup C [1,2]. HMPV is an important aetiological agent of respiratory tract infection (RTI) in infants, or senior and immunocompromised individuals. This infection caused different symptoms ranging from influenza like syndromes (i.e., fever, cough, and rhinorrhea) to severe lower respiratory tract infection. Previous studies have shown that many children exposed to this virus and also easily to be reinfected as common [3][4][5]. Therefore, HMPV is becoming as a major concern in child respiratory tract viral infection. However, its pathogenicity is still unclear.
Genome sequencing and comparative analysis provides us a useful approach to analyze the pathogenicity of organisms. Moreover, this analysis might also provide us an approach to understand its evolution history and cellhost interaction. As previously reported HMPV genome is approximately 13 Kb in length, and the gene composition from 3 terminal to 5 terminal is N-P-M-F-M2-1/M2-2-SH-G-L [6,7]. Comparative analysis suggests that its genomic organization is similar to human respiratory syncytial virus (HRSV), which just lacks 2 nonstructural genes, NS1 and NS2. Moreover, HMPV has been demonstrated the existence with two main genetic lineages termed as subtype A and B, which also containing within them the subgroups A1/A2 and B1/B2, respectively [2]. The genetic diversity analysis shows the A2 sublineage exhibits the greatest diversity among all the sublineages of HMPV.
As we all know, there are differences in the frequency of occurrence on synonymous codons in coding DNA, which termed as synonymous codon usage bias. Briefly, there are 64 different codons (61 codons encoding for amino acids plus 3 stop codons) in each organism, but only 20 different translated amino acids. These alternative codons for the same amino acids are termed as synonymous codons. In general, codon usage variation may be the product of natural selection and/or mutation pressure for accurate and efficient translation in various organisms [8][9][10]. Synonymous codon usage bias on virus can provide us with a better understanding on the evolution profile, gene expression, and virushost interaction [11][12][13][14]. However, there is still lacking about codon usage pattern of HMPV genome and its major influence factors. Herein, we firstly performed the comparative analysis of synonymous codon usage in HMPV genomes and analyzed their influencing factors. This study will provide a new insight to understand the pathogenicity and its evolution history of HMPV.

Analysis of Codon Usage Pattern.
To investigate the characteristics of synonymous codon usage, relative synonymous codon usage (RSCU) values of each complete coding region in 17 HMPV genomes and 10 AMPV genomes were calculated [15]. The RSCU value of each codon for their amino acid was calculated as previously described [16]. A codon with an RSCU value of more than 1.0 has a positive codon usage bias, while a value of less than 1.0 has a negative codon usage bias. When the codon with RSCU values close to 1.0, it means that this codon is chosen equally and randomly. The codon usage data of human cell and bird cell were obtained from the codon usage database online (http://www .kazusa.or.jp/codon/) [17].
The effective number of codons (ENC) is used to measure deviation from expected random codon usage of HMPV and is independent of hypotheses involving natural selection [18]. The ENC values range from 20 to 61. If only one codon is used for each amino acid, this value would be 20, while all of codons are used equally, it will be 61. Moreover, the index of GC3s was used to calculate the fraction of the nucleotides G + C content at the synonymous third codon position (excluding AUG [Met], UGG [Trp], and the termination codons) [19].

Correspondence Analysis.
Multivariate statistical analysis can be used to explore the relationships between variables and samples [20]. In this study, correspondence analysis was used to investigate the major trend in codon usage variation among genomes. In this study, the complete coding region of all 17 HMPV genomes was represented as a 59 dimensional vector, and each dimension corresponds to the RSCU value of one sense codon (excluding Met, Trp, and the termination codons).

Correlation Analysis.
Correlation analysis was used to identify the relationship between nucleotide composition and synonymous codon usage pattern [21]. This analysis was implemented based on the Spearman's rank correlation analysis way. All statistical processes were carried out with statistical software STATA11.5 for windows.

Pattern of Synonymous Codon Usage on HMPV.
In order to investigate the synonymous codon usage of HMPV, we calculated various RSCU values of various codons from 17 different strains, including different genotypes. As shown in Table 3, the preferred codons in HMPV are GCA, AGA, AAU, GAU, UGU, CAA, GAA, GGA, CAU, CAU, AUU, UUA, AAA, UUU, CCA, UCA, AGU, ACA, UAU, GUU. Interestingly, all preferred codons in HMPV genomes are ended with A/U, while none of them is ended with G/C. This result suggests that HMPV genome has a great synonymous codon usage bias, and this phenomenon might highly associate with the nucleotide composition in its genomes. Therefore, we analyzed the GC content among HMPV genomes. As shown in Table 2, the G + C content of HMPV genome is 36.91%, which shares similar extent with another RNA virus. There are over 68.57% codons are ended with A/U among HMPV genomes, and 40.87% codons are A3 end, and 27.7% codons are U3 end. This high abundance of A/U nucleotides is consistent with all preferred codons, which are ended with A/U. This phenomenon reflects that nucleotide composition is the main force to affect the codon usage bias in HMPV genome. Moreover, as shown in Table 2, the ENC values among HMPV genomes show a range from 45.127 to 48.28, and its average value of 45.785 and SD value of 0.8458. The stable ENC values suggest that their genomic compositions are much conserved among HMPV genomes.

Nucleotide Contents of HMPV Genomes.
Natural selection and mutation pressure have been considered to be two key factors which have effect on codon usage patterns of organisms [22]. In order to investigate whether mutation pressure or natural selection as a determinative factor for codon usage mutation in HMPV, we calculated correlation relation between A%, U%, G%, C%, GC% and A3%, U3%, G3%, C3%, GC3%. As shown in Table 4, there exhibits a very complex correlation map observed in nucleotide compositions. In detail, U3% has a significant positive correlation with U% (r = 0.7821, P < 0.01), while shared negative correlation with C% (r = −0.7153, P < 0.01) and GC% (r = −0.7474, P < 0.01). C3% has positive correlation with C% (r = 0.8331, P < 0.01) and GC% (r = 0.7880, P < 0.01), and negative correlation with A% (r = −0.6199, P < 0.01) and G% (r = −0.5846, P < 0.05). GC3% has significant positive correlation with C% (r = 0.7178, P < 0.01) and GC% (r = 0.8052, P < 0.01), but has negative correlation with U% (r = −0.7342, P < 0.01). Interestingly, the highest third end nucleotide A3% has no correlation with any nucleotides. Interestingly, the GC3% shows positive correlation with C%, while shows negative correlation with G%. We calculated the GC3 skew by using formula as CG3 skew = (C3 − G3)/(C3 + G3) [22]. The GC3 skew of HMPV range s from 0.371 to −0.592, which reveals that GC composition involved in the codon usage bias. These data suggest that the nucleotide constraint might play an important role in influencing synonymous codon usage bias.
Furthermore, the correlation analysis of first two principle axes ( f 1 and f 2 ) in HMPV and its nucleotide contents were performed (Table 5). Apparently, the first principle axis ( f 1 ) has a significantly negative correlation with U3%, and negative correlation with GC3%. This result suggests that nucleotide U3% and GC3% are the major factor influencing the synonymous codon usage pattern in HMPV genome. Moreover, we observed the second principle axis ( f 2 ) shared a significant positive correlation with G3%, and negative correlation with C3% and GC3%. Therefore, compositional constraint is a major factor which is involved in shaping the pattern of synonymous codon usage bias in HMPV genome.

Phylogenetic Distant Effect on Synonymous Codon Usage.
To investigate the effect of different HPMV genotypes on synonymous codon usage, we analyzed the codon usage bias of different genotypes with correspondence analysis. From the correspondence analysis, the first dimension variable f 1 and the second dimension variable f 2 can reflect 43.27% and 33.38% of total mutation, respectively. As the plot of correspondence analysis shown (Figure 1), each genotype is mainly separated and clustered into two clades. This phenomenon implied that phylogenetic relationship has some extent effect on codon usage bias. However, the sublineage of each genotype did not exhibit any significant difference among them. But this might be due to the limited number of HPMV genomes available in current study. Therefore, the phylogenetic distant might effect on the variation of synonymous codon usage in HMPV, and this difference might reflect on their biological effect, such as viral replication, virulence, and so forth.   The number in the table represents as correlation coefficient r value, which is calculated from each correlation analysis. Abbreviation: NS represents as nonsignificant (P > 0.05), * represents P < 0.05, * * represents P < 0.01. Table 5: Summary of correlation between the first two principle axes and nucleotide contents in samples. The number in the table represents as correlation coefficient r value, which is calculated from each correlation analysis. Abbreviation: NS represents as nonsignificant (P > 0.05), * represents P < 0.05, * * represents P < 0.01.

Relationship between Codon Usage Pattern of HMPV and
Its Host. From the ENC-GC3% plot analysis (Figure 2), the plots of each HMPV genomes are all under the expected curve, none of them shows above the curve. This result implied that mutation pressure is the major factor influencing the codon usage [19]. Moreover, there are still some other factors that can effect on the codon usage bias of HMPV.
In the current study, we compared the patterns of codon usage in HMPV and human host. As shown in Table 3, the pattern of synonymous codon usage in HMPV shows a complementary profile, which shows in human cell. In detail,   HMPV and human host cell shared only 1 preferred codon (AGA), which encoded for Arginine, while there are 17 different preferred codons between them. As a reference frame, AMPVs were enrolled in this study, and it also shows a complementary pattern with its host, bird cell. The comparative analysis among HMPV and human host cell, HMPV and AMPV host bird cell, and HMPV and AMPV were analyzed. To compare the complementary ability of HMPV with bird cell, there are more overlays (6 preferred codons overlay) than HMPV with its human host cell (only 1 pre-ferred codons overlay). This result shows human cell has much more complementary pattern with HMPV than bird cells. This might be more suitable for HMPV survive and persist infect in human host environment. This result also suggests that host factor plays an important role in codon usage bias in HMPV. This complementary trend will benefit for virus replication instead of competitive with its host and it might help us to understand the mechanism of HMPV persistent infection. Interestingly, HMPVs are shares with more than 15 preferred codons to AMPV, which might be due to a close phylogenetic distance.

Discussion
Synonymous codon usage analysis can reveal much about virus genome. To understand the extent and causes of codon usage bias is essential for studying the viral evolution, particularly the interaction between viruses and host immune response. In this study, we analyzed the codon usage bias in HMPV, and its influencing factors. As we know, the variation and evolution of virus generally happened in the changes of nucleotide composition [23]. Therefore, the nucleotide composition bias is the main force to influence the synonymous codon usage patterns. In this study, several evidences can support this statement in HMPV genomes. First of all, in HMPV, all preferred codons are ended in A/U nucleotide, which occupied the majority of nucleotide composition in HMPV genome. This phenomenon confirmed that nucleotide composition was the main force in shaping the pattern of codon usage. Secondly, ENC was used to quantify the codon usage bias, which is one of the best overall estimators of absolute synonymous codon usage bias [18]. In this study, we observed ENC of these genomes fluctuated from 45. 13 [8,24,25]. Moreover, the ENC of HMPV is more close to respiratory infection agents, RSV and parainfluenza-3, which reveals that the similar extent of codon usage bias among viruses might have similar infection syndrome. This observation helps us to address an interesting assumption that synonymous codon usage bias of virus might associate with its pathogenicity. Mutation pressure and natural selection are generally treated as the main factors that account for codon usage bias in different organisms [22]. ENC-GC plot was considered as a part of the general strategy to investigate patterns of synonymous codon usage [9,18,26]. Herein, all the plots are laid below the expected curve, suggesting that codon usage bias in all these 17 HMPV genomes was principally influenced by mutation bias, which consistent with that mutation pressure rather than natural selection is the most important determinant of the codon usage in human RNA virus [8,[27][28][29][30][31]. This observation can be explained as the mutation rates in RNA viruses much higher than those in DNA viruses.
Mutation pressure is the main force in shaping synonymous codon usage bias of RNA virus. However, based on correspondence analysis, we observed an interesting phenomenon that codon usage bias in HMPV showed distinct differences among different phylogenetic types. It might suggest that codon usage bias plays an important role in HMPV evolution history. This similar phenomenon also observed in several other viruses, this might reflect that phylogenetic difference is a common influencing factor in shaping codon usage bias [25,27,[30][31][32][33][34]. This difference could potentially affect the viral protein expression rate or its replication manner. Therefore, we hypothesize that the difference of codon usage bias might influence its virulence in different genotypes.
In this study, we also observed that HMPV showed a complementary trend with human cells by comparing the codon usage. This complementary will be benefit for the survive of virus, which can keep replication by using the nonpreferred codons in the host cell without competition, and this could be one of the mechanisms of virus persistent infection in the human environment. Moreover, this pattern might also be caused by the longitude selection and evolution between human hosts with virus. Therefore, this characteristic is important for HMPV keeping balanced with their host on the codon usage side, and also for understanding the cellhost interaction and viral evolution.

Conclusions
In summary, we firstly reported the synonymous codon usage pattern in HMPV genomes and revealed that mutation pressure is the main force in shaping its codon usage bias. Phylogenetic difference and host factors are also discussed, and this information can provide better understanding on the molecular evolution and its pathogenicity of HMPV.