Analysis of Synonymous Codon Usage Bias in Flaviviridae Virus

Background Flaviviridae viruses are single-stranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for most of Flaviviridae viruses. Results The overall extent of codon usage bias in 65 Flaviviridae viruses is low with the average value of GC contents being 50.5% and the highest value being 55.9%; the lowest value is 40.2%. ENC values of Flaviviridae virus genes vary from 48.75 to 57.83 with a mean value of 55.56. U- and A-ended codons are preferred in the Flaviviridae virus. Correlation analysis shows that the positive correlation between ENC value and GC content at the third nucleotide positions was significant in this family virus. The result of analysis of ENC, neutrality plot analysis, and correlation analysis revealed that codon usage bias of all the viruses was affected mainly by natural selection. Meanwhile, according to correspondence analysis (CoA) based on RSCU and phylogenetic analysis, the Flaviviridae viruses mainly are made up of two groups, Group I (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group II (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others). Conclusions All in, the bias of codon usage pattern is affected not only by compositional constraints but also by natural selection. Phylogenetic analysis also illustrates that codon usage bias of virus can serve as an effective means of evolutionary classification in Flaviviridae virus.


Introduction
All amino acids, except for methionine (Met) and tryptophan (Trp), are coded by more than one synonymous codon in the organism. The phenomenon that alternative synonymous codons do not occur equally is referred to as codon usage bias and this is a process of long-term accumulation. As an important evolutionary phenomenon, it is well known that synonymous codon usage bias exists in a wide range of species from prokaryotes to eukaryotes [1]. Compositional constraints and natural selection are thought to be two main factors influencing codon usage variation among the gene in different organisms [2,3]. Flaviviridae viruses are singlestranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies, such as Zika virus, Dengue virus, Yellow fever virus, Japanese encephalitis virus, and other viruses. Because their hosts are from the vertebrates and invertebrate, most of Flaviviridae viruses are related to some human diseases. For example, Dengue virus, Japanese encephalitis virus, and Zika virus are mediated by mosquitoes. Dengue virus contains four serotypes (DENV1 to DENV4) and its infection may cause symptoms from mild dengue fever to dengue hemorrhagic fever, even dengue shock syndrome [4] and stabilizing selection acts on the codon usage bias [5]. Spread of the Japanese encephalitis virus, reported from WHO, produced a total of 27, 059 patients during 2006∼2009, out of which 86% were from China and India, 20∼30% were caused to be fatal and 30∼50% of the survivors were found to cause serious postinfection neurological sequelae and Japanese encephalitis virus has low codon usages bias influenced by both mutational pressure and natural selection [6]. Zika virus producing a number of microcephaly in Brazil is rapidly spreading to other parts of the world since 2015. Zika coding sequences have relatively conserved and genotype-specific evolution of codon usage bias [7]. Powassan virus, yellow fever virus, and spondweni virus are mediated by ticks. Powassan virus is a fatal, neurotropic virus, with a 671% 2 BioMed Research International rise in cases in the last 18 years, which has become an emerging danger worldwide [8]. Yellow fever virus can cause yellow fever which is endemic in many African and South American countries [9]. Spondweni virus can cause a selflimiting febrile illness characterized by headache, myalgia, nausea, and arthralgia similar to Zika virus infections [10]. Codon usage patterns of some members from the Flaviviridae viruses have been studied, such as Zika virus [7] and Dengue virus [5]. But the population codon usage characteristics of all Flaviviridae viruses have not been reported by now. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for Flaviviridae viruses.

Genetic Material.
The complete sequences of 65 Flaviviridae viruses were downloaded from NCBI (http://www.ncbi .nlm.nih.gov) and the detailed information about the viruses is listed in Table 1. The ORFs of the viruses were identified by DNAStar.

Nucleotide Composition Analysis.
The following compositional properties were calculated for the coding sequences of the Flaviviridae virus genomes: (i) overall GC content; (ii) overall frequency of nucleotides (A%, C%, U%, and G%); (iii) frequency of each nucleotide at the third site of the synonymous codons (A 3S %, C 3S %, U 3S %, and G 3S %); (iv) frequency of nucleotides G + C at the third synonymous codon positions (GC 3S %); (v) frequency of nucleotides G + C at the third codon positions (GC 3 ) and the mean of the frequency of both G + C at the first and second position (GC 12 ). The codons AUG and UGG are the only codons for Methionine and Tryptophan, respectively, and the termination codons UAA, UAG, and UGA do not encode any amino acids. Therefore, these five codons were excluded from the analysis. Nucleotide composition was calculated using the program CodonW 1.4.2 [11].

Effective Number of Codons (ENC)
Analysis. ENC analysis was used to quantify the extent of the codon usage bias of viruses coding sequences, if regardless of the length of a given gene and the number of amino acids. The ENC values range from 20 to 61, in which the larger it is, the weaker the codon preference is. ENC of 20 indicates that there is only one of the synonymous codons for each amino acid and the value of the 61 means that all corresponding amino acids are coded by all synonymous codons equally. Generally, coding sequence has a codon bias significantly when the ENC value is less than or equal to 35 [7].

ENC-Plot Analysis.
To determine the major factors affecting codon usage bias, an ENC-plot was analyzed with the ENC values plotted against the GC 3S values. If the points lie on or around the standard curve, the codon usage of given genes is only constrained by mutational pressure. Otherwise, the codon usage pattern is influenced by other factors, such as natural selection. The standard ENC values were calculated using the equation [12]: "s" represents the given (G+C) 3S % value 2.5. Neutrality Plot Analysis. The neutrality plot is also named neutral evolution analysis. It is used to compare the influences of mutation pressure and natural selection on the codon usage patterns of the virus coding sequences by plotting the GC 12 values of the synonymous codons against the GC 3 values [7]. The values of GC 12 and GC 3 of Flaviviridae virus were calculated by the EMBOSS CUSP program and then subjected to neutrality plot analysis.

Relative Synonymous Codon Usage (RSCU) Analysis.
The RSCU values of the coding sequences were analyzed to gain the characters of synonymous codon usage pattern without the consideration of influence of the composition of amino acids and the size of coding region following a described method [7].The RSCU values were calculated as follows: x ij represents the number of codons for the amino acid and ni represents the degenerate numbers of a specific synonymous codon that ranges from 1 to 61.

Correspondence Analysis. Correspondence analysis (CoA)
is an effective method in identifying the major trends in the codon usage patterns among viruses coding sequences [5]. Each coding region was represented as 59-dimensional vector corresponding to RSCU value of each synonymous codon (excluding AUG, UGG, and stop codons). In this research, the CoA of Flaviviridae viruses were performed by CodonW.

Correlation Analysis.
Correlation analysis was carried out to identify the factors influencing synonymous codon usage patterns by the statistical software SPSS22 [7]. The parameters of viruses were gained from the software EMBOSS CUSP program and CodonW.
2.9. Phylogenetic Analysis. The evolutionary processes of viruses significantly influence their codon usage pattern [13].
To determining the evolutionary relationship between different viruses, phylogenetic analysis based on the nucleotide sequences of coding region of viruses was performed using MEGA7 software.

Nucleotide Composition of 65 Flaviviridae Viruses.
The nucleotide content of 65 Flaviviridae coding sequences was calculated. The results revealed that the A%, U%, G%, C%, and GC % were 27.03 ± 0.0236 (mean ± SD), 22.88 ± 0.0192, 28.49 ± 0.0253, 21.48 ± 0.0163, and 50.53 ± 0.0323, respectively. Further, for insight into its potential role on shaping the codon usage pattern, the base contents in the third position of Flaviviridae viruses were also calculated and A 3S %, U 3S %, G 3S %, C 3S %, and GC 3S % in these viruses were 33.11±0.0405 (mean ± SD), 34.54±0.0253, 27.01±0.0104, 29.14±0.0275, and 44.83±0.0508, respectively. It is clear that U 3S % was distinctly high and G 3S % was the lowest when compared to other base contents in the third position ( Table 2). The result of CAI shows that in relation to E.human, the CAI values of Flaviviridae virus range from 0.673 to 0.740, with an average value of 0.714 and a SD of 0.0163 (Table 1).

The ENC-GC 3 s Plots Analysis.
The mean value of the ENC values in the viruses was 54.58, the highest was 57.83, and the lowest was 48.75, in which the ENC values of 61 viruses were greater than 50, and that of 4 viruses was less than 50 (Table 2). It indicated that codon usage bias in Flaviviridae viruses is a little low. To investigate the factors affecting Flaviviridae virus codon usage bias, the ENC values were plotted against the GC 3S values. In ENC versus GC 3S graph, the curve represents the expected values of ENC with the only factor of mutation and the points represent the actual values of ENC of coding sequences in the Flaviviridae viruses ( Figure 1). According to the ENC-GC 3S plots, all the viruses clustered together below the expected ENC curve, which indicated that in addition to mutation pressure, other factors, such as translational selection, also influence the codon usage pattern of Flaviviridae viruses coding sequences. [14].  another word, Flaviviridae viruses prefer A/U-ended codons ( Figure 2). We performed CoA on the RSCU values, which revealed that the first, second, third, and fourth axis accounted for 50.68%, 9.16%, 3.51%, and 1.63% of the total variation, respectively. Thus, the codon usage bias could be mainly explained by the first axis and second axis values which were plotted to understand the distribution of synonymous codons usage patterns. Each point represents a virus and the closer the points are, the more similar the patterns of the viruses are. As shown in Figure 3, Flaviviridae viruses can be divided into two groups and the others, in which Group A includes Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, Wesselsbron virus and Group B includes West Nile virus

Neutrality Plot Analysis.
In the neutrality plot analysis (Figure 4), a significant positive correlation was observed between the GC 12 and GC 3 values of Flaviviridae viruses (r 2 = 0.06). The slope of the regression line was calculated to be 0.062 which indicated that the mutation pressure and natural selection were calculated to be 6.2% and 93.8%, respectively. It demonstrates the dominant influence of natural selection [15]. In addition, these viruses can be grouped into two clusters, Group A (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group B (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others) which is similar to the result of RSCU analysis. Table 4, the ENC values had significant correlations with A%, C%, G%, A 3S %, C 3S %, and GC 3S %, respectively in Flaviviridae viruses. Additionally, GC 3S % had significant correlations with GC%. These data suggest that the nucleotide constraint influences synonymous codon usage. ENC values have significant negative correlations with Gravy and Aroma. In addition, U 3S %, G 3S %, C 3S %, and GC 3S % have significant negative correlations with Gravy values and A 3S % have significant negative correlations with Aroma values. These results indicate that natural selection also influenced codon usage bias along with mutational pressure.

Phylogenetic Analysis of Flaviviridae Viruses.
To evaluate the effects of evolutionary processes on codon usage patterns, phylogenetic analysis was carried out. The results show that 65 Flaviviridae viruses can be divided into two groups ( Figure 5), Group I and Group II. Group I includes Kedougou virus, Louping ill virus, West Nile virus lineage 2, and Yaounde virus, and the variation range of their GC3s content is not extensive (0.364 ≤ GC 3S ≤0.582). Group II includes Omsk hemorrhagic fever, Alkhurma virus, Tickborne encephalitis virus, Spanish goat encephalitis virus. And, the variation range of their GC 3S content is relatively smaller (0.345 ≤ GC 3S ≤ 0.454, respectively). These results suggest that the closer the evolution of species classification, the more similar their codon usage bias

Discussion
Study of codon usage patterns of viruses can reveal more useful information about overall viral survival, fitness, and evolution [6]. In this research, the majority of Flaviviridae viruses have a weak codon bias with the mean ENC value of 54.58. And this is in accordance with some earlier studies on codon usage bias of Tembusu virus and West Nile virus which has a low codon usage bias [16][17][18]. According to the calculation results of CodonW (Table 2), the content of A and G is the highest and RSCU analysis indicates that Flaviviridae viruses prefer A/U-ended codons.
Linking to other RNA viruses, such as polioviruses, H5N1 influenza virus, and SARS-covs with the mean ENC values of 53.75, 50.91, and 48.99 [19][20][21], respectively, we conjecture that the weak codon bias in RNA virus is advantageous to replicate efficiently in host cells [22]. As ENC-GC 3 plots analysis shows, mutational pressure and other factors shaped the codon usage patterns of Flaviviridae viruses, which is similar to hepatitis C virus [22]. In fact, Hongju et al. have previously reported that the codon usage bias of ZIKV is weak and the influencing factors of the patterns are not only mutation pressure, but also translational selection, aromaticity, and hydrophobicity [14]. Although in previous studies [14,23] on Zika virus, it is observed there were greater frequencies of A 3S /G 3S than U 3S . There were some viruses showing contrary characteristics; for example, Aedes   flavivirus U 3S % was 0.2994 and G 3S % was 0.279; Alkhurma virus U 3S was % 0.3617 and G 3S % was 0.2773. By comprehensive analysis of all results, it can be found that overall U 3S % was more and G 3S % was lowest. Since Flaviviridae viruses prefer A/U-ended codons and A 3S % has a remarkable correlation with ENC (Table 3), we think that compositional constraint shaping the synonymous codon bias was from the content of nucleotides A and U on the third codon position. This result was different from many reports in which compositional constraints influencing codon usage bias are from G and C contents (Zhou et al. 2004) [20,24]. In addition, it can be found that the correlations of both Gravy values and Aroma values with ENC values are significant, which indicates the role of natural selection in shaping the codon usage patterns of the Flaviviridae viruses [6]. Besides, the codon usage patterns of this family were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2% (Figure 4).   In CoA-RSCU analysis, the Flaviviridae viruses can be divided into two groups and the others. The viruses which have similar codon usage patterns are clustered together. It is similar to the result from Neutrality plot analysis and the phylogenetic tree. All in, it is found that Yellow fever virus, Apoi virus, Tembusu virus, and Dengue virus 1 always clustered together.
In summary, combining the nucleotide composition analysis, ENC-plot analysis, and correlation analysis, it is clear that both mutation pressure and nature selection influence the codon usage patterns of Flaviviridae viruses. In addition, most of the Flaviviridae viruses can also be classified into two categories according to the findings of the CoA-RSCU, neutrality plot analysis, and phylogenetic analysis. Codon  usage patterns were similar between different virus species in same group.

Conclusion
In this study, the majority of Flaviviridae viruses have a weak codon usage bias which help to adapt to the diverse host or the varied environment. The Flaviviridae viruses can also be classified into two groups according their codon usage patterns. Their codon usage patterns were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2%. The information from this research may not only help to understand the evolution of Flaviviridae virus, but also have potential value for developing the virus vaccines.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no competing interests.