The Effects of the Context-Dependent Codon Usage Bias on the Structure of the nsp1α of Porcine Reproductive and Respiratory Syndrome Virus

The information about the crystal structure of porcine reproductive and respiratory syndrome virus (PRRSV) leader protease nsp1α is available to analyze the roles of tRNA abundance of pigs and codon usage of the nsp1 α gene in the formation of this protease. The effects of tRNA abundance of the pigs and the synonymous codon usage and the context-dependent codon bias (CDCB) of the nsp1 α on shaping the specific folding units (α-helix, β-strand, and the coil) in the nsp1α were analyzed based on the structural information about this protease from protein data bank (PDB: 3IFU) and the nsp1 α of the 191 PRRSV strains. By mapping the overall tRNA abundance along the nsp1 α, we found that there is no link between the fluctuation of the overall tRNA abundance and the specific folding units in the nsp1α, and the low translation speed of ribosome caused by the tRNA abundance exists in the nsp1 α. The strong correlation between some synonymous codon usage and the specific folding units in the nsp1α was found, and the phenomenon of CDCB exists in the specific folding units of the nsp1α. These findings provide an insight into the roles of the synonymous codon usage and CDCB in the formation of PRRSV nsp1α structure.

The replicative enzymes of the PRRSV are encoded in ORF1a and ORF1b, which locate in the 5 proximal three quarters of the viral genome. The two polyproteins encoded by ORF1a and ORF1b are cleaved extensively by the nonstructural protein 4 (nsp4) deriving from ORF1a, yielding a series of nonstructural proteins [9]. In particular, the nsp1 and the nsp2 proteases release themselves from the ORF1a polyprotein firstly, and the nsp1 can be further processed into two multifunctional proteases, namely, the nsp1 and the nsp1 [10,11]. The arterivirus nsp1 region contains a tandem of papain-like autoprotease domains (PCP and PCP ), and the arterivirus PCP and PCP domains were found to be active in the reticulocyte lysates and the E. coli systems [12,13]. This biological feature might indicate that the active functions of PCP and PCP are free from the different types of the expression systems and depend on the correct folding by themselves. As for the nsp1 , it plays an important role in regulating the accumulation of both genome-and subgenome-length minus-strand RNA and thereby fine-tuning the relative abundance of each of viral mRNAs in the infected cells [10,14,15]. The correct secondary structure of the nsp1 is required for the biological functions of the protease. Based on the crystal structure of the nsp1 , it was found that this nonstructural protein has three domains, namely, the N-terminal zinc finger (ZF) domain, the papain-like cysteine protease domain, and the carboxylterminal extension [16]. Recently, the role of the nsp1 in impairing the host immune response has been reported [17]; however, little information about the relationship between synonymous codon usage and the secondary structure of the PRRSV nsp1 is available to date.
The synonymous codon usage and translational speed of gene play important roles in many biological functions, like translation efficiency, genetic diversity, amino acid conservation, transfer RNA abundance, coevolution of the virus and its hosts, and context-dependent codon bias (CDCB), and so forth [18][19][20][21][22]. The nucleotide composition of a coding sequence (CDS) is nonrandom, and the CDS nonrandomness is influenced by the preferences in the selection of synonymous codons pairing to the same amino acid (termed as the synonymous codon usage bias SCUB). The link between SCUB and specific folding unit of protein gives us a new insight into the correct formation of the secondary structure of proteins [23][24][25][26]. It is noted that mRNA sequences generally have an additional potential to carry correct structural information in the forms of SCUB, which can be involved in a single codon or a nucleotide context of the target coding sequence [27,28]. As for SCUB, neighboring nucleotides flanking a codon regulate the usage of the specific codon from the synonymous family, termed as context-dependent codon bias (CDCB) [20,[29][30][31]. It has been reported that the most important nucleotide determining CDCB is the first nucleotide after a codon, termed as the 1 context [32]. Although several evidences indicate the link between SCUB and the formation of the specific folding unit of viral protein, little information about the role of CDCB in the formation of the specific folding unit is reported up to date. In this study, we employed the structural information about the nsp1 of PRRSV and several simple formulas to analyze the relationship between the CDCB of the PRRSV nsp1 gene and the protease.

Information of PRRSV Gene and Structure of the nsp1 .
The 191 coding sequences of PRRSV containing the nsp1 gene were downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi .nlm.nih.gov/Genbank/) and the accession numbers of the sequences were listed in Table S1 available online at http://dx.doi.org/10.1155/2014/765320. To investigate SCUB of the nsp1 , the related genes were obtained from these 191 coding sequences by the multiple sequence alignments performed with the Clustal W (1.7) computer programs [33]. The information about the secondary structure of the PRRSV nsp1 was obtained from protein data bank (PDB: 3IFU).

Analysis of the Overall tRNA Abundance of Each Codon
Position along the nsp1 Gene. To identify the translation selection caused by the various tRNA copy numbers (reflecting tRNA abundance) of the pigs (http://gtrnadb.ucsc.edu/) at each codon position in the PRRSV nsp1 , we devised an index (C value) representing the overall tRNA abundance for a particular codon position in a target gene. Consider where value indicates the overall tRNA abundance for a particular codon position in the target gene, represents the tRNA copy numbers of a synonymous codon ( ) for the corresponding amino acid ( ), represents the optimal tRNA copy numbers of a synonymous codon for the same amino acid, and means the number of the interesting gene. The value ranges from 0 to 1.0. The value less than 0.3 for a codon position represents low tRNA abundance, and the value more than 0.7 for a codon position represents high tRNA abundance.

Estimation of the Relationship between the Synonymous Codon Usage Bias and the Secondary Structure of the nsp1 .
Based on the alignment between the amino acid sequences of the PRRSV (PDB: 3IFU) and the 191 nsp1 genes involved in this study, we can locate the different folding units in the target protein. We devised the formula for the value based on the previous research which analyzed the relationship between the codon usage bias and the structure of the target protein [25]. Consider where ( ,sec-) represents the amount of a specific synonymous codon for the corresponding amino acid in a specific folding unit (the -helix, the -strand, or the coil) of protein; sec-represents the corresponding amino acid in a specific secondary unit; ( ) represents the amount of the amino acid in the corresponding folding unit. In addition, ∑ ( ,sec-) represents the total number of amino acids in a specific folding unit; sec-contains the three kinds of folding unit, namely, -helix, -strand, and the coil; total represents the total number of codons in the target genes. When the value is more than zero, the corresponding synonymous codon ( ) owns a potential to be selected in a specific folding unit. When the value is less than zero, the synonymous codon ( ) has no tendency to be chosen in a specific folding unit. Furthermore, we defined that when the value is more than 0.1, the synonymous codon has a strong ability to exist in the specific folding unit; on the contrary, when the value is less than −0.1, the synonymous codon has a strong tendency to avoid the specific folding unit.

Calculation of the Relative Abundance of Codons with
Context. With the purpose to estimate the synonymous codons playing an important role in the formation of the specific folding units, codons having a significant tendency to exist in the specific folding unit of the PRRSV nsp1 were analyzed by the formula for the relative abundance of codons with context. Berg and Silva [32] defined that the context 1 represents the first nucleotide after the target codon. Following this notation, we defined that the context 1 represents the last nucleotide before the target codon. We devised a formula calculating value for the context 1 ( ∼ ) and the context 1 ( ∼ ) depending on the formula previously reported [20,34]. Consider where ( ) is the frequency of the codon and ( ) is the frequency of nucleotide in the 1 or 1 context.
( ∼ ) and ( ∼ ) are the frequency of a codon with the context. It is noted that , , , and are the nucleotides ( , , , or ) and the codon is composed of . Here and elsewhere the tilde character (∼) separates codons (italic) or oligonucleotides (nonunderlined) from their mononucleotide context.

Calculation of the Relative Abundance of Mononucleotide and Dinucleotides in the nsp1 Gene.
To investigate whether the 1 and 1 contexts are shaped by randomness or not, we calculated the frequencies of each nucleotide ( ) and dinucleotide ( ), where , , , and are each one of the four nucleotides ( , , , and ). Then we calculated the relative abundances ( value) of the mononucleotide and dinucleotides with a single nucleotide context: , for dinucleotide with context .

Statistic Analysis.
One-way analysis of variance, namely, one-way ANOVA, is a technique used to compare means of two or more samples. In this study, the ANOVA test is applied for identifying whether the overall tRNA abundance of positions of a specific folding unit is different from other specific folding units or not. In addition, the ANOVA test is also employed to estimate whether the frequencies of codon usage in a specific folding unit are different from other specific folding units or not. This statistic analysis is carried out by the software SPSS 11.5.

The Relationship between the Synonymous Codon Usage
Bias and the Structure of the nsp1 . Based on the values for the synonymous codons which are involved in the formation of the specific folding units in the nsp1 , we found the link between SCUD and the specific folding unit ( = 2.75 × 10 −11 ). In detail, the synonymous codons have a strong propensity toward shaping the -helix unit, including AUC for Ile, GUA for Val, AGC for Ser, AAG for Lys, and AUG for Met (Table 1). Turning to the effects of SCUB on shaping the -strand unit, there are UUA for Leu, AUA for Ile, GUG for Val, UCA and AGU for Ser, ACA for Thr, UAC for Tyr, CGC for Arg, and two synonymous codons for His (Table 1). It is interesting that there are no codons which have a strong tendency to exist in the coil of the nsp1 ( Table 1). As for the codons which have a strong tendency ( value > 1.0) to exist in the nsp1 , all of them strongly tend to exist only in thehelix or the -strand of this protein. 1 Context in the nsp1 Gene. As for the codons which have a strong tendency to exist in the specific folding unit of the nsp1 , their values, the relative abundance of codons with 1 contexts, were calculated from the 191 nsp1 genes ( Table 2). The data show that the occurrence of the codon with 1 context or 1 context is not random, and many codons with 1 context or 1 context have a strong tendency to  The corresponding codon is not selected in the specific secondary structure unit. Italic indicates that the corresponding codon has a weak bias to be selected in a specific secondary structure unit. Bold indicates that the corresponding codon has a tendency to be selected in a specific secondary structure unit. exist in the specific folding units of the nsp1 . Based on the data of SCUB in the specific folding units (Table 1), the corresponding codon with 1 context was found to have a trend to exist in the specific folding unit of the nsp1 . In detail, the codons with 1 context or 1 context (GUA∼A, AGC∼A, AAG∼C, AGA∼C, A∼AUA, U∼AGC, U∼AAG, C∼ AAG, G∼AGA, and U∼AUG) have an obvious trend to exist in the helix unit of the nsp1 . Some codons with 1 context or 1 context have a strong tendency to exist in the -strand of the nsp1 , including UUA∼A, AUA∼G, GUG∼U, UCA∼ C, AGU∼G, ACA∼C, UAC∼U, UAC∼C, CAU∼G, CAC∼G, CGC∼U, G∼UUA, U∼AUA, U∼GUG, G∼GUG, U∼UCA, C∼ AGU, C∼ACA, C∼UAC, G∼UAC, U∼CAU, U∼CAC, and U∼ CGC.

The Relative Abundance of the Codon with
In order to identify the roles of nucleotide compositions (dinucleotide with 1 context and mononucleotide with 1 context) in shaping the codon with 1 context or 1 context, the values for these interesting codons with 1 context or 1 context which have a strong tendency to exist in the helix or the -strand were compared with the values for the dinucleotide/mononucleotide with 1 context (Tables 3  and 4). The value for the target codon with 1 context is higher than the corresponding dinucleotide/mononucleotide with 1 context or 1 context. For example, as for GUA which tends to exist in the helix of the nsp1 gene, GUA∼A has a tendency to exist in the helix unit, because the value (1.4751) of GUA∼A for the helix unit is higher than the value for GUA∼A for the -strand and the coil ( Table 2) and higher than the value for UA∼A and the value for A∼A (Tables 3  and 4). As for UUA which tends to exist in the -strand of this gene, UUA∼A has a tendency to exist in the strand unit, because the value (4.9268) for UUA∼A is higher than the values for UUA∼A of the helix and the coil and higher than the values for UA∼A and A∼A (Tables 3 and 4). As for AGC which tends to exist in the helix of this gene, U∼AGC has  The relative abundance of codons with 1 context in the helix unit of the PRRSV nsp1 .

2
The relative abundance of codons with 1 context in the -strand unit of the PRRSV nsp1 .

3
The relative abundance of codons with 1 context in the coil unit of the PRRSV nsp1 .
a tendency to exist in the helix rather than in the strand or the coil, because the value (5.9004) for U∼AGC is higher than the value of U∼AGC of the strand and the coil and higher than the values for U∼AG and U∼A (Tables 3 and  4). As for UUA which tends to exist in the strand of this gene, G∼UUA has a tendency to exist in the strand unit, because the value (3.7019) for G∼UUA of the strand unit is higher than the values for G∼UUA of the helix and the coil and higher than values for G∼UU and G∼U (Tables 3 and 4). Based on the standard mentioned above, GUA∼A, AGC∼A, AAG∼C, AGA∼C, A∼GUA, U∼AGC, U∼AAG, C∼AAG, G∼AGA, and U∼AUG have a strong trend to exist in the helix of PRRSV nsp1 gene and UUA∼A, AUA∼G, GUG∼U, UCA∼C, ACA∼ C, UAC∼U, UAC∼C, CAU∼G, CGC∼U, G∼UUA, U∼GUG, U∼UCA, C∼AGU, C∼ACA, G∼UAC, U∼CAU, and U∼CGC have a strong tendency to exist in the -strand of the nsp1 gene.

Discussion
In this study, we have mapped the fluctuation of the overall tRNA abundance for each codon position along the PRRSV nsp1 gene and estimated the correlation between the synonymous codon usage and different folding units of the nsp1 . The performance of mapping the fluctuation of the overall tRNA abundance for each codon position along the target gene likely reflects the translation speed of ribosomes scanning caused by the tRNA abundance of the pigs to some degree, since the tRNA abundance plays an important role in the ribosome scanning along the target coding sequence [35,36]. The previous report showed that the -helix is preferentially coded by translationally fast mRNA regions while the slow segments often encode -strands and coil regions [37]. In the study, no linkage between the fluctuation of the overall tRNA abundance pairing to the codon positions along the nsp1 gene and the specific folding units might suggest that the process of translation fine-tunes is not performed by variation of translation speed for each codon position along the nsp1 . The fine-tuning in vivo protein folding exists in the gene, and this regularity is largely believed to occur in a cotranslational process [38]. However, the PRRSV nsp1 derives from the posttranslational processing of the pp1a [10,39]. The process of the cleavage of the nsp1 from the pp1a polyprotein of PRRSV performed by the posttranslation might be free from the fluctuation of tRNA abundance pairing to the each codon position along the nsp1 gene. As for the ribosomes scanning the nsp1 gene, there is no significant link between the fluctuation of the overall tRNA abundance and the specific folding units, and the translation elongation rate of this gene is not high. These results suggest that the low tRNA abundance controls the ribosomal traffic along the translated message to achieve the effective synthesized product of the PRRSV pp1a. The low translational elongation at the translation beginning step directs the target gene to generate the corresponding protein effectively [40].
Turning to the role of the synonymous codon usage in the formation of the specific units of the nsp1 , there is significant relationship between the synonymous codon usage bias and the specific folding units in the target protein.
The synonymous codons assist messenger RNA to carry the information of the specific folding units, and a single codon or a contiguous nucleotide region plays roles in shaping the specific folding units [24,25,41,42]. As for the PRRSV nsp1 , there is no synonymous codon which tends to exist in coil unit. However, many synonymous codons exist in thehelix and -strands regions of this gene, and no synonymous codon has a strong tendency to be selected by both the -helix and the -strands in the PRRSV nsp1 simultaneously. These results indicated that SCUB might play roles in shaping this  protease with natural properties for the life-cycle of PRRSV. SCUB for formation of the specific folding units of the PRRSV nsp1 is influenced by the natural selection. As an example of the role for natural selection, the expressivity of genes is an important factor in shaping SCUB, both for prokaryotic and for eukaryotic organisms [18,22,43,44]. Although the link between the SCUB and the formation of the specific folding units was reported [25,35,37,38,45], the role of CDCB in formation of specific folding units is not clear. In this study, we found that CDCB plays a role in the formation of specific folding units in the PRRSV nsp1 . The synonymous usage bias and CDCB, which play important roles in achieving accuracy and efficiency in protein synthesis, are particular manifestations of coding sequence nonrandomness [23,46,47]. Spatial interaction of ribosomal proteins with codonanticodon RNA pairs inside the A and P sites of the ribosome could be preferable for particular codons with context [20,48].