Computational Analysis Suggests That Lyssavirus Glycoprotein Gene Plays a Minor Role in Viral Adaptation

The Lyssavirus glycoprotein (G) is a membrane protein responsible for virus entry and protective immune responses. To explore possible roles of the glycoprotein in host shift or adaptation of Lyssavirus, we retrieved 53 full-length glycoprotein gene sequences from NCBI GenBank. The sequences were from different host isolates over a period of 70 years in 21 countries. Computational analyses detected 1 recombinant (AY987478, a dog isolate of CHAND03, genotype 1 in India) with incongruent phylogenetic support. No recombination was detected when AY98748 was excluded in the analyses. We applied different selection models to identify selection pressure on the glycoprotein gene. One codon at amino acid residual 483 was found to be under weak positive selection with marginal probability of 95% by using the maximum likelihood method. We found no significant evidence of positive selection on any site of the glycoprotein gene when the putative recombinant AY987478 was excluded. The computational analyses suggest that the G gene has been under purifying selection and that the evolution of the G gene may not play a significant role in Lyssavirus adaptation.


Introduction
Positive selection and recombination are important mechanisms in microbial pathogen adaption to new hosts, resistance to antibiotics, and evasion of immune responses [1]. RNA viruses have high mutation rates due to lack of both proofreading and postreplicative repair activities associated with RNA replicases and reverse transcriptases [2], which benefits RNA viruses in adapting to the changing environment. Recombination is a general phenomenon in evolution and plays a significant role in viral fitness [3,4]. Rabies virus is a single-stranded negative RNA virus belonging to the order Mononegavirales, family Rhabdoviridae, genus Lyssavirus, which causes rabies in all warm-blooded mammals. Host shift and spillover events are frequently reported in rabies [5][6][7][8][9]. The nucleotide substitution rate of lyssaviruses is estimated to be around 10 −4 per site per year [7]. The RNA-dependent RNA-polymerase (RdRp or L) together with phosphoprotein (P), functions as the transcriptase and replicase complex. The glycoprotein (G) is the only outer membrane protein responsible for virus entry and inducing protective immune responses [10,11]. The role of the G gene in rabies spillover, host shift, and adaptation has not been analyzed thoroughly. The information could help understand viral pathogenesis and develop a vaccine for a broad spectrum of lyssavirus infections.
Here, we used newly developed computational algorithms as well as traditional methods to investigate potential recombination events and selection pressures in the G gene of Lyssaviruses. The dataset for the study was comprised of 53 full-length glycoprotein gene sequences isolated from different hosts in 21 countries over a period of 70 years. We hypothesized that if different hosts with rabies infections over decades did not lead to positive selection or recombination events in the G gene, the gene does not play a significant role in lyssavirus adaptation. from 21 countries isolated over a period of 70 years were retrieved from NCBI GenBank. The sequences were aligned using fast statistical alignment (FSA, [12]). Briefly, FSA is a probabilistic multiple-sequence alignment algorithm, which uses a "distance-based" approach to aligning homologous protein, RNA, or DNA sequences. It produces superior alignments of homologous sequences that are subject to very different evolutionary constraints. The nucleotide (nt) sequence alignment of the lyssavirus G genes was corrected manually by visual inspection using the amino acid sequence alignment. Gaps were removed if they existed in majority of the sequences.

Phylogenetic Analyses.
A phylogenetic tree was reconstructed by using the neighbor joining algorithm in the MEGA 4 package [13]. The maximum composite likelihood model was used as well as the pairwise deletion option for gaps. The statistical significance of the phylogeny was measured by bootstrap with 1,000 replicates.

Selection Analyses.
To test positive selection on sites of the G gene in Lyssaviruses, the Codeml program in PAML software package version 4.4 was employed [21]. Codeml implements the maximum likelihood method to test if positive selection has taken place at sites within a gene. This method uses different codon substitution models to estimate the number of nonsynonymous (dN) and synonymous substitutions (dS) per site among codons, since different amino acids in a protein could be under different selective pressures, thus creating a different ω (dN/dS) ratio. The models in our dataset analyses were M0 (one-ratio), M1 (nearly neutral), M2 (positive selection), M7 (β distribution), and M8 (β + ω > 1) [22]. The M0 model estimates overall ω for the data. The M1 model estimates codon site proportion p 0 with ω 0 < 1 and proportion p 1 (p 1 = 1 − p 0 ) with ω 1 = 1. The M2 model allows an additional class of positively selected sites with proportion p 2 (p 2 = 1 − p 1 − p 0 ) with ω 2 estimated from the data. The M7 model specifies that ω follows a beta distribution and the value of ω is allowed to change between 0 and 1. Parameters p and q of the beta distribution are estimated from the data in the M7 model. In the M8 model, a proportion of sites p 0 has a ω in the beta distribution and the proportion p 1 sites are assumed to be positively selected. Two sets of comparisons (M2 versus M1, M8 versus M7) were made to test the hypothesis of selection.
Within the comparison, the likelihood ratio test statistic used to determine the level of significance was calculated as twice the difference of the likelihood scores (2Δl) estimated by each model. The significance was determined under χ 2 distribution. The degrees of freedom for the M1 versus M2 and M7 versus M8 tests are 2 [22]. If M8 or M2 is significantly favored and it contains codons with ω > 1, positive selection is significantly evident. Posterior probabilities of the inferred positively selected sites were estimated by the Bayes empirical Bayes (BEB) approach [23].
We also applied single-likelihood ancestor counting (SLAC), fixed-effects likelihood (FEL), and random-effects likelihood (REL) [18] to indentify selection pressure on individual codons of the G gene in lyssaviruses.

Recombination Analyses.
Our dataset covered lyssaviruses isolated over a period of 70 years from 21 countries (Table 1), including the new and old continents. The hosts included bats, cows, dogs, foxes, humans, raccoons, sheep, and skunks.
The PHI and Max χ 2 tests suggested significant evidence of recombination in the G gene. By 1000 permutations, the P-values of PHI and Max χ 2 test were .006 and 0, respectively. However, no significant evidence (P = .796) of recombination was detected by using the NSS test.
By using 3SEQ, 6 long recombinant sequences (>100 bp) were detected: AF233275, AY237121, AY987478, DQ074978, DQ849071, and L04523 (Table 2). Two breakpoints were identified in all recombinants. The first breakpoint was at nucleotide position between 400 and 800. The second breakpoint was around nucleotide position of 1080. However, the two breakpoints for DQ074978 and L04523 were at the very beginning and around nucleotide position of 109, respectively.
The analysis by using GARD also suggested evidence of recombination with significant topological incongruence at the 2 breakpoints (Table 3). The first breakpoint was at nucleotide position of 441 and the second was at nucleotide position of 1089. The significance value for the 2 breakpoints was 0.01. The left hand side (LHS) and the right hand side (RHS) P-values for the 2 breakpoints were .0004.
We analyzed the recombination events by using Boot-Scanning as implemented in SimPlot. Sequence AY987478 was used as a query sequence in all four cases (Figures 1(a     Since recombination with 2 breakpoints was predicted by 3SEQ, GARD, and Bootscanning, we constructed phylogenetic trees by using sequences from the beginning to the first breakpoint and the sequences from the second breakpoint to the end (Figure 2(a)) and a phylogenetic tree with sequences between the two breakpoints ( Figure 2(b)). The reconstructed trees presented conflicting topological positions of the putative recombinant AY987478. The putative recombinant was clustered with AY237121 and AF325489 in Figure 2(a), but clustered with DQ074978 and AF233275 in Figure 2(b). All other 5 putative recombinants did not present phylogenetic incongruence. The same result was also verified by GARD (data not shown). When AY987478 was excluded from the dataset, the P-values of Phi, Max χ 2 , and NSS were .121, .209, and .791, respectively, suggesting no evidence of recombination. The GARD analysis did not indicate evidence of recombination either.

Selection Pressure Analyses.
The selection pressure analysis with the glycoprotein gene by using PAML is presented in Table 4. The likelihood ratio test statistic (2Δl) estimated by M2 and M1 was 0. The corresponding P value was .99, which is not significant to reject the nearly null hypothesis of neutral selection in M1. In the comparison between the null neutral site model (M7) and the selection model (M8), the 2Δl was 18.18 and the corresponding P-value was .0001, indicating that the positive selection model was significantly favored over the null neutral site model. Posterior probabilities of the inferred positively selected sites estimated by the BEB approach were shown in Table 5. Four amino acid sites at 466, 483, 486, and 490 were identified to be under positive selection. But only the site at position 483 had a marginal significance support with posterior probability of 95% and weak positive selection pressure with ω of 1.466. The corresponding posterior probabilities for sites at 466, 486 and, 490 were 68%, 56%, and 82%, respectively.
To test the effect of recombination on positive selection analysis, we excluded the putative recombinant AY987478 from the dataset. Similar results were observed, and the BEB posterior probability supports for amino acid sites under positive selection were nonsignificant (Table 5). When all six putative recombinants were excluded in our analysis, no evidence was found to support positive selection either in M1 or M7 (data not shown). In all cases, the ω in M0 was either 0.07 or 0.08. Overall, 87% of the sites in the G gene had a very low ω value of 0.05 in M2 and M7, indicating strong selective constraints on those sites.
To study the effect of viral passages and possible genetic bottlenecks on the results, we repeated the analysis with a dataset excluding six vaccine sequences and the sequence AF233275 (PV11) from cell culture of lyssaviruses under intensive cell culture. We found no significant evidence for positive selection pressure on any site of the G gene. Analyses using SLAC, REL, and FEL found no evidence of any amino acid in the G gene under positive selection, instead most of the amino acids were found to be under negative selection (Table 6). One site at position 416 was under marginal positive selection by FEL with P-value of .0999, narrowly passing the significance level of 0.1. However, this result was not supported by SLAC and REL.

Discussion
Lyssaviruses can infect all warm-blooded mammals, and spillover events and host shift have been well documented [5][6][7][8][9]. The molecular mechanism of rabies infection and transmission is still not completely understood, and the phenomenon usually leads to the connection with rabies virus G protein, since G is the only membrane protein responsible for virus entry both in vitro and in vivo. Therefore, it is a reasonable assumption that rabies virus    adaptation is due to the G gene. Positive selection is an important evolutionary force that drives adaptation. It is not surprising that evolutionary scientists first applied selection analysis to the G gene of lyssaviruses [39]. One notable difference between the previous investigations and our study was the dataset. Previous dataset with 55 complete G gene sequences were from isolates of natural rabies infections, excluding passages and vaccine strains. Our dataset included street 53 rabies isolates and vaccine strains collected over a period of 70 years from 21 countries. The neutrality tests on the G in lyssavirus indicated that the protein was under negative selection. Analysis of heterogeneous selective pressures on the amino acid sites across the gene found no evidence for positive selection on any site when the putative recombinant AY987478 was excluded. Instead, most of the sites were under strong negative selection, which was consistent with previous investigations using only street rabies isolates [39,40]. The only weak positive selection identified by our analyses was at amino acid residue 483 (not in the ectodomain). No positive selection has been detected in the main epitope II or III, the site of virus escape identified by monoclonal antibody binding selections in vitro. It is possible that the results were confounded by the sequences from isolates under intensive cell culture. Repeated passages of an RNA virus resulted in loss of fitness due to Muller's ratchet [41]. Serial virus passages severely reduce population size when a small set of founder population is reintroduced into an identical unpopulated environment, which may lead to the stochastic loss of certain genotypes, especially the rare genotypes [42,43]. However, exclusion of sequences of passaged lyssaviruses from the dataset in this study did not affect the readout of the analyses. It appears that rabies spillover, host shift (happened naturally), virus escape by monoclonal antibody selection, and vaccine strains (under various in vitro and in vivo conditions) is not the result of positive selection in the G gene. Recombination is another important evolutionary driving force in adaptation, and it is a mechanism that prevents the accumulation of deleterious substitutions [44]. It allows the acquisition of multiple genetic changes in a single step and can combine genetic information to produce advantageous genotypes. It may be important for incremental host adaptation after switching to new host has occurred [45]. Recombination in rabies viruses had been proposed, but it was not thoroughly inspected [46,47]. Our study suggested one recombinant event. The recombinant sequence AY987478 was from a dog isolate (CHAND03, genotype 1) and the possible parental sequences were isolated from dogs and sheep from the same geographic area (India and Nepal). However, the putative recombinant AY987478 could be an artifact from sequencing or sample contamination. Generation of recombinants in the course of reverse transcription of RNA and subsequent PCR is a well-known phenomenon [48][49][50]. From the bootscanning analysis in this research, the 3 prime and 5 prime regions of AY987478 were clustered with putative parents with a bootstrap value of 100%, indicating little difference between the two sequences in the two regions. By checking the sequences, there are regions of about 450 bases long that are identical between the recombinant and the corresponding parent, which is rare considering the high mutation rate in RNA viruses. The homologous recombination rate in negative-sense RNA virus was found to be low [46], which is supported by a recent report that homologous recombination is very rare or absent in influenza A virus [17]. Further experimentation is needed to prove that the recombinant AY987478 is not an artifact.
In summary, we did not find significant support for positive selection pressure on G gene in lyssavirus isolates from different rabies hosts and vaccine strains that cover 70 years of evolution in 21 countries. The recombination analysis suggested an orphan event that needs further investigation. It appears that evolution of the G gene may not play a major role in lyssavirus adaptation. It is surprising considering the functions of glycoprotein in lyssavirus infection. It has been reported that host switching from chiropters to carnivores has occurred in lyssavirus evolution history [7,9]. Spillovers of lyssaviruses from chiropters to other animals may have happened repeatedly and still occur [8]. Transmission of European bt lyssavirus 1 (EBLV-1) was reported in sheep [51], stone marten [52], and cats [53]. For a successful spillover and subsequent adaptation, there must be effective cross-species viral exposure and compatibility between the virus and the new host to allow replication and transmission. Lyssavirus infections are typically transmitted by the virusladen saliva of a rabid animal via a bite or scratch, which can facilitate cross-species viral exposures. The initial viral interaction with cells of a new host plays a critical role in determining host specificity and host shift [45]. For example, feline virus acquired the ability to infect dogs through changes in its capsid protein that binds to canine transferrin receptor on canine cells [54]. Lyssavirus G is a surface glycoprotein responsible for receptor recognition and membrane fusion [7][8][9]55]. It is reasonable to expect that the protein is under positive selection pressure in the viral adaptation to the new host. The lack of positive selection in the G glycoprotein suggests that the virus is not subject to strong immune selection [25]. The G gene may escape the immunity of the host since lyssaviruses migrate from the peripheral to the central nervous systems [7]. Recent investigation demonstrated that diminishing frequencies of both cross-species transmission and host shifts were found with increasing phylogenetic distance between bat species [9], indicating the virus, thus the G gene, is subject to less selection pressure in a similar host and cellular environment [7,25]. However, the G gene might have been under relative low positive selection that was not detected by current computational methods. More sensitive method or properly relaxed statistical significance stringency with experimental verification may help identify the role of the G gene in lyssavirus adaptation.