Baculovirus: Molecular Insights on Their Diversity and Conservation

The Baculoviridae is a large group of insect viruses containing circular double-stranded DNA genomes of 80 to 180 kbp. In this study, genome sequences from 57 baculoviruses were analyzed to reevaluate the number and identity of core genes and to understand the distribution of the remaining coding sequences. Thirty one core genes with orthologs in all genomes were identified along with other 895 genes differing in their degrees of representation among reported genomes. Many of these latter genes are common to well-defined lineages, whereas others are unique to one or a few of the viruses. Phylogenetic analyses based on core gene sequences and the gene composition of the genomes supported the current division of the Baculoviridae into 4 genera: Alphabaculovirus, Betabaculovirus, Gammabaculovirus, and Deltabaculovirus.


Background
Baculoviruses are arthropod-specific viruses containing large double-stranded circular DNA genomes of 80,000-180,000 bp. The progeny generation is biphasic, with two different phenotypes during virus infection: budded viruses (BVs), during the initial stage of the multiplication cycle, and occlusion-derived viruses (ODVs), at the final stages of replication [1,2]. In general, primary infection takes place in the insect midgut cells after ingestion of occlusion bodies (OBs). Following this stage, systemic infection is caused by the initial BV progeny [3,4]. And finally, OBs are produced during the last stage of the infection. These OBs comprise virions embedded in a protein matrix which protects them from the environment [5,6].
The Baculoviridae family is divided into four genera according to common biological and structural characteristics: Alphabaculovirus, which includes lepidopteranspecific baculoviruses and is subdivided into Group I or Group II based on the type of fusogenic protein, Betabaculovirus, comprising lepidopteran-specific granuloviruses, Gammabaculovirus, which includes hymenopteran-specific baculoviruses, and finally Deltabaculovirus which, to date, comprises only CuniNPV and possibly the still undescribed dipteran-specific baculoviruses [1,[18][19][20].
The comparison between known genome sequences of all baculoviruses has been the source for identifying a common set of genes, the baculovirus core genes. However, there are probably more orthologous sequences that may not be identified due to the accumulation of many mutations throughout evolution. Thus, core genes seem to be a key factor for some of the main biological functions, such as those necessary to transcribe viral late genes, produce virion structure, infect gut cells abrogate host metabolism and establish infections [21][22][23][24].    For this report, previous data as well as bioinformatic studies conducted on currently available sets of completely sequenced baculovirus genomes were taken into account and have resulted in a summary of gene content and phylogenetic analyses which validates the classification of this important viral family.
As a first approach to perform a comparative analysis, the GC content of the genomes were calculated (Figure 1). The histogram revealed that many baculoviruses have about 41% of GC content although several of them have significantly higher values (CfMNPV at 50.1%, CuniNPV at 50.9%, AnpeNPV-L2 at 53.5%, AnpeNPV-Z at 53.5%, LyxyNPV at 53.5%, OpMNPV at 55.1%, and LdMNPV at 57.5%). A detailed analysis of DNA content did not show a clear pattern of GC content that could be associated with each genus.
Further characterization of the patterns of gene content and organization may prove useful for establishing evolutionary relationships among members of Baculoviridae. The high variability observed in the number of coding sequences becomes a key feature of viruses with large DNA genomes that infect eukaryotic cells [18].  Insertions, deletions, duplication events, and/or sequence reorganizations by recombination or transposition processes seem to be the main forces of the macroevolution in this particular kind of biological entities. For example, the loss or gain of genetic material could provide new important abilities for colonization of new hosts, or they could improve performance within established hosts. However, there seems to be a set of core genes whose absence would imply the loss of basic biological functions, and that could be typical of the viral family. In view of this, and considering previous reports [1,19,22,23], the amount and identity of baculovirus common genes were reevaluated ( Table 2). As a result, P6.9 and Desmoplakin were recognized in this work, as core proteins by using sequence analysis complementary to the standard ones (see Supplementary files available at doi:10.4061/2011/379424).
The group of conserved sequences found in all baculovirus genomes is consistently estimated at about 30 shared genes, regardless of the increasing number of genomes analyzed [22,148]. Meanwhile, the role or function assigned to several sequences has been renewed, according to new studies. In particular, it has been identified that 38k (Ac98) gene encodes a protein which is part of the capsid structure [121,122]; P33 (Ac92) is a sulfhydryl oxidase which could be related to the proper production of virions in the infected cell nucleus [123][124][125]; ODV-EC43 (Ac109) is a structural component which would be involved in BV and ODV generation [126]; P49 (Ac142) is a capsid  protein important in DNA processing, packaging, and capsid morphogenesis [129]; Ac81 interacts with Actin 3 in the cytoplasm but does not appear in BVs or in ODVs [135]; ODV-E18 (Ac143) would mediate BV production [131]; desmoplakin (Ac66) seems to be essential in releasing processes from virogenic stroma to cytoplasm [132]; PIF-4 (Ac96) and PIF-5 (ODV-56, Ac148) are ODV envelope proteins with an essential role in per os infection route [145,147]; Ac68 may be involved in polyhedron morphogenesis [130].

6
International Journal of Evolutionary Biology The virus names are indicated in three letter code according to established in Table 1. Numbers in columns indicates the corresponding ORFs of each genome.
The number and identity of shared orthologous genes in every accepted member of each genus were investigated, and the unique sequences typical of each clade as well as those shared between different phylogenetic groups were identified ( Figure 2). This analysis shows that the four accepted baculovirus genera have accumulated a large number of genes during evolution. Probably, many of these sequences have been incorporated into viral genomes prior to diversification processes since they are found in members of different genera. In contrast, other genes are unique to each genus, suggesting that they have been incorporated more recently and after diversification ( Table 3). The possibility that nonshared genes found only in one genus which represent baculovirus ancestral sequences deleted in the other lineages should also be considered. In any case, a set of particular genes which could help in an appropriate genus taxonomy of new baculoviruses with partial sequence information were obtained from this analysis.

Whole Baculovirus Gene Content
The study of all genes reported in the 57 completely sequenced viral genomes revealed the existence of about  well as with the proportion of core genes which represents only 3%. This curious biological feature supports the hypothesis that highlights the great importance of structural mutations in the macroevolution of viruses with large DNA genomes. From this view, the set of genes shared by all members belonging to each baculovirus genus was compared to those corresponding to the whole genus gene content (Figure 3). The analysis shows that Group I alphabaculoviruses and gammabaculoviruses have a lower diversity of gene content with respect to the rest of lineages. This information, coupled with the significant number of genome sequences obtained from Group I alphabaculoviruses, suggests that this lineage of viruses would constitute the newest clade in baculovirus evolution history [149]. This is based on the assumption that Group I alphabaculoviruses have had less time to incorporate new sequences from different sources (host genomes, other viral genomes, bacterial genomes, etc.) since the appearance of their common ancestor.

Baculovirus Core Gene Phylogeny
Traditional attempts to infer relationships between baculoviruses were performed by amino acid or nucleotide sequence analyses of single genes encoding proteins such as polyhedrin/granulin (the major component of OBs), the envelope fusion polypeptides known as F protein and GP64, or DNA polymerase protein, among many other examples [149][150][151][152].  Group I Group II   ACN  PXN  RON  BMN  BON  MVN  AP2  APN  AGN  CDN  EPN  HCN  CFN  OPN  SL2  LSN  AON  AHN  APO  OLN  EUN  EON  CBN  LXN  LDN  HZN  HAS  HA4  HA1  TNN  CCN  MC4  MCN  MCB  HAN  ASN  AIN  SF9  SF2  SLN  SEN  XCG  HAG  PUG  SLG  ASG  PXG  AOG  POG  CPG  CLG  PRG  COG  NSN  NAN  NLN  CNN   100 Figure 4: Baculovirus genome phylogeny. Cladogram based on amino acid sequence of core genes. The 31 identified core genes from Baculoviridae family were independently aligned using MEGA 4 [25] program with gap open penalty = 10, gap extension penalty = 1, and dayhoff matrix [26]. Then, a concatemer was generated and phylogeny inferred using the same software (UPGMA; bootstrap with 1000 replicates; gap/missing data = complete deletion; model = amino (dayhoff matrix); patterns among sites = same (homogeneous); rates among sites = different (gamma distributed); gamma parameter = 2.25). Baculoviruses are identified by the acronyms given in Table 1, and the accepted distribution in lineages and genera are also indicated. Gammabaculovirus and Deltabaculovirus are referenced by Greek letters. The proposed clades of Betabaculoviruses are shown in bold letters. Helicase . PAM (point accepted mutation) matrices refers to the evolutionary distance between pairs of sequences. Given the weak similarity between several core proteins, PAM250 matrix was selected. The divergence considered in this matrix is 250 mutations per 100 amino acid sequence and was calculated to analyze more distantly related sequences. PAM250 is considered a good general matrix for protein similarity search.
Mostly, the evolutionary inferences were in agreement with much stronger subsequent studies based on sequence analyses derived from sets of genes with homologous sequences in all baculoviruses. Thus, these new approaches were based on the construction of common-proteinconcatemers which were used to propose evolution patterns for baculoviruses [149].
Then, the fact that a viral family consists of members who share a common pattern of genes and functions and whose proliferation cycle continuously challenges the viral viability turns it essential to take into account their higher or lesser tolerance to the molecular changes. Molecular constraints regarding tolerance to changes in core genes are different from those of other genes. Therefore, core genes should be considered the most ancestral genes which may have diverged in higher or lesser degrees. According to this, a phylogenetic study was performed based on concatemers obtained from multiple alignments of the 31 proteins recognized in this work as core genes for the 57 available baculoviruses with sequenced genomes (Figure 4).
The obtained cladogram reproduces the current baculovirus classification based on 4 genera. Additionally, this approach consistently separates the alphabaculoviruses into two lineages: Group I and Group II. And the same can be observed when analyzing Group I, where the presence of two different clades can be clearly inferred (clade a and clade b). These groupings result in accordance with previous reports [20,150]. In Group II alphabaculoviruses, a clear clustering may not be identified and would not allow to suggest a subdivision.
Despite the evolutionary inference based on core genes, there was a remaining question: "is the tolerance to changes in all core genes the same?". The answer could be reached by an individual core gene variability analysis for which studies of sequence distance for each baculovirus core gene were performed ( Figure 5).
The resulting order of core genes shows that pif-2 was the most conserved baculovirus ancestral sequence, whereas desmoplakin was the gene with evidence of greatest variability. This analysis reveals that genomes can be evolutionarily constrained in different ways depending on the proteins they encode.
The gain of access to new hosts might be an important force for gene evolution. During an infection process, the genome variants that appear with mutations introduced by errors in the replication/reparation machinery could be quickly incorporated into the virus population if the nucleotide changes offered a better biological performance when proteins were translated. The DNA helicase gene was considered as an important host range factor being, for this study, the second core sequence showing more variability [87]. However, other sequences like pif-2 gene would not accumulate mutations because the protein encoded might lose vital functions not necessarily associated with the nature of the host.

Conclusions
Baculoviridae is a large family of viruses which infect and kill insect species from different orders. The valuable applications of these viruses in several fields of life sciences encourage their constant study with the goal of International Journal of Evolutionary Biology understanding the molecular mechanisms involved in the generation of progeny in the appropriate cells as well as the processes by which they evolve. The establishment of solid bases to recognize their phylogenetic relationships is necessary to facilitate the generation of new knowledge and the development of better methodologies.
In view of this, many researchers have proposed and used different bioinformatic methodologies to identify genes as well as related baculoviruses. Some of them were based on gene sequences [150], gene content [17], or genome rearrangements [152]. In this work, a combination of core gene sequence and gene content analyses were applied to reevaluate Baculoviridae classification. To our knowledge, the most important fact is that this report is the first work which identifies the whole baculovirus gene content and the shared genes that are unique in different genera and subgenera. All this information should be taken into account to group and classify new virus isolates and to propose molecular methodologies to diagnose baculoviruses based on proper gene targets according to gene variability and gene content.

Acknowledgments
This work was supported by research funds from Agencia Nacional de Promoción Científica y Técnica (ANPCyT) and