Phylogenomic and Molecular Demarcation of the Core Members of the Polyphyletic Pasteurellaceae Genera Actinobacillus, Haemophilus, and Pasteurella

The genera Actinobacillus, Haemophilus, and Pasteurella exhibit extensive polyphyletic branching in phylogenetic trees and do not represent coherent clusters of species. In this study, we have utilized molecular signatures identified through comparative genomic analyses in conjunction with genome based and multilocus sequence based phylogenetic analyses to clarify the phylogenetic and taxonomic boundary of these genera. We have identified large clusters of Actinobacillus, Haemophilus, and Pasteurella species which represent the “sensu stricto” members of these genera. We have identified 3, 7, and 6 conserved signature indels (CSIs), which are specifically shared by sensu stricto members of Actinobacillus, Haemophilus, and Pasteurella, respectively. We have also identified two different sets of CSIs that are unique characteristics of the pathogen containing genera Aggregatibacter and Mannheimia, respectively. It is now possible to demarcate the genera Actinobacillus sensu stricto, Haemophilus sensu stricto, and Pasteurella sensu stricto on the basis of discrete molecular signatures. The other members of the genera Actinobacillus, Haemophilus, and Pasteurella that do not fall within the “sensu stricto” clades and do not contain these molecular signatures should be reclassified as other genera. The CSIs identified here also provide useful diagnostic targets for the identification of current and novel members of the indicated genera.


Introduction
The family Pasteurellaceae, the single constituent family of the order Pasteurellales, represents a diverse group of commensal and pathogenic bacteria within the class Gammaproteobacteria. The family currently contains 19 genera, some of which are particularly important human and animal pathogens [1,2]. The genera Haemophilus contains species responsible for human bacteremia, pneumonia, acute bacterial meningitis, and the sexually transmitted disease chancroid [3][4][5]; Aggregatibacter species have been implicated in juvenile periodontitis [6]; members of the genera Mannheimia, Pasteurella, and Actinobacillus have been implicated in the causation of shipping fever in cattle, fowl cholera, and pleuropneumonia in pigs, respectively [7][8][9].
The family Pasteurellaceae was originally proposed as a higher level taxonomic grouping of the related pathogenic genera Actinobacillus, Haemophilus, and Pasteurella [10].  [23]; and the species [Pasteurella] haemolytica and [Pasteurella] granulomatis have been transferred to the genus Mannheimia [7]. Additionally, some individual species within the genera Actinobacillus, Haemophilus, and Pasteurella that do not cluster with other members of their genus in phylogenetic trees have been moved or proposed to be moved to novel or neighbouring genera (namely, the transfer of the species [Haemophilus] pleuropneumoniae to the genus Actinobacillus [24], the transfer of the species [Pasteurella] anatis to the genus Gallibacterium [25], the transfer of the species [Pasteurella] trehalosi to the genus Bibersteinia [26], the transfer of the species [Pasteurella] ureae to the genus Actinobacillus [27], and the proposed transfer of the species [Haemophilus] ducreyi to a novel genus [28]). However, despite these changes, the classification of the genera Actinobacillus, Haemophilus, and Pasteurella is still problematic and each genus continues to contain members which exhibit polyphyletic branching [5,[17][18][19][20].
Multiple studies have attempted to define a core group of species which cluster around the nomenclatural type species of Actinobacillus, Haemophilus, or Pasteurella as the only true members of these genera (i.e., sensu stricto) [13,[15][16][17][29][30][31], but the taxonomy and phylogeny of these bacteria continue to remain inconclusive [20,32,33]. Several methods have been employed for the demarcation of these genera; however, no simple method or criterion is available that can clearly delimit these genera. It has been suggested that genome based studies may provide reliable means of clarifying the evolutionary relationships of these bacteria [33].
Since the availability of the first complete genome sequence of the Haemophilus influenzae [34], a large number of genomes for the members of the family Pasteurellaceae have become available in public databases [35,36]. The availability of these genomes provides us with an opportunity to complete comprehensive genome scale phylogenetic analyses of the family Pasteurellaceae. These genome sequences have also been utilized to carry out comparative genomic analyses to identify molecular signatures (namely, conserved signature indels (CSIs) in various proteins), commonly shared by all or closely related subsets of species within the family Pasteurellaceae. On the basis of the molecular signatures identified from comparative analyses of Pasteurellaceae genomes in conjunction with core genome based and multilocus sequence based phylogenetic analyses, we have identified sensu stricto clades of Actinobacillus, Haemophilus, and Pasteurella that are supported by 3, 7, and 6 unique molecular signatures, respectively. We also report sets of molecular signatures that are unique characteristics of the pathogen containing genera Aggregatibacter and Mannheimia.

Multilocus Sequence Analysis.
Multilocus sequence analysis was completed for members of the family Pasteurellaceae using widely available nucleotide sequences of the 16S rDNA, infB (translation initiation factor IF-2), recN (DNA repair protein), and rpoB (DNA-directed RNA polymerase subunit beta) genes which have been used, individually or as part of a set, in a number of previous phylogenetic analyses of the family Pasteurellaceae [15-17, 29, 30]. Gene sequences for these four genes were obtained for 52 Pasteurellaceae strains, representing a large majority of the known Pasteurellaceae species, and 2 members of Vibrio cholerae from the NCBI nucleotide database [37]. Species which were missing one of these four genes or which did not have a gene sequence that was at least 50% of the length of the full gene were excluded from the analysis. The four genes were individually aligned using MUSCLE [38] and manually concatenated to create a combined dataset that contained 10 183 nucleotide long alignments. A maximum-likelihood tree based on 100 bootstrap replicates of this alignment was constructed using MEGA 6.0 [39] while employing maximum composite likelihood substitution model.

Pasteurellaceae Core Genome Phylogenetic Tree.
A phylogenetic tree of 76 Pasteurellaceae strains, rooted using 7 members of the family Vibrionaceae, based on the core genome of the family Pasteurellaceae was created for this study. The core set of Pasteurellaceae proteins were identified using the UCLUST algorithm [55] to identify widely distributed protein families with at least 30% sequence identity and 50% sequence length. Proteins families which were present in less than 50% of the input genomes were excluded from further analysis. Potentially paralogous sequences (additional proteins from the same organism in a single protein family) within the remaining protein families were also excluded from further analysis. Each protein family was individually aligned using MAFFT 7 [56]. Aligned amino acid positions which contained gaps in more than 50% of organisms were excluded from further analysis. The remaining amino acid positions were concatenated to create a combined dataset that contained 128 080 amino acid long alignments. An approximately maximum-likelihood tree based on this alignment was constructed using FastTree 2 [57] while employing the Whelan and Goldman substitution model [58].

Identification of Molecular Signatures (CSIs) for Different
Genera of the Family Pasteurellaceae. The detailed outline of the process of identifying CSIs has been recently published [59]. In brief, Blastp searches were performed on all proteins from the genome of Haemophilus influenzae F3047 [47]. Ten to fifteen high scoring homologues that were present in Haemophilus, other Pasteurellaceae, and Gammaproteobacteria species were retrieved, and their multiple sequence alignments were constructed using Clustal X 1.83 [60]. The alignments were visually inspected to identify any conserved inserts or deletions (indels) that are restricted to the particular clades of the family Pasteurellaceae, which are flanked on each side by at least 5-6 identical/conserved residues in the neighbouring 30-40 amino acids. The selected sequences containing the indels and their flanking conserved regions were further evaluated by detailed Blastp searches International Journal of Genomics 3 to determine species distribution and group specificity. The results of these Blast searches were processed using Sig Create and Seq Style to construct signature files [59]. Due to space constraints, the sequence alignment files presented here contain sequence information for a limited number of species within the order Pasteurellaceae and a representative selection of outgroup species. However, in each case, all members of the order and outgroups exhibited similar sequence characteristics to the representatives.

Phylogenetic Analysis of the Pasteurellaceae.
Elucidating an accurate phylogeny of the members of the family Pasteurellaceae has been a long standing challenge in Pasteurellaceae research [10-12, 18, 19]. Early 16S rRNA based studies revealed that the established taxonomy of the family Pasteurellaceae was not consistent with their genetically inferred phylogeny [12,14]. This has led to a long series of taxonomic revisions within the family Pasteurellaceae, a process which is still taking place today [7,18,22,28]. However, it was subsequently discovered that phylogenetic trees of Pasteurellaceae species based on different genes did not completely agree with each other [15,16,31]. In particular, phylogenetic trees based on the 16S rRNA gene, often considered the gold standard in bacterial taxonomy and phylogeny [61,62], disagreed with highly robust multilocus sequence and concatenated protein sequence based phylogenetic trees [9,17,19,20,53,63].
Phylogenetic trees based on concatenated sequences for a large number of unlinked and conserved loci are more reliable and robust than phylogenetic trees based on any single gene or protein [64,65]. Due to a rapid increase in the availability of genomic sequence data, we are now able to complete genome scale phylogenetic analyses of the family Pasteurellaceae which cover a vast majority of the diversity within the family. In this work we have produced a phylogenetic tree for 74 genome sequenced members of the family Pasteurellaceae based on 128 080 aligned amino acid positions (Figure 1(a)). The branching patterns of the core genome phylogenetic tree produced in this work largely agree with a previous genome based phylogenetic tree produced for a limited number of Pasteurellaceae species [19] and a concatenated protein based phylogenetic tree of the family Pasteurellaceae produced by our lab in a previous study [20]. Additionally, we have also produced a multilocus sequence based phylogenetic tree using the 16S rDNA, infB, recN, and rpoB genes which are commonly used in the phylogenetic analysis of the family Pasteurellaceae (Figure 1(b)) [15-17, 29, 30]. This tree also showed broadly similar branching patterns to past multilocus sequence based phylogenetic trees [17,18] and to our core genome based phylogenetic tree. Both our core genome based and multilocus sequence based phylogenetic trees provide evidence for a division of the Pasteurellaceae into at least two higher taxonomic groups (families) which are broadly similar to the two clades of Pasteurellales identified in our previous work [20]. A similar division of the family Pasteurellaceae into two or more large groups is seen in many other robust multilocus or concatenated protein based phylogenetic trees [17,19,53,63]; however, this division is not readily apparent in phylogenies based on the 16S rRNA gene [9,66].
A majority of the known genera within the family Pasteurellaceae form well-defined and coherent clusters in phylogenetic trees ( Figure 1) [9,17,19,20,66]. The genera Actinobacillus, Haemophilus, and Pasteurella, which were described before the advent of genetic characterization, exhibit polyphyletic branching in all gene and protein based phylogenetic trees, including the core genome based and multilocus sequence based phylogenetic trees created in this work ( Figure 1). However, there are large clusters of Actinobacillus, Haemophilus, and Pasteurella species identifiable in the phylogenetic trees which represent the core or "sensu stricto" members of each genera. The clusters of species that represent Actinobacillus sensu stricto, Haemophilus sensu stricto, and Pasteurella sensu stricto are indicated in Figure 1. Members of each genera which fall outside of the sensu stricto clusters, indicated in our phylogenetic trees by the presence of square brackets around their genus name (e.g., [Pasteurella] pneumotropica), are only distantly related to the sensu stricto members of their genus and will require reclassification in order to make their taxonomy and phylogeny concordant.

The Usefulness of Conserved Signature Indels as Phylogenetic and Taxonomic
Markers. Whole genome sequences are a rich resource for the discovery of molecular signatures which are unique to a group of organisms [67][68][69]. One useful class of shared molecular signatures are conserved signature indels (CSIs), which are insertions/deletions uniquely present in protein sequences from a group of evolutionarily related organisms [59,70,71]. The unique, shared presence of multiple CSIs by a group of related species is most parsimoniously explained by the occurrence of the genetic changes that resulted in these CSIs in a common ancestor of the group, followed by vertical transmission of these CSIs to various descendant species [59,[71][72][73]. Hence, these CSIs represent molecular synapomorphies (markers of common evolutionary decent) which can be used to identify and demarcate specific bacterial groups in molecular terms and for understanding their interrelationships independently of phylogenetic trees [59,[70][71][72]. CSIs have recently been used to propose important taxonomic changes for a number of bacterial groups (namely, Aquificae, Spirochaetes, Thermotogae, Xanthomonadales, and Borrelia) at different taxonomic ranks [69,[74][75][76][77]. In the present work, we have completed comprehensive comparative analysis of Pasteurellaceae genomes (Table 1) in order to identify CSIs that are primarily restricted to the different genera within the family Pasteurellaceae. We have identified 3, 7, and 6 unique molecular signatures which are shared by Actinobacillus sensu stricto, Haemophilus sensu stricto, and Pasteurella sensu stricto, respectively. Information regarding these CSIs and their evolutionary significances is discussed below.     Actinobacillus lignieresii, the type species of the genus [24,78]. However, the original classification scheme for the genus Actinobacillus led to the inclusion of a highly heterogeneous and polyphyletic grouping of species within the genus [12][13][14]. An assemblage of Actinobacillus species closely related to Actinobacillus lignieresii has been recognized as Actinobacillus sensu stricto (i.e., the core members of the genus Actinobacillus) in both our phylogenetic analysis ( Figure 1) and past phylogenetic analyses [12][13][14]17]. Differentiation of Actinobacillus sensu stricto from other Actinobacillus species and the modern criteria for placing novel species within the genus Actinobacillus sensu stricto is heavily reliant on genetic and genomic criteria, namely, DNA-DNA hybridization values, 16S rRNA sequence similarity, and other single gene sequence comparisons [13,18]. There are currently no known discrete characteristics which are unique to Actinobacillus that define the genus. In this work, we have completed a comprehensive comparative analysis of Pasteurellaceae genomes in order to identify unique, defining molecular signatures for different genera within the family Pasteurellaceae. We have identified 3 CSIs which are unique, defining molecular signatures for the sequenced members of Actinobacillus sensu stricto (namely, Actinobacillus capsulatus, A. pleuropneumoniae, A. suis, and A. ureae). An example of a CSI specific for Actinobacillus sensu stricto is shown in Figure 2. The CSI consists of a 1-amino-acid insertion International Journal of Genomics 7 Table 2: Conserved signature indels specific for genera within the family Pasteurellaceae.

Protein name
Gene name GenBank identifier Figure Table 2(A).

Molecular Signatures Specific for Pasteurella sensu stricto.
The genus Pasteurella is highly heterogeneous and polyphyletic ( Figure 1) [13]. Similar to the members of Actinobacillus, bacterial isolates were originally classified as members of the genus Pasteurella based on growth factor independent growth and phenotypic or biochemical similarity to Pasteurella multocida, the type species of the genus [78,81]. The monophyletic clusters of Pasteurella species that branch with Pasteurella multocida are considered the core members of the genus (namely, Pasteurella sensu stricto) [9,13,16,17]. Our comparative analysis of Pasteurellaceae genomes has led to the identification of 6 CSIs which are unique characteristics for the sequenced members of Pasteurella sensu stricto (namely, Pasteurella multocida and P. dagmatis). An example of a CSI uniquely found in the sequenced members of Pasteurella sensu stricto, consisting of a 4-amino-acid insertion in a conserved region of Menaquinone-specific isochorismate synthase, is shown in Figure 4. This CSI is only found in the sequenced members of Pasteurella sensu stricto and is absent from all other sequenced Gammaproteobacteria. Partial sequence alignments for 5 additional CSIs which are also unique characteristics of Pasteurella sensu stricto are presented in Supplemental Figures 9-13 and their characteristics are briefly summarized in Table 2 472335016  525657229  261308718  528822364  526467663  452088105  481064088  523435054  523448251  528824940  528874222  528877290  528878523  575444678  575442543  575442169  575448350  652755709  470166988  219870684  491990832  497813944  492352703  407693344  52424698  332289959  491699858  345429694  68249417  490274190  490205456  493874411  237809379  283785958  487492822   33 Figure 5: A partial sequence alignment of (a) a nhaC family sodium:proton antiporter containing a 3-amino-acid insertion specific for all sequenced species of the genus Aggregatibacter (b) a methyl-galactoside ABC transporter substrate-binding protein containing a 1amino-acid deletion specific for all sequenced species of the genus Mannheimia. In each case, the identified CSIs were only found in the sequenced members of the genera Aggregatibacter or Mannheimia and were absent from all other sequenced Gammaproteobacteria. Sequence information for other CSIs specific to Aggregatibacter or Mannheimia is presented in Supplemental Figures 14-19 and their characteristics are summarized in Tables 2(D) and 2(E). distinctly from the "sensu stricto" members of their respective clades [21]. Similarly, the genus Mannheimia was proposed as a novel classification for the Pasteurella Haemolytica complex which did not branch with Pasteurella sensu stricto in phylogenetic trees [7]. Currently other than branching in phylogenetic trees or relatedness in DNA-DNA hybridization studies, the members of the genera Aggregatibacter or Mannheimia do not share any single unique or defining biochemical or molecular characteristic that can differentiate them from all other bacteria [5,82].
In this study we have identified 4 CSIs that are unique molecular characteristics shared by all sequenced species of the genus Aggregatibacter and another 4 CSIs which are uniquely found in all sequenced members of the genus Mannheimia. Examples of CSIs specific to the sequenced members of Aggregatibacter and Mannheimia are shown in Figure 5. A partial sequence alignment of a nhaC family sodium:proton antiporter containing a 3-amino-acid insertion specific for all sequenced species of the genus Aggregatibacter is shown in Figure 5(a) and a partial sequence alignment of a methyl-galactoside ABC transporter substratebinding protein containing a 1-amino-acid deletion specific for all sequenced species of the genus Mannheimia is shown in Figure 5(b). In each case, the identified CSIs were only found in the sequenced members of the genera Aggregatibacter or Mannheimia and were absent from all other sequenced Gammaproteobacteria. Partial sequence alignments additional CSIs specific for the genera Aggregatibacter or Mannheimia are provided in Supplemental Figures 14-19 and their characteristics are summarized in Tables 2(D)-2 (E). These CSIs are the first discrete molecular characteristics which are unique for the genera Aggregatibacter and Mannheimia and support their observed monophyly in phylogenetic trees. Additionally, these CSIs could be useful targets for the development of PCR based diagnostic assays for the genera Aggregatibacter and Mannheimia which amplify the CSI containing DNA segment using the conserved flanking regions of the CSIs [83,84].

Conclusion
The genera Actinobacillus, Haemophilus, and Pasteurella, within the family Pasteurellaceae, are known to exhibit extensive polyphyletic branching. We have utilized molecular signatures and phylogenetic analyses to clarify the taxonomic boundary of these genera. We have been able to identify large clusters of Actinobacillus, Haemophilus, and Pasteurella species which represent the "sensu stricto" members of these genera. We have identified 3, 7, and 6 unique molecular signatures which are specifically shared by the members of the genera Actinobacillus sensu stricto, Haemophilus sensu stricto, and Pasteurella sensu stricto, respectively. The group specificity of the molecular signatures we have identified in this work is summarized in Figure 6 and their characteristics are briefly summarized in Table 2. Our comparative genomic analyses have not come across any CSIs that were unique characteristics of all sequenced members of the genera Actinobacillus, Haemophilus, or Pasteurella as currently defined, suggesting that the members of these genera that do not  fall into the "sensu stricto" clusters should not be considered members of their respective genus. Examinations of phenotypic and biochemical characteristics do not provide a reliable means of assigning a novel isolate to the genera Actinobacillus, Haemophilus, and Pasteurella [18]. However, based upon the CSIs described in this work, it is now possible to demarcate the genera Actinobacillus sensu stricto, Haemophilus sensu stricto, and Pasteurella sensu stricto on the basis of the presence or absence of unique molecular signatures. It is important to note that the current analysis of CSIs is limited to the currently available genomic sequence data and may show slight variance as additional bacterial genomes are sequenced. However, earlier work on CSIs for other groups of bacteria provides evidence that the identified CSIs have strong predictive value and will likely be found in other members of these groups as more species are sequenced and novel species are isolated [74,77,85,86]. The conserved nature of the sequence regions that contain these CSIs, in conjunction with their strong predictive value, makes CSIs promising targets for the development of highly specific diagnostic assays for Actinobacillus sensu stricto, Haemophilus sensu stricto, Pasteurella sensu stricto, Aggregatibacter, and Mannheimia [83,84]. Additionally, further analysis of these genus specific CSIs should lead to the discovery of their functional role in their respective organisms and may provide important insights into novel distinguishing features of these groups of organisms.