Phylodynamics and Molecular Mutations of the Hemagglutinin Affecting Global Transmission and Host Adaptation of H5Nx Viruses

,

Although the molecular determinants for H5Nx viruses to cross interspecies barriers are still not completely understood, the impact of mutation in hemagglutinin (HA) as a key determinant of host adaptation was observed in previous studies [14]. Evolutionary dynamics of the HA protein regarding HA receptor binding specifcity play a central role in the infuenza viruses (AIVs) switch from avian to mammalian hosts [15][16][17][18]. Specifcally, nonsynonymous mutations in the proximity of the HA receptor binding site (RBS) appear to enhance host adaptation and aerosol transmission of H5N1 viruses in mammals [19][20][21]. Naturally occurring mutations within the 130-and 220loops of H5N1 HAs (such as S137A and K222Q) could increase the afnity for human-like α2,6-linked sialic acid (SA) receptors [22,23]. Residues increasing α2,6 SA binding of the H5N1 HA and lysine at PB2 residue 627 (K627) presented a way to achieve airborne transmission between mammals [16]. It was also reported that human-isolated H5N1 and H5N6 viruses might recognize both avian-like α2,3 and α2,6 SA receptors but could not sustain respiratorydroplet transmission in ferrets [24]. In addition, the increasing antigenic diversity in HA and NA can also potentially broaden the host ranges of AIVs because immune pressure in diferent animal hosts can partially drive the selection and preservation of genetic mutations that arise in both surface glycoproteins [25,26]. Identifying specifc residues within HA and NA by reverse genetics has made it possible to study the efect of certain mutations on the interspecies transmission of H5Nx viruses [16,17,24,27].
Hence, amino acid mutations in HA, especially around the RBS region, of H5Nx variants should be carefully monitored in terms of global public health. Tis study investigated the molecular and evolutionary dynamics of H5Nx HAs using the complete genomic sequences of the HA gene in recently expanded H5 (sub) clades. Te phylodynamics were used to investigate the genealogical relationship of H5Nx HAs, and the selection pressure profle. Te genetic diversity in the key antigenic sites and molecular changes in the HA crystal structure were assessed to investigate the evolutionary dynamics of H5Nx HAs on the molecular level and its potential efects on the viral characteristics of H5Nx viruses.

Data Collection and Phylogenetic Analyses.
A total of 4,069 complete HA sequences of avian and human H5Nx viruses were collected from the database of the National Center for Biotechnology Information (Bethesda, MD, USA) (n = 2,690) and the Global Initiative on Sharing Avian Infuenza Data (GISAID) (n = 1,379) from 1996 to July 2022. We manually inspected HA sequence data and initially selected 3,890 HA sequences of avian and human H5Nx viruses (H5N1 subtype, n = 1,733; H5N2, n = 659; H5N3, n = 90; H5N4, n = 5; H5N5, n = 53; H5N6, n = 379; H5N8 n = 947; and H5N9, n = 24) (Supplemental Table 1 and Supplemental Data S1). Te random subsampling was performed only from HAs containing complete genomic information based on the subtype clades from 0 to 9 based on H5 subclades nomenclature [28], year of isolation (1996-2022), isolation region, and host using an in-house command-line script. Finally, our study used 739 HA sequences including 655 avian and human H5Nx HAs and 84 references HAs (Supplemental Table 2 and Supplemental Data S2).
Multiple alignment using fast Fourier transform (MAFFT) (v7.419) was used to align our HA sequences [29]. Te phylogeny of 739 HA sequences was reconstructed by the time-framed Bayesian inference method implemented on Bayesian evolutionary analysis by sampling trees (BEAST) (v1.10.4) [30]. Te general time-reversible + I + G nucleotide substitution [31] and lognormal uncorrelated relaxed clock [32] [33]. Te lineage-specifc evolutionary rate and divergence times of nucleotide substitutions of the HA genes in four datasets were also estimated and summarized based on the posterior probability of sampling phylogenies.

Selection Pressure Profles Analysis.
Natural selection pressure at individual codon sites was estimated from the ratio of nonsynonymous (dN) vs. synonymous (dS) nucleotide substitutions per site (dN/dS, also known as ω) using online Datamonkey site [34]. Te estimated ω (dN/dS) lower than 1 indicated an overall negative or purifying selective pressure and the ω higher than 1 signifed that the amino acid position is under positive selection. Te individual sites under either positive or negative selection pressure were detected using single-likelihood ancestor counting (SLAC) [35], fast unconstrained Bayesian approximation (FUBAR) [36], and mixed-efects model evolution (MEME) [37] models with the cutofs of p value <0.1 (SLAC and MEME) and posterior probability >0.9 (FUBAR).  Table 3 and Supplemental Data S3).  [39] viruses using PyMOL Molecular Graphics System v2.1.0 (Schrödinger, Inc., New York, NY, USA). Te conserved amino acids at Y95, W153, H183, and Y195 lining the base of each HA RBS were exposed [40], and molecular interactions within 3.0Å of specifc amino acid residues around the HA RBS were investigated.

Phylogenetic Relationships of H5Nx
HAs. Our MCC phylogeny of the HA gene showed that global H5Nx viruses formed a polyphyletic topology with continuous generation and extinction of sublineages. Our study observed independent diversifcation of two major clades 2.3.2.1 and 2.3.4.4 following the recent H5 nomenclature [28], and each subclade heterogeneously involved multiple genotypes and human infection strains ( Figure 1

Temporal Dynamics of the Genetic Diversity in HAs.
Here, we generated lineage through time (LTT) plots and examined the gradual lineage diversifcation rate following a coalescent-based Bayesian model separately for clades 2.  (Table 1).
Among the mutations around the RBS of various H5 subtypes, the substitution ratios of six mutations (A131T, S133L, S137A, D187N, K193N, and R222Q) were with higher proportions in the HAs of the H5N6 and H5N8 subtypes ( Figure 3(a), Supplemental Figure 1(a)) and appeared to continuously increase over the past decade ( Figure 3(b), Supplemental Figure 1(b)). However, most of these mutations were not detected in the HAs of H5N2, H5N3, H5N5, and H5N9 subtypes before 2020 (Figure 3(a)). Specifcally, RBS mutations, such as Δ130, Δ133, and I155T mutations, were not found in HAs of the H5N8 (Figure 3(a)) but were detected in 50-80% of the H5N6 and frequently detected in the HAs of human-isolated H5N6 of subclade 2.3.4.4d and 2.3.4.4h (Supplemental Table 3). We noted that the proportion of the A189E mutation increased after 2014 (Figure 3(b)) and was retained in the HAs of two independent subclades 2.3.4.4b and 2.3.4.4h after 2016 (Supplemental Table 3). Importantly, the subtle molecular changes appear to be dominant in the H5N6 and H5N8 HAs (Figure 3(a)). Deletions at residues 130 (Δ130) and 133 (Δ133) were detected mostly in the H5N6 subtype HAs (Figure 3(a), Table 2), where Δ130 was in increased proportions between 2013 and 2020, while Δ133 increased between 2017 and 2019 and then decreased (Figure 3(b)). Mostly observed in the H5N6 HAs, the I155T mutation also showed an increased proportion between 2013 and 2016. We further noted that the proportions of I155T and T160A mutations in human-isolated H5Nx HAs exhibited a similar pattern from 2006 to 2016 (Supplemental Figure 2). However, the number of human H5Nx HAs with the T160A mutation increased after 2017 (Supplemental Figure 2), while I155T proportions were dramatically reduced, as this mutation was not observed in subclade 2.3.4.4b (Supplemental Table 3).

Genetic Polymorphism and Natural Selection Pressure in H5Nx
HAs. Our study found that estimates of selection pressures in all four datasets ranged from 0.172 to 0.207, which were lower than 1 (  (Table 3).   Figure 3). Furthermore, we found that the P185S mutation might alternatively interact with residues Y95, H183, and E190 residues within the RBS pocket of the simulated CS1 HA (Figure 3(c)). Within the RBS of SC26221 HA, however, in the presence of L133 and I155, the S185 amino acid physically interacted with the Q191 sidechain ( Figure 3(e)). CS1 HA simulation (Figure 3(a)) also showed that Δ130 in CS1 generates a potential N-linked glycosylation (NLG) site at N128 (Table 2, Figure 3(c)), which is in proximity to the previously well-known NLG site at N158 in the HA of VN1194 (Figure 3(d)).

Discussion
Our study explored the phylodynamics of infuenza H5Nx HAs, detected key determinants on the molecular evolutionary level, and tried to unravel the impact of reassortment with the NA genes with the potential of the emergence of novel zoonotic strains. While ftness diference among AIVs continuously afects the dominance of variant selection through the rapid extinction of variants with low ftness, a variant with benefcial antigenic mutations efectively becomes a dominant strain and generates new sublineages. Te HA has been considered a key gene segment that contributes to the ftness of variants due to the impact of its mutation on hosts' partial immunity and transmissibility [43]. Our approach based on the HA gene sequences found that the H5Nx viruses from the Gs/Gd lineage dominantly expanded and diverged into multiple sublineages with unique adaptive genetic mutations in the HAs through their global spread. Since the HA protein is a constant target for neutralizing antibodies, here, we demonstrated that the HA RBS mutations of H5N6 and H5N8 subtypes would play a key role in  (Figure 1). We observed the accumulation of mutations in key antigenic sites around the HA RBS in the H5N8 viruses (Figure 3(a)). Prominent molecular substitutions were A131T, S133L, S137A, D187N, A189E, K193N, and R222Q, and especially most H5Nx viruses in the clade 2.3.4.4 identifed after 2020 held these mutations (Figure 3(a), Table 2). Te P185S mutation was frst detected in clade 2.3.4.1 and became molecular characteristics of the currently dominant HAs of H5Nx viruses throughout clades 2.3.4.1-4 ( Figure 1). Yet with largely unknown function in the presence of L133 and I155, the S185 amino acid residue might physically interact with Q191, thus generating a pull toward the 190-helix (Figure 3(e)). Te HA RBS structure in subclade 2.3.4.4h H5N6 (CS1), however, revealed that the   Te hydrophobic residues of I155, A131, and L133 are colored gray, the oxygen atoms are red, and the nitrogen atoms are blue. Te potential NLG sites at N128 and N158 are colored orange and residues T131 and T155 are colored cyan. S132, S133, and S185 side chains are colored green. amino group of H183 also likely form H-bonds with the -OH groups of S185 and E190, causing a conformational shift and averting its amino acid residues from the usual physical connection between Y95, H183, and Y195 side chains (the base of the RBS dictate the receptor-binding mode [40,46] (Figures 3(d) and 3(e)). Tis plasticity and physical interactions among Y95, H183, S185, and E190 residues might generate a pull toward the 220-loop in the HA of the H5N6 subtype (Figure 3(c)), therefore, are readily accessible for the SA receptors. An increased polarity in amino acid residues T131, S132, S133, T155, and S185 and a shorter 130-loop (ΔE130) would also be a reason for a more fexible RBS pocket in the human H5N6 HA (Figure 3(c)). Tus, the P185S mutation is a unique and signifcant determinant in the evolution of H5Nx HAs forming novel RBS plasticity. H5Nx viruses with deletions in the 130-loop, such as ΔE130 or ΔL/S133, in combination with I155T, might have a potential for zoonotic infections because these were detected in the HAs of H5N1 viruses instigating sporadic human infections between 2007 and 2014 [27,47]. High proportions of deletion in 130-loop and I155T are rarely observed in subtypes other than H5N6 in the past decade (Figure 3(b)), and ΔE130 and I155T in the HAs were notable in the avian and human-isolated H5N6 viruses of subclades 2.3.4.4d and 2.3.4.4h (Table 2, Supplemental Table 3). Te proportions of strains with these two substitutions increased since 2013 and then declined after 2017, which coincided with the period of subclades 2.3.4.4d-h expansion. Another deletion ΔL133 was also reported to enhance the α2,6 SA receptor recognition [21,47] and perhaps mimics the shorter 130-loop in H2/H3 HAs [48]. Terefore, the dynamics of H5Nx avian infuenza viruses in wild migratory birds with these molecular features should be carefully monitored to prevent the risk of the emergence of zoonotic infuenza [49].
Introduction of potential NLG at N128 and/or N158 was detected in the HAs of many human-isolated H5N1 and H5N6 viruses (Supplemental Table 3), whereas NLG at N144 was a characteristic of H5N1 HAs in clade 2.3.2.1 ( Table 2). NLG is an important molecular modifcation due to its impact on the modulation of glycoprotein stability and virus pathogenicity [50,51]. Since amino acid residues 128 and 158 are structurally close to each other (Figures 3(c) and 3(d)), potential NLGs at N128 and N158 might be mutually exclusive. Sun et al. demonstrated that, although NLG at N128 is positioned away from the RBS, it may improve viral ftness by stabilizing the HA trimeric structure [51]. Te impact of the lack of NLG at position N158 around the HA globular head on the pathogenicity of H5 AIVs in mammals remains controversial. Yen et al. [52] and Zhao et al. [53] found that NLG at N158 can increase virulence in mice but does not afect receptor binding preferences. However, Zang et al. [54] and Yin et al. [55] reported that a lack of NLG at N158 enhances virulence in mice. Recently, Gao et al. also showed that NLG at N158 increases H5N6 pathogenicity by inducing higher levels of infammatory factors in mice [50]. Tese observations suggest that a functional NLG site with a shorter 130-loop might be an important mechanism for adaptation and increased pathogenicity in mammalian hosts.
Te lineage diversity plot for clades 2.  [56] and provides evidence that the expansion of one genetic group is gained at the expense of another [57]. We also noted a small increase in the genetic diversity after 2019 was characteristic for clades 2.  (Figure 1). It is still unclear how the maintained balance between HA and NA in the H5N1 subtype suddenly shifted to other stable combinations with various NA subtypes, resulting in the emergence and global spread of H5Nx [59,60]. An NLG camoufage in major HA epitope regions and a shorter 130loop might contribute to the adaptation and evasion of H5Nx viruses from the host immune response [51]. Te shorter 130loop might also shift the sialic acid's sitting position and strengthen the epistatic interactions between the host receptor and HA RBS [48]. As shown previously, a combination of shortened 130-loop and polar amino acid residues T131, S132, and T155 (Figure 3(c)) in the H5N1 or H5N6 HAs could enhance α2,6 SA receptor recognition [47]. Interestingly, these were not observed in the recently reported HAs of humanisolated H5N1 and H5N6 viruses in subclade 2.3.4.4b (Supplemental Table 3), which also suggests an improved genetic and functional balance with the NAs of N1 and N6 subtypes.
Te human mortality rate with H5Nx viruses is higher than that with other infuenza subtypes, including seasonal infuenza viruses [61]. Te evolving H5Nx viruses are becoming increasingly virulent in mammalian models [1,62,63]. Specifcally, the HAs of the H5N1 and H5N6 subtypes isolated after 2020 in the subclades 2.3.4.4b and 2.3.4.4h showed an afnity for both α2,3 and α2,6 SA receptors and high replication capacity in mice [1,64]. Although human-to-human transmission of the H5Nx viruses was very limitedly reported, these H5Nx viruses would have a potential for host adaptation in humans through the accumulation of genetic mutations in the HA key antigenic sites and its functional interaction with the NA as H5N1 and H5N6 viruses showed. Due to the low herd immunity of humans against such novel H5Nx variants, global public health authorities should carefully monitor molecular changes in novel H5Nx viruses. Tough our study solely focused on the genetic characteristics of the H5Nx HAs, our fndings would be still meaningful because the molecular characteristics of H5Nx HAs in the present study will provide valuable insight for better surveillance to precisely monitor key genetic mutations and their potential for zoonotic infuenza H5Nx emergence through both high transmissibility and host adaptation.

Conclusions
Te present study highlighted unique evolutionary characteristics of the HAs of currently circulating infuenza H5Nx viruses. Te global spread of the H5Nx viruses promoted the expansion of their genetic diversity, and diverse viral variants continuously gained adaptive advantages by harboring accumulated mutations in HAs and by the genetic shift. Our study suggests key molecular mutations and evolutionary dynamics of the HAs of HPAI H5Nx viruses. We believe this study will facilitate step-changes in our understanding of molecular virology as well as the evolution of H5Nx viruses for the better molecular monitoring of novel infuenza variants.

Data Availability
All the data in this study are included within the article and supplementary information fles. Te HA genomic data of H5Nx viruses are obtained from the database of Infuenza Virus Resource (https://www.ncbi.nlm.nih.gov/genomes/ FLU/Database/nph-select.cgi?go=database) and the EpiFlu database of Global Initiative on Sharing Avian Infuenza Data (GISAID; https://gisaid.org).

Disclosure
Te funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflicts of Interest
Te authors declare that they have no conficts of interest.