Identification of Influenza A/H7N9 Virus Infection-Related Human Genes Based on Shortest Paths in a Virus-Human Protein Interaction Network

The recently emerging Influenza A/H7N9 virus is reported to be able to infect humans and cause mortality. However, viral and host factors associated with the infection are poorly understood. It is suggested by the “guilt by association” rule that interacting proteins share the same or similar functions and hence may be involved in the same pathway. In this study, we developed a computational method to identify Influenza A/H7N9 virus infection-related human genes based on this rule from the shortest paths in a virus-human protein interaction network. Finally, we screened out the most significant 20 human genes, which could be the potential infection related genes, providing guidelines for further experimental validation. Analysis of the 20 genes showed that they were enriched in protein binding, saccharide or polysaccharide metabolism related pathways and oxidative phosphorylation pathways. We also compared the results with those from human rhinovirus (HRV) and respiratory syncytial virus (RSV) by the same method. It was indicated that saccharide or polysaccharide metabolism related pathways might be especially associated with the H7N9 infection. These results could shed some light on the understanding of the virus infection mechanism, providing basis for future experimental biology studies and for the development of effective strategies for H7N9 clinical therapies.


Introduction
Influenza is one of the most dangerous contagions worldwide and is still a serious global health threat. In the spring of 2013, a novel Influenza A virus subtype H7N9 (A/H7N9) broke out in China and quickly spread to other countries [1][2][3]. As of 11 August 2013, 136 human infections had been laboratoryconfirmed, with 44 deaths.
The Influenza A viruses (IAVs) are classified into subtypes according to a combination of 16 hemagglutinin (HA: H1-H16) and 9 neuraminidase (NA: N1-N9) surface antigens [4]. Genomic signature and protein sequence analyses revealed that the genes of this A/H7N9 virus were of avian origin [5][6][7]. The six internal genes were derived from the avian Influenza A/H9N2 strain, whereas the haemagglutinin (HA) and neuraminidase (NA) gene segments were from viruses of domestic duck or wild birds [2,3,8].
Generally, most avian influenza viruses (e.g., subtypes H5N1, H9N2, H7N7, and H7N3) are of low pathogenicity [4], possibly because avian viruses are inefficient at binding to sialic acid receptors located in human upper airways [5]. However, by comparison, the novel reassortant A/H7N9 seems to cross species from poultry to human more easily [5]. The recombinant has mutations in the hemagglutinin protein, which is associated with potentially enhanced ability to bind to human-like receptors. A deletion in the viral neuraminidase stalk may be also responsible for the change in viral tropism to the respiratory tract or for enhanced viral replication. Mammalian adaptation mutations are also observed in the polymerase basic 2 (PB2) gene of the virus [2,9]. These are thought to be correlated with the increased virulence and the better adaptation to mammals of A/H7N9 than other avian influenza viruses [10].
No vaccine for the prevention of A/H7N9 infections is currently available [11]. Although preventing further spread of the infection is important, new drug and vaccine development are also vitally needed for the antiviral treatment. However, viral and host factors associated with the infection of this reassortant are poorly understood [5], which is an obstacle to fight against H7N9. The difficulty is increased by the unusual characteristics from hallmark mutations in the virus, differing from other avian IAVs. Therefore, it is meaningful to identify H7N9 infection-related human genes, which could be used as biomarkers for early diagnosis and targets for new drug development.
In the present study, we proposed a new method for identifying H7N9 infection-related human genes based on a protein-protein interaction (PPI) network. So far the PPI data have been widely used for gene function predictions. The "guilt by association" rule, which was first proposed by Nabieva et al. [12], suggests that interacting proteins share the same or similar functions and hence may be involved in the same pathway. This assumption can be used to identify disease-related genes from existing protein-protein interaction networks. In our previous studies, based on this assumption, we have identified genes related to other diseases, such as the ones mentioned in [13][14][15].
Shortest path and betweenness method are widely used to identify and analyze biomarkers on virus-host interaction networks [16][17][18]. If one protein is on many shortest paths between virus target genes, it has great betweenness and it can disrupt the signal transduction on the network [19,20]. It was found that proteins with great betweenness usually have similar functions with the original seed genes [13,21]. In this study, we used this method to identify potential host response genes to the A/H7N9 virus infection.

Materials and Methods
The overall procedure of our method is illustrated in Figure 1. In the following subsections, details are presented.

Dataset Construction of Target Human Proteins.
The course of the Influenza A/H7N9 infection can be determined by comprehensive protein-protein interactions (PPIs) between the virus and its host (human). In this study, whether a human protein interacted with virus proteins was determined based on the Gene Ontology (GO) database. The Gene Ontology (GO) terms provide information about the biological process, molecular function, and cellular component of a specific protein. A human protein and a protein of H7N9 having at least 1 sharing GO term were assumed to interact with each other and the human protein was called target human protein. Since protein pairs sharing generic GO terms should be ignored, in this study, only GO terms at levels below 3 were considered. That is to say, we excluded the root GO terms ("GO:0008150: biological process", "GO:0005575: cellular component", "GO:0003674: molecular function"), their children, and the children of their children terms. Based on this rule, we constructed a dataset of target human proteins. The detailed description of the procedure was presented below.
Based on the rule of sharing GO terms, 3,212 target human proteins (coded by 1,023 human genes) were picked out, each of which interacted with at least 1 H7N9 virus protein. These virus-human protein pairs were provided in Supporting Information S2, together with the sharing GO terms for each pair. And we summarized the 3,212 target human proteins with their 1,023 related coding genes in Supporting Information S3.

PPI Data from STRING. STRING (Search Tool for
the Retrieval of Interacting Genes) (http://string.embl.de/) [23] is an online database resource which compiles both experimental and predicted protein-protein interactions with a confidence score to quantify each interaction confidence. A weighted PPI network can be retrieved from STRING, in which proteins in the network are represented as nodes, while interactions between proteins are given as edges marked with confidence scores if they are in interaction with each other. Interacting proteins with high confidence scores in such a PPI network are more likely to share similar biological functions than noninteractive ones [23][24][25]. This is because the protein and its interactive neighbours may form a protein complex performing a particular function or may be involved in the same pathway.
We constructed a graph with the PPI data from STRING (version 9.0). In such a graph, proteins were represented as nodes; however, the weight of each interaction edge was assigned a value rather than a confidence score ( ). The value was derived from the confidence score according to the equation = 1000 × (1 − ). Thus, the value can be considered as representing protein distances to each other: the smaller the distance, the higher the interaction confidence score and the more similar the functions they have.
In this study, we analyzed in such a graph every two protein interactions in the target human protein dataset.

Shortest Path
Tracing. The Dijkstra algorithm [26] were used to find the shortest paths in the graph between every two proteins in the target human protein dataset, that is, the shortest paths between each of the 3,212 proteins to all the other 3,211 proteins in the graph. The Dijkstra algorithm was coding genes were regarded as potential infection-related genes Pick out the top 20 proteins which had betweenness > 10, 000. The related Figure 1: The flowchart of the method developed in this study to identify the Influenza A/H7N9 infection-related human genes. Target human proteins interacting with the Influenza A/H7N9 virus were obtained based on sharing GO terms. Shortest path proteins were calculated from the shortest paths between every pair of the target human proteins, by searching by the Dijkstra algorithm in the network constructed from STRING. Finally 20 shortest path proteins were screened out with betweenness >10,000, the related genes of which were considered as infection-related human genes.
implemented with R package "igraph" [27] (no parameters needed to be set in this algorithm). Then we get all proteins existing on the shortest paths (962 proteins, called Shortest Path Proteins) and ranked these proteins according to their betweenness. Results can be found in Supporting Information S4. The top 20 proteins (20 genes) with betweenness over 10,000 were picked out and the 20 corresponding coding genes were regarded as potential H7N9 infection-related human genes.

KEGG Pathway Enrichment Analysis.
The functional annotation tool DAVID [28] was used for KEGG pathway enrichment analysis (all parameters were selected as default). The enrichment value was corrected to control family-wide false discovery rate under a certain rate (e.g., ≤0.05) with the Benjamin multiple testing correction method [29]. All human protein-coding genes were regarded as background during the enrichment analysis.

Comparison with Another Two Species of Viruses.
To further understand the Influenza A/H7N9-human interaction, we compared the results of the potential H7N9 infection-related human genes obtained above with those identified from another two species of viruses: human rhinovirus (HRV) and respiratory syncytial virus (RSV), which are also causing human acute respiratory infections. The same procedure of our method presented above was performed on the two species of viruses as that on H7N9.
All protein sequences of HRV and RSV viruses were downloaded from NCBI protein database (http://www.ncbi. nlm.nih.gov/). After removing those with sequence identities >40%, the proteins left were listed in Supporting Information S5, S6, respectively. The virus-human protein pairs were also provided in Supporting Information S2, and the target human proteins with their coding genes were also summarized in Supporting Information S3. 1,904 and 9,846 shortest path proteins were obtained from HRV and RSV virus, respectively, after computing shortest paths, given also in Supporting Information S4. The numbers of proteins and genes for the three species of viruses at each step were summarized in Table 1.
We selected betweenness threshold as 10,000 for the shortest path proteins of H7N9. However, the threshold should be different for the other two species of viruses since the numbers of target human proteins were different. We standardized the betweenness threshold for HRV and RSV viruses on that for H7N9 virus in this study.
Shortest paths were computed on every two proteins in target human protein dataset. Denoting the number of target human proteins as , the number of shortest paths was 2 . The average threshold was calculated on H7N9 as Then the betweenness threshold for HRV and RSV was determined by 2 HRV = 2 6985 = 47,299 and 2 RSV = 2 36273 = 1,275,672, respectively. Therefore, the top 11 proteins (11 genes) were picked out for HRV (betweenness > 47,299) while the top 44 proteins (44 genes) were picked out for RSV (betweenness > 1,275,672) from the lists in Supporting Information S4, respectively. And the corresponding 11 and 44 coding genes were regarded as potential infectionrelated human genes for HRV and RSV virus, respectively. The betweenness threshold and the numbers of proteins picked out were also summarized in Table 1.

Sharing GO Terms between H7N9 Proteins and Human
Proteins. H7N9 and human proteins with at least 1 sharing GO term were considered as interacting with each other. 3,212 target human proteins were found as interacting with H7N9 proteins based on this rule. The same procedure was performed on the other two species of viruses for comparison: HRV and RSV. Types of the sharing GO terms and the share numbers of the terms could give information about the interaction between the virus and its host. Thus, a statistical analysis was made on the sharing GO terms from Supporting Information S2 for each species of virus, respectively. Results were depicted in Figure 2.
From Figure 2, it can be seen that the sharing GO terms and their numbers were apparently different between the three species of viruses, indicating specific properties and different interactions with host during infections.
For H7N9, the term "GO:0003723|RNA binding" accounted for the most, indicating important roles of RNA binding proteins in the PPI interactions between H7N9 and human, which was consistent with the observations in Influenza A viruses in the literature [30][31][32]. As shown in Figure 2, H7N9 and HRV both fell into the significant term "GO:0003723|RNA binding, " indicating that RNA binding was essential between virus-human proteins during the infection of the two viruses. However, RSV was not presented in such a term. It was possibly suggested that H7N9 and HRV had such a specificity that could be different from RSV, although all the three are RNA viruses. Several other GO terms indicated specific and important virus-human protein interactions for H7N9 infection, such as "GO:0005975|carbohydrate metabolic process, " "GO:0015078|hydrogen ion transmembrane transporter activity, " and "GO:0015992|proton transport. " Nevertheless, 3 terms of H7N9 were the same as those of HRV ("GO:0003723|RNA binding, " "GO:0019079|viral genome replication, " and "GO:0003968|RNA-directed RNA polymerase activity"), and 2 terms as RSV ("GO:0003968|RNA-directed RNA polymerase activity, " "GO:0019031|viral envelope"), indicating similar processes of the infections between the three viruses.

Potential H7N9 Infection-Related
Genes. The shortest paths were calculated between each pair of the 3,212 proteins. All proteins were picked out with their betweenness from the shortest paths, given in Supporting Information S4. We selected the top 20 proteins with betweenness over 10,000 and ranked them according to their betweenness. The related coding genes of the 20 proteins were also retrieved accordingly (20 genes). These were shown in Table 2. The 20 genes were regarded as potential H7N9 infection-related human genes in this study. Results of potential infectionrelated human genes for HRV and RSV were also listed in Table 2 by the same method as that for H7N9 for comparison. Note that the proteins (genes) listed in Table 2 were all human proteins (genes), not virus. Potential human genes found for the three viruses were also depicted in Figure 3. It was clearly seen from Figure 3 that the potential human genes found were remarkably different in H7N9 infection as compared with those in HRV and RSV, although several sharing genes existed. Thus, these 20 human genes could be closely related to the H7N9 infections. Our further analysis was based on these 20 genes.  The 20 human genes were submitted to the CCSB interactome database to analyze their interactions with viruses (http://interactome.dfci.harvard.edu/V hostome/). Among them, proteins encoded by RANBP2 and GYS1were found to be related to EBV or HPV proteins, such as EBV-BVLF1, EBV-BGLF3, and HPV8-E6. These proteins could also have some relationship with H7N9 infections.
Among the 20 genes, some, such as GAPDH and NXF1, had been well documented to be relevant to H7N9 infections. However, there were also other genes with rare previous association with H7N9 infections reported or that had been only poorly characterized, such as PGK1, GYS1, YBX1, and NUP214.
GAPDH (glyceraldehyde-3-phosphate dehydrogenase) is a housekeeping gene in carbohydrate metabolism. This finding was consistent with the general agreement that GAPDH is an important gene and is widely used in the studies of host gene response to virus infections, including influenza virus infections [33][34][35]. NXF1 (nuclear export factor 1) is one member of a family of nuclear RNA export factor genes. It was reported that viral mRNAs of Influenza A virus were transported to  the cytoplasm by the NXF1 pathway for translation of viral proteins [36]. Not surprisingly, the H7N9 virus exploited the same pathway. YBX1 (Y box binding protein 1) has been found to be an interacting partner of genomic RNA of Hepatitis C Virus, which negatively regulates the equilibrium between viral translation/replication and particle production [37]. NUP214 (nucleoporin 214 kDa) encodes one of nucleoporins composing the nuclear pore complex (NPC), which forms a gateway regulating the flow of macromolecules between nucleus and cytoplasm. Many viruses have been reported to require these mechanisms to deliver their genomes into the host cell nucleus for replication, such as human immunodeficiency virus (HIV) [38], encephalomyocarditis virus [39], and herpes simplex virus [40]. However, reports on NUP214, YBX1 related to Influenza A viruses, were sparse.
Cancer-related genes were also included. BRCA1 (breast cancer 1) encodes a nuclear phosphoprotein that plays a role in maintaining genomic stability, and it also acts as a tumor suppressor. BARD1 (BRCA1 associated RING domain 1) encodes a protein which interacts with the N-terminal region of BRCA1, regulating cell growth and the products of tumor suppressor genes, and may be related to breast or ovarian cancer.
Interestingly, more genes were involved in energy pathways containing glycolysis and gluconeogenesis, such as GPI (glucose-6-phosphate isomerase), PGK1 (phosphoglycerate kinase 1), and TPI1 (triosephosphate isomerase 1). In addition, GYS1 (glycogen synthase 1) encodes a protein catalyzing the addition of glucose monomers to the growing glycogen molecule in starch and sucrose metabolism. GLA (galactosidase) encodes a glycoprotein that hydrolyses the terminal alpha-galactosyl moieties from glycolipids and glycoproteins. Therefore, it was suggested that the H7N9 infection could be probably linked to saccharide or polysaccharide metabolism related pathways. Central metabolism could be strongly affected by virus infections [41]. Janke et al. [42] also found changes in metabolism in cells infected by Influenza A/H1N1 virus, suggesting that fatty acid synthesis might play a crucial role for the virus replication as they acquired lipid. ATP6V1B1 (ATPase, H+ transporting, lysosomal 56/58 kDa, V1 subunit B1) and ATP5B (ATP synthase, H+ transporting, mitochondrial F1 complex, beta polypeptide) were involved in ATP synthase and hydrolysis.  PLK1  PSEN1  PXN  RHO  SF3A2  SHC1  SLC9A3R1  SMAD2  SNCA  SP1  SRC  STAT3  TP53   FYN  GCG  GRB2  GRK5  GSK3B  HDAC1  HIF1A  MAPK1  MAPK8  MDM2  MYC  PARK2  PDGFRB   AKT1  BCL2  CBL  CD2  CD2BP2  CDC25C  CDC42  CFTR  CTNNB1  EGFR  EP300  ERBB2  ESR1   EIF4E  RAN  RPL11   BRCA1  YBX1   POLR2A  PSMD4  UBC   ATP5B  ATP6V1B1  BARD1  DKC1  GAPDH  GLA  GPI  GYS1  NUP214  NXF1  PGK1  RANBP2  SNRPE  TCIRG1  TPI1   LCK  RAC1  TBP   HRV   H7N9 RSV Figure 3: The potential virus infection-related human genes found based on our method for the three species of viruses. 20, 11, and 44 potential infection-related human genes were found for H7N9, HRV, and RSV, respectively. There were 5 sharing genes between those for H7N9 and HRV, and 2 sharing genes between H7N9 and RSV. Other human genes related were not all the same, indicating specific properties or particular characteristics between the infections of the three species of viruses.
From Table 2, it also can be seen that although several genes (PGK1, DKC1, was GLA) were located on Chromosome X, none on Chromosome Y was found in this study. Although earlier findings reported that H7N9 infections preferentially occurred in males, it was suggested from our findings that it may not be so significant. This was also consistent with results of Chen et al. 's work [43], in which they indicated that it did not show any statistically significant differences in clinical outcomes between genders from their logistic regression analysis.

GO Enrichment Analysis of H7N9 Infection-Related Genes.
We performed GO enrichment analysis on these 20 genes. The 20 proteins encoded by the genes were mapped to GO terms on the levels below 3 from Gene Ontology. Totally 504 GO terms were obtained. GO enrichment analysis was performed on these terms. The GO terms and the number of proteins related to each GO term were shown in Table 3. The same procedure was performed on the other two species of viruses for comparison, with results shown in Table 3. Both commonness and differences of GO term enrichment between the three species of viruses existing as described in Table 3.
Form Table 3, it can be seen that 15 out of the 20 H7N9, all the 11 HRV, and 42 out of the 44 RSV infection-related proteins were involved in protein binding (GO:0005515). Protein binding played important roles in both virus infection and host immune responses [44]. This could partially explain why the novel reassortant had more enhanced ability to bind to human receptors than other avian influenza viruses [2,10]. The recombinant proteins could also induce immune responses via protein interactions [45]. Once the host immune system activated, patients would have severe symptoms, such as cough, sputum, fever, and shortness of breath. Many related proteins of the three viruses fell into GO terms "GO:0005829 cytosol" and "GO:0005737 cytoplasm, " since all the three viruses are RNA viruses and replication of RNA viruses usually takes place in cytoplasm.
These were commonness. However, differences or specific characteristics still exist in H7N9-related proteins from those of other two viruses.
Nine of these proteins were enriched in "GO:0044281 small molecule metabolic process" (45.00%) for H7N9, whereas only 1 (9.09%) and 2 (4.55%) proteins were enriched in this term for HRV and RSV, respectively. Furthermore, still many related proteins of H7N9 enriched in "GO:0005975 carbohydrate metabolic process, " "GO:0006006 glucose metabolic process, " "GO:0006094 gluconeogenesis, " and "GO:0006096 glycolysis, " differing from those cases of HRV or RSV. These specific enrichment of GO terms indicated that the H7N9 infection could be especially relevant with human saccharide or polysaccharide metabolism-related pathways.
For H7N9, 3 proteins fell into the term "GO:0015991 ATP hydrolysis coupled proton transport" and 3 proteins into "GO:0015992 proton transport, " but it was not the case for HRV or RSV. Proteins involved in "GO:0005215 transporter activity" and "GO:0055085 transmembrane transport" were also different between the H7N9 infections and the other two viruses.

KEGG Pathway Enrichment
Analysis. KEGG pathway enrichment analysis was also performed on the 20 genes. The KEGG pathway terms and the number of proteins belonging to each pathway term were shown in Table 4.
Only 3 pathways were retrieved. However, all the 3 pathways were specially related to H7N9; that is, none of the 3 pathways appeared in the KEGG results of the other two viruses (data not shown of the KEGG results for the other two viruses).
Form Table 4, it can be seen that 2 out of the 3 pathways were saccharide or polysaccharide metabolismrelated pathways ("Glycolysis/Gluconeogenesis" and "Starch and sucrose metabolism"), suggesting that these types of pathways could play pivotal roles in the H7N9 infections. Another pathway involved was "oxidative phosphorylation. " This pathway could also be important, but it may not so as the former two, since genes involved in this pathway (ATP5B, ATP6V1B1, and TCIRG1) were ranked at the bottom in the gene list in Table 2 according to betweenness.

Conclusion
In this study, we developed a computational method to identify Influenza A/H7N9 infection-related human genes based on the shortest paths in a PPI network. Finally, 20 human genes were screened out which could be the most significant, providing guidelines for further experimental validation. Among the genes, several ones such as PGK1, GYS1, YBX1, and NUP214 were previously reported with rare association with influenza virus infections or had been only poorly