A Cross-Study Biomarker Signature of Human Bronchial Epithelial Cells Infected with Respiratory Syncytial Virus

Respiratory syncytial virus (RSV) is a major cause of lower respiratory tract infections in children, elderly, and immunocompromised individuals. Despite of advances in diagnosis and treatment, biomarkers of RSV infection are still unclear. To understand the host response and propose signatures of RSV infection, previous studies evaluated the transcriptional profile of the human bronchial epithelial cell line—BEAS-2B—infected with different strains of this virus. However, the evolution of statistical methods and functional analysis together with the large amount of expression data provide opportunities to uncover novel biomarkers of inflammation and infections. In view of those facts publicly available microarray datasets from RSV-infected BEAS-2B cells were analyzed with linear model-based statistics and the platform for functional analysis InnateDB. The results from those analyses argue for the reevaluation of previously reported transcription patterns and biological pathways in BEAS-2B cell lines infected with RSV. Importantly, this study revealed a biosignature constituted by genes such as ABCC4, ARMC8, BCLAF1, EZH1, FAM118A, FAM208B, FUS, HSPH1, KAZN, MAP3K2, N6AMT1, PRMT2, S100PBP, SERPINA1, TLK2, ZNF322, and ZNF337 which should be considered in the development of new molecular diagnosis tools.


Introduction
Respiratory syncytial virus (RSV) is a major etiologic agent causing acute lower respiratory infections that can progress to bronchiolitis and pneumonia in children, elderly, and immunocompromised individuals [1,2]. RSV outbreaks are influenced by virus diversity and evolution [3,4], environmental factors [5], and host immunity [6].
The epithelium is the primary site for host-virus interface, where cells recognize pathogen-associated patterns on microbes through innate immunity receptors [7,8]. Indeed, epithelial cells constitute an important line of defense against RSV and other airborne pathogens [9]. They form a physical barrier and produce mucus to inhibit microbes from entering the body. Moreover, they express molecules with antimicrobial properties, as lysozyme, lactoferrin, collectins, and antimicrobial peptides [10]. Two human cell lines have been extensively used to understand the interaction between host and RSV, the alveolar epithelial cell, A549, and one from proximal airways, the bronchial epithelial cell, BEAS-2B.
Genome-wide microarrays are powerful tools to investigate host transcriptional response during infections in the pulmonary epithelium, including those induced by RSV [11,12]. Indeed, two studies evaluated the patterns of gene expression from BEAS-2B cell lines infected with RSV [10,13]. However, it is intriguing that after 4 h of infection Huang and collaborators (2008) found that RSV-modulated genes were only associated with the neuroactive ligandreceptor interaction pathway [13]; in contrast, Mayer and collaborators (2007) identified that the same time of RSV infection of BEAS-2B cells induced transcriptional changes similar to those found for other respiratory pathogens as Pseudomonas aeruginosa [10]. In spite of differences, publicly available microarray data offers an interesting opportunity to reveal common features of RSV induced transcriptional profiles to understand the early response of BEAS-2B cell lines and extend the knowledge on biomarkers of acute infections with this virus. Therefore, those datasets were evaluated in a meta-analysis by fitting linear models for each array probe and Empirical Bayesian approach to detect 2 Advances in Virology transcriptional changes that revealed significant associations with unreported pathways. Of importance, this strategy also rendered a biomarker signature of BEAS-2B cell lines infected with RSV that can be useful for the design of molecular diagnosis tools.

Materials and Methods
The datasets GSE3397 and GSE6802 were obtained from GEO database (http://www.ncbi.nlm.nih.gov/), which compared BEAS-2B cells infected with RSV with control experiments. Only arrays in which cells were infected with RSV for 4 h were selected for further analysis. Raw data were processed using the R Language and Environment for Statistical Computing (R) 3.2.0 [14] and Bioconductor 3.1 [15]. The affy package for R [16] was used to perform quality control when applicable. Data was log 2 transformed and quantile normalization was applied for dataset GSE3397 due the absence of CEL files. The dataset GSE6802 was already RMA normalized. Batch effects were corrected with Combat( ) function [17] of sva package for R [18]. Expression data were weighted with the arrayWeights( ) function from limma package for R [19]. Differential gene expression was also evaluated with limma package for R [19], whereby differentially expressed genes (DEGs) were identified by a false discovery rate (FDR) <0.05. Hierarchical clustering was performed with Euclidian distance for metric calculations and the complete linkage method, which were displayed as heatmaps drawn with gplots package for R [20]. Pathway analyses were performed with the online platform for functional analysis InnateDB [21] and significant pathway overrepresentation was computed with hypergeometrical distribution and Benjamini-Hochberg correction for multiple comparisons. Significantly enriched pathways were determined by a value < 0.05 and FDR < 0.1.

Dataset Selection and Preprocessing Analysis.
To define a robust transcriptional signature of BEAS-2B acutely infected with RSV, two publicly available datasets, GSE3397 and GSE6802, were used to conduct a meta-analysis from which data were extracted for BEAS-2B cells infected with RSV for 4 h and controls. First, background subtracted expression data from GSE3397 (Figure 1(a)) were preprocessed and normalized (Figure 1(b)). However, in a first attempt to conduct differential gene expression analysis using limma [19], there were no statistically significant differences in gene expression. Therefore, principal component analysis (PCA) was used to evaluate the expression profiles of each array and, except for arrays named here Control2 and RSV2, the consistent pattern of clustering in Figure 1(c) suggests a batch effect. After normalization, this effect was even more evident (Figure 1(d)), which led to the speculation that Huang and collaborators (2008) [13] analyzed only three microarray experiments from this dataset based on the assumption that differences found for those microarrays were due to failures in experimental procedures; however they did not consider or correct for batch effects. In view of those facts, the datasets were adjusted with Combat function for R, which removed such effects from GSE3397 expression data (Figure 1(e)). Batch correction of GSE3397 did not change the profiles of arrays Control2 and RSV2; nevertheless, those arrays were included in further analysis because the variation observed in this experiment could have a substantial impact over the final result. Even adverse experimental variations that may change the overall expression patterns of a dataset could be useful to power up the identification of genes that are robustly modulated in BEAS-2B cells infected with RSV. The expression dataset GSE6802 (Figure 1(f)) was also included in the analysis. PCA from expression data extracted from GEO demonstrates that most of the variability between the arrays is explained (76.6%) by the infection with RSV, as the standardized PC1 separates RSV-infected from control arrays (Figure 1(g)), whereas standardized PC2 (11.4%) separates one pair of arrays (RSV 3 and ctrl2) and, although these arrays are supposedly from different batches, clustering features of this axis also suggested a batch effect (Figure 1(g)). log 2 transformation of data impacted the profile of array RSV 1 however did not change the profiles from RSV 3 and ctrl 2 ( Figure 1(h)). Combat( ) function was also applied to the expression dataset GSE6802; however, PCA shows that the adjustment did not to improve further clustering between specific arrays (Supplementary Figure 1; see Supplementary Material available online at http://dx.doi.org/10.1155/2016/3605302). In view of that, downstream analyses were carried out with normalized log 2 transformed data.

Differential Gene Expression.
Next, linear model-based statistical analyses with a FDR < 0.05 were conducted to identify differentially expressed genes (DEGs). The dataset GSE3397 exhibited ninety-four DEGs (Figure 2(a) and Table 1). Those genes are highly discordant from DEGs previously reported by Huang and collaborators (2008) [13], which identified 277 DEGs based on different statistical analysis and assumptions. Fifty genes were downregulated and fortyfour were upregulated ( Table 1). The differences found in this study might reflect the inclusion of all microarray experiments from controls and 4 h after RSV infection; exclusion of expression data from 24 h after RSV infection; distinct preprocessing approaches as normalizing method and batch effect correction; and the assessment of statistical significance with a linear model-based method and corrected values. In contrast, 1965 DEGs were identified for the dataset GSE6802. The top hundred DEGs ranked by fold changes (Figure 2(b) and Table 2) included genes such as JUNB, KLF4, CXCL1, CXCL2, and IL6, which are in agreement with those reported by Mayer and collaborators (2007) [10]. Several factors should account for the notable differences in expression analysis from both datasets. First, different RSV strains were used to stimulate BEAS-2B cells. Second, experimental conditions of controls were also different, as control experiments from GSE3397 were incubated with vehicle (not specified) and those from GSE6802 were not stimulated. Third, despite both datasets being generated with affymetrix microarray platform, those include distinct versions, HU133 plus 2.0 for GSE3397 and HU133A 2.0 for GSE6802.

Functional Analysis.
To obtain a biological interpretation of the transcriptional signature of RSV-infected BEAS-2B cells and compare with those reported by previous studies, enrichment analysis was performed with the online platform for functional analysis InnateDB [21]. Based on a FDR < 0.1, DEGs identified for GSE3397 were enriched in pathways related to Chromatin organization, histone acetylation, signaling by NOTCH, IL1, Integrin-linked kinase signaling, 8

Control_2
Control_4    EPO signaling pathway, VEGF signaling pathway, platelet degranulation, p73 transcription factor network, IL-7 signaling, p53 signaling pathway, and others (Figure 3(a) and Supplementary Data 1). Of interest, Huang and collaborators (2008) [13] reported gene overrepresentation within p53 signaling pathway, but only after 24 h following RSV infection of BEAS-2B cells. After 4 h following RSV infection, Huang and collaborators (2008) [13] only found a significant association with neuroactive ligand-receptor interaction pathway, which was not overrepresented in the present analysis. In contrast, DEGs resultant from dataset GSE6802 were enriched in pathways related to AP-1 transcription factor, ATF-2 transcription factor, IL-6 signaling, SMAD function, signaling by TGFBR, HIF-1 transcription factor, signaling by CD40/CD40L, signaling by MAPK, signaling by innate immune receptors, and others (Figure 3(b) and Supplementary Data 1). Some of those pathways as CD40 signaling are indeed commonly induced by a variety of viral respiratory infections [22], whereas several of those pathways could indicate novel directions for studying the host response against RSV. Six pathways were enriched by DEGs from both datasets, the EPO signaling pathway, FBXW7 Mutants and NOTCH1 in Cancer, IL1, p53 signaling pathway, p73 transcription factor network, and signaling by NOTCH1. The erythropoietin (EPO) gene is a primary target of HIF-1 transcription factor, whereas binding of HIF-1 to the EPO enhancer promoter region induces transcriptional programs that influence inflammation and infection processes [23].
In addition, expression of Dll4, a major NOTCH ligand, is upregulated in dendritic cells infected with RSV, whereas blockage of Dll4 in vivo increased hyperreactivity of airways and mucus secretion that impacted the pathology of the disease, showing a key role of signaling by NOTCH in the regulation of immunity against RSV [24]. Moreover, besides modulations of the p53 signaling pathway by infection of RSV in vitro [10,13], this pathway was found to be upregulated in whole blood of children with lower respiratory tract infection by RSV [25]. Taken together, those data point to key pathways which can impact infections of human bronchial epithelial cells with RSV.  Figure 4). Despite particular features in expression data from both datasets, unsupervised hierarchical clustering analysis based on this signature revealed the formation of robust clusters between RSV-infected or uninfected BEAS-2B cells (Figure 4). Of note, human airway epithelial cells were shown to express ABCC4/MRP4, a transporter for uric acid and cAMP [26].

Meta-Analysis Based
Mucosal production of uric acid was recently linked to particulate matter-induced allergic sensitization [26]; therefore RSV infection could trigger such a response and contribute to the development and severity of allergic responses to particulate matter [27]. Moreover, both ABCC4 and SER-PINA1 are annotated into the platelet degranulation pathway (Figure 3(a)), suggesting a role in antiviral mechanisms from bronchial epithelial cells. After an initial encounter with RSV, the transcriptional activity of human bronchial epithelial cells is reprogrammed to counteract viruses and other pathogens [10], whereas MAP3K2 and ZNF322 are clearly involved on the activation and regulation of MAP kinase signaling pathway [28,29]. Indeed, RSV infection leads to the activation of p38 MAPK [30] and c-JUN kinase pathway, which negatively regulates the production of TNFin human epithelial cells [31] and might contribute to virus evasion from an early immune response. Interestingly, the biosignature also included BCLAF1, a molecule involved in processes as apoptosis, transcription and processing of RNA, and export of mRNA from the nucleus [32]. However, this nuclear protein was also implicated as a viral restriction factor targeted to degradation by human cytomegalovirus [32]. Moreover, EZH1 was shown to be involved in the methylation of histone 3 at lysine 27 (H3K27) of the HIV provirus in resting cells [33] and could thus exert a significant function in infections with RSV, whereby other genes such as N6AMT1, FUS, and PRMT2 are also involved in protein methylation. Indeed, using coimmunoprecipitation and mass spectrometry, recent work demonstrated that RSV nucleoprotein (N) interacts with protein arginine Nmethyltransferase 5 (PRMT5) [34], suggesting that PRMT2 could also interact with RSV proteins and play an important role during infections of human bronchial epithelial cells. Several of the genes identified in this study have been poorly studied in the context of RSV infection, whereby none of them was previously reported as a biomarker of infections by this virus. Of note, except for FAM208B and KAZN, analysis conducted by Smith and collaborators (2012) [22] which included both datasets (GSE3397 and GSE6802) also identified the significant modulation of the genes included in the biomarker signature identified herein.

Conclusions
The combined analysis of distinct datasets from BEAS-2B cells infected with RSV retrieved intriguing results, whereby using powerful statistical methods and assumptions this study identified a new set of biomarkers of early infection with RSV composed by seventeen genes: ABCC4, ARMC8, BCLAF1, EZH1, FAM118A, FAM208B, FUS, HSPH1, KAZN, MAP3K2, N6AMT1, PRMT2, S100PBP, SERPINA1, TLK2, ZNF322, and ZNF337. This transcriptional signature could be useful for the development of molecular diagnosis tools as well as future investigations of processes involved in hostpathogen interactions.