Transcriptomic Analyses Reveal Gene Expression Profiles and Networks in Nasopharyngeal Carcinoma

Background Nasopharyngeal carcinoma (NPC) is a rare but highly aggressive tumor that is predominantly encountered in Southeast Asia and China in particular. Aside from radiotherapy, no effective therapy that specifically treats NPC is available, including targeted drugs. Finding more sensitive biomarkers is important for new drug discovery and for evaluating patient prognosis. Methods mRNA expression datasets from the Gene Expression Omnibus database (GSE53819, GSE64634, and GSE40290) were selected. After all samples in each dataset were subjected to quality control using principal component analyses, the qualified samples were used for additional analyses. The genes that were significantly expressed in each dataset were intersected to identify the most significant of these. Gene functional enrichment analyses were performed on these genes, using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes analyses. The protein–protein interaction network of selected genes was analyzed using the Search Tool for the Retrieval of Interacting Genes database. Significantly, differentially expressed genes were further verified with two RNA-seq datasets (GSE68799 and GSE12452), as well as in clinical samples. Results In all, 34 (8 upregulated genes and 26 downregulated) genes were identified as significantly differentially expressed. The immune response and the regulation of cell proliferation were the most enriched biological GO terms. Using reverse transcription quantitative real-time PCR (RT-qPCR), the genes MMP1, AQP9, and TNFAIP6 were detected to be upregulated, and FAM3D, CR2, and LTF were downregulated in NPC tissue samples. Conclusion This study provides information on the genes that may be involved in the development of NPC and suggests possible druggable targets and biomarkers for diagnosing and evaluating the prognosis of NPC.


Introduction
Nasopharyngeal carcinoma (NPC), which originates in the nasopharynx epithelium, is often observed in the pharyngeal recess. NPC is an aggressive head and neck cancer that is highly prevalent in Southeast Asia, particularly in South China [1]. This distinct geographical distribution aligns with other features that distinguish it from other head and neck cancers [2]. Radiotherapy is the main method used to treat NPC, but recurrence after primary radiotherapy, with or without chemotherapy, and treatment failure is severe, at a rate of around 10-30% [3,4]; this has prompted researchers to seek novel effective therapies, such as a targeted drug therapy that could be combined with radiotherapy.
Studying changes in the gene expression in NPC is an important way to identify biomarkers of early clinical diagnosis, improve evaluation of prognosis, and most importantly, target molecules that can effectively target NPC. Highthroughput sequencing allows for the identification of profiles of gene sets that are changed in cancerous tissues relative to normal controls. A combination of multiple microarray studies and RNA-seq studies could help us rule out bias from a single sequencing run and together can provide us with more solid and reliable results to ground further investigation.
Previous studies have reported single transcriptome analyses of NPC or reanalyses of a few integrated datasets [5][6][7]. In our study, we examined changes in the mRNA expression. To investigate mRNA expression profiles in NPC, we searched the Gene Expression Omnibus (GEO) database [8], obtained all the files relevant to NPC, and reviewed each by case numbers and methods. We excluded records for which the original annotation files are not available. Using strict quality control enabled by principal component analysis (PCA), the three RNA microarray datasets GSE64634, GSE53819, and GSE40290 were used in our primary analyses as discovery sets, including 42 NPC tumor tissues and 14 normal tissue controls in total. We conducted downstream analyses, including functional enrichment and protein-protein interaction analyses, of 34 significantly differentially expressed genes (DEGs). Then, the two datasets GSE68799 and GSE12452 were selected as verification sets to validate these DEGs. Clinical samples were also taken for verification.

Materials and Methods
2.1. Microarray Studies, Data Sets, and Sample Characteristics from the GEO Database. The GEO database was used to search all publicly available data about NPC. Each study in the database was reviewed for whether it met the following criteria: data are from the mRNA expression sequencing or microarray analyses of NPC; datasets are available for download and reanalysis, and data include an NPC sample and a control sample, as well as detailed information on the technique and platform used for each. Following these criteria, six datasets were included in our primary study.

Differential Expression Analyses and Data Visualization.
Next, differential analyses were performed to compare tumor tissues to normal tissues using GEO2R for each revised microarray analysis and the DESeq2 package for the RNAseq data in the R computing environment. Gene lists were filtered by |log 2 FC | >2 and p < 0:05 and then were intersected together. Volcano plots were used to visualize the DEGs.

Gene Ontology and KEGG Pathway Analyses, Functional
Enrichment Analyses, and Protein-Protein Interaction. Gene ontology (GO) and KEGG pathway enrichment analyses were performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, version 6.7). Protein-protein interaction analyses were performed using the database of the Search Tool for the Retrieval of Interacting Genes (STRING), version 11.0 [9], with a medium confidence ≥ 0:4. Genes were clustered with k-means clustering.

Clinical Sample Collection and Total RNA Preparation.
This study was approved by the Research Ethics Committee of Peking University Shenzhen Hospital, and written informed consent was obtained from all patients. Three NPC tissues and three normal nasopharynx tissues were obtained from each patient who had undergone a nasopharynx biopsy from December 2017 to December 2019 at the Peking University Shenzhen Hospital, China. The tissue samples were stored at -80°C in a freezer until total RNA was extracted using a total RNA extraction kit (BioTeke, China, code no: RP1201).
2.5. RT-qPCR Analyses. RT-qPCR was conducted as previously described [10], and 1 μg total RNA was reverse tran-scribed in a 20 μL reaction system using StarScript II Firststrand cDNA Synthesis Mix with gDNA Eraser (GenStar, China, code no. A224-10) following the manufacturer's instructions. The reaction products were diluted in 80 μL distilled water. The real-time PCR reaction was performed with 1 μL diluted reverse transcription product, 5 μL TB Green™ Premix Ex Taq™ II (Takara Bio Inc., code no, RR820A), and 0.4 μL forward and 0.4 μL reverse primers (0.4 μM). The reaction was performed in a LightCycler 96 Sequence Detection System (Roche, Basel, Switzerland) for 40 cycles (95°C for 5 s, 55°C for 30 s, and 72°C for 30 s) after an initial 30 s denaturation at 95°C. GAPDH was used as an internal control. The RNA levels of the tumor samples and control samples were calculated using the 2 -ΔCt method. All primers of the hub genes and GAPDH were synthesized by Sangon Biotech (Shanghai, China), and the sequences are listed in Table 1. 2.6. Statistical Analyses. The statistical analyses were performed using Prism GraphPad 8.4.0 software (San Diego, CA). In the RT-qPCR analyses, the results from the NPC samples and control samples were compared using Student's t-test, with a significance threshold of p < 0:05.

Strategy of Transcriptome Biostatistical Analyses in NPC.
To select all available NPC transcriptome data, the GEO database was used. After reviewing each possible dataset, we selected six from which the primary data could be obtained and analyzed. PCAs were first used to check the quality of each set. After all unqualified files were excluded, gene expression analyses were performed separately in each dataset. A differentiated hub gene list was created from the intersection of three datasets, and gene functional annotations and assessment of protein-protein interactions were performed. In addition, we intersected all the genes separately with two RNA-seq datasets and selected a few for BioMed Research International validation in clinical NPC samples. The complete scheme of this study is given in Figure 1.   GSE118719 showed a crossover between NPC and control groups, and it has fewer samples than any other dataset, so we excluded it from our analyses ( Figure 2).

Identification of Significantly Differentially Expressed
Genes Related to NPC. For each dataset, genes with p < 0:05 and |log 2fold change | >2 were identified as significantly changed genes. A volcano plot was used to exhibit the DEGs in each dataset ( Figure 4). In all, three RNA microarray datasets (GSE53819, GSE64634, and GSE40290) were used as an exploratory set to filter out the DEGs related to NPC. In our primary analyses, 42 NPC tumor tissues and 14 normal controls were used. The datasets were analyzed by intersection, and 8 upregulated and 26 downregulated genes were identified as significantly differentially expressed in NPC tissues relative to normal controls ( Figure 5).

GO Functional Annotation and KEGG Pathway
Analyses of all Selected Genes. GO enrichment analyses were performed on the 34 significantly DEGs. The most enriched GO terms included the humoral immune response, regulation of cell proliferation, and others ( Figure 6). KEGG pathway analyses showed one enrichment pathway, namely, transcriptional misregulation in cancer, which included the three genes HOXA10, MMP3, and PROM1.

Protein-Protein Interaction Network of All Selected
Genes. To explore the protein-protein interactions among all the genes selected from the database, the STRING  Figure 7, where all proteins are grouped into five clusters by k-means clustering, which indicates the possible protein interaction network that could be involved in the development of NPC.
3.6. Verification of DEGs by Validation Sets. Then, 34 genes were further analyzed in two datasets acquired from the GEO database. These datasets both contain RNA-seq data from the Illumina HiSeq 2000 platform. After the intersection, two upregulated genes and one downregulated gene were identified in GSE68799, and four upregulated and 15 downregulated genes were found in GSE12452 (Table 3).

The Expression Level of Select Genes in Clinical NPC
Samples. Six DEGs were selected for our further analyses in patient samples (including three NPC tissue samples and three normal control nasopharynx tissues). MMP1 and LTF were selected as known up-and downregulated genes in NPC. As shown in Figure 8, MMP1, AQP9, and TNFAIP6 were overexpressed in tumors compared to controls, and FAM3D, CR2, and LTF were downregulated in NPC.

Discussion
NPC is a severe cancer found particularly in South China. Its sensitivity to radiotherapy is a hopeful sign for NPC patients, but its high recurrence rate requires more effective methods of targeting for NPC treatment. In our study, we gathered all available RNA-seq and RNA microarray data to select possible biomarkers for NPC, and we also obtained information to study the etiology of NPC and its possible molecular signal transduction pathway during tumorigenesis in greater detail. PCAs were first performed to control data quality, as they provide a method of simplifying complexity in highdimensional data and emphasizing variation [11]. After PCAs, we excluded the unqualified samples and analyzed the remaining samples in each dataset separately, considering    6 BioMed Research International that the integration of different microarray platforms and analytical conditions could affect the results. Through our analyses, we obtained 34 significantly DEGs, including 8 upregulated and 26 downregulated ones. A few upregulated genes were previously reported as oncogenes and play important roles in the proliferation, progression, or metastasis of different cancers. Some downregulated genes act as tumor suppressors in many cancers. PTGS2 (COX-2) encodes cyclooxygenase 2, an important enzyme for prostaglandin biosynthesis during inflammation or wound healing. Cox-2 is overexpressed in many cancers, including breast cancer, colorectal cancer, and NPC [12][13][14]. It is also a biomarker for poor NPC prognosis [15].
Matrix metalloproteinase 1 and 3 (MMP1 and MMP3), which are in the matrix metalloproteinase family, are also overexpressed in many cancers. MMP-1 is overexpressed in breast cancer, colon cancer, and NPC [16,17]. Its function in NPC has been well studied. The overexpression of MMP1 has been detected in NPC tissues and NPC cell lines, and the high MMP-1 expression significantly suppresses the sensibility of 5-FU chemotherapy in NPC [18]. The overexpression of MMP-3 is associated with metastasis in ductal   7 BioMed Research International breast cancer patients [19]. However, the function of MMP-3 in NPC has not yet been studied.
HOXA10 is another oncogene that is overexpressed in acute myeloid leukemia, NPC, and many other cancers [20,21]. The high expression of HOXA10 promotes proliferation in bladder cancer, and it is also related to the low survival rate of bladder cancer patients [22]. The overexpression of HOXA10 induces proliferation and invasion in NPC cells [20].
Fibronectin 1 (FN1) belongs to the glycoprotein family. It is upregulated in many cancers, including NPC. The overexpression of FN1 in NPC cell lines promotes proliferation, migration, and invasion of NPC cells. It suppresses apoptosis in NPC cells by upregulating BCL-2 [23].
The remaining upregulated genes are also involved in tumor development, but their function in NPC remains unclear. TNFAIP6, also called TSG-6, belongs to the tumor necrosis factor alpha-induced protein family, and it may be related to cell migration. It is crucial for the maintenance of the stemness of murine mesenchymal stem cells [24,25]. AQP9 is downregulated in hepatocellular carcinoma but is elevated in many other cancers, including clear cell renal cell carcinoma, which suggests that it plays a complex role in cancer development [26,27]. A few bioinformatic studies have shown that COL8A1 is a hub gene associated with several cancers, but this requires further investigation.
Some genes that showed downregulation in our study might serve as tumor suppressors in NPC, and some of these have been studied in NPC. GTP-binding protein RAD (RRAD) has been reported to be a tumor suppressor in NPC, where the high methylation of the promoter region in RRAD causes its low level of expression and inactivates its function [28]. High methylation of the gene ZMYND10 has been reported in NPC and has been suggested to be a biomarker for NPC prognosis [29]. One study reported that serum levels of polymeric immunoglobulin receptor (PIGR) are downregulated in NPC patients, and low levels of the   [30]. Lactotransferrin (LTF) has been reported to be downregulated in NPC tissues and acts as a tumor suppressor by repressing the AKT signaling pathway in NPC [31]. Some genes that we have identified here are also involved in the development of other cancers. Microseminoprotein beta (MSMB) has been reported to be downregulated in prostate cancer tissues and in patient serum [32]. The expression level of uroplakin-1b (UPK1B) is upregulated in bladder cancer, and it shows an oncogene function [33]. Forkhead box protein J1 (FOXJ1) is upregulated in bladder cancer and colorectal cancer [34,35] but downregulated in ependymoma and choroid plexus tumors [36]. The downregulated genes that we have identified in NPC, such as MSMB, UPK1B, and FOXJ1, may be associated with the development or progression of NPC and may play different roles from their roles in other cancers. We also validated six genes that were up-or downregulated in clinical NPC tissues using quantitative RT-PCR. MMP1 is a known upregulated gene that is associated with NPC. TNFAIP6 and AQP9 are newly identified genes that are overexpressed in NPC. Further studies are in the process in our group to identify how the overexpression of these two genes could regulate NPC development and progression. We validated only three of the downregulated genes. LTF was reported to be downregulated in NPC in a previous study [31], and FAM3D and CR2 are newly validated genes that may be involved in the development of NPC. Complement receptor type 2 (CR2) is an important receptor for primary infection of EBV in B cells and epithelial cells [37]; however, we found it to be downregulated in NPC, which suggests a different manner in which EBV can affect NPC. The CR2 promoter region also showed high DNA methylation rates [38], but its actual role in NPC remains unclear and requires deeper study. FAM3D belongs to the family with the sequence similarity 3 (FAM3) gene family, and it is overexpressed in colon cancer [39]. Further study is needed to investigate its function in NPC.

Conclusions
In summary, we identified 34 hub genes, including known and unknown genes associated with NPC. More functional studies must be performed to validate the hub genes identified from multiple transcriptome analyses. Our study provides information to continue to study the mechanism of development of NPC and more importantly to find biomarkers and druggable targets that will be useful for prognosis and new drug discovery.

Data Availability
Original data we selected for analysis are available in the GEO database, and our reanalyzed data are available from the corresponding authors upon request.

Conflicts of Interest
The authors declare that their research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.