Identification of Potential Biomarkers for Thyroid Cancer Using Bioinformatics Strategy: A Study Based on GEO Datasets.

Background
The molecular mechanisms and genetic markers of thyroid cancer are unclear. In this study, we used bioinformatics to screen for key genes and pathways associated with thyroid cancer development and to reveal its potential molecular mechanisms.


Methods
The GSE3467, GSE3678, GSE33630, and GSE53157 expression profiles downloaded from the Gene Expression Omnibus database (GEO) contained a total of 164 tissue samples (64 normal thyroid tissue samples and 100 thyroid cancer samples). The four datasets were integrated and analyzed by the RobustRankAggreg (RRA) method to obtain differentially expressed genes (DEGs). Using these DEGs, we performed gene ontology (GO) functional annotation, pathway analysis, protein-protein interaction (PPI) analysis and survival analysis. Then, CMap was used to identify the candidate small molecules that might reverse thyroid cancer gene expression.


Results
By integrating the four datasets, 330 DEGs, including 154 upregulated and 176 downregulated genes, were identified. GO analysis showed that the upregulated genes were mainly involved in extracellular region, extracellular exosome, and heparin binding. The downregulated genes were mainly concentrated in thyroid hormone generation and proteinaceous extracellular matrix. Pathway analysis showed that the upregulated DEGs were mainly attached to ECM-receptor interaction, p53 signaling pathway, and TGF-beta signaling pathway. Downregulation of DEGs was mainly involved in tyrosine metabolism, mineral absorption, and thyroxine biosynthesis. Among the top 30 hub genes obtained in PPI network, the expression levels of FN1, NMU, CHRDL1, GNAI1, ITGA2, GNA14 and AVPR1A were associated with the prognosis of thyroid cancer. Finally, four small molecules that could reverse the gene expression induced by thyroid cancer, namely ikarugamycin, adrenosterone, hexamethonium bromide and clofazimine, were obtained in the CMap database.


Conclusion
The identification of the key genes and pathways enhances the understanding of the molecular mechanisms for thyroid cancer. In addition, these key genes may be potential therapeutic targets and biomarkers for the treatment of thyroid cancer.


Introduction
Thyroid cancer, a most common endocrine cancer, accounts for 1-2% of human tumors. In recent years, the incidence of thyroid cancer, ranking the seventh among all tumors and the fifth for women, has been increasing. The appropriate treatment plan (surgery, radiotherapy, chemotherapy) is designed according to the stage of the disease. Most patients have a good prognosis after percutaneous transluminal coronary angioplasty, but invasive tumor or distant metastases would occur in a small number of patients following this treatment [1]. Therefore, finding key biomarkers and therapeutic targets of thyroid cancer will greatly improve the quality of life for patients.
Since the invention of gene expression microarray technology about 20 years ago, many mRNA profiling data sets have been generated for different biological processes in organisms. Currently, more than 30,000 series and 1 million array-based gene expression data samples are stored in GEO database of the National Center for Biotechnology Information (NCBI) [2]. Microarray analysis is a new method to study tumor genes, search for molecular targets of tumor drug therapy and monitor prognosis. However, due to the heterogeneity of the experimental samples, different detection platforms and data processing methods generate inconsistent results [3]. The study was based on biological information in the GEO database, and the final results were obtained by analyzing the molecular functions of mRNA associated with thyroid cancer and the signal pathways involved.
To more accurately identify DEGs associated with TCHA, bioinformatics methods were applied to analyze datasets from GEO and then the results in this study were integrated. Bioinformatic analyses were performed on these DEGs. Here, we aimed to explore a reliable basis for exploring the molecular mechanisms of THCA pathogenesis and the identification of molecular targets for clinical diagnosis or treatment.

Materials and Methods
2.1. Microarray Data. The GEO database (https://www.ncbi .nlm.nih.gov/geo/) was searched using the keywords "thyroid cancer". The GSE3467 (including 9 normal thyroid tissue samples and 9 thyroid cancer samples, [4]), GSE3678 (including 7 normal thyroid tissue samples and 7 thyroid cancer samples0, GSE33630 (including 45 normal thyroid tissue samples and 60 thyroid cancer samples, [5]) and GSE53157 (including 3 normal thyroid tissue samples and 24 thyroid cancer samples, [6]) gene expression profile matrix files were downloaded. The data set information is shown in Table 1 and the clinical information is provided in the supplementary materials (available here). We downloaded the four series of matrix TXT files and the corresponding platform TXT files. The gene probe ID in each matrix file was converted to the gene symbol in the platform file by the Perl language. Impute and Limma packages in R software (version: x64 3.2.1) were applied to perform the background correction, normalization and log2 conversion for the matrix data of each GEO dataset.

Data
Processing and Identification of DEGs. Differential expression analysis of the each dataset was carried out using the limma package. We defined |log(FC)| ≥ 1 and adj. P-val< 0.05 as DEGs screening criteria for thyroid cancer samples from four microarray datasets, and generated heat maps and volcano maps of each dataset. In addition, a txt file of all gene lists sorted by log (FC) in each data set was saved for the subsequent integration analysis.

Integration of Microarray Data.
We downloaded the RobustRankAggreg (RRA) package and used the R software to run the instruction code. The four lists of genes ranked by expression level were integrated using the RRA package in R software. The RRA method was based on the assumption that all genes were unordered in each list. Equally, |log(FC)| ≥ 1 and adj. P-val< 0.05 were considered statistically significant for the DEGs. As a result, a post-integration heat map was generated and the upregulated and downregulated genes were further screened for further analysis. All the R packages performed in our research were arranged in R software.

Functional Enrichment
Analysis of DEGs. The Database for Annotion, Visualition and Intrgrated Discovery (DAVID) 6.8 (https://david.ncifcrf.gov), is an online bioinformatics database for gene functional analysis that integrates biological data and analysis tools, and annotates biological functions for large-scale gene or protein lists. We used DAVID to perform GO analysis on the up-regulated and down-regulated DEGs. The difference was considered statistically significant when adj. P-val< 0.05. Subsequently, we introduced the up-regulated and down-regulated differential genes into Cytoscape's plug-in ClueGO for pathway analysis. Similarly, the difference was considered statistically significant when adj. P-val ≤ 0.05.
2.5. PPI Network Construction. STRING (http://string-db .org) was used to analyze the direct (physical) and indirect (function) associations between different genes. Protein-Protein Interaction (PPI) network analysis was conducted on all integrated DEGs by STRING10.0 with the effective binding score set >0.7 [7]. Then the results were imported into Cytoscape 3.7.1 to build a network model. Using Cytoscape's plugin Cytohubba, we selected the top 30 DEGs with high connectivity in the gene expression network as the hub genes according to the degree algorithm.
2.6. Survival Analysis of Hub Genes. To further clarify the relationship between the hub gene expression and thyroid cancer prognosis, we used Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn) for survival and statistical analyses using log rank [8]. The difference was considered statistically significant when P < 0.05. Due to their importance to the prognosis of thyroid cancer, these hub genes were taken as key genes.

Verification of Key Genes in TCGA. Gene Expression
Profiling Interactive Analysis(GEPIA), is a web-based tool (http://gepia.cancer-pku.cn) that delivers fast and customizable functionalities based on TCGA and GTEx data. In this study, we used GEPIA to verify the key genes and the        BioMed Research International expression of key genes in TCGA was shown by boxplot. The following screening criteria were set as follows: |log(FC)| ≥ 1 and P-val< 0.05.

Immunohistochemical
Analysis. Due to the high specificity of the binding between antibodies and antigens, we combined the antibody-based approach with transcriptomics data to summarize the global expression profile [9]. The protein expressions of key genes in normal tissues and thyroid cancer tissues were observed using the human protein map (HPA, https://www.proteinatlas.org), a unique collection of antibodies mapped across the entire human proteome by immunohistochemistry and immunocytochemistry.

Identification of Candidate Small
Molecules. The Connectivity Map database (CMap, http://www.broadinstitute .org/cmap/), a reference database containing drug-specific gene expression profiles, can be used to discover the connections between small molecules that share a mechanism of action, chemicals and physiological processes, as well as diseases and drugs by submitting the genes that are potentially associated with a particular disease [10,11]. We matched the DEGs screened from GEO with the data from CMap to predict small molecules that might reverse the biological status of thyroid cancer. The DEGs were divided into the upregulated group and the downregulated group. Then two lists of gene IDs were introduced into CMap for gene set enrichment analysis to obtain small molecules with enrichment values of -1 to +1. The positive connectivity value (close to +1) indicated that the corresponding small molecule could induce gene expression in thyroid cancer, while the negative connec-tivity value (close to -1) indicated greater similarity between the gene and the small molecule, the potential to reverse the state of thyroid cancer cells. The results were ranked by P values. We chose the top 4 small molecules and analyzed their 3D conformations in PubChem (http://www.pubchem .ncbi.nlm.gov), a collection of chemical information, including substance information, compound structures, and biological activities.

Identification of DEGs.
The thyroid cancer expression profile chip data sets GSE3467, GSE3678, GSE33630 and GSE53157 were normalized ( Figure 1). The GSE3467 dataset contained 501 differential genes, including 282 upregulated genes and 219 downregulated genes. The GSE3678 dataset contained 526 differential genes, including 226 upregulated genes and 300 downregulated genes. The GSE33630 dataset contained 904 differential genes, including 481 upregulated genes and 423 downregulated genes. The GSE53157 dataset contained 93 differential genes, including 11 upregulated genes and 82 downregulated genes. The volcano plots of each dataset are shown in Figure 2, and the cluster heat maps of the top 20 DEGs in each dataset are shown in Figure 3.

Identification of Integrated DEGs.
A total of 330 integrated DEGs, including 154 upregulated genes and 176 downregulated genes, were identified by the RRA method. Table 2 shows the DEGs. Figure 4 presents the heat map of the top 20 upregulated and downregulated integrated DEGs.

DEGs
Gene names 3. GO Enrichment Analysis of DEGs. GO function annotations of the integrated DEGs were mainly divided into three parts: biological process (BP), cell composition (CC) and molecular function (MF). As shown in Figure 5 and Table 3, the upregulated DEGs were mainly concentrated in the extracellular region (ontology: CC), extracellular exosome (ontology: CC), and heparin binding (ontology: MF); the downregulated DEGs were mainly concentrated in thyroid hormone generation (ontology: BP) and proteinaceous extracellular matrix (ontology: CC).

Functional Enrichment Analysis of DEGs.
Functional enrichment analysis of integrated DEGs showed the upregulated DEGs were mainly attached to ECM-receptor interaction, p53 signaling pathway, TGF-beta signaling pathway and other pathways; the downregulated DEGs were mainly attached to tyrosine metabolism, mineral absorption, and thyroxine biosynthesis ( Figure 6, Table 4).
3.5. PPI Analysis of DEGs. The STRING online database was used to analyze the integrated DEGs obtained after screening and a PPI network was construct (Figure 7). The effective binding fraction was set to be greater than 0.7, which resulted in 313 nodes and 228 edges. The top 30 hub genes were screened out by cytoHubba (Figure 8).
3.6. Survival Analysis of Hub Genes. We used GEPIA to analyze the correlation between hub gene expression and thyroid cancer prognosis. It was found that the Disease Free Survival (DFS) was lower in the high FN1 expression group than the 3  8 BioMed Research International low FN1 expression group (P = 0.024); the high NMU expression group had lower DFS than the low NMU expression group (P = 0.021); the low CHRDL1 expression group had lower DFS than the high CHRDL1 expression group (P = 0.0028); the low GNAI1 expression group had lower DFS than the high GNAI1 expression group (P = 0.014); the high ITGA2 expression group had lower DFS than the low ITGA2 expression group (P = 0.029); the low GNA14 expres-sion group had lower DFS than the high GNA14 expression group (P = 0.011). The low AVPR1A expression group had lower DFS than the high AVPR1A expression group (P = 0.013). As shown in Figure 9, the other hub genes were not significantly associated with the prognosis of thyroid cancer (P > 0.05). Hence, we derived seven key genes related to the prognosis of thyroid cancer: FN1, NMU, CHRDL1, GNAI1, ITGA2, GNA14, AVPR1A.

Verification of Key Genes in TCGA.
We validated the reliability of the key genes using GEPIA. The databases showed that the key genes were differentially expressed in normal thyroid tissue samples and thyroid cancer samples ( Figure 10). By reviewing the original data, we found that the results were consistent with our study.   Figure 11). These results confirmed our findings. However, we did not find the association between AVPR1A and thyroid cancer in this database. According to our analysis, we predicted that AVPR1A might also be associated with the development of thyroid cancer, but experimental data were needed to confirm this specific link.
3.9. Screening of Small Molecule Drugs Screening. In order to find drugs involved in the treatment or prognosis of thyroid cancer, we uploaded the selected DEGs (divided into the upregulated and downregulated groups) to the CMap database and then matched them with small molecule therapy. As shown in Table 5, among the top 4 small molecules, adrenosterone (enrichment value -0.914) and hexamethonium bromide (enrichment value -0.813) exhibited highly significant negative scores. The 3D conformer is shown in Figure 12. These small molecule drugs could reverse the gene expression induced by thyroid cancer. Of course, more 13 BioMed Research International experimental data are needed to confirm the potential of these candidate small molecules in treating thyroid cancer.

Discussion
Although thyroid cancer is a least fatal human cancer, its rising incidence imposes a great burden on both the society and individuals. Since the 1990s, the incidence of thyroid cancer has been growing faster than the other tumors in the United States. The rising morbidity and mortality of thyroid cancer are related to overdiagnosis [12]. Therefore, studying the biomarkers and precise targets associated with the development of thyroid cancer will improve the diagnostic accuracy and thus lessen the economic burden.
In this study, bioinformatics methods were applied to analyze the GSE3467, GSE3678, GSE33630 and GSE53157 datasets from GEO, and GO and pathway analyses were performed in the integrated DEGs. We used STRING to structure the Protein-Protein Interaction (PPI) network of integrated DEGs based on the functional association. Interactions are derived from seven sources: (1) the experiments channel; (2) the database channel; (3) the textmining channel; (4) the coexpression channel; (5) the neighborhood channel; (6) the fusion channel; (7) the co-occurrence channel. Then we applied Cytoscape to identify 30 hub genes. Finally, 7 key genes related to thyroid cancer prognosis were obtained. Among them, FN1, NMU and ITGA2 were upregulated DEGs, while CHRDL1, GNAI1 and GNA14 were downregulated ones. At the same time, gene expression was verified through GEPIA and the level of gene expression protein was verified by HPA.
Fibronectin (FN1), produced by fibroblasts and tumor cells, is a high molecular weight glycoprotein component of the extracellular matrix in the tumor microenvironment (TME). Normally, FN1 supports the cell-extracellular matrix interaction and participates in cell adhesion, migration, metastasis, proliferation and differentiation, as well as embryogenesis, blood clotting, wound healing, development, and tissue homeostasis [13]. Tumor-associated fibroblasts are the most abundant cells in the tumor stroma, and their ability to contract the matrix and induce cancer cell invasion has been well certified. FN1, derived from tumor-associated fibroblasts, stimulates cancer cell invasion after assembly [14]. In addition, as an intrinsic component of epithelialmesenchymal transition (EMT) regulatory circuit, FN1 is a potential target for interfering EMT during tumorigenesis  15 BioMed Research International [15]. Here, we suspect that FN1 is related to the pathway of ECM-receptor interaction. FN1 has been found to be highly expressed in esophageal and cervical cancers and associated with their prognosis [16][17][18]. In the present study, FN1, as a hub gene with the highest correlation with thyroid cancer, was highly expressed in tumor tissues compared with the adjacent non-tumor or normal tissues. Further validation of the association between FN1 and the development of thyroid cancer may provide new targets for treating thyroid cancer.
As the gene with the second highest correlation with thyroid cancer in our study, neuromedin U (NMU), a neuropeptide originally isolated from the spinal cord of pigs, has multiple physiological functions and is involved in obesity and inflammation [19]. NMU has been confirmed to confer alectinib resistance in non-small cell lung cancer (NSCLC) [20]. Although no experimental data has suggested a direct link between NMU and thyroid cancer, NMU has been shown to promote the development of various tumors, including breast cancer. Garczyk S et al. have demonstrated that NMU may contribute to the progression of NMUR2-positive breast cancer [21] and enhance resistance to tumor immune responses in breast cancers with HER2 overexpression [22], suggesting NMU is a potential drug target for personalized strategies. Furthermore, NMU is involved in the development of endometrial, colorectal and gastric cancers, as well as acute myeloid leukemia caused by TP53 mutations [23][24][25][26].
CHRDL1 is a secreted protein and antagonist of bone morphogenetic protein (BMP) that antagonizes BMP-4 activity and induces differentiation of spinal cord-derived   [29]. Since no research has proved that CHRDL1 is related to thyroid cancer, more research is needed to verify the association between CHRDL1 and thyroid cancer and evaluate CHRDL1 as a target for thyroid cancer treatment. The GO annotations of GNAI1 (G protein subunit αi1) include GTP binding and outdated signal transduction activity. The guanine nucleotide binding protein (G protein) acts as a transduction downstream of the G protein coupled receptor (GPCR) in many signaling cascades. The alpha chain, which contains a guanine nucleotide binding site, alternates between active and inactive GTP binding states. Activation of GPCRs promotes GDP release and GTP binding. The alpha subunit has low GTPase activity, which converts the bounding of GTP to GDP, and thereby terminates the signal. Both GDP release and GTP hydrolysis are regulated by regulatory proteins [30,31]. Cell migration involves a cycle of adhesion and de-adhesion, and the dynamic balance between adhesion and reversal of adhesion regulated by G protein is very important for this process [32]. Cell migration of normal cells is tightly regulated. However, tumor cells are exposed to a modified microenvironment that promotes cell migration [33]. Changes in G protein affect the tumor cell migration.
ITGA2, the integrin subunit α2, encodes a protein that forms a heterodimer with the beta subunit and mediates     [35]. GNA14 (G protein subunit α14) encodes a member of the guanine nucleotide binding or G protein family. GNA14 mutations induce morphological changes in cells and make cell growth factors independent by upregulating the MAPK pathway. A study found that GNA14 mutations could cause vascular tumors in children [36].
Arginine vasopressin receptor 1A (AVPR1A) is involved in mediation of cell contraction and proliferation, platelet aggregation, release of coagulation factors, and glycogenolysis. A large volume of literature has demonstrated that AVPR1A contributes to a range of social behaviors in lower vertebrates and humans [37]. But evidence on the association between AVPR1A and cancer is rare. Nor did we find evidence of the link between AVPR1A and thyroid cancer.
Using data mining based on bioinformatics tools, the present study has confirmed CHRDL1, GNAI1, GNA14 and AVPR1A are associated with thyroid cancer although their specific biological functions have not been explored by the molecular biology methods.
For the results of the pathway analysis in this study, we consulted the literature and retrieved more relevant information. ECM-receptor interactions have been found to play a role in the development of papillary thyroid carcinoma [38]. Activation of the p53 signaling pathway could induce Nontumor ×100 ×100     In addition, we used the CMap database to find small molecules of drugs that might reverse the expression of thyroid cancer genes. This discovery will help develop new targeted drugs for thyroid cancer. Adrenosterone (enrichment value -0.914) is an endogenous steroid hormone used as a dietary supplement to reduce body fat and increase muscle mass. It is proposed that adrenosterone, which is primarily responsible for reactivation of cortisol from cortisone, may be an inhibitor of the 11beta-hydroxysteroid dehydrogenase type 1 enzyme (11beta-HSD1) [40]. Hexamethonium bromide (enrichment value -0.813), a non-depolarizing ganglion blocker and an nAChR antagonist, can be used to treat hypertension and duodenal ulcers [41,42]. After reviewing the literature, we found that the effectiveness and safety of adrenosterone and hexamethonium bromide in thyroid cancer treatment have not been studied. Therefore, further research is urgently needed to reveal the potential of these small molecules in treating thyroid cancer.
In summary, the 7 key genes obtained from the PPI network are closely related to tumorigenesis and tumor progression, suggesting that these genes may be prognostic markers and therapeutic targets of thyroid cancer. At the same time, the KEGG pathway analysis of DEGs provides a new perspective for elucidating the pathogenesis and diagnosis of thyroid cancer, and a new direction for developing targeted inhibitors for thyroid cancer. The present study is based on big data, therefore, these thyroid cancer-associated signaling pathways and key genes need further validation in clinical samples using methods of molecular biology such as RT-PCR and Western blot.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The author(s) declare(s) that they have no conflicts of interest.