Novel Candidate Key Drivers in the Integrative Network of Genes, MicroRNAs, Methylations, and Copy Number Variations in Squamous Cell Lung Carcinoma

The mechanisms of lung cancer are highly complex. Not only mRNA gene expression but also microRNAs, DNA methylation, and copy number variation (CNV) play roles in tumorigenesis. It is difficult to incorporate so much information into a single model that can comprehensively reflect all these lung cancer mechanisms. In this study, we analyzed the 129 TCGA (The Cancer Genome Atlas) squamous cell lung carcinoma samples with gene expression, microRNA expression, DNA methylation, and CNV data. First, we used variance inflation factor (VIF) regression to build the whole genome integrative network. Then, we isolated the lung cancer subnetwork by identifying the known lung cancer genes and their direct regulators. This subnetwork was refined by the Bayesian method, and the directed regulations among mRNA genes, microRNAs, methylations, and CNVs were obtained. The novel candidate key drivers in this refined subnetwork, such as the methylation of ARHGDIB and HOXD3, microRNA let-7a and miR-31, and the CNV of AGAP2, were identified and analyzed. On three large public available lung cancer datasets, the key drivers ARHGDIB and HOXD3 demonstrated significant associations with the overall survival of lung cancer patients. Our results provide new insights into lung cancer mechanisms.


Introduction
Lung cancer is the most common cause of cancer-related death worldwide, and non-small-cell lung cancer (NSCLC) accounts for approximately 80% of all cases [1]. The overall 5-year survival rate remains low despite the development of clinical diagnosis techniques and chemotherapy [2]. NSCLC has two major subtypes: squamous cell lung carcinoma (SCC) and lung adenocarcinoma (AD). SCC represents approximately 20-30% of NSCLC patients and is characterized by keratinization in squamous pearls and the formation of intercellular bridges [3]. Many studies have provided insight into several driver genes, miRNAs, and crucial signaling pathways that contribute to lung cancer pathogenesis.
Genetic and epigenetic alterations are frequently found in SCC. For example, Sriram et al. [4] found that lung squamous cell carcinoma patients with the loss of SOCS6 have worse disease-free and overall survival rates. Son et al. [5] detected gains at 1p31.1, 3q26.1, and 3q26.31-3q29 and losses at 1p21.1, 2q33.3, 2q37.3, 3p12.3, 4q35.2, and 13q34 in SCC. Many of the loss regions in the chr3, -5, -9, -13, and -17 loss that occur in SCC patients carry known tumor suppressors, such as TP53, RB1, and APC [6][7][8]. The epidermal growth factor receptor (EGFR) is a transmembrane glycoprotein, located on 7p12, that conducts signals to downstream cascades such as PI3K-AKT and RAS-RAF-MEK-ERK. It has been observed that high-frequency copy number gains and the overexpression of proteins occur in SCC cases [9,10]. An in-frame deletion of exons 2 to 7 in EGFR was found in three SCC cases and resulted in the development of NSCLC in a mouse model [11,12]. The PI3K-AKT and RAS-RAF-MEK-ERK signaling pathways play central roles in antiapoptosis and proliferation in many cancers, including SCC [13][14][15]. Kirsten 2 BioMed Research International rat sarcoma viral oncogene homolog (KRAS), located on chr12p12.1, belongs to the canonical RAS family, which also includes HRA and NRAS. The three conserved RAS genes encode monomeric GTPases and have been found to be frequent mutations in approximately 30% of all cancers [16]. RAS receives stimulation from upstream receptors such as EGFR and conducts signals to the downstream pathways to regulate diverse cellular responses, including cell proliferation, differentiation, and apoptosis. Many proteins, including SOS, RAF, and MEK, participate in the process of signal transmission, and their dysfunction may leave the whole pathway dysregulated [17,18]. Mutations of KRAS, NRAS, and HRAS were reported in lung cancer, including squamous cell carcinoma [19,20]. The PI3K/AKT pathway also can be activated by RAS to promote cell escape from programs. The dysregulation of the RAS/ERK and PI3K/AKT pathways is common in many cancer types [18,21]. Phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), encoding the catalytic subunit of PI3K, is located on the 3q26 amplified area and is found with high-frequency copy number (CN) gains and novel mutations in SCC [22][23][24][25]. Amplification of CN and somatic mutation presumably activate the PI3K pathway, leading to AKT activation, and provide tumor cells with multiple tumor-specific characteristics such as apoptosis arrest and limitless replicative potential [13,26]. However, the detailed mechanisms are not clear. V-Akt murine thymoma viral oncogene homolog 1 (AKT1), located on 14q32, is one of three closely related serine-threonine kinases (AKT1, AKT2, and AKT3). The E17K mutation of AKT1, resulting in activation of the kinase, was found in 5.5% of SCC patients [27]. In SCC, aberrant amplification or expression of FGFR1 and IGF1R leads to the dysfunction of downstream signaling via the PI3K/AKT and RAS/MEK pathways [28][29][30][31][32]. Deletions involving chr3p and 9p21 in lung cancer, including SCC, were reported by The Cancer Genome Atlas (TCGA) [33,34]. Some known tumor suppressor genes such as RASSF1A, FUS1, FHIT, and CDKN2A are located in these regions, and the loss of the allele may lead to aberrant tumorigenesis [35].
MicroRNAs (miRNAs) are a class of endogenous 20-25 nucleotide small RNAs that regulate mRNA translation and degradation through perfect or nearly perfect complementarity to the 3 untranslated regions (UTRs) of the target mRNA [36][37][38][39][40]. In Cho's review [41], many miRNAs can be cancer biomarkers. The target genes regulated by miRNAs have been demonstrated to play critical roles in cancer pathogenesis [42][43][44][45]. For instance, Jiang et al. reported that non-smallcell lung cancer patients with higher BCL11A expression have better prognosis, and the expression of BCL11A is regulated by microRNA-30a. Approximately 1%-3% of the genome codes for microRNA sequences and 30% of known genes are presumably regulated by miRNA [46]. The miRNA let-7 was first found in Caenorhabditis elegans, and emerging evidence suggests its critical role in lung cancer [47]. In vitro and in vivo, reduced expression levels of let-7 have been observed in lung cancer, and the downregulated expression of let-7 was associated with shortened postoperative survival of lung cancer patients. In A549, a lung adenocarcinoma cell line, the proliferation of cancer cells was inhibited by the overexpression of let-7 [48]. The 3 UTR of RAS, a key oncogene in lung cancer, has complementary sites to let-7. In lung cancer, it has been shown that the level of let-7 is reduced and the expression of RAS protein is increased significantly, suggesting a mechanism in which RAS is the target of let-7 [49]. In addition to RAS, target complementary sites to let-7 in the 3 UTR of c-myc and high mobility group protein A2 (HMGA2) were found to inhibit tumorigenesis [50][51][52][53]. In different lung cancer subtypes, miR-205, miR-93, and miR-221 were uniquely increased in the SCC, but let-7e was downregulated both in lung adenocarcinoma (AD) and in SCC [54]. The epithelial to mesenchymal transition is suppressed by miRNA205 targeting of the transcriptional factors ZEB1 and SIP1 [55]. In addition, the tumor suppressor genes IL24 and IL32 may be targets of miR-205 [56]. Xing et al. identified three miRNAs (miR-205, miR-210, and miR-708) to discriminate SCC from healthy individuals (73% sensitivity and 96% specificity) [57]. In SCC, the expression of miR-146b and miR-155 is associated with the overall survival rate [58]. Plasma miR-21 was identified as an early detection and chemosensitivity biomarker in NSCLC [59]. In glioblastoma cell lines, miR-21, as an antiapoptotic factor, exhibits aberrant high expression, and knockdown of it resulted in the activation of cell apoptosis caspases [60]. Further, miRNAs including miR-182, miR-486-5p, miR-30a, miR-140-3p, miR-31, miR-34a, miR-25, and miR-191 play tumor-suppressor roles in lung cancer [57]. Above all, miRNAs can be useful prognostic predictors of SCC to help us to understand SCC pathogenesis.
In this study, by means of regression analysis, we speculated that various factors such as copy number variation (CNV), miRNA, and methylation could be related to the regulation of each gene. Then, we collect SCC related genes from the hsa05225 pathway (non-small-cell lung cancer, Homo sapiens) in KEGG and obtained the regulatory factors, as discussed above. Based on the Bayesian network, we rebuilt a novel key net between the SCC genes and their regulatory factors. In this net, we identified some novel genes and miRNAs involved in SCC pathogenesis and some known genes or factors that were previously neglected and should receive more attention.

Datasets.
The gene expression, microRNA expression, and DNA methylation data of lung cancer patients were downloaded from TCGA (The Cancer Genome Atlas) squamous cell lung carcinoma project [61] website (https://tcga-data.nci.nih.gov/docs/publications/lusc 2012/). The expression level of 18,979 genes was measured with RNA-Seq and transformed into log 2 scale. The expression level of 437 microRNAs was measured with microRNA-Seq and also transformed into log 2 scale. The DNA methylation level of 18,498 genes was measured using infimum.

Construction of Whole Genome Integrative Network.
For each gene, its expression level was considered as the dependent variable, and all other factors including microRNA expression levels, methylations, and copy number variations of all genes were considered as independent variables. The network construction was decomposed into 18,979 regression models.
For each regression model, there were 437 + 18498 + 24174 = 43109 feature variables. We applied the variance inflation factor (VIF) regression algorithm [64] to build this large-scale regression model. VIF regression can not only fit the model as accurately as sophisticated but slow methods such as LASSO but also perform fast feature selection. It evaluates the prediction potential of each feature variable and then performs forward feature selection, also known as incremental feature selection (IFS). IFS has been widely used to solve high dimensional regression [65] and classification problems [66][67][68][69]. The VIF regression program was downloaded from http://cran.r-project.org/web/packages/VIF/.
By combining the VIF regression models that passed the adjusted 2 [65] cutoff 0.4, we obtained a whole genome integrative network in which the expression of target genes was regulated by microRNA, methylation, and copy number variation.

Refined Key Bayesian Subnetwork of Lung Cancer.
Using VIF regression, we constructed the backbone network structure. However, the network was too complex for further functional analysis. Network decomposition technologies [70] were required to obtain the key subnetwork. Another problem was how to fully utilize the existing knowledge about lung cancer to discover novel key genes. A third was that the regression model did not consider the regulation structure between feature variables and therefore included false positive regulations. For example, we found that A and B regulate C, but the actual regulations may be that A regulates B, and then B regulates C.
To address these problems, first, we extracted the hsa05225 pathway (non-small-cell lung cancer, Homo sapiens) from KEGG using KEGGgraph [71]. Then, we used these known regulations as prior knowledge to perform Bayesian network analysis of the KEGG lung cancer genes and their candidate regulators identified by VIF regression. The Bayesian network of KEGG lung cancer genes and their candidate regulators expanded the key lung cancer subnetwork and provided novel key genes, which may enrich the understanding of lung cancer mechanisms. Furthermore, it refined the regulations among KEGG lung cancer genes and their candidate regulators and reduced the false discovery rate. We used the package bnlearn [72] (http://www.bnlearn.com/) to build the Bayesian network.

The Novel Candidate Key Drivers of Lung Cancer.
We used VIF regression to build the whole genome integrative network of genes, microRNAs, methylations, and copy number variations. Then, the subnetwork involving the known lung cancers from KEGG was isolated and refined using the Bayesian network method. The refined key Bayesian subnetwork of lung cancer is shown in Figure 1. There were 48 mRNA genes, 27 microRNAs, 22 methylations, and 8 copy number variations. This integrative Bayesian network facilitates the investigation of the roles of microRNAs, methylations, and copy number variations in lung cancer. It reflects the complex lung cancer pathways that involve multiple levels of components. The key drivers of this network can serve as prognosis biomarkers and therapeutic drug targets.
From Figure 1, we identified some novel key drivers such as HOXD3, ARHGDIB, AGAP2, let-7a, and miR-31 that played important roles in the pathogenesis of lung cancer on the gene, microRNA, methylation, and copy number variation levels.

The Biological Roles of Novel Candidate Methylation Key
Drivers. Based on our analysis, the methylation of HOXD3 interacts with the expressions of hsa-mir-100 and hsa-mir-146a and therefore indirectly affects many neighbor genes and microRNAs in the network.
Homeobox D3 (HOXD3) is a member of the highly conserved homeobox family, which possesses four similar clusters, HOXA, HOXB, HOXC, and HOXD. HOX genes have been reported to play important roles in cell adhesion, cell apoptosis and differentiation, and multiple receptor signaling [73,74], and their aberrant expression occurs in many cancers originating from prostate, stomach, and lung [75][76][77][78][79]. Overexpression of HOXD3 enhanced motility and invasiveness in A549 cells [80]. In NSCLC, miR-100, targeted to polo-like kinase 1 (PLK1), was significantly reduced in expression and correlated with clinical stage [81]. In smallcell lung cancer (SCLC), it has been proved that HOXA1 is targeted and regulated by miR-100, and the expression of HOXA1 is inversely correlated with miR-100 [82]. Our bioinformatic analysis revealed that miR-100 and miR-146a are putative novel regulators to HOXD3 as well as HOXA1. In addition, it revealed that keratin 8 (KRT8), which participates in forming intermediate-sized filaments, is another target of miR-100 and is involved in the cell cycle controlled by CDK6. In NSCLC, miR-146a acts as a tumor suppressor by inhibiting cell growth and migration and inducing apoptosis by targeting EGFR, which plays a critical part in SCC pathogenesis [83]. It was demonstrated that miR-146a regulates multiple targets to affect different aspects of the central network.
As shown in Figure 1, another key methylation driver was ARHGDIB, which interacts with the mRNA expression of EGFR and is associated with the expression of hsa-mir-10a.
It is known that Rho GDP dissociation inhibitor (GDI) beta (RhoGDI, also known as ARHGDIB) is a guanine nucleotide dissociation inhibitor specific to the Rho family of small GTPases. RhoGDI2 is involved in diverse cellular events, such as cell signaling, cell proliferation, and cytoskeletal organization. In different cancer types, RhoGDI2 performs diverse functions and exhibits different aberrant expression levels. In ovarian and stomach cancer, the expression of RhoGDI2 is upregulated, while it is downregulated in bladder cancer and lung cancer [84][85][86][87][88]. It has been observed that RhoGDI2 expression is downregulated in lung cancer, and the lower expression is strongly correlated with higher malignancy grade, lower cell differentiation, and greater lymph node metastasis of lung cancer. The expression of RhoGDI2 is inversely correlated to the activation level of the PI3K/Akt/mTOR pathway, which plays key roles in SCC [88]. In bladder cancer, c-Src protein phosphorylates RhoGDI2 and is regulated by EGFR [89][90][91]. We also found a novel correlation between RhoGDI2 and EGFR. RhoGDI2 is involved in the EGFR-associated key network in SCC directly or indirectly through interaction with c-Src. In addition, the results of our analysis reveal that RhoGDI2 is regulated by miR-10a. There is no evidence that RhoGDI2 is the direct target of miR-10a, and more focus is needed on this subnet of miR-10a-RhoGDI2-EGFR. As discussed above, RhoGDI2 may be a putative molecular marker in metastatic lung cancer, which warrants deeper investigation for validation.
It is known that DNA methylation suppresses gene expression [92]. To investigate whether the gene expression levels of these methylation key drivers, ARHGDIB and HOXD3, play significant roles in prognosis, we collected three large lung cancer survival datasets from PROGgene [93]: GSE4573, which included 129 squamous cell lung carcinomas patients, GSE30219, with 281 lung cancer samples, and GSE41271, with 275 lung cancer specimens. The patients were divided into high expression group and low expression group by the median. The overall survival rates of the two groups were compared using the log-rank test [94], and their Kaplan-Meier plots [95] are shown in Figure 2. The log-rank values of ARHGDIB on GSE4573, GSE30219, and GSE41271 were 0.042, 0.0781, and 0.0021, respectively. The patients with high expression of ARHGDIB had good prognoses. Meanwhile, the log-rank values of HOXD3 on GSE4573, GSE30219, and GSE41271 were 0.0441, 0, and 0.0888, respectively. The patients with high expression of HOXD3 had poor prognoses.
The key drivers, ARHGDIB and HOXD3, play significant roles in prognosis prediction.

The Biological Roles of Novel Candidate MicroRNA
Key Drivers. In addition to novel SCC related mRNA/methylation/CNV genes, we also observed some novel miRNAs to be involved in the central network of SCC pathogenesis, such as let-7a and miR-31.  Figure 1: The refined key Bayesian subnetwork of lung cancer. The grey, green, red, and pink nodes represent mRNA genes, microRNAs, methylations, and copy number variations (CNVs), respectively. The one-arrow edges represent directed regulation, while the two-arrow edges represent undirected regulation.
In a study of Chinese SCC patients, miR-31 was considered as a new prognostic biomarker for SCC [96]. The miR-31, located on chr9p21.3 in genome, is involved in cell proliferation, migration, and invasion. In breast cancer, miR-31 was considered as a suppressor of metastasis [97]. In mesothelioma (MM) cells, the reintroduction of miR-31 reveals activity that suppresses the cell cycle and migration [98]. In our new key network, mir-31 is closely associated with CDKN2A and CDK. It has been reported that the chr9q21.3 region is frequently and homozygously deleted in MM, and the deletion is linked to a known tumor suppressor gene, CDKN2A [99][100][101][102]. In approximately 31% of NSCLC, the deletion of chr9p21.3 was observed [103]. All this evidence indicates that the codeletion of miR-31 with CDKN2A occurs normally in cancers, and both are beneficial to tumor pathogenesis and metastasis. CDKN2A can bond to CDK6 to suppress the cell cycle process directly. We speculate that miR-31 regulates the CDK6-related cell cycle directly or indirectly in SCC and is an excellent diagnostic biomarker for SCC.
It has been reported that the let-7 microRNA family plays critical roles in pathogenesis of SCC. We also found new miRNAs of the let-7 family to be highlighted in our new network, such as let-7a, let-7e, and let-7i, which indicated that more let-7 family members than we previously knew function in SCC tumorigenesis. Let-7a was characterized as a tumor suppressor in various cancers, including lung and colon cancer [104][105][106][107][108]. It has been observed that the expression of let-7 is downregulated in many cancer types and is correlated with poor clinical prognosis [48,106]. In NSCLC cell lines, let-7a inhibits cell proliferation and invasion by interacting with K-Ras and HMGA2 [104]. Target analysis reveals that IGF1R is a target gene, one of let-7's numerous targets, and is involved in the IGF1R/RAS/MAPK/ELK1 pathway, playing major roles in cell proliferation and cell survival [109]. In ovarian cancer, it was reported that let-7e has many target genes including HMGA2, C14ofr28, LIN28B, and ARID3B, and let-7e expression is lower than in adjacent tissues [110]. It has also been reported that let-7i is significantly downregulated in SCC [111]. In our network, we found that let-7 family members appeared frequently. Some let-7 members interact with their putative targets genes directly, and some members function in the central net indirectly through the regulation of other miRNA transcripts, including mir-101-1 and mir-146a. All this evidence implies that the let-7-miRNA subnet is worth greater attention and may be a novel marker for SCC.

The Biological Roles of Novel Candidate CNV Key Drivers.
From Figure 1, we can see that the CNV of AGAP2 can regulate the gene expression of CDK4 and is associated with the CNV of FBXO17. As we know, AGAP2 (ArfGAP With GTPase Domain, Ankyrin Repeat And PH Domain 2), also known as PIKE, belongs to the centaurin GRPase family, including three members, PIKE-L, PIKE-S, and PIKE-A [112][113][114]. PIKE-S and PIKE-L enhance PI3K antiapoptosis activity, but PIKE-A promotes the downstream Akt instead [113,115]. PIKE proteins participate in regulating the signal transducer and activator of transcription 5A (STAT5) by Janus kinase 2 (JAK2) phosphorylation [116]. Because of gene amplification, PIKE-A is overexpressed in glioblastoma and astrocytoma [113]. Its increased expression has been observed in various cancer types including breast, prostate, skin, colon, ovary, liver, stomach, lung, and kidney [117][118][119][120]. In the genome, the PIKE gene is adjacent to CDK4, which plays a key role in cell cycle control [121,122]. As a component of the CDK4 amplicon, coamplification of CDK4 and CENTG1 has been frequently found in various cancers [123][124][125][126][127]. An integrated oncomir/oncogene DNA cluster, comprised of PIKE-A, CDK4, and has-miR26a, accelerates the aggressiveness in glioblastoma [128]. CDK4 and PIKEs are involved concomitantly in cancer pathogenesis by cooperatively targeting the PI3K/AKT and Rb1 pathway, although some studies have shown that overexpression of PIKEs alone can elicit NIH3T3 cell transformation [117,128]. In our study, a relationship was found between CDK4 and PIKEs. This result demonstrated that their combination is more indirectly responsive to PI3K pathway dysfunction in SCC. The direct interaction between CDK4 and PIKEs requires deeper experiments to validate. The supposed effects between them may exist at different levels, including genome, transcriptome, and proteome, and more experimental evidence is needed to support these effects.

Conclusion
In our study, we established a new key network for SCC to help us mine the mechanisms more effectively. We not only found novel SCC related genes and subnets but also noticed significant changes to the old net. Compared to past studies, EGR-EGFR, PIK3CG/PIK3CA, HRAS, CDK6, CDK4, and CDKN2A were found to have interactions and relationships with more genes and miRNAs, expanding the network's scope and depth. These proteins may be the hinge of the whole pathogenesis network and warrant greater focus.