Comprehensive Analysis of Differentially Expressed Long Noncoding RNA-mRNA in the Adenoma-Carcinoma Sequence of DNA Mismatch Repair Proficient Colon Cancer

DNA proficient mismatch repair colon cancer (pMMR CC) is the most common subtype of sporadic CC. We aimed to investigate the role of long noncoding RNAs (lncRNAs) in pMMR CC carcinogenesis. In the present study, we conducted transcriptomic analysis of lncRNAs-mRNAs in five low-grade intraepithelial neoplasia (LGIN), five high-grade intraepithelial neoplasia (HGIN), four pMMR CC, and five normal control (NC) tissues. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment pathway, and coexpression network analyses were performed to elucidate the functions of lncRNAs and mRNAs as well as their interactions. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to validate five dysregulated lncRNAs in a large set of colon tissues. Receiver-operating characteristic (ROC) curves were employed to evaluate the performance of the candidate lncRNAs. A set of 5783 differentially expressed lncRNAs and 4483 differentially expressed mRNAs were detected among the LGIN, HGIN, pMMR CC, and NC samples. These differentially expressed lncRNAs and mRNAs were assigned to 275 significant GO terms and 179 significant KEGG enriched pathways. qRT-PCR confirmed that the expression of five selected lncRNAs (ENST00000521815, ENST00000603052, ENST00000609220, NR_026543, and ENST00000545920) were consistent with the microarray data. ROC analysis showed that four lncRNAs (ENST00000521815, ENST00000603052, ENST00000609220, and NR_026543) had larger area under the ROC curve (AUC) values compared to serum carcinoembryonic antigens, thereby distinguishing NC from pMMR CC. In conclusion, several lncRNAs play various roles in the adenoma-carcinoma sequence and may serve as potential biomarkers for the early diagnosis of pMMR CC.


Introduction
Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide, comprising over 1.8 million new cases in 2018 [1]. More than 70% of CRC cases are located in the colon, which is also called colon cancer (CC) [2]. According to the latest data, CRC ranks fifth in both incidence and mortality among cancers in China [3].
CRC is a heterogeneous disease with a progressive accumulation of genetic and epigenetic alterations [4][5][6]. About two-thirds of CRC cases arise with unknown contributions from germline factors or a significant family history of cancer or inflammatory bowel disease, defined as sporadic CRC. ree major genetic mechanisms underlie the pathogenesis of sporadic CRC [7], namely, chromosomal instability (CIN), microsatellite instability (MSI), and the epigenomic CpG island methylator phenotype (CIMP). CIN, the first described and the most common pathway, accounts for 80%-85% of sporadic CRC [8]. CIN CRCs are nonhypermutated with DNA proficient mismatch repair (pMMR) functions [7]. e mechanism responsible for CIN is different from that responsible for MSI and CIMP.
Understanding the molecular mechanisms underlying CRC is critically important for the clinical prognosis and therapeutic response. A meta-analysis of 63 eligible studies (10126 patients with CRC) showed a worse prognosis for CRC patients with CIN/pMMR [9]. Additionally, stage II CRC patients with defective mismatch repair (dMMR) or highlevel MSI (MSI-H) have better prognosis but no benefits from fluorouracil chemotherapy [10,11].
More than 90% of colorectal tumorigenesis cases follow the adenoma-carcinoma sequence (ACS), with multiple gene mutations and abnormal activation of signaling pathways [12,13]. Mutations in the adenomatous polyposis coli (APC) gene occur early during colorectal tumorigenesis, followed by the activation of the KRAS gene and the inactivation of the TP53 gene, which are often associated with CIN [14].
Long noncoding RNAs (lncRNAs) are a class of noncoding RNAs with more than 200 nucleotides that are not translated into proteins. LncRNAs can regulate gene expression at different levels, including epigenetically, transcriptionally, and posttranscriptionally [15]. Decreased expression of the lncRNA SATB2-AS1 promotes metastasis and influences the tumor microenvironment in CRC by regulating SATB2, thereby resulting in poor survival [16]. Overexpression of the lncRNA UCA1 contributes to the immune escape of cancer cells and protects PDL1 expression from miRNA repression in gastric carcinoma, serving as a potentially novel target for immunotherapy [17]. However, the differential expression and the role of lncRNAs in ACS for CRC remain to be elucidated.
Previous studies on the lncRNA expression profiles of CC were generally performed on all cancers in the colon together including rectal cancer, thereby neglecting the heterogeneity of molecular subtypes of CC. Although often linked together, CC and rectal cancer are different when it comes to treatment. us far, the differential expression profile for lncRNAs in the normal colonic mucosa-pMMR adenoma-pMMR sporadic CC sequence has not been reported. In the present study, we aimed to characterize the expression profile of lncRNAs in the malignant evolution process for pMMR sporadic CC using transcriptome microarray technology. e findings provide useful candidates for the diagnosis and treatment of CC.

Screening of pMMR Samples and Total RNA Isolation.
As in our previous study [18], multiplex PCR and immunohistochemistry were used to determine the MMR status. pMMR colonic tissues were defined as tissue samples with a low frequency of MSI (MSI-L) or microsatellite stability (MSS, no marker tested). Total RNA was extracted from the tissue samples using RNAiso Plus (Code No. 9109, TAKARA, Japan). RNA integrity was confirmed with a NanoDrop ND-2000 spectrophotometer ( ermo Fisher Scientific, Rochester, NY, USA) and Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, US) and further purified with an RNeasy mini kit (Cat. # 74106, Qiagen, GmBH, Germany) and RNase-Free DNase Set (Cat. #79254, Qiagen Co, GmBH, Germany). RNA samples with integrity ≥7.0 and a 28S : 18S ratio ≥0.7 were used for further experiments.

Microarrays Analysis.
e lncRNA and mRNA expression profiles in multistage colonic mucosa tissues (five NC, five LGIN, five HGIN, and four pMMR CC samples) were determined with the Agilent custom SBC Human ( Raw data were normalized with the Quantile algorithm in GeneSpring Software 11.0 (Agilent Technologies, Santa Clara, CA, USA). Cluster analysis was performed to systematically and intuitively display the relatedness between samples. Differential genes expressions (DEGs) among the four groups were filtered with the F test in the random variance model (RVM) using P values < 0.05 and Benjamini-Hochberg false discovery rate (FDR) < 0.05 as the significant cutoff criteria [19]. STC analysis was employed to identify the most probable set of clusters responsible for the observed ACS series by profiling the dynamic nature of the temporal gene expression. NC, LGIN, HGIN, and pMMR CC were set as different points to identify significant trending models related to multistage colonic mucosa tissue and their associated DEGs by STC analysis. Clusters with P values < 0.05/26 were considered statistically significant.

Gene Ontology and Kyoto Encyclopedia of Genes and
Genomes Pathway Analysis. Gene Ontology (GO) analysis was performed to annotate DEGs in three categories: molecular functions, biological processes, and cellular components. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was performed to identify the functional pathways and pathways for molecular interactions, reactions, and relation networks. e statistically significant threshold was set as P value < 0.01 and FDR <0.05.

Construction of Coexpression and Transcription Factor (TF) Regulatory Network.
To identify coexpressed lncRNA-mRNA pairs and hub DEGS, the lncRNA-mRNA coexpression network was constructed based on the gene expression value and Pearson's correlation coefficient (PCC) between the differentially expressed lncRNAs and mRNAs. |PCC| ≥0.8 and P value < 0.01 were considered as statistically significantly relevant. In order to distinguish the TF regulatory network in the stepwise process encompassing CC progression, the interactions were predicted starting from a TF by searching the conserved TF binding sites within a putative promoter area 2000 bp upstream and 500 bp downstream of the transcriptional initiation site of coexpressed genes.

Validation by Quantitative Real-Time PCR (qRT-PCR).
Based on types of lncRNAs, aberrant expression (fold change), the changes in coexpression degrees, and the results of CPC, CNCI, PFAM, and PhyloCSF scores, 5 lncRNAs were selected to validate the results of microarray. e expression levels of the lncRNAs were examined by qRT-PCR in 225 colorectal tissue samples. qRT-PCR was performed with a QuantStudio TM 6 Flex Real-Time PCR System ( ermo Fisher, Rochester, NY, USA). Primers are listed in Additional file 1 and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was used as the endogenous control. Each qRT-PCR reaction (in 10 μL) contained 5 μL 2X SYBR Green PCR Mix (ABI, USA), 0.4 μM of each of the forward and reverse primers, and 5 ng of template cDNA. e PCR program was as follows: initial denaturing at 95°C for 5 min followed by 40 cycles at 94°C for 30 s and 60°C for 1 min and fluorescence acquisition at 60°C for 1 min. PCR amplifications were performed in triplicate for each sample.

Statistical Analysis.
Computer-based calculations were conducted using SPSS version 20.0 (SPSS Inc., Chicago, IL, USA) and MedCalc version 19.1 (MedCalc Software Bvba, Ostend, Belgium). e differences in the expression of selected lncRNAs among multiple groups were compared using ANOVA. Further pairwise comparisons were performed with the least significance difference (LSD) test. e differences in the expression of lncRNAs between the pMMR CC group and the dMMR CC group were compared with the nonparametric Mann-Whitney U test. e relationship between lncRNAs and their target coding genes were determined by Spearman correlation analysis. Receiver-operating characteristic (ROC) curves and the area under the ROC curve (AUC) were employed to evaluate the diagnostic accuracy of candidate lncRNAs serum carcinoembryonic antigen (CEA), obtained from Hospital Information System. e data were presented as the mean with standard deviation (SD) for normally distributed data and the median with the interquartile range for skewed data. | Fold change| ≥5 was used as the significant threshold to screen differentially expressed lncRNA and mRNA. All P values were two-sided, and FDR was calculated for multiple comparisons. Differences with P < 0.05 were considered as statistically significant.

Basic Characteristics of the Microarray Data.
e total RNA extracted from pMMR colonic tissue samples, consisting of five NC, five LGIN, five HGIN, and four pMMR CC samples, were hybridized to a transcriptome microarray. e quality control results for RNA/microarray are summarized in Additional file 1 and Additional file 2, respectively. Cluster analysis of normalized data indicated that the samples from each group were well separated when all DEGs were considered (Figure 1), suggesting that the microarray data were reliable for further bioinformatics analysis.

Dysregulated lncRNAs and mRNAs in the Colonic Adenoma-Carcinoma Sequence.
A total of 5783 lncRNAs and 4483 mRNAs were differentially expressed among the NC, LGIN, HGIN, and pMMR CC samples. In order to generate dynamic and significant trending models of DEGs related to colon mucosa malignant transformation, STC analysis was used to further identify the clusters of DEGs with similar expression patterns. As shown in Figure 2(a), the differential expression profile consisted of 26 differentially expressed lncRNA clusters, among which 15 clusters (profile no. 0, 1, 2, 3,4,5,9,12,17,20,21,22,23,24, and 25) showed significant expression trends (P < 0.05). Moreover, the lncRNA expression levels in clusters 21, 22, 24, and 25 were stable or gradually elevated, while the lncRNA expression in clusters 0, 1, 3, 4, 9, and 12 showed a decreasing trend. Cluster 23 contained 207 lncRNAs that gradually increased from NC to adenoma and were suppressed in CC (Figure 2(b)). However, lncRNA expression in cluster 3 was opposite that in cluster 23. As shown in Figure 2(a), a total of nine mRNA expression clusters, specifically clusters 2, 4, 5, 13, 20, 21, 22, 24, and 25, showed significant expression trends (P < 0.05), among which clusters 21, 22, 24, and 25 showed a stable or gradual elevation. e results suggested that adenoma is an intermediate step from normal tissue to CC, and certain lncRNAs may play important roles during the dynamic process in colonic mucosal protruding lesions.

Functional and Pathway Enrichment Analysis.
DEGs from significant expression trends were subjected to GO and KEGG pathway analyses to identify their potential functions and mechanisms in the dynamic process for colonic mucosal protruding lesions (Figure 3). e top 20 GO terms included multicellular organismal development, cell division, cell adhesion, cell cycle, cell differentiation, mitosis, proteolysis, and angiogenesis. KEGG enrichment analysis revealed that the DEGs were mainly involved in several signaling pathways, including metabolism, cancer, cell cycle, PI3K-Akt signaling, and transcriptional misregulation in cancer (Figure 3(b)).

Construction of TF Regulatory Networks.
To investigate the mechanism of gene regulation at the transcriptional level, we constructed two interaction networks (TF-DEG network ( Figure 5(a)) and TF-angiogenesis-DEG network ( Figure 5(b))) between TFs and DEGs that originated from the coexpression network or A-coexpression network. e TF-DEG network contained 99 regulation models and 402 nodes involving 85 TFs ( Figure 5(a)), among which three TFs, namely, LUN-1 (TF, degree � 218), Tel-2 (TF, degree � 22), and 1-Oct (TF, degree � 20), were regulated over 20 nodes (Table 4). LUN-1 had the highest degree and might play more important roles in the TF regulatory networks. e lncRNAs regulated by the most TFs were NR_103548, lnc-ARRDC3-1: 16, ENST00000513626, and ENST00000516496 (regulated by six TFs). In the TF-angiogenesis-DEG network, due to the important role of LUN-1 (TF, degree � 73) and Tel-2 (TF, degree � 18) in the progression of CC, a subnetwork was constructed showing their effects on angiogenesis functions ( Figure 5(b) and Table 4).

qRT-PCR Verification for the Candidate Genes.
According to the probe signal value, type, DEG fold change, degree value, and protein coding ability, five LncRNAs were selected from the networks above for validation in the expanded clinical samples (70 NC, 58 LIGN, 27 HIGN, 62 pMMR CC, and 8 dMMRCC samples). LncRNAs ENST00000545920, ENST00000521815, ENST00000609220, and ENST00000603052 presented a sequentially increasing trend in expression with tumor progression (Figures 6(a)-6(d)), whereas NR_026543 showed a sequentially decreasing trend (Figure 6(e)). No statistically significant differences were observed in the expressions of NR_026543 and ENST00000521815 among LGIN, HGIN, and pMMR CC. However, the expression levels for each group were significantly elevated or decreased compared with those of the NC samples, suggesting that these genes could be used to distinguish normal from lesion tissue. Furthermore, the expressions of ENST00000603052, ENST00000521815, and ENST00000609220 were significantly different in pMMR CC and dMMR CC tissue samples (Figures 6(f)-6(j)), indicating that these three genes could distinguish pMMR CC from dMMR CC.

Diagnostic Efficacy of Selected lncRNAs.
e diagnostic efficacy of the above selected five lncRNAs was evaluated as potential biomarkers for pMMR CC diagnosis compared to carcinoembryonic antigen (CEA), a traditional serum tumor biomarker.
With the improvement of microarray and next-generation sequencing technology, large-scale aberrantly expressed lncRNAs have been discovered in multiple cancers, including breast cancer [22], gastric cancer [23], CRC [26], lung cancer [27], and pancreatic cancer [28]. A large proportion of colon adenomas are able to transform into adenocarcinoma through a dynamic process with the accumulation of multiple mutations and preternatural activation of signaling pathways [13]. Altered lncRNA expression patterns for microsatellite-stable colorectal cancer indicated that several lncRNAs were continuously upregulated/downregulated during the adenoma-adenocarcinoma sequence [29]. Unlike previous studies that focused on one or more lncRNAs in CRC [30][31][32], we first conducted a systematic and comprehensive analysis using microarrays to reveal the differentially expressed profiles of lncRNAs and mRNAs in the malignant evolutionary process for pMMR CC. ousands of dysregulated transcripts were identified in the ACS, including 5783 lncRNAs and 4483 mRNAs. Subsequently, STC analysis revealed that adenoma is an intermediate step from normal tissue to CC and certain lncRNAs participate in this continuous process. As the dysregulation can be detected already in adenomas, these lncRNAs can be used as biomarkers for the screening and early detection of CC.
GO annotation and KEGG pathway analysis were performed to determine the functions underlying the differentially expressed lncRNAs in ACS. GO annotation suggested that aberrant transcripts were predominantly enriched in multicellular organismal development, cell division, cell adhesion, cell cycle, cell differentiation, mitosis, proteolysis, and angiogenesis. KEGG enrichment analysis revealed that the dysregulated transcripts mainly participated in metabolic pathways, pathways in cancer, cell cycle, PI3K-Akt signaling pathway, and purine metabolism. Cell Journal of Oncology 7 cycle involvement was identified in both analyses. e uncontrolled tumor cell cycle is an essential feature of cancer and is caused by aberrant cell cycle proteins including cyclindependent kinases, Aurora kinases, and Polo-like kinases.
Exploiting key molecules in the cell cycle may provide a new perspective for cancer therapy [33].
To further explore the underlying functions of lncRNAs in ACS, coexpression of lncRNAs with their associated coding genes was performed based on the degree of correlation. Among the top 10 elevated and decreased lncRNAs according to the degree, NR_029374 with 12 degrees was upregulated, consistent with the results obtained by Sun et al. [34] who reported that NR_029374 was upregulated in CC and correlated with poor overall survival. NR_029374 promoted the migration, invasion, and metastasis of CC through the miR-30-5p/SOX9 axis and this finding was supported by previous results [35]. NR_029374 was upregulated in hepatocellular carcinoma (HCC), markedly promoting HCC proliferation, migration, and angiogenesis [35].
Angiogenesis plays a key role in tumorigenesis and development by delivering nutrients and evacuating metabolic wastes [36]. Recently, several lncRNAs and mRNAs were demonstrated to participate in tumor angiogenesis. Upregulation of the lncRNA FLANC promoted angiogenesis by upregulating and prolonging the half-life of phosphorylated STAT3/VEGFA in CRC [37]. In the present study, we discovered that NOTCH4 (mRNA, degree � 35), a member of the Notch family of transmembrane receptors, had the highest degree. Wu et al. reported the association of overexpressed NOTCH4 with CRC survival [38]. e NOTCH4 expression level was especially higher in non-small-cell lung cancer tissues than in the NC tissues, related to the tumor size and TNM stage [39]. Active NOTCH4 might inhibit endothelial sprouting in vitro and vivo [40]. Further study is required to determine the expression level of NOTCH4 in a large population of pMMR CC patients and the underlying interactions with lncRNAs in angiogenesis. Several lncRNAs have demonstrated interactions with TFs in tumorigenesis   Co-expression network D

(b)
A-co-expression network E

(c)
A-co-expression network D (d) Figure 4: lncRNA-mRNA coexpression networks. (a-b) LncRNAs with elevated or decreased profiles and mRNAs for significant terms from GO and KEGG enrichment analysis were, respectively, selected to construct coexpression networks E and D in colonic ACS. (c-d) LncRNAs with significantly elevated/decreased model profiles and angiogenesis-related GO and KEGG items were, respectively, selected to construct A-coexpression networks E and D in colonic ACS. Nodes represent DEGs (green for mRNAs and yellow for lncRNAs). Lines represent interactions and node size represents degree value.  and progression. For instance, lncRNA SNHG15 obstructed the ubiquitination and degradation of Slug, a fast-turnover TF critical for controlling cancer cell invasion and metastasis, thereby promoting CC progression [41]. e transcription of lncRNA LINC01503 was activated by TF TP63, resulting in shorter survival times for patients with esophageal squamous cell carcinoma [42]. In the current study, the lncRNA-TF network analysis revealed that LUN-1 had the highest degree in both the TF-DEG network and the TF-angiogenesis-DEG network.
qRT-PCR analysis showed the upregulation of ENST00000545920, ENST00000521815, ENST00000609220, and ENST00000603052 and the downregulation of NR_026543 with tumor progression. Furthermore, ENST00000603052, ENST00000521815, and ENST00000609220 can be used to identify pMMR CC and dMMR CC. Previous studies revealed the upregulation of ENST00000545920, also known as lnc-SNHG1, in CRC [41], non-small-cell lung cancer [43], gastric cancer [44], and glioma [45] to promote cancer cell growth by interacting with EZH2 in the nucleus and miR-154-5p in the cytoplasm [46]. Similar results have been observed in other tumors, including ENST00000521815, also known as CASC19, which possesses an oncogenic function through targeting miR-140-5p/CEMIP in CRC progression [47]. e downregulation of NR_026543 was associated with liver metastasis and poor prognosis for CC by binding with the miR-203 promoter [48]. ENST00000609220 was found to be highly expressed in CRC tissue, promoting CRC growth and metastasis through the miR-206/YAP1 axis [49]. To date, there is little knowledge on the function of ENST00000603052 in cancer. * * * * * * * * *  Figure 6: qRT-PCR verification of the candidate genes. ENST00000545920 (a), ENST00000521815 (b), ENST00000609220 (c), and ENST00000603052 (d) presented a sequentially ascending trend in expression with tumor progression, whereas NR_026543 (e) showed a sequentially decreasing trend. e expression of ENST00000521815 (f ), ENST00000603052 (h), and ENST00000609220 (i) showed statistically significant differences between pMMR CC and dMMR CC. However, the expression of ENST00000609220 (g) and NR _ 026543 (j) did not show statistically significant differences between two groups.  Ultimately, the diagnostic efficacy of the identified lncRNAs was evaluated against serum CEA. Most of the selected lncRNAs achieved a slightly higher AUC in distinguishing CC from normal tissue. Further study is needed to validate these lncRNAs as biomarkers for the early diagnosis of CC by measuring their expression in serum samples to assess the consistency in tissue samples. Monitoring expression changes in these genes may also provide a new strategy to assess disease progression.
ere are certain limitations in the present study. First, the sample size was relatively small, thereby limiting result reliability. We increased the number of clinical samples to confirm the function of the lncRNAs. Second, we only constructed expression profiles for colon tissue. Additional studies are required to evaluate tissues from other tumors. Last, compared with high-throughput sequencing, microarray is incapable of identifying novel lncRNAs.

Conclusions
In conclusion, we verified a series of differentially expressed lncRNAs and mRNAs in a normal mucosa-adenoma-pMMR CC sequence in the current study. e potential roles of these RNAs were predicted through bioinformatics analyses. e findings may provide novel insights for the diagnosis and therapeutic strategy of pMMR CC. Further studies are required to provide robust validation evidence for the functions of lncRNAs in CC.

Data Availability
All the data generated or analyzed during this study are included in this published article and its supplementary information files.

Disclosure
is manuscript was sent to Research Square as a preprint (https://www.researchsquare.com/article/rs-47627/ v1.pdf ); however, it has not been published elsewhere in whole or in part. e funding bodies had no role in the study design, data collection, data interpretation, or writing of this manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Wenkun Li and Qian Li contributed equally to this work. However, there were no statistically significant differences between the AUC of ENST00000545920 (0.794) and that of CEA (P � 0.8734) (e). e ROC curve for CEA is shown in blue, while the ROC curves for the validation of the lncRNAs are shown in green.