Differences in miRNA and mRNA Profile of Papillary Thyroid Cancer Variants

Papillary thyroid cancer (PTC) can be divided into classical variant of PTC (cPTC), follicular variant of PTC (fvPTC), and tall cell variant (tcPTC). These variants differ in their histopathology and cytology; however, their molecular background is not clearly understood. Our results shed some new light on papillary thyroid cancer biology as new direct miRNA-gene regulations are discovered. The Cancer Genome Atlas (TCGA) 466 thyroid cancer samples were studied in parallel datasets to discover potential miRNA-mRNA regulations. Additionally, miRNAs and genes differentiating PTC variants (cPTC, fvPTC, and tcPTC) were indicated. Putative miRNA regulatory pairs were discovered: hsa-miR-146b-5p with PHKB and IRAK1, hsa-miR-874-3p with ITGB4 characteristic for classic PTC samples, and hsa-miR-152-3p with TGFA characteristic for follicular variant PTC samples. MiRNA-mRNA regulations discovery opens a new perspective in understanding of PTC biology. Furthermore, our successful pipeline of miRNA-mRNA regulatory pathways discovery could serve as a universal tool to find new miRNA-mRNA regulations, also in different datasets.


Introduction
Papillary thyroid cancer (PTC) is the most frequent thyroid cancer (80% of cases) with the 10-year overall relative survival rate of 93% [1,2]. The other differentiated thyroid cancer that originates from follicular thyroid cells, follicular thyroid cancer (FTC), is less frequent with incidence of around 10% [1]. Most of PTC tumors have good prognosis and are relatively easy to treat [3]. Presence of tall cells in PTC is also considered as a risk factor, especially if percentage of tall cells comprises more than 10% of tumor cells [4]. The most frequent histopathological subtypes of PTC are classical variant of PTC (cPTC) and follicular variant of PTC (fvPTC), which are different in histopathology, but they confer similar risk of aggressive outcome, which in case of both tumors is relatively low [5,6]. On both molecular and morphological levels fvPTC shares similarities with cPTC and follicular tumors, namely, FTC and follicular thyroid cancer (FTA) [7]. Follicular variant of PTC, especially encapsulated fvPTC, shares clinicopathological features of cPTC and FTC [8].
On the other hand, on molecular level fvPTC can harbor a BRAF V600E mutation and PAX8/PPARG translocation [9,10]. Follicular variant of PTC standing in the middle of two distinct tumor types creates diagnostic challenge and incorrect diagnosis possibility. Incorrect classifications of fvPTC-FTA could be problematic and dangerous for the patient [11].
Genomic alterations, like BRAF mutation or PAX8/ PPARG translocation, are not distinct features of fvPTC, first one being characteristic for cPTC and the second one for FTC and FTA. Expression markers may be helpful to distinguish follicular variant from other entities and number of studies were performed with use of both gene expression and miRNA expression markers [12][13][14][15]. Studies on 2 International Journal of Endocrinology the expression markers were limited by sample size and therefore a larger sample cohort would be desirable to catch population diversity.
In most of the cases when new miRNA-gene regulatory pair is introduced it is unclear how the pair was selected. Most frequently bioinformatics analysis leading to selection of interesting miRNA regulatory pathway is poorly described, not reproducible or just missing. Some publications are very focused on finding regulation/regulations of one chosen gene [17] or gene regulation/regulations by one chosen miRNA [18]. Global analysis of thyroid cancer miRNA regulations was done in PTC and in follicular thyroid tumors (FTC, FTA) [20,21]. Both experiments were performed using microarray platforms for both gene and miRNA expression analysis with small sample size for follicular thyroid tumors (total 24 samples) [20] and relatively large sample size for PTC samples (126 samples) [21]. One problem with expression measurement by hybridization method applied in microarrays is that very similar sequences will hybridize to the same probe on microarray, which is extremely important especially for miRNAs, which are small in size and have been proven to have multiple isoforms ranging in size and in sequence [22]. In our opinion high-throughput sequencing is a method more suitable for correct miRNA expression evaluation since it is more isoform specific and enables us to evaluate expression of each miRNA isoform.
Lately published repository of deep sequencing data comprises a large dataset of PTC samples. The Cancer Genome Atlas (TCGA) project enables us to perform large scale analysis with high number of samples, and experiments were performed with high-throughput sequencing, a method most suitable to study miRNA expression [23]. In parallel, the TCGA thyroid cancer dataset covers gene expression data studied by RNA-Seq for the same cohort of patients.

Analysis of miRNA High-Throughput Sequencing Data.
Expression of 46681 unique miRNA isoforms was detected in the miRNA sequencing data in at least one sample of all available in TCGA thyroid samples. Each of the isoforms was described by reads per million miRNA mapped as described in the TCGA data portal. 4133 of those isoforms were detected in at least half of 466 studied samples. Only those 4133 were taken into further consideration to diminish effect of low abundance isoforms on the outcome of the expression analysis. Quantile normalization with R/Bioconductor pre-processCore library was performed. After normalization log 2 transformation of data was carried out together with subtraction of batch effects. Batch effect removal was performed by subtraction of average expression from each of 16 batches for each single miRNA isoform and for mRNA data. Results of batch removal were observed in the Principal Components Analysis of data before removal (Supplementary Figure 1A) and after removal (Supplementary Figure 1B). miRNA isoform data obtained from above preprocessing steps were passed to integrative analysis.

Analysis of mRNA High-Throughput Sequencing Data.
20531 genes were detected in the mRNA sequencing data. Normalized counts of sequences aligning to particular genes were extracted for all of 466 samples common for mRNA, miRNA, and clinical data. 17438 genes were expressed in at least half of the samples so they were considered in the further analysis. Similar to miRNA data preprocessing log 2 transformation, quantile normalization and batch removal were performed. Results of batch removal for gene expression data were presented in Supplementary Figure 1C (before batch removal) and Supplementary Figure 1D (after removal).

Statistical Testing and Annotation.
Median based fold changes (MBFC) were introduced to diminish outlier effects and considered significant when higher than 1.25 or lower than 0.8. -test values were considered as statistically significant when lower than 0.05. False discovery rates (FDR) were computed with reference to total number of miRNAs on the microarray to take into account multiple testing bias. To compute -test and FDR only expression features with variance higher than 1st quartile (25%) of variance in analyzed dataset were considered. Annotations were prepared according to miRBase version 20 [24][25][26][27] (ftp:// mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3). independently in 3 sample groups: cPTC (321 samples), fvPTC (99 samples), and tcPTC (35 samples). Spearman correlation coefficient was calculated with respective value for each correlation using Hmisc R library [28]. For each one of sample groups (cPTC, fvPTC, and tcPTC) a number of 72,071,254 independent correlation values with their respective values were calculated. Raw values were adjusted with False Discovery Rate (FDR) method. Correlation coefficient threshold of < −0.6 was applied to filter the best inverse correlations. Subsequently best inverse correlations for all 3 samples sets (cPTC, fvPTC, and tcPTC) were tested with miRNA regulation prediction algorithms: miRanda and TargetScan [29,30]. For this analysis purpose miRNA chromosomal coordinates annotations were transformed to miRNA IDs according to miRBase version 20 (ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa .gff3, chromosomal coordinates annotations being still primary annotations). Both algorithms (miRanda and Tar-getScan) were applied to predict miRNA binding sites in 3 -UTR and 5 -UTR and coding sites of mRNAs. Correlation of <−0.6 from all 3 sample groups was considered for further testing when both applied prediction tools (miRanda and TargetScan) were independently predicting a putative regulation of miRNA for miRNA-mRNA pair in tested correlations. To test analysis pipeline utility the best putative regulations were tested with isoform specific method of miRNA-target prediction: TargetRank, which takes into account binding of the seed region of miRNA to target mRNA sequence [31].

Integrative
Only best putative regulations that were predicted by both miRanda and TargetScan and had coefficient value of correlation below −0.65 for cPTC (321 samples in correlation), below −0.7 for fvPTC (99 samples in correlation) and −0.8 for tcPTC (35 samples in correlation), were tested with TargetRank algorithm. For better explanation of our analysis pipeline appropriate block diagram was presented in Supplementary Figure 2.

Follicular Variant of PTC and Its Molecular
Difference from Classic PTC. We find fvPTC standing in between of two thyroid cancer types, PTC and FTC, and therefore we show differences of this variant contrary to cPTC based on TCGA data. Molecular difference between fvPTC and cPTC is of similar level to fvPTC and tcPTC difference, 1689 miRNA isoforms and 8721 genes differentiating with FDR < 0.05. The most differentiating miRNA is hsa-miR-21-3p (FDR = 4.12 * 10 −26 , MBFC = 0.36, downregulated in fvPTC) and the most differentiating gene is TMPRSS11E2 (FDR = 1.72 * 10 −32 , MBFC = 0.36, downregulated in fvPTC). Distributions of the most differentiating miRNAs and genes (fvPTC versus cPTC) are presented on the box plots in Figure 2. Top 8 miRNAs and genes with both significant FDR and significant MBFC were presented in the figure. TargetRank algorithm tested on putative regulations with < −0.65 for PTC, < −0.70 for fvPTC, and < −0.80 for tcPTC has confirmed 4 putative miRNA regulatory pairs: hsa-miR-146b-5p with PHKB and IRAK1; hsa-miR-874-3p with ITGB4 within cPTC samples (Table 1, Figure 3). Hsa-miR-152-3p with TGFA pair was confirmed within fvPTC samples (Table 1, Figure 3).

Discussion
We present in this work a successful pipeline of analysis for miRNA-mRNA regulation discovery in a large dataset of PTC samples from TCGA project. An important statement to make is that our analysis pipeline is designed only for miRNAs causing cleavage of target mRNAs, while it would rather fail in discovering or confirming miRNA triggered translation repression. A simple reason for that is that we correlate in our analysis mRNA expression with miRNA expression and we lack protein expression data to study translation repression.
Described miRNA-mRNA regulations were obtained with very stringent filtering criteria: best inverse correlation coefficient values (<−0.65 for cPTC, <−0.70 for fvPTC, and <−0.80 for tcPTC), additionally confirmed independently with 3 prediction algorithms (miRanda, TargetScan, and Tar-getRank). We provide extended tables with putative miRNA regulatory pairs with correlations below −0.60, confirmed by miRanda and TargetScan for all studied PTC histotypes in supplemental materials (Tables S1, S2, and S3). Sample size of tcPTC set (35 samples) is a limiting factor in analysis and one should be cautious with drawing conclusions for this sample subgroup. Thus, main effort was done to compare fvPTC and cPTC sample sets.
Out of the best correlations (below −0.60) calculated for cPTC (3488 correlations), fvPTC (9460 correlations), and tcPTC (35028 correlations) 10-30% were confirmed by miRanda or TargetScan prediction algorithms. Both miRanda and TargetScan concordantly were predicting 3-7.5% from best correlations (below −0.60). Low percent of those confirmed miRNA regulations from all of the correlation results may seem surprising. On the other hand, large part of high correlations may simply represent coexpression or coregulation of miRNAs and genes. Putative regulation discovery should be well reinforced by suitable confirmation to avoid false positives, coming from biological processes that may also produce high correlation coefficient values.
We have presented the most significantly deregulated genes between cPTC and fvPTC: TMPRSS11, CEACAM6, ACTBL2, FN1, CRLF2, LDLR, LY6G6C, and TM7SF4 (Figure 2). Fibronectin 1 (FN1) is a well-described PTC marker [34,35]. However, we should say that based on our results it is a marker of cPTC samples, as it was highly upregulated in cPTC samples. To the best of authors knowledge, remaining genes (TMPRSS11, CEACAM6, ACTBL2, CRLF2, LDLR, LY6G6C, and TM7SF4) were never reported in the context of thyroid neoplasia and according to our results they should be considered as differentiation markers between cPTC and fvPTC ( Figure 2).
According to our integrative analysis we report four new regulations in PTC: hsa-miR-146b-5p regulating PHKB (phosphorylase kinase, beta), hsa-miR-146b-5p regulating IRAK1 (interleukin-1 receptor-associated kinase 1) and hsa-miR-874-3p regulating ITGB4 (integrin, beta 4) in cPTC samples, and hsa-miR-152-3p regulating TGFA (transforming growth factor, alpha) in fvPTC samples. TGFA/EGFR pair is one of the most tightly controlled ligand/receptor pairs in PTC samples [36]. In light of our results in a fvPTC histotype miRNA-152-3p may control TGFA expression and balance International Journal of Endocrinology    were predicted by two used prediction tools: miRanda and TargetScan as putative miRNA regulations. Top 10 from each type of thyroid cancer (cPTC, fvPTC, and tcPTC) were ranked by correlation coefficient value and 10 lowest correlations are shown for each dataset (cPTC, fvPTC, and tcPTC). In columns from left, "regulation name," assigned name of correlation; "correlation , " correlation coefficient value (Spearman); "Correlation FDR," DR corrected value of correlation; "mature miRNA," mature miRNA name; "confirmed by TargetRank," stating that if regulation is confirmed by TargetRank (rank and score in the parenthesis); "gene symbol," HGNC symbol of gene in correlation; and "gene name," HGNC official full name.  proliferation/differentiation signals of TGFA on thyroid cells ( Figure 3). hsa-miR-152-3p regulation of TGFA expression is rather absent or antagonized by other regulation mechanisms in cPTC, correlation of this two RNAs expressions in cPTC samples is rather high (−0.42), and samples distribution is not suggesting a strong dependence of TGFA expression on hsa-miR-152-3p expression ( Figure 2). Integrin, beta 4 is involved in cell adhesion and motility [37,38]. Putative regulation of ITGB4 expression by hsa-miR-874-3p was observed in cPTC samples and may be also present in fvPTC variant (Figure 3). Expression of ITGB4 is low in most of fvPTC samples and rather high in cPTC samples suggesting different patterns of regulation and/or additional regulation of ITGB4 expression (Figure 3). PTC variants are known to be different in their metastatic potential, as fvPTC more often gives distant metastases and cPTC is more frequently observed with local node metastases and soft tissue invasion noted during surgery [39]. Hsa-miR-874-3p was already reported to control cell invasiveness in nonsmall cell lung and gastric cancers [40,41]. Regulation of ITGB4 may be a one aspect controlling PTC cells invasive potential and as presented on Figure 3 there are samples of PTC where miRNA-874-3p is significantly downregulating ITGB4 expression and there are samples that escape this regulation and have high ITGB4 expression.
Hsa-miR-146b-5p is very well described in a context of thyroid cells biology and neoplastic transformation [15,18,42]. In cPTC it may regulate PHKB and IRAK1 genes expression, and both regulations are most likely absent or antagonist by other regulatory processes in fvPTC (Figure 3). What is striking about both regulatory pairs is that regulation seems to be present only when hsa-miR-146b-5p expression is relatively high (approximately more than 8, Figure 3), suggesting a threshold of miRNA expression that is necessary to trigger the regulation. Most of fvPTC samples have lower hsa-miR-146b-5p expression and that is why they escape this 8 International Journal of Endocrinology regulation by miRNA. PHKB is a gene encoding enzyme involved in carbohydrates metabolism [43]. Hsa-miR-146b-5p may control carbohydrate metabolism in cPTC, as energy supply is crucial for cancer cell development. Second putative regulation exerted by hsa-miR-146b-5p is IRAK1 regulation that is in fact a confirmed and validated miRNA regulation [44][45][46], reported also by miRTarBase, database of experimentally validated miRNA-target interactions [47]. In fact a closely related hsa-miR-146a was already suggested to regulate IRAK1 in PTC [48]. We find highly concordant results regarding hsa-miR-146b-IRAK1 regulation as an indirect proof that our pipeline of analysis is correctly predicting real miRNA regulations in PTC. IRAK1 has been reported to have pleiotropic effect on cell biology, so it is hard to speculate what functional effect hsa-miR-146b-5p regulation of IRAK1 can produce. As IRAK1 is activated by interleukin-1 it is very likely that miRNA 146b controlling IRAK1 activity has a modulating effect on interactions with inflammatory cells infiltrating thyroid parenchyma.