Comparability of Microarray Data between Amplified and Non Amplified RNA in Colorectal Carcinoma

Microarray analysis reaches increasing popularity during the investigation of prognostic gene clusters in oncology. The standardisation of technical procedures will be essential to compare various datasets produced by different research groups. In several projects the amount of available tissue is limited. In such cases the preamplification of RNA might be necessary prior to microarray hybridisation. To evaluate the comparability of microarray results generated either by amplified or non amplified RNA we isolated RNA from colorectal cancer samples (stage UICC IV) following tumour tissue enrichment by macroscopic manual dissection (CMD). One part of the RNA was directly labelled and hybridised to GeneChips (HG-U133A, Affymetrix), the other part of the RNA was amplified according to the “Eberwine” protocol and was then hybridised to the microarrays. During unsupervised hierarchical clustering the samples were divided in groups regarding the RNA pre-treatment and 5.726 differentially expressed genes were identified. Using independent microarray data of 31 amplified vs. 24 non amplified RNA samples from colon carcinomas (stage UICC III) in a set of 50 predictive genes we validated the amplification bias. In conclusion microarray data resulting from different pre-processing regarding RNA pre-amplification can not be compared within one analysis.


Introduction
Microarray-based investigations of genome wide gene expression have become a popular method for the molecular characterisation of various tissue types. In molecular oncology prognosis-related genes could be identified concerning various cancer types [1][2][3][4][5]. Especially in colorectal carcinoma gene clusters related to metastasis, tumour recurrence or chemoradiation were described [4,[6][7][8]. Microarray analysis in most of these studies was not dependent on RNA amplification, as enough RNA could be isolated from the tumours. Whenever tissue is limited and a high-throughput analysis is in concern, amplification of RNA by in vitro transcription is essential. However, using amplification, one has to be sure that the RNA is amplified linear, meaning that gene expressions will be comparable between native and amplified RNA. This is necessary, as more and more data are generated with different methods, stored at the internet being available for the research community. Recently the limited comparability of gene expression profiles between studies using different techniques has been demonstrated [5]. However, there is an ongoing discussion if microarray data of non amplified and amplified RNA samples are comparable. The aim of our study was to evaluate to which extend (1) the microarray data based on amplified RNA are reproducible, (2) the expression data from amplified and native RNA are comparable, and (3) if tumour specific genes are affected by an amplification bias.

Patients and Experimental
Procedure. Primary tumours of four patients with colorectal carcinomas stage UICC IV resected at the Department of Surgery at the Friedrich-Alexander-University Erlangen-Nuremberg were chosen for the analysis. No patient received neoadjuvant treatment prior to surgery. By approval of the Ethical Committee of our University and by patient consent, conformity to the ethical guidelines for human research respecting the principles of the Declaration of Helsinki was provided. After tumour enrichment by cryotomy after manual dissection (CMD) RNA was isolated from each sample [9]. One part of the RNA of each sample was hybridized to the microarray without amplification; the other part underwent amplification prior to microarray hybridization ( Figure 1). For validation purpose the microarray data of amplified RNA from 31 colon carcinoma samples stage UICC III versus the microarray data of not amplified RNA from 24 colon carcinoma samples were used. Patient selection, tissue workup, and amplification protocol were equally in each group.

Sample
Workup and RNA Isolation. The tissue was inserted into a cryotube (Roth, Karlsruhe, Germany) together with Tissue-Tek (Zakura, Zoeterwoude, Netherlands) and immediately shock frozen in liquid nitrogen after surgery and stored at −80 • C until further workup. CMD was performed as recently described [9]. RNA isolation was performed in the same way from all four tissue samples using commercial kits (RNeasy-Kit, Qiagen, Hilden, Germany), following the manufacturers' protocol. Each sample was added to the Qiagen spin column, and centrifuged to bind the RNA to the matrix. The column was washed with the buffers provided in the kit, and the RNA was finally eluted with distilled H 2 0. Within this procedure a DNAse (Qiagen, Hilden, Germany) digestion was included following the manufacturers' suggestion. RNA quality and quantity was determined by the "Lab on a Chip" method (Bioanalyzer 2100, Agilent Technologies, Palo Alto, USA) following the manufacturers' instructions [10]. A total of 50 ng to 100 ng of each RNA sample was loaded/well. The analyser allows for visual examination of both the 18S and 28S rRNA bands as measure of RNA integrity.
The 3 /5 -ratios for the housekeeping genes glycerinaldehyde-3-phosphatase (GAPDH) and ß-actin supplied by the GeneChip were used as further parameters for RNA quality and to exclude partial degradation. A 3 /5 -ratio below the value of 3 was regarded as an indicator for good RNA quality according to the manufacturers' protocol (Affymetrix, Santa Clara, USA) [11].
2.3. RNA Amplification. Amplification of RNA was performed with the Message Amp aRNA kit (Ambion, Austin (Tx), USA) according to the manufacturers' instructions, using 200 ng of total RNA for each sample. Briefly, first strand cDNA syntheses were primed with the T/Oligo (dT) primer to synthesize cDNA with a T/promoter sequence from the poly (A) tails of massages by reverse transcription. The second strand cDNA synthesis converted cDNA with the T7 promoter primer into double-stranded DNA (dsDNA) template for transcription. Following a cDNA purification step an in vitro transcription was done, generating multiple copies of aRNA from the double-stranded cDNA templates. Finally, in another purification step, unincorporated NTPs, salts, enzymes, and inorganic phosphate were removed. In a second round, the additional amplification of the RNA sample was achieved. Besides using different primers for the second round, the same reagents and methodology were used. During this second round the biotin labelling of the probe took place before the in vitro transcription step.
Hereunto, for each sample, 3.75 µL of 10 mM biotin 11-CTP and 3.75 µL of 10 mM biotin-16-UTP were added and the probe dried in a vacuum centrifuge concentrator.  calculated and the mean expression values and standard deviation (log 2 ) of 50 predictive genes for lymphatic metastasis recently described have been compared [12]. For validation purpose gene expression measures were computed with the Robust Multichip Average (RMA) method described in Irizarry et al. [12] and implemented in the R-function just RMA of the Bioconductor R package affy. The statistical analysis was performed with the open-source software R, Version 2.6.1.

Comparability of Amplified and Non Amplified RNA.
In the amplified test set 200 ng of RNA was used as the starting yield. Two rounds of amplification of RNA resulted in sufficient amounts of cRNA with good quality for microarray hybridization. The correlation of the microarray signals between non amplified and amplified RNA reached 86%-91%. The detection P-values of the microarray data correlated in 75%-81% (Table 1). An unsupervised hierarchical cluster analysis including all 22.283 probe sets from the GeneChips separated all unamplified from the preamplified RNA samples (Figure 2). All samples were correctly classified regarding the method of RNA pretreatment. In the statistical analysis (Mann-Whitney U-test, P < .05) 5.725 significantly differentially expressed genes between non amplified and amplified RNA samples could be identified. In 1.182 probe sets of the microarray a significantly elevated signal intensity of amplified versus non amplified samples could be detected. In 4.543 probe sets significantly lower signal intensity between amplified versus non amplified RNA could be detected. The fold change (PA/NA) of the mean signals was between 8 (bicaudal-D (BICD) mRNA) and 13 (myosin I× b (MYO9b) mRNA). Several ribosomal RNA (e.g., 18S rRNA gene) which were included on the microarrays as internal control could be detected with a fold change of 273 between amplified vs. non amplified samples. In 36% RNA with increased FC and in 32% RNA with a decreased FC had a sequence length between 1000-19000 bp. In 5% RNA with increased FC and in 1% RNA with a decreased FC had a sequence length >300000 bp (Figure 3(a)). Thirteen per cent of genes with an increased FC were located on chromosome 1; 10% on chromosome 17; 9% on chromosome 6, and 8% on chromosome 2. Eleven per cent of genes with decreased FC were located on chromosome 19; 9% were located on chromosome 2, chromosome 7, and chromosome 12, and 8% on chromosome 16 (Figure 3(b)).

Influence of RNA Amplification on Colorectal Cancer
Specific Genes. Various genes which have been recently described participating in carcinogenesis and tumour progression in colorectal carcinomas could be identified with significantly different signals between amplified and non amplified RNA samples (e.g., WNT3, APC, and VEGF). VEGFB had a decreased FC of −5 and the APC gene   Table 2).

Validation of Amplification Bias.
The Spearman correlation of 22.115 probe sets (Affymetrix HG-U133A) between 31 amplified RNA samples versus 24 not amplified RNA samples of colon carcinomas stage UICC III was 0.8 ( Figure 4). Comparing the mean microarray signals of 50 recently described genes predictive for lymphatic metastasis only in one case an equal value could be detected (210701 at). The standard deviation in this case was less high without RNA amplification. In most other genes substantially differences were identified (Table 3).

Discussion
Gene expression profiling has become an attractive tool for tissue typing and prognostic evaluations in cancer research. For colorectal carcinoma several gene profiles dividing healthy mucosa from tumours and for prognostic classification could be identified [5,7,[12][13][14][15]. Nevertheless there is only a limited overlap in the described gene profiles in most of these studies [5]. One reason for this finding might be the fact that there is a brought variability of applied techniques used during the analysis. Regarding the methods of tissue handling and isolation, RNA preparation, and microarray hybridization, various distributive factors may influence the results. Especially, when only small amounts of tissue can be harvested or only limited amounts of tissue  are available, the yield of RNA might not be sufficient for microarray hybridization. Preprocessing of the RNA by amplification becomes indispensable. Whether samples of amplified RNA can be compared to samples with non amplified RNA is still discussed controversially. For the amplified probes, we used the linear amplification technique which is based on a double-stranded cDNA synthesis with an oligo-dT primer coupled to the T7 RNA polymerase promoter followed by an in vitro transcription into aRNA by T7 RNA polymerase [16]. This is an established technique used for RNA amplification procedures during microarray experiments [17][18][19]. During two rounds of amplification enough RNA in sufficient quality could be generated for microarray hybridization which supports the reliability of the method. The Affymetrix (Santa Clara, USA) GeneChip technology provides standardized protocols for microarray procedures on a commercial platform which is frequently used in gene expression profiling regarding colorectal tumours [4,14,[20][21][22][23][24]. Using unsupervised hierarchical cluster analysis of our microarray results we observed a separation of two groups respecting the RNA pretreatment. We identified 5.725 significantly differentially expressed genes between non amplified and amplified RNA samples. As amplified and non amplified RNA referred to the same samples no separation in clusters should have been occurred. The cluster results and the identification of significantly different expressed Table 3: Comparison of mean signals and standard deviation (sdv) of genes which were recently described as predictive for lymphatic metastasis in colorectal carcinomas [12], between 31 amplified RNA samples of colon carcinomas and 24 RNA samples of colon carcinomas not amplified prior to microarray hybridization.   genes demonstrate an amplification bias between native and amplified RNA. The correlation of microarray signal NA versus PA was between 86-91%. This amplification bias could be validated in a cohort of 55 RNA samples either amplified or not amplified from colon carcinoma samples (stage UICC III). The correlation of 22.115 probe set signals did reached only 80% and the comparison of 50 genes involved in lymphatic metastasis varied substantially. If the same labelled cRNA is hybridized twice to microarrays the correlation of signals is 99%. When two cRNA samples are generated from the same mRNA and hybridized to microarrays, the correlation of signals is about 99% as well.
Signal correlations of 97% were reached with two separate RNA isolations and microarray hybridizations of one and the same tumour probe (data not published). Behind these findings, the identified differences between amplified and non amplified RNA are relevant. Analysing the reasons for these findings we detected that sequences with a length of bp 1000-19000 are mainly affected by differential signal intensity. This may be explained due to the more frequent amplification of shorter transcripts which may be dependent on the amount of amplification rounds. A connection of sequences located on specific chromosomes could not be identified. These findings are supported by previous studies which identified a correlation between amplification rounds and comparability between native and amplified RNA [18]. The detected alterations during RNA amplification are important, because several genes of interest involved in carcinogenesis (e.g., APC, VEGF) and tumour progression (e.g., CDC2, MMPs) were affected. When amplified and non amplified RNA are compared in the same microarray study false positive results might occur. Therefore, amplified and non amplified RNA should not be compared during microarray investigations. These findings have already been suspected previously, but have not been demonstrated in detail so far [18,19].

Conclusion
Amplification of RNA by the T7-IVT is an elegant method to generate RNA in good quality and sufficient yield for microarray hybridization from as less as 200 ng of starting RNA. Nevertheless during amplification alterations occur which lead to an amplification bias compared to non amplified RNA. For this reason the microarray results of amplified and non amplified RNA samples should not be compared within the same study.