DV200 Index for Assessing RNA Integrity in Next-Generation Sequencing

Poor quality of biological samples will result in an inaccurate analysis of next-generation sequencing (NGS). Therefore, methods to accurately evaluate sample integrity are needed. Among methods for evaluating RNA quality, the RNA integrity number equivalent (RINe) is widely used, whereas the DV200, which evaluates the percentage of fragments of >200 nucleotides, is also used as a quality assessment standard. In this study, we compared the RINe and DV200 RNA quality indexes to determine the most suitable RNA index for the NGS analysis. Seventy-one RNA samples were extracted from formalin-fixed paraffin-embedded tissue samples (n = 30), fresh-frozen samples (n = 25), or cell lines (n = 16). After assessing RNA quality using the RINe and DV200, we prepared two kinds of stranded mRNA sequencing libraries. Finally, we calculated the correlation between each RNA quality index and the amount of library product (1st PCR product per input RNA). The DV200 measure showed stronger correlation with the amount of library product than the RINe (R2 = 0.8208 for the DV200 versus 0.6927 for the RINe). Receiver operating characteristic curve analyses revealed that the DV200 was the better marker for predicting efficient library production than the RINe using a threshold of >10 ng/ng for the amount of the 1st PCR product per input RNA (cutoff value for the RINe and DV200, 2.3 and 66.1%; area under the curve, 0.99 and 0.91; sensitivity, 82% and 92%; and specificity, 93% and 100%, respectively). Our results indicate that NGS libraries prepared using RNA samples with the DV200 value > 66.1% exhibit greater sensitivity and specificity than those prepared with the RINe values > 2.3. These findings suggest that the DV200 is superior to the RINe, especially for low-quality RNA, because it is a more consistent assessment of the amount of the 1st NGS library product per input.


Introduction
Next-generation sequencing (NGS) has become an essential technology in molecular biology research and clinical assessment [1][2][3]. However, the quality of the input biological samples has a critical effect on NGS results. It is important to grasp the quality of NGS results before conducting NGS analyses in order to avoid wasting precious samples and to minimize cost and labor.
Several RNA quality indexes have been developed, including RNA integrity number equivalent (RINe) and DV200 metrics (percentage of RNA fragments > 200 nucleotides in size). RINe is generally and widely used for assessing RNA integrity, and it is based on a mathematical model that calculates an objective quantitative measurement of RNA degradation that represents the relative ratio of signal in the fast zone to the 18S peak signal.
The DV200 was developed by Agilent in 2014 as a tool to more accurately assess the quality of RNA samples (http:// urx.red/OB4Y) and used as an RNA quality assessment standard even in the protocol published by Illumina. Values indicative of high quality can be obtained with the DV200 even for samples exhibiting weak 18S and 28S peaks if there is a sufficient volume of RNA fragments greater than 200 nt in length. However, the best practice for evaluating RNA quality remains uncertain. In this study, we compared the two RNA quality indexes in terms of the amount of the 1 st PCR product as preparation for NGS analyses in order to determine a much more suitable RNA quality index.

Data
Collection. Seventy-one specimens were obtained at four sections of Okayama University Hospital (Center for Clinical Oncology; Department of Hematology, Oncology and Respiratory Medicine; Department of Respiratory Medicine; and Department of Thoracic, Breast and Endocrinological Surgery) during their own studies. All study protocols were approved by the Institutional Review Board/Ethical Committee of Okayama University, Okayama, Japan (reference numbers K1603-066, K1512-024, K1505-033, K1605-022, and K1808-009), and all participants signed written informed consent. Each section consigned an analysis by NGS to our biobank for its own research purpose and provided collected samples to Okayama University Hospital Biobank. This study uses only the data obtained in the steps of RNA extraction from the provided collected samples, preparation of the NGS library, and the NGS analysis, which were conducted at Okayama University Hospital Biobank. Detailed information regarding each of the samples used in the study is shown in Supplemental Table 1. 2.2. RNA Extraction and Quality Evaluation. RNA was extracted from frozen samples (n = 25) and cell lines (n = 16) using the RNeasy Mini kit (Qiagen, Hilden, Germany) or from formalin-fixed paraffin-embedded (FFPE) tissue samples (n = 30) using the RNeasy FFPE kit. RINe values were automatically determined on the basis of electropherograms generated using TapeStation HS RNA ScreenTape (Agilent Technologies, Santa Clara, CA, USA). We calculated the DV200 values on the basis of the same electropherograms using TapeStation Analysis software.

NGS Library
Construction. NGS libraries were prepared using TruSeq RNA Access (Illumina, San Diego, CA, USA) (n = 63) or TruSight RNA Pan-Cancer (Illumina, San Diego, CA, USA) (n = 8). The NGS library preparation kits utilized the same workflows: fragmentation, cDNA synthesis, 1 st PCR, hybridization, 2 nd PCR, and cleanup, although hybridization probes were different. The amount of the NGS library product was quantified using a Qubit 2.0 fluorometer (Thermo Fisher, Waltham, MA, USA).

Receiver
Operating Characteristic Curve Analysis. We generated receiver operating characteristic (ROC) curves using JMP 9.0.2 software (SAS Institute Japan, Osaka, Japan). We determined >10 ng/ng for the 1 st PCR product per input RNA as the threshold on the basis of the following factors: (1) 200 ng of 1 st PCR product is needed to proceed to the 2 nd PCR step for the NGS library preparation and (2) the minimum recommended input volume of RNA is 20 ng, as determined according to the following formula: 200 ng 1 st PCR product/20 ng input volume = 10 ng/ng.  Table 1). Figure 1(c), the RINe and DV200 values were correlated (R 2 = 0:6944). It should be noted that 12 of 32 (37.5%) samples with a low RINe value (<5) exhibited a high DV200 value (>70%), suggesting that the DV200, compared with RINe, has the potential to increase the number of samples available for the following assays.

RINe and DV200 Values and NGS Library Preparation.
The median of the 1 st NGS library product per input was 41.0 ng/μl (0.01-129.5 ng/μl) (Supplemental Table 1). Both the RINe and DV200 values correlated positively with the amount of the 1 st NGS library product, although the DV200 exhibited a better correlation than the RINe index (R 2 = 0:8208 versus 0.6927, respectively) (Figures 2(a) and 2(b)). The fresh and FFPE samples were extracted and analyzed separately from the other samples to investigate the effects of different sample types. In the fresh samples, a high RINe value, a high DV200 value, and a sufficient amount of the 1 st NGS library product were obtained (more than 8.3, 89.32, and 73.15 ng/ng, respectively), even though the R 2 value of the RINe was higher than that of the DV200 (Figures 2(c) and 2(d)). Although the amount of the 1 st NGS library product was low in all FFPE samples, the DV200 showed better R 2 value than the RINe (0.0294 versus 0.0006), indicating that the DV200 is useful for evaluating RNA in low-quality samples such as FFPE.

Receiver
Operating Characteristic Curve Analyses. The analysis of ROC curves indicated that the optimal RINe and DV200 cutoff values were 2.3 and 66.1%, respectively, when >10 ng/ng for the 1st PCR product per RNA input was considered a sufficient amount for NGS. The area under the curve (AUC) for the DV200 was 0.99, with sensitivity of 92% and specificity of 100%, whereas the AUC for the RINe was 0.91, with sensitivity of 82% and specificity of 93% (Figures 3(a) and 3(b)).

Discussion
Remarkable progress in development of NGS technologies has made it possible to analyze a variety of specimens, including highly degraded materials such as 10-year-old FFPE samples [4]. The RINe has been widely used as an indicator of RNA quality in NGS, microarray, and qPCR [5][6][7]. However, the DV200 is more suitable than the RINe for quantification of RNA because it can be applied to evaluate not only RNAs extracted from fresh or frozen samples but also samples with lower RINe values, such as RNAs extracted from FFPE samples [8,9]. In our study, the DV200 showed better correlation with the amount of the 1 st NGS library product compared with the RINe even for low-quality samples such as FFPE. Recently, paraffin-embedded RNA metrics (PERM) is also proposed as a novel indicator that is based on the intensity of fluorescence at specific time points using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA) [10]. Although we attempted to perform a PERM analysis, unfortunately, the TapeStation used in this study did not support the PERM analysis.
Furthermore, our study also revealed that the DV200 with a cutoff value of 66.1% provided greater AUC, sensitivity, and specificity than the RINe (cutoff value 2.3) on the basis of the analysis of ROC curves. These results indicate that the DV200 with a cutoff value of 66.1% is more useful than the RINe for predicting whether a sufficient amount of high-quality 1 st NGS product can be obtained.
In addition to the 1 st NGS library product per input, we examined the effect of RNA quantification on quality metrics of RNA sequencing (RNA-seq): duplicates, reads not mapped, and nonspecific matches. As shown in Supplemental Figure 1, the DV200 showed better R 2 values than the RINe. Consistent with our report, another study reported a positive correlation between the DV200 value and the number of uniquely mapped NGS reads, which are reads mapped to one region of the reference genome [11]. By contrast, sample selection based on the RINe values reportedly provides no advantage for determining the quality of NGS reads [12]. In order to analyze some functional relationships between the RNA quality and the result of RNA-seq, we analyzed the transcripts per million (TPM) of protein coding genes (Supplemental Figure 2), and we found that the total TPM of protein coding genes in all the fresh samples (RINe > 8:0 and DV200 > 89%) exceeds 950,000 (meaning 95% of total RNA-seq reads). This result suggests that RNA-seq with high-quality input RNA using TruSeq RNA Access library preparation protocols could capture the whole picture of gene expression of protein coding genes with the least 3 BioMed Research International information loss. On the other hand, the total TPM of protein coding genes in all the FFPE samples (RINe < 3:0 and DV200 < 55%) ranged from 675,000 to 778,000 (meaning 67.5-77.8% of total RNA-seq reads) with one outlier (578,000). This result suggests that RNA-seq for low-quality input RNA may lead to the gene expression profiles with some information loss due to the potential RNA degradation/fragmentation. The total TPM of protein coding genes in frozen samples ranged widely from 150,000 to 970,000 on the basis of their RNA quality. On the other hand, in samples with RINe values of 2 or less, some samples had TPM values of more than 800,000, but others had TPM values of 800,000 or less. This result suggests that careful interpretation is required when using RNA with an RINe value of 2 or less.
These data suggest that the DV200 index is superior to RINe for assessing RNA integrity in order to obtain NGS results worthy of evaluation.
In general, the time required for tissue acquisition, fixation, and preservation is important for RNA quality [13,14]. However, unfortunately, we could not obtain detailed information including ischemia time, interval from sample collection to formalin fixation, and formalin fixation duration. Currently, we are planning to obtain the duration of  BioMed Research International processing for sample preservation to investigate the effect of the duration of the preservation process on RNA quality as well as NGS libraries.

Conclusion
The DV200 index is a more consistent assessment of the amount of the 1 st NGS library product per input than the RINe index, especially for low-quality RNA. Therefore, we conclude that the DV200 is a beneficial RNA quality index for NGS analyses using degraded RNA samples such as those extracted from FFPE samples.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.  Figure 3: Receiver operating characteristic curves for RINe and DV200. ROC curves for the RINe and DV200 indexes indicating the most efficient amount (more than 10 ng/ng) of the 1 st PCR library product per input. The area under the curve for DV200 was greater than that for the RINe index (0.99 with P = 0:0008 and 0.91 with P = 0:0012, respectively).