IROme, a New High-Throughput Molecular Tool for the Diagnosis of Inherited Retinal Dystrophies

The molecular diagnosis of retinal dystrophies is difficult because of the very important number of genes implicated and is rarely helped by genotype-phenotype correlations. This prompted us to develop IROme, a custom designed in solution-based targeted exon capture assay (SeqCap EZ Choice library, Roche NimbleGen) for 60 retinitis pigmentosa-linked genes and three candidate genes (942 exons). Pyrosequencing was performed on a Roche 454 GS Junior benchtop high-throughput sequencing platform. In total, 23 patients affected by retinitis pigmentosa were analyzed. Per patient, 39.6 Mb were generated, and 1111 sequence variants were detected on average, at a median coverage of 17-fold. After data filtering and sequence variant prioritization, disease-causing mutations were identified in ABCA4, CNGB1, GUCY2D, PROM1, PRPF8, PRPF31, PRPH2, RHO, RP2, and TULP1 for twelve patients (55%), ten mutations having never been reported previously. Potential mutations were identified in 5 additional patients, and in only 6 patients no molecular diagnosis could be established (26%). In conclusion, targeted exon capture and next-generation sequencing are a valuable and efficient approach to identify disease-causing sequence variants in retinal dystrophies.


Introduction
Retinitis pigmentosa (RP) (MIM number 268000) is a group of genetically highly heterogeneous-inherited retinal dystrophies [1]. Typically, night blindness starts during adolescence, and patients progressively loose the rod photoreceptor-mediated peripheral vision. At later stages, the cone photoreceptors also become affected, constricting vision over time to the most central fovea and eventually resulting in complete blindness. To date, more than ��y genes have been linked to nonsyndromic RP (Ret-Net; http://www.sph.uth.tmc.edu/RetNet/). Inheritance can be autosomal dominant (AD), autosomal recessive (AR) or X-linked, and, rarely, mitochondrial or digenic [2]. Sporadic or simplex cases account for about 30% [3].
e molecular diagnosis of RP is difficult because (i) there is no genotype/phenotype correlation in a vast majority of patients, (ii) a high intra-and interfamilial variability of clinical phenotypes is observed in patients carrying the same causative mutation, (iii) different mutations in a same disease-linked gene cause highly variable clinical phenotypes if not clinically distinct retinal degenerations, and (iv) overlapping clinical phenotypes and disease-linked genes exist with additional retinal degenerations, that is, early-onset Leber congenital amaurosis (LCA), congenital stationary night blindness (CSNB), cone-rod dystrophies (CRD), enhanced S-cone syndrome (ESCS), or syndromic RP in Bardet-Biedl and Usher syndrome [2]. However, identi�cation of RP-linked sequence variants is important for genetic counseling and patient management.
Similar to other Mendelian disorders, mutations in RP patients were identi�ed until recently by linkage mapping and subsequent Sanger sequencing of candidate genes [4]. For molecular diagnosis, the validated RP mutations could be detected by arrayed primer extension (APEX) chip technology [5]. However, a low success rate in detecting mutations by APEX was inherent to the genetic heterogeneity of RP patients, and in a cohort of 272 Spanish families affected by ARRP, causative mutations were identi�ed in only 11% of them [6]. e development of next-generation sequencing (NGS) tools in recent years has allowed the production of an enormous volume of sequencing data at low costs [7]. Whole genome sequencing and downstream data handling remains cost and labor intensive, limiting its use in routine mutation detection [8]. Targeted capture of the about 30 Mb of protein-coding regions in the human genome, the socalled exome, reduced the sequencing and data handling effort by a factor of 100 and allowed the identi�cation of mutations in unrelated patients affected by the same syndrome [9]. Exome sequencing has since been widely used as a tool for Mendelian disease gene discovery [10,11]. Initially array-based, targeted sequence capture has become easy-to-use, thanks to the development of in-solution capture methods [12]. Finally, benchtop high-throughput sequencers made exome sequencing available to small-size diagnostic laboratories [13].
ese technological advances prompted us to develop a custom designed in solution-based targeted capture assay, called IROme, for the detection of mutations located in the exons, including complete 3 ′ -untranslated regions (UTR), intron-exon boundaries and potential promoter, and 5 ′ -UTR regions of 63 genes on a 454 GS Junior sequencing platform.

Patients and DNA Samples
. ese studies were approved by the Swiss Federal Department of Health (authorization number 035.0003-48) and followed the principles of the Declaration of Helsinki. e 23 patients analyzed in this study were of Swiss, Algerian, and Tunisian origin. Blood samples were collected aer informed consent. Genomic DNA was extracted from peripheral blood using a Nucleon BACC2 genomic DNA extraction kit (GE Healthcare, Glattbrugg, Switzerland). Four patients had been previously analyzed at Asper Biotech for known RP-linked mutations by APEX technology [5].

Design of Solution-Based Capture Assay for Retinitis
Pigmentosa-Linked Genes. Exons of targeted genes were identi�ed in the reference human genome version hg19 (http://www.ensembl.org/) ( Table 1). For each exon 50 bp were added in both 5 ′ and 3 ′ of the exon, including the complete 3 ′ UTR for each gene. Potential alternative transcripts were also considered in the design. To include potential proximal promoters, an additional 1000 bp in 5 ′ of the �rst exon of each gene, containing the complete 5 ′ -UTR, were added. e resulting custom-designed SeqCap EZ Choice library (NimbleGen, Roche) was called IROme, version 1.

GS Junior
Sequencing. e work�ow for GS Junior sequencing is summarized in Figure 1. DNA concentrations were measured on a NanoDrop spectrophotometer (ermo Fisher Scienti�c, Wilmington, DE). 500 ng of gDNA were fragmented by nebulization, and size selected by Agencourt AMPure XP beads (Beckman-Coulter, Beverly, MA) to obtain fragments between 500 and 1200 bp. Adaptors provided in the GS Titanium Rapid Library Preparation Kit (Roche, Basel, Switzerland) were ligated to the fragmented DNA and then quanti�ed by �uorometry (�uantiFluor, Promega, Madison, WI). is library was ampli�ed by ligation-mediated (LM)-PCR using speci�c 454 primers. en, 1 g of the PCR ampli�cation product was dried down with COT-DNA (Roche) and 454-Hybridization Enhancing Primer in a Speedvac. e pellet was resuspended in Nim-bleGen's hybridization buffer and hybridized to the customdesigned SeqCap EZ Choice library (NimbleGen, Roche), called IROme v1, for 70 h at 47 ∘ C in a thermocycler. e captured DNA was bound to Streptavidin M-270 Beads (Invitrogen Dynal, Oslo, Norway) for 45 min at 47 ∘ C and, using a magnet support, washed with the 4 different Nim-bleGen buffers provided according to the manufacturer's instructions. e captured DNA-Beads were ampli�ed by LM-PCR using the same speci�c 454 primers as before. Captured and noncaptured DNA was subjected to quantitative PCR on a Lightcycler480II (Roche, Basel, Switzerland) to measure the relative fold enrichment of the targeted sequences. Postcapture samples with an enrichment higher than 200-fold were further processed. According to the 454 GS Junior protocol (Roche), an emulsion PCR was done on 2 molecules per beads. Aer PCR, the beads were collected, washed, and bound to the Enrichment Beads. e enriched   located on chromosome X, CNGA2, was included because of its homology to CNGA1. e total of targeted regions spans 394 ′ 758 bp. Of note, aer the design of IROme was completed, TTC8 (BBS8/RP51), C8ORF37, and MAK were linked to RP, and KCNJ13 and NMNAT1 to LCA. ese latter genes, as well as IDH3B and RD3, will be included in a future version of IROme.

Results and Discussion
Patients 1-4 had previously been investigated by APEX technology for known RP-linked mutations [5]. All nucleotides tested by APEX were correctly detected by IROme, with a 98.9% accuracy of the sequence reads for nucleotides at a homozygous state ( Table 2). A p.USH2A-V2562A mutation had been detected by APEX in patient 2 in a heterozygous state, and this was correctly validated by IROme (46.8% of the sequence reads at 47-fold coverage).
As an additional control, the IROme assay was tested on genomic DNA of a previously described family of Algerian origin, affected by LCA or early onset retinal degeneration [14]. e causative 6-base in-frame duplication c. TULP1-1593_1598dupTTCGCC was readily detected in exon 15 (  [13]. On average per patient, 1 ′ 111.7 ± 222.2 sequence variants were found (range: 736-1 ′ 826 ). Among these, 90.1±10.0 were located in coding sequences, and a further 42.1 ± 4.7 were changing the amino acid sequence. By considering all patients, the median coverage was 17-fold, with a maximal 112-fold coverage in one exon of patient 16 (Figure 3). No coverage was observed for four exons (0.3%): exons 1 of RP9, IMPDH1, and LPCAT1 and an alternative exon 2 of CNGA2. ese exons contained GC-rich and/or repetitive sequences impeding efficient probe design and targeting [15]. Another 15 exons were not covered in all patients (1.6%). Because these exons were not restricted to the 5 ′ regions, absence of coverage was attributed to technical limitations or, as observed for patient 9, to a deletion (see below).
For patients 20 and 21, two potential heterozygote mutations had been detected at 22.6% (53-fold coverage) and 21.3% (61-fold coverage), respectively. However, these two sequence variants could not be validated by Sanger sequencing. For further patient analyses, a more stringent threshold up to 35% of sequence reads might be used for prioritization of sequence variants. Alternatively, a dynamic threshold could be implemented, starting at a high stringency and going down until one or two mutations are identi�ed.
In conclusion, the design of IROme resulted in an over 98% coverage of the targeted exons. e variant detection work�ow could be improved by further increasing the quality of the sequencing data, that is, by using a benchtop sequencer less prone to homopolymer-associated insertion/deletion errors (e.g., MiSeq, Illumina) [13] and high-�delity DNA polymerases [16].

IROme: Molecular Diagnosis on RP Patients.
IROme analysis yielded in de�nite diagnosis for 55% of the RP patients, that is, 12 out of 23 patients ( Patients 4,5,8,9,10,11,12,13,16,17,19,and 23). is was in line with the approximately 60% success rate reported for exome F 3: Fold coverage of targeted sequences. For each patient the unique depth data provided by column 5 of the 454_AlignmentInfo.tsv �le was used to estimate the coverage per targeted bp. e onefold coverage data corresponding to reference genome sequences used for alignment purposes, but not targeted by IROme, were removed. e coverage data is represented as cumulative percentage; that is, indicating what percentage of targeted bp has a minimal coverage of -fold ( axis represents the fold coverage). e average coverage for all patients is represented as a black dashed line, and the median coverage for all patients is 17-fold. capture strategies to identify Mendelian disease genes [4], but represented a 5-fold increase in mutation detection as compared to the APEX assay [6]. A solution-based targeted exon capture assay similar to IROme had also identi�ed disease-causing mutations in 11 out of 17 families affected by various retinal degenerations (65%) [17]. In contrast, in a cohort of 100 RP patients, array-based targeted exon capture resulted in the identi�cation of pathogenic mutations in 36 individuals (36%) [15]. Amplicon-based approaches identi�ed potential mutations in 24% of patients affected by retinal degenerations (5/21) [18], in 79% of ADRP patients (15/19) [19], and 24% of LCA patients (4/17) [20].
In addition to the control (patient 5), only the p.PROM1-R373C mutation identi�ed in patient 10 had been previously described [21], further underscoring the importance of screening RP-linked genes for the presence of new mutations.
e work�ow for variant detection was not immediately successful for two patients. For patient 9, a deletion of exons 45-47 in ABCA4 was only found by analyzing the coverage data. For patient 16, the 33 bp insertion in PRPF31 was detected by Sequence Pilot, but not Reference Mapper soware.
Potential mutations were found in three patients (13%). Patient 1 inherited from her healthy mother a heterozygous p.C2ORF71-R571delRTVVPP mutation and from her healthy father a heterozygous p.FSCN2-P231S mutation. Digenic RP has been linked so far to heterozygous PRPH2 and ROM1 mutations [2], and further analyses will be necessary to validate this molecular diagnostic. Patient 2 and 20 had, respectively, two and one potential mutation, but no family members were available to con�rm the result.
Results were questionable for two additional patients. Patient 6 carried a p.RHO-R252P mutation that had been previously reported [22]. However, unaffected family members were not available to con�rm this dominant mutation. Also, a heterozygous p.CRX-Q105X sequence variant was detected in patient 14, but his healthy mother was also carrying it.
Finally, no molecular diagnostic could be established for six patients (26%): in patients 18 and 21 no potential mutations were found by IROme analysis, in patients 7 and 15 the potential mutation did not segregate with disease in the family, and in patients 3 and 22 heterozygous mutations were found in genes only reported for recessive inheritance (CLRN1, EYS). For each patient, the total number of Mb (10 6 bp) sequenced on the Roche 454 GS Junior (total seq Mb) and the average read length (read length bp) are indicated. e median fold coverage (cvg) was extracted from the unique depth information. From all the sequence variants (total seq var), �rst only the sequence variants located in coding sequences were analyzed (cds seq var), with �ltering (�lt seq var) and prioritizing (prio seq var) according to Figure 2. e sequence variants eventually tested and validated by Sanger sequencing (test/val seq var) are also indicated. For each potential mutation, the coverage (cvg pot mut) and the percentage of sequence reads reporting the potential mutation (mut reads %) are indicated. For cosegregation analysis, "?" indicates absence of available family members and/or simplex cases. For patients 1 and 14, the mother (M) and/or the father (F) are healthy heterozygous carriers (het norm).
Of note, all these patients carry novel sequence variants in noncoding regions. To prioritize for potential disease-causing sequence variants in these regions, systematic annotation should not only cover splicing sites, 5 ′ -and 3 ′ -UTRs, but also implement detailed information about transcription factor binding sites and regulatory elements located in the potential proximal promoter regions. Promoter sequence variants could then be tested by reporter transactivation assays (e.g., luciferase reporter assays), but this time-consuming approach cannot be implemented in a routine molecular diagnostic lab.

Conclusions
e custom designed in solution-based targeted exon capture assay IROme efficiently detected disease-causing mutations in 55% of RP patients (12/23). A 99.7% coverage of the targeted regions was obtained. e �rst translated exon o�en contains sequences with a high GC content in its 5 ′ -UTR that hinders an efficient capture [23]. Remarkably, more than 95% of exons 1 (60/63) were successfully enriched by IROme.In comparison, a pilot study carried out in our laboratory on 25 patients using whole exome sequencing (SureSelect, Agilent) resulted in no coverage of promoter regions, highly variable coverage of 3 ′ -UTRs, and several genes had their �rst translated exon very poorly covered. For instance, the �rst exons of the following RP-linked genes could not be correctly analyzed: C2ORF71, CA4, CABP4, CERKL, CNGA1, FAM161A, FSCN2, GUCY2D, IMPDH1, LPCAT1, MERTK, RDH12, RP9, and RPGR (D. F. Schorderet, unpublished results). It is tempting to speculate that the additional sequences upstream of exon 1 included in IROme further enhanced the performance of the NimbleGen exome capture technology, that reportedly has more speci�c targeting and a higher percentage of on-target reads than competing products [23,24]. However, because the costs for whole exome sequencing have dramatically decreased to about 1000 $ per patient, this method may in the future replace target enrichment and resequencing, providing that a new line of "whole exome" kits covering effectively all exons, including the �rst one, of all genes, will become commercially available [24].
Meanwhile, custom-designed target enrichment and subsequent next-generation sequencing are a cost-efficient approach for the molecular diagnosis of retinal dystrophies, also with respect to the relative ease of data handling and analysis [25]. Finally, the median global coverage of 17fold observed with the IROme assay also indicated the possibility to include additional retinal degeneration-linked genes, newly discovered ones or candidate genes.