Unravelling the Complexity of Inherited Retinal Dystrophies Molecular Testing: Added Value of Targeted Next-Generation Sequencing

To assess the clinical utility of targeted Next-Generation Sequencing (NGS) for the diagnosis of Inherited Retinal Dystrophies (IRDs), a total of 109 subjects were enrolled in the study, including 88 IRD affected probands and 21 healthy relatives. Clinical diagnoses included Retinitis Pigmentosa (RP), Leber Congenital Amaurosis (LCA), Stargardt Disease (STGD), Best Macular Dystrophy (BMD), Usher Syndrome (USH), and other IRDs with undefined clinical diagnosis. Participants underwent a complete ophthalmologic examination followed by genetic counseling. A custom AmpliSeq™ panel of 72 IRD-related genes was designed for the analysis and tested using Ion semiconductor Next-Generation Sequencing (NGS). Potential disease-causing mutations were identified in 59.1% of probands, comprising mutations in 16 genes. The highest diagnostic yields were achieved for BMD, LCA, USH, and STGD patients, whereas RP confirmed its high genetic heterogeneity. Causative mutations were identified in 17.6% of probands with undefined diagnosis. Revision of the initial diagnosis was performed for 9.6% of genetically diagnosed patients. This study demonstrates that NGS represents a comprehensive cost-effective approach for IRDs molecular diagnosis. The identification of the genetic alterations underlying the phenotype enabled the clinicians to achieve a more accurate diagnosis. The results emphasize the importance of molecular diagnosis coupled with clinic information to unravel the extensive phenotypic heterogeneity of these diseases.


Introduction
Inherited Retinal Dystrophies (IRDs) are a heterogeneous group of eye disorders characterized by rod and/or cone photoreceptor cells degeneration, which include Retinitis Pigmentosa (RP), Leber Congenital Amaurosis (LCA), Stargardt Disease (STGD), Best Macular Dystrophy (BMD), and syndromic forms such as Usher Syndrome (USH). The overall prevalence of these disorders is ∼1 in 4,000 individuals for RP, ∼1 in 90,000 individuals for LCA and USH, ∼1 in 5,000-10,000 individuals for STGD, and 1/5000-1/67000 for BMD (http://www.orpha.net). Classification of IRDs considers the principal site of retinal dysfunction (rod, cone, retinal pigment epithelium, or inner retina), the mode of inheritance, the underlying gene defect, typical age of onset, rate of progression, and association with systemic syndromes. The genetic bases of IRDs are highly heterogeneous, with almost 150 genes currently known [RetNet, https://sph.uth.edu/retnet/] and a wide clinical and genetic overlap among the different disorders, with high phenotypic variability and genes associated with more than one phenotype. The inheritance of these diseases is also complex, with autosomal dominant (AD), autosomal recessive (AR), X-linked (XL), and even digenic patterns [1]. The extensive clinical and genetic heterogeneity in IRD, along with the variable age of onset, the incomplete penetrance, and unclear inheritance, hamper clinical diagnosis.
Recently, Next-Generation Sequencing (NGS) has been used for the genetic diagnosis of retinal diseases [2][3][4][5][6] and has been reported as a cost-effective approach [7,8] with a wide range of reported mutation detection rates related to differences in number of genes analyzed, NGS platform, and cohort size but above all composition of the study case phenotypes. We therefore present a multidisciplinary approach coupled with a comprehensive NGS amplicon-based strategy to explore IRD genetic complexity and evaluate genotypephenotype correlations.

Patients and Methods
This study was approved by the ethics committee (Comitato Etico di Modena, Modena, Italy). The procedures followed were in accordance with the Helsinki Declaration of 1975, as revised in 2000, and samples were obtained after patients had provided written informed consent.
A total of 109 samples were collected, including 88 IRDs affected probands with unknown molecular diagnosis and 21 healthy family members (Table 1) an anomalous distribution of NGS reads attributable to amplification problems due to the insertion itself located at the end of the target region) and to sequence RPGR ORF15 partially uncovered by the NGS panel. Primers for PCR and sequencing are shown in Supplementary Table 3. The following conditions were used: a 50 L PCR reaction containing 100 ng of DNA, 100 pmol of forward and reverse primers, 5 L of buffer, and 0.5 L of Taq Expand High Fidelity6 DNA Polymerase (Roche). PCR amplification (see Supplementary Table 3) was performed using a Gene Amp PCR System 9700 (Applied Biosystems, California, USA). The resultant amplicons were purified using High Pure PCR Product Purification Kit (Roche). Additional primers for RPGR sequencing were used. The sequencing reactions were performed with BigDye Terminator v1.0 (Life Technologies) and run on ABI PRISM5 3130XL Genetic Analyzer (Life Technologies). Due to sequence composition and technical difficulties, part of RPGR ORF15 (∼250 bp, chrX: 38145343-38145593) could not be accurately sequenced with Sanger sequencing.

Data Analysis.
Samples were processed using the Ion Torrent Suite6 (TS) Software for raw data processing and sequence alignment to the human genome reference sequence hg19. The TS Variant Caller was used for the detection of germline variants that were subsequently analyzed using the following optimized filtering and annotation pipeline. Annovar [9] and Variant Effect Predictor (VEP) [10] were used to functionally annotate the detected variants, retrieving RefSeq gene annotation, dbSNP rs identifiers, ClinVar accession, and allele frequency observed in the population (1000-Genome Project, NHLBI GO Exome Sequencing Project ESP6500SI-V2, Exome and Aggregation Consortium ExAC 0.3). Variants with low coverage or low frequency (<30 reads or <30%, resp.) were filtered out. The synonymous variants and variants having an allele frequency greater than 1% reported in the population were discarded as well. In addition, an internal database, built with all variants present in our cohort of processed samples, allowed recognizing and classifying as polymorphisms variants not listed in public databases. Variants were further annotated with conservation scores and functional predictions listed in dbNSFP [11][12][13], a database which compiles scores from various prediction algorithms, among which are SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM. Retina International (http://www.retina-interna--tional.org/), RPGR database (http://rpgr.hgu.mrc.ac.uk/ index.php?select db=RPGR), CEP290base (http:// cep290base.cmgg.be/), and BEST1 LOVD database (http:// www-huge.uni-regensburg.de/BEST1 database) were used to explore additional annotations and literature information, if present. Splice-altering predictions were obtained using the online tools Human Splicing Finder (HSF 3.0) [14] and NNSPLICE 0.9 [15] and the databases dbscSNV [16] and SPIDEX [17], which provide predicted effects for all of the potential variants within splicing consensus regions or across the entire genome, respectively. For the prioritization of pathogenetic mutations, the evaluation of inheritance mode was taken into account, along with segregation information coming from the sequencing of healthy family members, if available.
NGS procedure and data analysis were tested on the four control samples with known molecular diagnosis as proof of concept. In all cases the previously identified variants were correctly detected and prioritized as pathogenic variants.

Results
A cohort of 109 samples (Table 1), including 88 IRDs affected probands without molecular diagnosis and 21 unaffected family members, was analyzed by the newly developed system based on NGS and data analysis. A total of 19 sequencing runs were performed (6 samples/Ion Chip 318), obtaining on average a mean coverage of 450 mapped reads, with 92% mean uniformity and 97.6% (SD ± 1.4) of target regions covered at least 30x (96.2% > 50x). For each sample, 242 raw variants were detected on average. Annotation and filtering procedure resulted in the identification of possibly causative mutations in 59.1% of patients ( = 52/88) ( Table 2, Figure 1). The majority of the obtained molecular diagnoses were consistent with the subject's clinical presentation and family history.
We found pathogenic mutations in 16 genes, with the most recurrent being ABCA4 for STGD and USH2A for RP/USH patients. The majority of the mutated genes were inherited with an AR pattern (78.9%), followed in order by AD (11.5%) and XL (9.6%) inheritance. The majority of cases displaying recessive inheritance were compound heterozygous of two different pathogenic variants, in line with the low frequency of consanguineous marriages in Italy Identified candidate pathogenic mutations are shown in Table 3. Overall, 63 different mutations were identified: 62.5% of variants were already reported in previous studies, while 37.5% were novel. Among the list of novel variants, 56% were missense predicted to have deleterious protein functional effect by the prediction algorithms described in the Patients and Methods (predicted to be damaging by at least three of the applied algorithms), and 44% were frameshift, nonsense, or splice-site mutations that might severely affect protein function. Notably, 12% of identified variants were located within splicing consensus regions, and additional 12% were exonic variants predicted to alter splicing through enhancer/silencer motif modification or the creation of new potential donor/acceptor sites. Table 2 summarizes the mutation detection rates obtained for the different clinical subtypes of our study cohort. The highest diagnostic yields were achieved for BMD, LCA, USH, and STGD patients with well-defined clinical diagnosis, where the number of known genes associated with each disease is relatively limited.
For BMD cases, all diagnosed patients were heterozygous for mutations on BEST1. Three patients (mother and son) were found to harbour a novel BEST1 missense mutation c.80G>C (p.Ser27Thr) located in the immediate N-terminus, in one of the four mutational hotspots regions in the highly conserved N-terminal half of the protein [18] and predicted to be deleterious by all interrogated algorithms.        For STGD patients, genetic diagnosis was achieved in 11 out of 14 (78.5% of the cases). All diagnosed patients in our cohort carried mutations on ABCA4. In 75% of the unsolved cases at least one ABCA4 pathogenic allele was identified, suggesting the presence of disease-causing mutations lying outside the coding sequence covered by our panel, as reported in a previous study [19].
In LCA patients, causative mutations were identified in CEP290, RPE65, RPGRIP1, and CRX genes, and only one case remained unsolved (20% of the total LCA cases), whereas all Usher 2 syndrome cases were found to carry mutations in USH2A gene.
For RP patients, genetic diagnosis was achieved in 27 out of 45 (60% of the cases), involving mutations in 11 different genes: confirming that these phenotypes are genetically heterogeneous ( Figure 1). Dominant mutations were identified in RHO gene, whereas USH2A, CNGB1, and TULP1 were the most recurrently mutated genes in ARRP. X-linked inheritance was established for 5 RP male patients (4 probands had mutations in RPGR, whereas one had a mutation in RP2). The identification of USH2A as the defective gene in patients with initial clinical diagnosis of RP was followed by audiometric testing to establish if there were any hearing deficiencies. A hearing impairment was found in 2 cases out of 5 leading to clinical reassessment and final diagnosis of USH (Table 2).
For patients with IRD without a defined clinical diagnosis or with unclear disease manifestations, we identified causative mutations in 7 out of 17 probands (23.5% of the total IRD cases). In two cases the molecular results allowed a refined clinical diagnosis: a compound heterozygosity of two mutations in CEP290 led to a genetic diagnosis of LCA in a patient with initial diagnosis of North Carolina or Stargardt macular dystrophy, whereas a homozygous pathogenic variant in ABCA4 was found in a patient with tapetoretinal degeneration.
In 36 patients (12 familiar and 24 sporadic) the molecular analysis did not achieve any definitive result, even after the analysis of the healthy family members, which was performed in 8 cases. Half of the cases with a negative test result (18 out of 36) were affected by RP. The additional analysis of the RPGR ORF15 (a mutational hotspot which was nonsufficiently covered in our panel) for the male patients with a sporadic or suspected X-linked pattern of inheritance (10 patients) by Sanger sequencing yielded no additional mutations.

Discussion
The results of the present study confirm that high-throughput Next-Generation Sequencing represents a comprehensive cost-effective approach for the molecular diagnosis of Inherited Retinal Dystrophies (IRDs), achieving a molecular diagnosis for 59.1% of the studied cases. More specifically, among the different clinical phenotypes, the highest detection rates were achieved for BMD, LCA, USH, and STGD patients, in whom the genetic test clearly confirmed the clinical diagnoses ( Table 2). The results of the RP and of the not defined IRD cohorts, instead, demonstrated the high genetic heterogeneity of this diseases and the essential contribution of our NGS analysis to achieving an accurate diagnosis, with the involvement of 12 different genes in 28 sporadic cases. Revision of the initial diagnosis, performed for 9.6% of the genetically diagnosed patients, further emphasizes the importance of a comprehensive genotype/phenotype analysis to unravel the extensive heterogeneity of these diseases. Notably, a remarkable fraction of identified variants are splice-altering mutations (25% of the total mutation burden, 16 out of 64), located within splicing consensus regions, or exonic variants predicted to cause enhancer/silencer motif modification or the creation of new potential donor/acceptor, which are amenable to the antisense-mediated splicing-correction approaches, as recently reported for several genetic diseases, including CEP290-caused LCA [20,21].
The prevalence of IRD and most importantly the frequency of gene mutations causing those diseases are not well characterized in Italy and only few data have been reported [22][23][24]. RPE65, CRB1, and GUCY2D were identified as the most prevalent mutated genes in Italian LCA patients [22] and RHO was reported to be the gene most commonly responsible for ADRP [23] and EYS the most recurrent for nonsyndromic ARRP and sporadic cases [24]. Our study contributes only partially to the knowledge of the gene mutation frequencies, since each IRD type is represented by small cohorts of cases (i.e., the LCA and dominant RP phenotypes were accounted for by 5 and 6 cases, resp.), and some probands of other ethnicities have been included too. Indeed, regarding LCA, we identified mutations in CEP290, RPE65, CRX, and RPGRIP1 genes.
For ADRP, RHO was identified to be responsible for the phenotype in one case, whereas, in ARRP and sporadic RP, USH2A, CNGB1, and TULP1 were the most recurrently mutated genes. RPE65 mutations were found in two ARRP cases: in one more case, still unsolved, a single RPE65 heterozygous pathogenic variant was found. ROM1 compound heterozygosity was established in one RP proband, suggesting a mechanism of recessive inheritance for this gene associated with dominant and digenic forms. X-linked inheritance was established for 5 RP affected probands, with RPGR and RP2 identified as the disease-causing gene in 4 cases and 1 case, respectively. All BMD diagnosed patients were heterozygous for mutations on BEST1 gene, the major gene responsible for Best's juvenile form [25], whereas the 78.5% of patients with clinically diagnosed STGD carried pathogenic variants on ABCA4 [26].
Similarly to a recent study [6], the clinical sensitivity of our NGS analysis was not uniform, with the highest diagnostic yields obtained in conditions where the diseasecausing genes have been nearly all identified.
Direct comparison of our findings with other recently published NGS studies [2][3][4][5][6]27] is not straightforward, due to differences in the number of genes analyzed but especially due to composition and relative representation of the different phenotypes in the patients cohorts. However, the finding of USH2A and ABCA4 as the most mutated genes for RP/USH and STGD patients is consistent with previous reports [27][28][29]. In our RP cohort, USH2A is followed by CNGB1 and RPGR. These two genes, already reported among the most frequently mutated genes in IRD patients [29], were not highly frequently altered in the Saudi population [6] or in a large cohort of Western European and South Asian individuals [27]. Also, we did not find any alteration in EYS, one of the top three genes contributing to IRD in other populations [28,29].
The different gene alterations identified in our LCA cohort (CEP290, RPE65, RPGRIP1, and CRX genes) were consistent with the different disease manifestations of the analyzed patients, in accordance with the specific clinical features described for each of the LCA-associated genes [30,31]. Less direct is the correlation between the genes involved and the phenotypic features in RP, due to the known contribution of environmental factors to late-childhood-and adult-onset-diseases.
Allelic heterogeneity, with different mutations in the same gene causing different phenotypes, is evident also in USH2Arelated retinal disease. Genotype-phenotype correlations observed in our cohort were in accordance with the allelic hierarchy proposed in a recent study [32], supporting the model that USH represents the null phenotype consequent upon severe USH2A defects, whereas milder mutations in at least one allele result in a pure retinal phenotype associated with normal auditory function.
IRD genetic heterogeneity, reflected in the identification of mutations in many genes with a considerable number of previously undescribed alterations, supported the conclusion that molecular diagnosis of these disorders should rely on massive parallel multigene sequencing. Nevertheless, for 36 probands, including 12 familiar cases and 24 unrelated probands, our NGS procedure did not result in the identification of a clear genetic cause of the disease. Some subjects may have mutations that cannot be detected by our ampliconbased approach, such as deep intronic mutations, copynumber variations, or large deletions. In the perspective of the design of a more complete new version of the panel, additional deep intronic regions reported in the literature as carrying disease-causing mutations [19,33,34] or a higher exon padding (5 bp in our design, up to 100 bp available in the current pipeline version of the Ion AmpliSeq Designer tool) could be implemented. Moreover, technical limitations, including the difficult amplification of RPGR ORF15, a mutational hotspot for X-linked RP, may have accounted for some of the missed diagnosis (our panel is presently covering only 30% of this critical exon), but the addition of the specific analysis by Sanger sequencing of the ORF15 of the RPGR gene in 10 males patients, with sporadic/Xlinked RP and previously testing negative for pathogenic mutations using our NGS panel, did not reveal any mutation in the analyzed region. Finally, as an improvement to further support the pathogenicity of novel mutations identified in probands, the analysis of both affected and unaffected family member should be performed, when possible.
In some of the patients who tested negative we however identified single potentially pathogenic heterozygous mutations in recessive genes or novel heterozygous missense variants in dominant genes with unknown significance, lacking the appropriate level of evidence to classify them as disease-causing or not in concordance with patients' clinical presentations or family data. The contribution of these variants in combination with deep intronic mutations or large deletions is suspected but could not be demonstrated with the present technique.
Database incompleteness further complicates variant interpretation. Two probands with BMD phenotype and BEST1 mutation were found to harbour also heterozygous mutation in RHO (c.578C>T, p.Thr193Met), which was predicted to be damaging and listed as associated with ADRP in a public database [http://www.retina-international.org/sci-news/databases/mutation-database] but in our cohort was carried also by healthy subject, reinforcing the need of a critical interpretation of the molecular findings in view of the phenotypic features of the patients with IRD until a more thorough knowledge of the frequency of the variants and a critical amount of data present in the public disease databases are reached.
In conclusion, by presenting profoundly different mutation rates varying according to the clinical diagnosis and by reporting 9.61% of cases of reassessment of the initial diagnosis on the basis of the results of the test, our study reinforces the need of a multidisciplinary work-up before and after the genetic testing, due to the implications of the results in terms of risk assessment for family members and inclusion in gene-based clinical trials.