Genetic Diagnosis of Charcot-Marie-Tooth Disease in a Population by Next-Generation Sequencing

Charcot-Marie-Tooth (CMT) disease is the most prevalent inherited neuropathy. Today more than 40 CMT genes have been identified. Diagnosing heterogeneous diseases by conventional Sanger sequencing is time consuming and expensive. Thus, more efficient and less costly methods are needed in clinical diagnostics. We included a population based sample of 81 CMT families. Gene mutations had previously been identified in 22 families; the remaining 59 families were analysed by next-generation sequencing. Thirty-two CMT genes and 19 genes causing other inherited neuropathies were included in a custom panel. Variants were classified into five pathogenicity classes by genotype-phenotype correlations and bioinformatics tools. Gene mutations, classified certainly or likely pathogenic, were identified in 37 (46%) of the 81 families. Point mutations in known CMT genes were identified in 21 families (26%), whereas four families (5%) had point mutations in other neuropathy genes, ARHGEF10, POLG, SETX, and SOD1. Eleven families (14%) carried the PMP22 duplication and one family carried a MPZ duplication (1%). Most mutations were identified not only in known CMT genes but also in other neuropathy genes, emphasising that genetic analysis should not be restricted to CMT genes only. Next-generation sequencing is a cost-effective tool in diagnosis of CMT improving diagnostic precision and time efficiency.


Introduction
Charcot-Marie-Tooth (CMT) is the most common inherited neuropathy, affecting 40 to 81 cases per 100,000 in the Norwegian general population [1,2]. CMT is clinically, neurophysiologically, and genetically heterogeneous. The clinical classification is based on age at onset, distribution of muscle weakness, sensory loss, walking difficulties, and foot deformities [3]. CMT is neurophysiologically subdivided into a demyelinating (CMT1) and axonal (CMT2) form depending on whether the median motor nerve conduction velocity (NCV) is below or above 38 m/s, respectively. A third form, intermediate CMT, has both demyelinating and axonal features and NCV between 25 and 45 m/s [2,3].
The mode of inheritance is autosomal dominant, autosomal recessive, or X-linked [3]. At present more than 40 CMT 2 BioMed Research International families with information about prognosis and recurrence risk, as well as future options for specific treatment [16,17].
Current strategy for diagnosing CMT is based on the clinical and neurophysiological phenotype. It is favourable to initially test CMT1 patients for the PMP22 duplication due to its high prevalence. Genes are thereafter traditionally tested sequentially by Sanger sequencing, but the low prevalence of specific CMT point-mutations renders sequential testing unfavourable due to time and cost. Furthermore, most diagnostic laboratories only have capacity for sequencing a few genes [3,7,12,17]. Hence, it is important to develop a more comprehensive approach for clinical diagnosis of heterogeneous disorders such as CMT, dystonia, hereditary spastic paraplegia (HSP), and Parkinson's disease [6]. Nextgeneration sequencing (NGS) makes it possible to sequence several genes in parallel and at a low cost compared to traditional methods.
We applied NGS on 59 CMT families from the Norwegian general population and sequenced 32 CMT genes along with 19 genes causing other inherited neuropathies, since the phenotypes of CMT, distal hereditary motor neuropathy (dHMN), and other inherited neuropathies overlap [3,4,6].

Study Population.
People with CMT residing in eastern Akershus County, Norway, January 1, 1995, were included in the study [2]. Akershus County has rural and urban areas and was inhabited by 297, 539 persons [18]. A total of 245 affected persons from 116 CMT families were identified. DNA was available in 81 CMT families, 189 affected individuals. The neurophysiology among the families was 38 CMT1, 33 CMT2, two intermediate CMT, and 8 families with an unknown neurophysiological phenotype. The families were previously tested for the PMP22 duplication by real-time quantitative PCR and point mutations in PMP22, GJB1, MPZ, LITAF, MFN2, and EGR2 by conventional Sanger sequencing [2]. Later, a duplication of MPZ was identified in one CMT family, and another CMT family had a point mutation in the SOD1 gene [14,19]. A mutation was identified in 22 CMT families. A more comprehensive description of the study population has been published previously [2].
This study applied NGS on 70 affected individuals from 59 CMT families without a genetic diagnosis; these were 22 CMT1 families, 29 CMT2 families, one intermediate CMT family, and seven families with unknown neurophysiological phenotype. A control group of 180 healthy individuals were included in order to detect polymorphisms present in ≥1% of the population [20]. The Norwegian Regional Ethical Committee for Medical and Health Research approved the project, and the participants gave written informed consent.

Targeted
Capture and DNA Sequencing. Table 1 shows the 51 neuropathy genes included in the panel [3,4,6]. Illumina's DesignStudio (Illumina Inc., San Diego, USA) for TruSeq Custom Enrichment was used to target all exons and flanking 5 and 3 UTR (untranslated region) sequences (default settings). In total, 909 oligonucleotide probes covering 256,248 bp (base pairs) were included. Genomic DNA was extracted from whole blood using standard techniques; DNA samples were prepared in multiplex according to standard TruSeq Sample Prep and Custom Enrichment protocols (Illumina), and 75 base pairs were sequenced in each direction (pairedend). Sequencing was performed on the Illumina HiScan SQ. Samples from affected and controls were run in two separate runs.

Sequence Analysis.
Bioinformatic analysis consisted of a standard protocol including image analysis and base calling by Illumina RTA 1.12.4.2, demultiplexing by CASAVA 1.8 (Illumina), and alignment of sequence reads to the reference genome GRCh37/hg19 by BWA [23]. Picard (http://picard.sourceforge.net/) was used for removing PCR duplicates. The GATK (Genome Analysis Toolkit) was applied for base quality score recalibration, INDEL (insertion and deletion) realignment, and SNP (single nucleotide polymorphism) and INDEL discovery [24,25]. Annotation of sequence variants was performed by Annovar [26]. Variants present in exons ±10 bp intron sequences and 3and 5 UTR (untranslated region) were included in further analysis.

Classification of Variants.
Variants were classified into five pathogenicity classes (Table 2). Variants were classified based on frequency data from 1000 gen-omes (http://www.1000genomes.org/), dbSNP 135 (http://www .ncbi.nlm.nih.gov/projects/SNP/), 180 in-house normal controls, pathogenicity predictions through the Alamut interface v2.2-0 (Interactive Biosoftware, Rouen, France), and reports in HGMD, IPNMDB, and the literature [4,6,27]. Variants with prevalence ≥ 0.1% in dbSNP 135 or 1000 genomes and presence in ≥ 2 in-house normal controls were removed unless homozygous or compound heterozygous. Variants with frequency < 0.1% were considered possible pathogenic as the SNP databases may contain information from individuals with disease, especially traits with debut during life such as CMT. Data from the ESP (the exome sequencing project) (http://evs.gs.washington.edu/EVS/) was also used in classification but only as a guidance as this database contains data from both the selected affected and controls for specific traits. Synonymous, intronic, and UTR variants not predicted to have an effect on splice site were also removed. The remaining variants were defined as the candidate variants. Variants classified likely or certainly pathogenic in recessive genes had to be present in a homozygous or compound heterozygous state. Classification into these classes also required phenotype-genotype correlation with previously published literature, and/or segregation of the variant(s) within the affected in the families, and/or the possible dual (digenic) effect of two variants in different genes. Identified variants classified certain, likely, and uncertain pathogenic were submitted to the ClinVar database (http://www.ncbi.nlm.nih.gov/clinvar/).  (1) Reported pathogenic in at least two unrelated cases (2) and/or functional studies reveal effect on protein structure/function (3) and zygosity/inheritance of phenotype fits the variant (4) and phenotype-genotype correlation with previously published literature 4 likely pathogenic (1) Reported pathogenic in one case (2) and/or predicted pathogenic in at least 2 of 4 variant prediction tools: SIFT [28], Polyphen [29], Align GVGD [30], and Mutation Taster [31] through the Alamut interface (3) and/or predicted loss or gain of splice site predicted in at least 4 of 5 splice site predictors: SpliceSiteFinder [32], MaxEntScan [33], NNSPLICE [34], GeneSplicer [35], and Human Splicing Finder [36] through the Alamut interface (4) and/or close proximity to known pathogenic mutations with similar or lower variant prediction score (5) and zygosity/inheritance of phenotype fits the variant (6) and phenotype-genotype correlation with previously published literature 3 uncertain pathogenic (1) Present in ≤0.1% of dbSNP135 or 1000 genomes (2) and/or present in ≤1 in-house control (3) and zygosity/inheritance of phenotype in family fits the variant (4) Variants in class 2 may be lifted to this class if present in several affected patients with similar phenotype 2 unlikely pathogenic (1) Present in 0.1-1% of dbSNP135 or 1000 genomes (2) and/or present in 2-3 in-house controls (3) and/or predicted no loss or gain of splice site predicted by 5/5 splice site predictors (applies only to synonymous variants and variants in introns and UTRs) (4) and/or reported benign in the literature 1 certainly not pathogenic (1) Present in ≥1% of dbSNP135 or 1000 genomes (2) and/or present in ≥4 in-house controls dbSNP = the single nucleotide polymorphism database.

Verification by Sanger Sequencing. Candidate variants
were verified by Sanger sequencing in all available family members to establish genotype-phenotype correlation. Primer design and sequence analysis were performed in CLC Main Workbench (CLC bio, Aarhus, Denmark); the sequencing was carried out using standard procedures and sequenced on the ABI3130XL (Life Technologies Ltd., Paisley, UK) as previously described [2].

Sequencing Performance Results, Variant Identification, and Verification.
Analysis of sequence data revealed uniform coverage and high read depths in all samples. On average among the affected patients, the percentage of nucleotides with at least 30x and 2x coverage was 97.73% and 98.73%, respectively, and the mean coverage depth was 516. On average, 202 variants were detected among the 51 investigated genes per patient. Table 1 shows sequence  capture performance results per gene and Table 3 shows sequence capture performance and variant identification results among the 70 affected analyzed by NGS. In the group defined as candidate variants, 63 nonsynonymous exonic variants, zero synonymous variants, and four nonexonic variants remained among the 70 patients. The candidate variants were sorted in the five classes: (5) certainly pathogenic-seven variants, (4) likely pathogenic-ten variants, (3) uncertain pathogenic-15 variants, (2) unlikely pathogenic-15 variants, and (1) certainly not pathogenic-20 variants. All candidate variants were verified by Sanger sequencing.

Prevalence of CMT Variants.
The distribution of variants among the CMT families is illustrated in Figure 1. Table 4 shows phenotype-genotype correlations for certain and likely pathogenic variants and Supplemental NGS identified seven certain, 10 likely, and 15 uncertain pathogenic variants in 24 CMT families. One family was compound heterozygote for likely pathogenic variants in POLG (family 62) and six CMT families had possible dual pathology, that is, mutations in two different genes. Family 252 had one certain variant and one uncertain variant in SH3TC2 and AARS, respectively. Family 95 had two likely pathogenic variants in REEP1 and SETX. Three families had one likely and one uncertain pathogenic variant, LMNA and DCTN1 in family 27, LM NA and ARHGEF10 in family 54, and DYNC1H1 and GAN in family 231. Family 11 had two uncertain pathogenic variants in SEPT9 and SETX. Eleven of the certain, likely, and uncertain pathogenic variants were novel. Two families had mutations in previously sequenced genes, GJB1 in family 5 and MFN2 in family 90. These were not detected in the previous study due to mix-up of DNA of an affected and unaffected and due to an unknown laboratory mistake.
Of the total 81 CMT families, 37 CMT families had certain or likely pathogenic variants in 16 different genes (Table 4). Twelve CMT families had a CNV (11 families had the PMP22 duplication and one family had a MPZ duplication) and 25 CMT families carried a point mutation. Figure 2 illustrates the gene frequencies among the CMT1 and CMT2 subgroups. Of the 38 CMT1 families, 55% (21/38) of the families had certain or likely identified genotypes; that is, 29% (11/38) had a CNV and 26% (10/38) had a point mutation. Thirtysix percent (12/33) of the CMT2 families had a certain or likely identified genotype. One of the two families with intermediate CMT had an identified genotype. Among the eight families with unknown neurophysiological phenotype, one family had PMP22 duplication and two families had point mutations. Four families had likely pathogenic variants in non-CMT genes, ARHGEF10, POLG, REEP1, SETX, and SOD1 [4,6]. Forty-one percent (11/27) of the sporadic case families had certain or likely identified genotypes; that is, three families had PMP22 duplication and eight families had point mutations.

Main
Findings. This is to our knowledge the first study to provide prevalence data for most of the currently known CMT genes in a population based sample by targeted NGS.
The main result of our study is as follows. After extracting CMT families with the PMP22 duplication, sequencing identifies certain and likely pathogenic point mutations in 36% (25/70) of the CMT families. The duplication of PMP22 is the most common cause of CMT, found in 14% (11/81) of our families, whereas one family had a duplication of the MPZ [2,14]. Large CNVs are not detected by our NGS-pipeline but require other methodologies, such as MLPA (multiplex ligation-dependent probe amplification). Thus, before NGS is applied, patients with CMT1 should be tested for the PMP22 duplication. Other CNVs are considered rare [48]. The known CMT genes accounted for the majority of our identified mutations supporting a correct clinical diagnosis. However, phenotypically certain CMT families had certain and likely pathogenic variants in the non-CMT neuropathy genes, that is, ARHGEF10, POLG, SETX, and SOD1, thus expanding the number of known CMT genes. This highlights the importance of including all neuropathy genes in the NGS panel due to the genetic heterogeneity and pleiotropic genes of inherited neuropathies.

Study Population.
Our material included 27 CMT families with only one affected; that is, the diagnostic certainty  of the phenotype might be less than in CMT families with several affected. However, it would be incorrect to exclude sporadic cases, as CMT may be caused by autosomal recessive inheritance, reduced penetrance, and de novo mutation as well as nonpaternity. Autosomal recessive CMT accounts for about 4% of all cases in Europe, while it is considerably more frequent in countries with a high rate of consanguinity [7].
De novo duplication of PMP22 may occur in about 10% of the patients [8]. We identified the PMP22 duplication in three and a point mutation in eight of the total 27 sporadic CMT families. Thus, CMT variants were identified almost equally frequent in the sporadic and nonsporadic CMT families, that is, PMP22 duplication 11% (3/27) versus 15% (8/54) and point mutations 30% (8/27) versus 35% (19/54), justifying the inclusion of the 27 sporadic CMT families in our material.

Methodological Considerations.
Technically, the NGS panel demonstrated excellent results for coverage, read depth, and robustness for all genes in all 250 patients and controls, except one control with poor DNA quality. Lowering the possibility of technical errors is important in a clinical setting. NGS has several diagnostic advantages in heterogeneous diseases; that is, all known genes can be effectively sequenced and interpreted simultaneously. Furthermore, Sanger sequencing does not detect dual pathology, as sequencing is usually finalized when the first pathogenic variant is identified. This may be the reason why the literature rarely reports dual pathology, except from an American study which identified dual pathology in 1.4% of the CMT patients [9]. We identified possible dual pathology in 10% (6/59) of our CMT families. Thus, CMT dual pathology may not be  [6]. Scoliosis at variable degree was found in all cases, which is often associated with mutations in this gene. Found as heterozygous in ten unaffected family members and in five in-house controls.
Likely pathogenic ARHGEF10 c.1013G>C p.Arg338Thr 257 CMT2 Novel variant, highly conserved, predicted benign but extensive change in amino acid physiochemical properties. Sporadic case with CMT2 and decreased NCV. Close proximity to another heterozygous variant (Thr332Ile) associated with decreased NCV and thin myelination [39]. Functional studies show that the Thr332Ile mutant stimulates increased actomyosin contraction, regulating cell morphology in Schwann cells [40]. Classified likely pathogenic due to similar phenotype, NCV in the same range, and close proximity to the previous reported variant.  CMT CMT2 Totally conserved, predicted pathogenic. The pathogenicity of this variant has been questioned due to extreme phenotypic diversity including neuropathy and also low penetrance in affected families. But in support of its pathogenicity, found in 19 patients and not in 1000 controls (including our results), totally conserved, and studies of fibroblast carrying this variant show abnormalities of nuclear shape [44]. Found in the ESP database among 14 individuals (0.1%), but included in this database are also the affected carrying traits associated with LMNA mutations. Digenic inheritance with another variant, which is observed in three reported cases, may explain the phenotypic diversity and nonpenetrance [44].  [45]. Additionally the p.Trp748Ser variant is reported to cause neurodegenerative disorders with ataxia in three patients and dHMN in five patients as compound heterozygote [22,46]. A patient in our clinic with simular phenotype presented with the same two variants, not seen in in-house controls or in SNP databases.   as rare as the earlier literature implies. The digenic effect of two variants in different genes may modulate the phenotype, depending on whether the gene products work in the same pathways or not. Another shortcoming with selective gene testing of a specific CMT phenotype is that unknown genotype-phenotype correlations are missed. An example is MFN2, usually tested only in CMT2 families; thus MFN2 mutations in CMT1 or intermediate CMT families are missed [49,50].

REEP1 SETX
The diagnostic benefit of NGS targeted sequencing has been highlighted in other heterogeneous diseases such as cardiomyopathies and epilepsy [51,52]. The technical quality of NGS targeted sequencing has previously been questioned in relation to clinical diagnostics, but increasing quality is now obtained of which two examples are the study on cardiomyopathies and ours indicating that NGS targeted sequencing is ready for clinical diagnosis [52].
Exome sequencing is another NGS approach, where every exon in the genome is sequenced. It has been frequently applied on rare Mendelian disorders as well as on some CMT patients [42,53]. Targeted sequencing as compared to exome sequencing shows higher technical performance, increased capacity per run (192 versus 12 samples in our laboratory), easier data analysis, lower cost of data storage, fewer problems with incidental findings, and lower cost (approximately C 175 (500x coverage) versus C 1165 (70x coverage) in our laboratory). Furthermore, it is easier to adopt in small laboratories. However, exome sequencing can discover new disease genes, while targeted sequencing only can if the gene panel is expanded. In families with unknown CMT genotype exome sequencing could be beneficial as a next step towards a genetic diagnose.
Precise classification of variants with exclusion of nonpathogenic and inclusion of pathogenic variants is extremely important in a clinical setting. Stringent criteria were applied, in order to avoid misclassification. The analysis of the 179 controls secured that ethnically specific normal variants were excluded. Detailed clinical data and family history were necessary for matching genetic data with the phenotype. A limitation with the interpretation of novel variants detected in this study is that no functional tests have been performed, but currently this is rarely available as part of routine genetic testing and beyond the scope of our present study.

Genotype-Phenotype Correlations.
Among the seven certain pathogenic variants, four families were homozygous for the SH3TC2 Arg954 * mutation, previously reported in several populations [6]. The prevalence of 5% (4/81) shows that SH3TC2 should not be considered an unusual CMT gene in Northern Europe.
Eight CMT families had variants classified likely pathogenic. Phenotypically certain CMT families had pathogenic variants in the non-CMT neuropathy genes. ARHGEF10 has been associated with slow NCV [39]. CMT2 and dHMN phenotypes have previously been reported for patients with POLG variants [21,22]. SETX variants are associated with dHMN among other phenotypes, but this patient had a neurophysiological CMT2 phenotype. In a previous study on the same material, the affected in a large CMT2 family carried a certain pathogenic variant in SOD1, a gene usually associated with ALS [19]. The identification of certain and likely pathogenic variants in non-CMT genes and in CMT genes, regarded unusual, is especially important in the CMT2 families, where an accurate molecular diagnosis often has been lacking. It could also be speculated whether these genes might be more common than first thought but has been considered unusual due to lack of routine analysis.
Among the 15 uncertain pathogenic variants, 9 families had heterozygous variants in GAN, MTMR2, and SH3TC2 genes usually associated with autosomal recessive inheritance but dominant inheritance has also been reported in a few cases often related to lighter phenotypes [6]. In our cases the variants were predicted pathogenic and the phenotype matched previous reports for these genes, except for one variant in MTMR2. Further analysis is required in order to establish whether the heterozygous state can cause a mild phenotype.
The identification of dual pathology is important to increase our knowledge of interplay between different gene variants. Dual pathology was observed in six of our CMT families. One sporadic case with CMT2 and spasticity had likely pathogenic variants in SETX and REEP1; we assume the SETX variant causes CMT2 and the REEP1 variant causes spasticity. Thus, in this case we do not consider REEP1 a CMT causing gene. The pathology of the LMNA variant observed in two CMT families has been questioned due to extreme phenotypic diversity and low penetrance in affected families [44]. It is speculated that the pathogenic effect of this variant might be due to the digenic inheritance with another variant [44]; intriguingly this was observed in both our cases with variants in ARHGEF10 and DCTN1. LMNA and ARHGEF10 are both involved in myelination and cell morphology [39,54]. LMNA and DCTN1 are situated in the same pathway, activation of the transcription factor XBP1, which has been associated with neuron differentiation [55].

Research in Context.
A comparison of our results with the prevalence of identified CMT point mutations in four large clinic populations of affected individuals from Japan, Spain, United Kingdom, and USA is shown in Figure 3 [9,10,12,13].
After exclusion of the PMP22 duplication, point mutations were detected in 36% of our, Japanese, and British CMT1 patients, 66% of the American CMT1 patients, and 79% of the Spanish CMT1 patients [9,10,12,13]. Point mutations in GJB1 or MPZ, the two genes most frequently mutated in CMT1,were identified in 44% of the American patients, 60% of the Spanish patients, and only 14-22% of the patients in the other three studies. We identified a higher percentage of pathogenic CMT2 variants than the British and Japanese studies but lower than the American and Spanish studies [9,10,12,13]. GJB1, MPZ, and MFN2 variants, the most common causes of CMT2, accounted for 12% in our study and 18, 20, 28, and 34% in the British, Japanese, Spanish, and American studies. Apart from the GJB1, MPZ, and MFN2 genes, variants were identified in 24% of our CMT2 families and in 35% of the Spanish patients but only in 1, 6, and 7% of the Japanese, American, and British CMT2 patients. High yield of identified "uncommon" CMT2 variants in our study and the Spanish study is most likely due to analysis of almost all CMT2 genes and the additional 19 neuropathy genes in our study. In the Spanish CMT2 patients, 26% had variants in GDAP1, also accounting for the high yield. In another study from northern Norway, CMT patients were analysed for the PMP22 duplication and point mutations in seven genes (EGR2, GJB1, LITAF, MPZ, MFN2, NEFL, and PMP22), a genetic diagnose was established in only 17% of the patients [11]. These results together with ours indicate that other genotypes might be more common in Norway. At least part of the gene frequency differences is likely due to geographical differences, while different ascertainment might also affect the results. This emphasizes the difficulties of having a common sequential testing scheme for a rationale diagnosis of CMT but rather highlights the benefits of NGS targeted analysis. Why do half of the CMT families lack a molecular diagnosis in our study? Several CMT genes are still to be identified, and there might be unidentified founder variants in the Norwegian population. After the analysis of these families, more than ten new genes have rapidly been identified mainly due to exome sequencing; these may count for a few unidentified cases [5,6,8]. Small tandem repeats, copynumber variations, mutations in regulatory elements distant from the gene, or cellular changes other than mutations in genomic DNA might be relevant in some cases. Dual pathology is easy to overlook. We also applied stringent criteria for the classification of variants; all heterozygous variants in autosomal recessive genes were classified as uncertain or unlikely pathogenic and variants with prevalence ≥ 0.1% in 1000 genomes or dbSNP135 were removed; thus some of these might be pathogenic. Clinical misclassification cannot be ruled out but it probably explains only a minority of the cases, since pathogenic variants were identified equally frequent in familial and sporadic cases.

Conclusion
Sequential testing scheme is useful for the PMP22 duplication as an initial first step in CMT1; otherwise it is advantageous to start with NGS targeted sequencing.
The insight of pathological mechanisms caused by mutations in CMT genes has prompted promising reports of specific targeted treatments. Examples are treatment with HDAC6 inhibitors in HSPB1 mutant mice, restoring axonal transport defects [16], and treatment with curcumin improving outcome of neuropathy in MPZ mutant mice [56]. Specific treatments require a precise genetic diagnose. The NGS technology has now become a robust and powerful tool with high technical quality, delivering increased diagnostic precision at a low cost. The NGS technology is likely to change clinical practice in complex diseases over the next years.