Genome- and Exome-Wide Association Studies Revealed Candidate Genes Associated with DaTscan Imaging Features

Introduction Despite remarkable progress in identifying Parkinson's disease (PD) genetic risk loci, the genetic basis of PD remains largely unknown. With the help of the endophenotype approach and using data from dopamine transporter single-photon emission computerized tomography (DaTscan), we identified potentially involved genes in PD. Method We conducted an imaging genetic study by performing exome-wide association study (EWAS) and genome-wide association study (GWAS) on the specific binding ratio (SBR) of six DaTscan anatomical areas between 489 and 559 subjects of Parkinson's progression markers initiative (PPMI) cohort and 83,623 and 36,845 single-nucleotide polymorphisms (SNPs)/insertion-deletion mutations (INDELs). We also investigated the association of cerebrospinal fluid (CSF) protein concentration of our significant genes with PD progression using PPMI CSF proteome data. Results Among 83,623 SNPs/INDELs in EWAS, one SNP (rs201465075) on 1 q32.1 locus was significantly (P value = 4.03 × 10−7) associated with left caudate DaTscan SBR, and 33 SNPs were suggestive. Among 36,845 SNPs in GWAS, one SNP (rs12450112) on 17 p.12 locus was significantly (P value = 1.34 × 10−6) associated with right anterior putamen DaTscan SBR, and 39 SNPs were suggestive among which 8 SNPs were intergenic. We found that rs201465075 and rs12450112 are most likely related to IGFN1 and MAP2K4 genes. The protein level of MAP2K4 in the CSF was significantly associated with PD progression in the PPMI cohort; however, proteomic data were not available for the IGFN1 gene. Conclusion We have shown that particular variants of IGFN1 and MAP2K4 genes may be associated with PD. Since DaTscan imaging could be positive in other Parkinsonian syndromes, caution should be taken when interpreting our results. Future experimental studies are also needed to verify these findings.


Introduction
PD is a neurodegenerative disease afecting 8-18 subjects out of 100,000 individuals annually [1].It is characterized by tremor, dyskinesia, and rigidity [2].Te diagnosis of PD is based on clinical fndings, and patients usually experience a prodromal phase before the defnite diagnosis [3].Te prodromal phase mainly manifests with nonmotor symptoms such as constipation, hyposmia, REM-sleep behavior disorder, depression, anxiety, and cognitive impairment [3].Te pathological hallmark of PD is the aggregation of misfolded α-synuclein, also known as Lewy body, resulting in the loss of dopaminergic neurons in the substantia nigra [4].Te diagnosis of PD is not usually made until at least 30% of dopaminergic neurons and 50-60% of their axon terminals in substantia nigra are lost [5].
Te exact etiology of PD is not yet fully understood; however, a higher incidence of PD in monozygotic compared to dizygotic twins and in people with a family history of PD than in those without a family history of PD highlights the prominent role of genetics in PD [6,7].Genetic studies have revealed several monogenic causes of PD, such as SNCA, PRKN, LRRK2, PINK1, DJ-1, VPS35, and GBA [8].Recently, GWAS and EWAS have indicated the involvement of numerous genetic polymorphisms in PD [9].Despite all these fndings, the overall estimated heritability of PD is about 60%, and the known genetic variants do not account for all of it [10,11].
An alternative approach to better identify the susceptibility genes for complex traits is the endophenotype approach [12].An endophenotype refers to a biological or psychological feature of a disease believed to be in the causal chain between genetic backgrounds and diagnosable symptoms of the disease [13].Neuroimaging endophenotypes have been widely used in neurological disorders such as PD.Neuroimaging endophenotypes provide quantitative measures of the brain structure or function that index genetic liability for a neurological condition [14].In the substantia nigra of patients with PD, these endophenotypes can appear as reduced fractional anisotropy in difusion tensor imaging, increased echogenicity in transcranial sonography, and decreased signal in the dorsolateral segment in magnetic resonance imaging (MRI) [15].However, more specifc and sensitive neuroimaging tools are needed to assess brain changes in PD.
Dopamine transporter single-photon emission computerized tomography (DaTscan) is a relatively new imaging modality in PD.DaTscan utilizes iofupane I-123 injection to visualize the striatal dopamine transporters [16].Iofupane I-123 is derived from cocaine compounds and binds to dopamine active transporter (DAT), thereby providing a specifc binding ratio (SBR) [17].Due to its high sensitivity (84.4%) and specifcity (96.2%),DaTscan has been used to diagnose PD and diferentiate it from other neurodegenerative disorders in probable cases [18].It also plays a role in detecting the preclinical phase of PD, as a longitudinal study has shown that SBR decline begins before the manifestation of motor symptoms and continues to decrease during disease progression [19].
Given the information above, we hypothesized that the genetic association study combined with neuroimaging fndings from DaTscan as endophenotypes can reveal genetic basis of PD more accurately compared to case-control studies.Moreover, we aimed to conduct our genomic analyses on a mixed population (PD cases, prodromal patients, and healthy controls) because we believed that increasing the phenotypic variance increases the power of our study.Given the information above, we hypothesized that the underlying genetic basis of PD may be associated with neuroimaging fndings from DaTscan.Compared with noncoding variants, exomes' variants can better prioritize the putative causal genes in PD [20].As a result, we aimed to explore the possible associations between striatal DaTscan fndings and SNPs by conducting EWAS and GWAS on data from Parkinson's progression markers initiative (PPMI) cohort.

Study Design.
We used data from the PPMI cohort in our study.In brief, PPMI is a longitudinal, observational, and multicenter study.Te primary goal of the PPMI cohort was to assess clinical and neuroimaging features and genetic and biological markers across all stages of PD.Te PPMI 2.2.Participants.We used the ppmi_wes_645_co-hort_vcf.tar,PPMI_NEUROX_Nov11th2013.zip, DaTSca-n_Analysis.csv,Age_at_visit.csv,and Demographics.csvfles, which contained whole-exome genetic polymorphism data of 654 subjects, whole-genome SNP data of 619 subjects, six neuroimaging features from DaTscan in 2930 visits of participants, and participants' age and demographic information in their screening visits.We chose screening visits of PPMI cohort patients for our study; because based on inclusion and exclusion criteria of PPMI cohort, the DaTscan values of patients were least confounded by disease progression or anti-Parkinson's drugs.

Quality Control.
We performed diferent stages of quality control on whole-exome sequence data before performing EWAS.In summary, exome sequencing was performed on whole blood-derived DNA samples following the PPMI Research Biomarkers Laboratory Manual using Illumina Nextera Rapid Capture Expanded Exome kit in 2015.Nextera Expanded Exome targets 201,121 exons, UTRs, and miRNA and covers 95.3% of Refseq exome.More details about the exome data in the vcf fle can be found on the cohort website.Tere were data about 707,050 genetic polymorphisms, including SNPs and INDELs of 645 subjects.We frst converted the .vcffle to the PLINK binary fles with bed sufx using PLINK 1.9 software (https://www.coggenomics.org/plink/).Quality control of the exomic data was performed using PLINK 1.9 software.All SNPs of the sex chromosomes were excluded.SNPs with minor allele frequency (MAF) of less than 5%, missing genotype rate of less than 95%, and Hardy-Weinberg equilibrium (HWE) P value of less than 0.000001 were excluded.Participants with a missing genotype rate of less than 95% and a heterozygosity rate of more than three standard deviations from the mean value were excluded.In the next step, the exomic data satisfying the quality control were merged with the neuroimaging features in screening visits and demographic characteristics.
We used the NeuroX SNP data of the original cohort to perform GWAS.In summary, SNP genotyping was performed using the Illumina NeuroX array.Te NeuroX array is an Illumina Infnium iSelect HD custom genotyping array containing 267,607 Illumina standard content exomic variants and 24,706 custom variants designed for studying neurological diseases.Tere were data of 267,607 SNPs from 2 Parkinson's Disease 619 subjects.All quality control steps were performed similarly to the exomic data.Finally, the genomic data satisfying the quality control were merged with data from DaTscan.

EWAS and GWAS.
We used GCTA 1.94.1 software (https://yanglab.westlake.edu.cn/software/gcta/#Download) to perform EWAS and GWAS.We used a mixed linear model with the --mlma command.Mixed statistical models were used and corrected for the following confounding factors in genome-wide association analyses: genetic relatedness and population structure (ethnicity).Te details of the GCTA software and mixed model association methods have been described elsewhere [22,23].Te mathematical equation of this statistical model is as follows.y � a + bx + g + e, where y is the phenotype (DaTscan features in our study), a is the mean term, b is the additive efect (fxed efect) of the candidate SNP to be tested for association, and x is the SNP genotype indicator variable coded as 0, 1, or 2. g is the polygenic efect (random efect), i.e., the accumulated efect of all SNPs (as captured by the genetic relatedness matrix (GRM) calculated using all SNPs), and e is the residual.We used age and gender as covariates.Using the --mlma-nopreadj-covar option, covariates were ftted together with the SNP for the association test.For ease of computation, the genetic variance, var(g), was estimated based on the null model, i.e., y � a + g + e, and then fxed while measuring for the association between each SNP and the trait.Using the Bonferroni correction method, the suggestive and signifcant thresholds for Pvalues were calculated based on the number of SNPs/INDELs used in our study.After performing EWAS and GWAS on six neuroimaging endophenotypes, we computed the λ-statistic to evaluate the degree of genomic infation for adjusting population stratifcation.Tereafter, we highlighted the observed versus expected Pvalues in the Q-Q plot by the qqman R package in R software version 4.2.2.We also used the qqman R package for generating the GWAS Manhattan plot.

Variant Annotation and Functional Fine Mapping.
Te positional gene mapping for all suggestive and significant SNPs/INDELs was performed using the UCSC genome browser website (https://genome.ucsc.edu/)based on the version of GRCh37/hg19.Using the Genotype-Tissue Expression (GTEx) version 8 (V8) data GTEx website (https:// gtexportal.org/home/index.html),top genes with a signifcant eQTL P value related to the SNPs/INDELs in 3 tissues (caudate nucleus, putamen nucleus, and substantia nigra) were identifed for functional annotation.Finally, information about the association of identifed genes with PD or PD-related phenotypes was evaluated on the GWAS catalog website (https://www.ebi.ac.uk/gwas/) and the Genecards website (https://www.genecards.org/) to identify novel candidate genes potentially associated with PD.

Replication of Signifcant Genes with CSF Proteome Data.
In order to confrm the potential role of statistically signifcant genes identifed in EWAS and GWAS, we investigated whether the CSF concentration of proteins produced by these genes is associated with PD progression.We used ordinal regression with PD status as the response variable (control � 0, prodromal � 1, and PD � 2) and age and gender as covariates.P values less than 0.05 were considered statistically signifcant.

CSF Proteome Data.
Proteomics data from the CSF of patients with PD and healthy volunteers were measured using the SOMAscan platform.Te data were quality controlled by removing outlier samples, calibrators, bufer, and nonhuman SOMAmers.Te measured values were hybridization normalized, plate scaled, median normalized intraplate, and calibrated at SomaLogic's side, then log 2 transformed, median normalized interplates, and batch corrected at the plate level.More details about the quality control steps of proteomic data can be found on the PPMI website.
2.9.Results.In the exomic data, 557,019 SNPs/INDELs with MAF of less than 5%, 60,100 SNPs/INDELs with a missing genotype rate of less than 95%, 6,212 SNPs/ INDELs with Hardy-Weinberg Pvalue of less than 0.000001, and 96 SNPs/INDELs on sex chromosomes were excluded.In addition, 95 subjects with less than 95% missing genotype rates and 13 participants with a heterozygosity rate of more than three standard deviations from the mean were excluded.Finally, the genetic data of 537 participants and 83,623 exomic SNPs/INDELs passed all steps of quality control.Among 537 participants, data from DaTscan and demographic characteristics were available for 489 participants.Among the 489 participants, there were 124 healthy subjects, 317 patients with PD, and 48 subjects from the SWEDD cohort (patients with symptoms of PD and normal DaTscan).Te mean age of participants was 61.23 ± 10.08 years.Finally, EWAS for six neuroimaging endophenotypes was performed on 489 subjects (318 males and 171 females).Based on the Bonferroni correction method, the statistically signifcant and suggestive threshold in our EWAS were P value <5.97 × 10 In the NeuroX genotyping data, 217,443 SNPs with MAF of less than 5%, 12,461 SNPs with a missing genotype rate of less than 95%, 191 SNPs with Hardy-Weinberg Pvalue of less than 0.000001, and 667 SNPs on sex chromosomes were excluded.In addition, 13 subjects with less than 95% missing genotype rates and 2 participants with a heterozygosity rate of more than three standard deviations from the mean were excluded.Finally, the genetic data of 606 participants and 36,845 SNPs passed all steps of quality control.Among 606 participants, data from DaTscan and demographic characteristics were available for 559 participants.Among 559 participants, there were 148 healthy subjects, 359 patients with PD, and 52 subjects from the SWEDD cohort (patients with symptoms of PD and normal DaTscan).Te mean age of participants was 61.06 ± 10.34 years.Finally, GWAS for six neuroimaging endophenotypes was performed on 559 subjects (365 males and 194 females).Based on the Bonferroni correction method, the statistically signifcant and suggestive threshold in our GWAS were P value <1.35 × 10 −6 and 2.71 × 10 −4 <P value <1.35 × 10 −6 , respectively.Te λ-statistic indicating the degree of genomic infation was low in our six analyses.Manhattan plots and Q-Q plots of six endophenotypes are shown in Supplementary Material 3.
Among 83,623 SNPs/INDELs in EWAS, one SNP (rs201465075) reached the statistically signifcant threshold (P value � 4.03 × 10 −7 ) associated with left caudate DaTscan SBR and 33 SNPs were considered suggestive.Positional gene mapping identifed 30 candidate genes associated with the DaTscan features.Te results of all SNPS with their mapped genes are shown in Table 1.
Among 36,845 SNPs in GWAS, one SNP (rs12450112) reached the statistically signifcant threshold (P value � 1.34 × 10 −6 ) associated with right anterior putamen DaTscan SBR, and 39 SNPs were suggestive among which 8 SNPs were intergenic.To perform positional gene mapping for intergenic SNPs, two nearest genes on the left and right sides of the SNP were reported.Te complete results of GWAS are shown in Table 2.

2.10
. CSF Proteome Analysis.One of our 2 signifcant SNPs was mapped to the IGFN1 gene.Te other signifcant SNP was near the MAP2K4 gene.We hypothesized that the concentration of protein products of these 2 genes in CSF may be associated with PD progression.Only the CSF concentration of MAP2K4 protein was available in the PPMI CSF proteome dataset.Te CSF concentration of MAP2K4 and demographic information were extracted for 1,156 subjects.Te mean age of subjects was 61.64 ± 9.32 years.Tere were 637 males and 517 females.Tere were 185 healthy subjects, 354 patients in the prodromal phase of PD, and 617 patients with PD in our analysis.Te mean CSF concentration of MAP2K4 among healthy subjects, patients in the prodromal phase of PD, and patients with PD were 7.82 ± 0.49, 7.79 ± 0.53, and 7.95 ± 0.59, respectively.Te mean CSF concentration of MAP2K4 among all subjects was 7.88 ± 0.56.Tere was a statistically signifcant association between the concentration of MAP2K4 in CSF and PD progression across the PD spectrum (P value � 0.001).

Discussion
Using data from the PPMI cohort, we performed EWAS and GWAS and identifed two loci, 1q32.1 and 17p12, at IGFN1 gene and near MAP2K4 gene, with a signifcant association with striatum DaTscan SBR which may have implications in PD pathology.We also found several suggestive genes (Table 4) associated with DaTscan SBR.Furthermore, the CSF proteome analysis showed that increased CSF concentration of MAP2K4 is associated with PD progression.
Based on Te Human Protein Atlas, immunoglobulinlike and fbronectin type III domain containing 1 (IGFN1) is highly expressed in the muscular tissue and has low expression in the central nervous system (CNS) and basal ganglia [24].IGFN1 is essential for myoblast fusion and diferentiation [25].IGFN1 is upregulated during muscle denervation and interacts with eukaryotic translation elongation factor 1A (eEF1A), thereby downregulating protein synthesis during muscle denervation [26].eEF1A2 knockdown has been associated with impaired autophagy, mitochondrial dysfunction, α-synuclein deposition, and apoptosis in the 1-methyl-4-phenylpyridinium ion (MPP + )-induced cellular model of PD [76].Giri et al. indicated that loss of function in the IGFN1 gene is signifcantly associated with Parkinson's disease [27].Nikonova et al. compared diferentially expressed genes between kinase hyperactive G2019S transgenic mice and mice with knockout of leucine-rich repeat kinase 2 (LRRK2), a common genetic cause of both autosomal dominant familial and sporadic PD.Tey indicated that IGFN1 gene was one of the diferentially expressed genes between LRRK2 knockout mice and kinase hyperactive G2019S transgenic mice [77].Lavin et al. found that rehabilitative training enhances IGFN1 expression in the skeletal muscle of patients with PD; however, data are scarce about its status in the CNS [78].In our analysis, rs201465075 was signifcantly associated with left caudate SBR.Te rs201465075 is a missense variant in IGFN1 gene.It may alter the IGFN1 protein structure.Tere are no data about the efect of this SNP on gene expression of nearby genes.

4
Parkinson's Disease   Parkinson's Disease Te MAP2K4 gene, also known as MKK4, encodes mitogen-activated protein kinase 4, a member of the mitogen-activated protein kinase (MAPK) family [79].Tis family of proteins is an integration point for intracellular signaling pathways and has been implicated in various cellular processes such as proliferation, diferentiation, and transcription regulation [80].Te MAP2K4 gene has been previously implicated in the pathogenesis of Alzheimer's disease [81].Chen et al. showed that the G2019S mutation of LRRK2 enhanced its kinase activity and led to overphosphorylation of MAP2K4 [82].LRRK2-mediated overphosphorylation of MAP2K4 upregulated and activated proapoptotic factors such as Fas ligand, caspase-9, caspase-8, and caspase-3 in the dopaminergic neurons of substantia nigra in transgenic mice, resulting in apoptotic neuronal death [82].However, there were not any subjects with LRRK2 mutation in GWAS and EWAS samples.
It was also observed that MAP2K4 activated c-Jun Nterminal kinase (JNK)-mediated apoptosis in the MPP +induced cellular model of PD [83].Consistently, inhibition of JNK attenuated MAP2K4-mediated neuronal death [83,84].Interestingly, Shakespear et al. found that miR-200a-3p targets 3′-UTR of MAP2K4 gene/MKK4 mRNA and downregulates MAP2K4 mRNA and protein expressions, thereby preventing apoptosis of MPP + -treated SH-SY5Y cells [85].In our study, rs12450112 near to MAP2K4 gene was signifcantly associated with right anterior putamen SBR.Due to lack of data about the efect of this SNP on gene expression, we hypothesized that MAP2K4 is the potential causal gene involved in PD pathology.Interestingly, CSF proteome analysis revealed that CSF concentration of MAP2K4 increases across the PD spectrum.So, MAP2K4 gene may be a potential causal gene in PD pathology.However, there are other genes such as myocardin (MYOCD), zinc fnger protein 18 (ZNF18), dynein axonemal heavy chain 9 (DNAH9), and Rho GTPase activating protein 44 (ARHGAP44) near our signifcant SNP.Our signifcant SNP may also afect the expression level of these genes.Tus, further studies such as single cell-based integration of omic data or CRISPR-based experimental methods are needed to link our signifcant SNP to the target genes.
Our study has several limitations: Tere are no data regarding the efect of our two signifcant SNPs on gene expression or protein structure of related genes.Unfortunately, there is not any compelling evidence in various bioinformatics databases about the efect of our two signifcant SNPs on the expression level of nearby genes.No eQTLs were associated with our two signifcant SNPs in various external studies.Also, data on the protein level of IGFN1 do not exist in the PPMI cohort database.Also, DaTscan can be positive in various types of Parkinsonian syndromes.Terefore, the signifcant genes identifed in our analyses can be associated with other forms of Parkinsonian

GRM7 Involved in glutamatergic neurotransmission
Improving motor dysfunction in PD High [74,75] Parkinson's Disease syndromes rather than PD.Although using mixed population in analyses increased the power of our study for detecting signifcant SNPs, our approach has important limitations due to our nonhomogenous samples regarding various disease statuses of subjects.

Conclusion
Analyzing data from the PPMI cohort, EWAS and GWAS identifed two potential genes, IGFN1 and MAP2K4, with a signifcant association with DaTscan SBR and PD.Furthermore, we found several suggestive genes (Table 4) for PD by EWAS and GWAS.Moreover, the CSF proteome data showed that the CSF concentration of MAP2K4 is associated with PD progression.More sample sizes should be used for future studies.Te exact role of our signifcant SNPs and genes should be investigated in experimental and animal studies.
coated vesicle trafcking and α-synuclein overexpression Maybe, by regulating vesicle trafcking and α-synuclein expression High and WNT/catenin signaling pathway Inhibiting α-synuclein removal and contributing to receptor and ion channel α-Synuclein removal and prevention of neurodegeneration High[72,73]

Table 3 :
Top eQTLs in one of three PD-related tissues (substantia nigra, caudate nucleus, and putamen nucleus) with their positional and functional mapped genes.

Table 4 :
Some of the suggestive genes with their possible roles in Parkinson's disease.