Parkinson's Disease in Relation to Pesticide Exposure and Nuclear Encoded Mitochondrial Complex I Gene Variants

Parkinson's disease (PD) is a common age-related neurodegenerative disorder thought to result from the integrated effects of genetic background and exposure to neuronal toxins. Certain individual nuclear-encoded mitochondrial complex I gene polymorphisms were found to be associated with ∼ 2-fold risk variation in an Australian case-control sample. We further characterized this sample of 306 cases and 321 controls to determine the mutual information contained in the 22 SNPs and, additionally, level of pesticide exposure: five distinct risk sets were identified using grade-of-membership analysis. Of these, one was robust to pesticide exposure (I), three were vulnerable (II, III, IV), and another (V) denoted low risk for unexposed persons. Risk for individual subjects varied > 16-fold according to level of membership in the vulnerable groups. We conclude that inherited variation in mitochondrial complex I genes and pesticide exposure together modulate risk for PD.

One testable hypothesis is that PD risk is modulated by inherited sequence variation in complex I genes. Mellick et al [1] addressed this possibility: they screened for single nucleotide polymorphisms (SNPs) in nuclear encoded complex I genes. A total of 22 SNPs (16 genes) polymorphic among Australians were investigated (306 PD patients; 321 control subjects). Statistically significant associations, and ∼ 2-fold variation in risk, were observed for NDUF genes A1, A10, A6, and S4, when taken individually. None of these associations would have survived correction for multiple comparisons. Although information on pesticide exposure, sex, and age at onset was available, it was not used in the association analysis due to the limited sample size and the larger number of multiple comparisons that would have been generated.
We extend this work by jointly investigating the 22 SNP genotypes found for these 306 PD patients and 321 control subjects, avoiding multiple comparisons, and considering level of pesticide exposure and age at onset. The goals were to identify combinations of alleles robust to pesticide exposure, others that are especially vulnerable, and to quantify risk for individuals. This was accomplished using gradeof-membership analysis or GoM.
Using GoM, two sets of parameters are simultaneously estimated by maximum likelihood (see "Methods"). One set represents a specified number of extreme pure type groups. Here, each of the five groups has distinct frequencies for the SNP genotypes, level of toxin exposure, and PD status according to age. The other set of parameters represents the degree of similarity of each subject to the groups. These graded membership scores range from zero, that is, denoting no resemblance to the group, to one, that is, the individual matches the group exactly, and sum to one for each person. The scores can be input into logistic models to quantify disease risk and produce 95% CI. This approach has identified highly predictive sufficient genetic risk sets for Alzheimer's disease [18] and multilocus genotypes specific to breast cancer and fibroadenoma [19].
Five model-based groups relevant to PD status were identified. Of these, one set of complex I polymorphisms was robust to pesticide exposure (I), three sets were vulnerable (II, III, IV), and another (V) denoted low risk for unexposed persons. Risk for individuals varied > 16-fold according to level of membership in the vulnerable groups. We conclude that inherited variation in mitochondrial complex I genes and pesticide exposure together modulate risk for PD.

METHODS
The specific details of the case-control sample and the genotyping methods used have been reported previously [1].

Environmental exposures
A structured questionnaire was used to probe for exposures to environmental toxins including insecticides, herbicides, fungicides, solvents, heavy metals [20]. However, this study limited exposure assessment to self-reported exposures to pesticides (ie, insecticides, herbicides, and fungicides). Responses were coded as 0 = no exposure, 1 = limited exposure, and 2 = regular exposure at least weekly for six months. Pesticide exposure was more common among men and cases (men: 63% vs 55%; women: 46% vs 39%), especially regular exposure (men: 20% vs 9%; women: 5% vs 3%).

Genetic determinations
SNPs in nuclear genes that encode mitochondrial complex I proteins were identified from the HGVbase as of July 2001 [1]. The 22 of 70 identified SNPs polymorphic among 16 randomly selected healthy Australian subjects are investigated (Table 1). SNP determinations were made using the DASH method [21,22]. The major allele at each locus was coded as "a" and the minor allele "b" yielding genotypes aa, ab, and bb. Multilocus genotypes were coded, for example, aa:ab, for genes having more than one SNP. Infrequent (0.018) missing values were ignored in the data analysis.

Number
Gene SNP (HGV base ID)

The data analytic approach
Detailed clinical genetic profiles were identified using gradeof-membership analysis or GoM [23][24][25]. Case subjects were considered according to age (< 60, 60-69, 70+) and environmental exposure (0, 1, 2); control subjects were coded according to exposure regardless of age, that is, 12 categories total. GoM can be described after first identifying four indices. One is the number of subjects I (i = 1, 2, . . . , I). Here, I = 627 subjects were identified. The second index is the number of variables J ( j = 1, 2, . . . , J). There are J = 17 variables. Our third index is L j: the set of response levels for the Jth variable. This leads to the definition of the basic GoM model where the probability that the ith subject has the L jth level of the Jth variable is defined by a binary variable (ie, y i jl = 0, 1). The model with these definitions is where the g ik are convexly constrained scores (ie, 0.0 g ik 1.0; k g ik = 1.0) for subjects and the λ k jl are probabilities that, for the Kth latent group, the L jth level is found for the Jth variable. The procedure thus uses this expression to identify K profiles representing the pattern of J¢ L j responses found for I subjects.
E. H. Corder and G. D. Mellick

3
The parameters g ik and λ k jl are estimated simultaneously using the likelihood function (in its most basic form).
In the likelihood y i jl is 1.0 if the L jth level is present and 0.0 if it is not present. Decade of age provided starting values.
Information on sex was available to further characterize the groups. One option in the likelihood is to separate calculations for "internal" (here, clinical and genetic) and "external" (here, sex) variables. For internal variables, maximum likelihood estimations (MLE) of g ik and λ k jl are generated and the information in internal variables is used to define the K groups. For external variables the likelihood is evaluated (and MLE of λ k jl ; generated) but the information is not used to redefine the K groups, that is, the likelihood calculations for likelihood equations involving the g ik are disabled for external variables so that the g ik , and the definition of the K groups, is not changed.
Next, three age-specific logistic models (< age 60, 60-69, 70+) were constructed to estimate the risk for PD according to membership in the vulnerable groups II, III, and IV. For this use, the graded membership scores were categorized from 1 (< 0.20 membership) to 5 (> 0.80 membership) representing 0.20 increments.

RESULTS
Five GoM groups represent the data, displayed in Table 2. Group I was robust to toxin exposure. Groups II, III, and IV were vulnerable. Group V had limited toxin exposure and was at low risk. Each group had a distinctive set of SNP genotypes for nuclear genes that encode mitochondrial complex I subunits.

Robust to pesticide exposure (I)
Low risk for PD despite limited toxin exposure carried a distinctive genetic signature of infrequent genotypes: X-linked A1 aa:ab:ab or bb:aa:aa, A6 bb, A8 bb:a-, A10 ab:aa, B4 ab, B8 ab or bb B9 ab, S1 ab:aa, and S4 bb:ab. The group consisted predominantly of females (84% chance). There was some chance of being affected at ages 70 or older after regular exposure to toxins despite this protective signature.

Early onset, regular exposure (II)
Group II was affected before age 60 and vulnerable to regular pesticide exposure. Its genetic signature consisted of A6 aa, A8 ab:ab, A10 aa:bb, S1 bb:a, the common S2 ab genotype found also for group I, and DLST bb.

Early onset PD, limited exposure (III)
Group III also had high risk for PD with onset before age 60(86%), at limited pesticide exposure (43%). This vulnerability was associated with A6 ab, B7 bb, and DLST aa. Homozygous V2 bb was found (22%) for this group only.

Low risk, no exposure (V)
Group V represents low risk for PD when not exposed to pesticides. Two SNP genotypes stand out as determinants: V2 ab (QRF = 1.47-the highest genetic influence score) and S4 aa:aa (QRF = 1.25). QRF stands for "question relevance score" denoting the relative importance of the variable in determining the group.

Informative variables
No one genetic variable dominated. An information statistic H, related to Shannon's information statistic (Bell Laboratories) was estimated for each variable: values close to zero indicate that the variable was not useful. Three of four SNPs deemed statistically significant in chi-square analyses (A6 : H = 0.92, A10 : H = 0.68, S4 : H = 1.13) [1] were identified as being highly informative. The fourth, A1, had limited heterozygosity and low H score (0.19). Nonetheless, 8196b + 8197b distinguished robust group I from the other groups. Additional loci were highly informative: B7 bb was associated with risk (III), aa with protection (I, V) (H = 1.07); A8 5147 bb was protective (I); risk was associated with 8968 ab for persons exposed to pesticides (II) (H = 0.84), among others as shown in Table 1. Figure 1 shows the membership distributions of case and control subjects in each age group (< age 60, 60 to 69, 70+). Few subjects exactly matched the respective groups (N = 0, 1, 1, 3, 1). Most divided membership, for example, had SNP genotypes found for several of the extreme pure type groups shown in Table 1. Relatively few subjects had membership scores of 0.60 or higher (0.60-0.79: green; 0.80-1.00: gray). Nonetheless, several trends were apparent: cases occurring before age 60 tended to resemble groups II and III more than the control subjects. Cases at ages 60 to 69 resembled group IV. Cases at age 70 and older did not over-represent II, III, and IV. Instead, both cases and controls most strongly resembled group V, that is, there may be a survival advantage for this set of polymorphisms. Note the consonance of membership distributions for control subjects in each age group.

RISK FOR PARKINSON'S DISEASE
The odds of PD, for subjects in each age group, were predicted by membership in groups II, III, and IV in logistic models ( Figure 2). Early onset PD was significantly predicted by groups II "early onset, regular exposure" and III "early onset, limited exposure" (OR (95% CI): 2.7 (1.8 to 4.0) and 4.0 (2.5 to 6.5)), respectively. Note that the confidence limits do not include the neutral reference value of one, which would denote no risk. Hence, even limited resemblance of subjects to either group, that is, membership score 0.20-0.39 versus < 0.20, carries statistically significantly increased risk.  Each 0.20 increment multiplies risk: successive increments in group II membership carry risks of 2.67, 7.13, and 29.0. Successive increments in group III membership carry risks of 4, 16, and 64. Higher levels of risk are predicted by, for example, 0.5 membership in each of groups II and III. Onset at ages 60 to 69 was significantly predicted by groups II "early onset, regular exposure" and IV "late onset, limited exposure" (OR (95% CI): 1.6 (1.1 to 2.3) and 2.63 (1.8 to 3.8)). Successive increments in group II membership carry risks of 1.6, 2.6, and 4.0. Successive increments in group IV membership carry risks of 2.63, 6.9, and 18.2. Higher levels of risk are predicted by, for example, 0.5 membership in each of groups II and IV.
The model was not predictive at ages 70 and older, that is, the global hypothesis that the parameter values were zero could not be rejected (P = .15).

DISCUSSION
The object of this study is the mutual information contained in multiple SNPs located in nuclear genes that encode mitochondrial complex I subunits, level of toxin exposure, and Parkinson's disease status according to age. A prior investigation of individual SNPs in the study sample [1] found relative risks of about two for PD associated with certain SNPs located in NDUF genes A1, A10, A6, and S4. This more inclusive analysis replicated these findings and yielded better estimates of risk in relation to the available information. Specifically, five model-based groups were identified that represented robustness to pesticide exposure (I), vulnerability to regular (II) and limited exposure (III, IV) and low risk in the absence of exposure (V). The robust group consisted predominantly of females and carried a set of less frequent alleles including one on the X chromosome. The vulnerable groups differed according to age at onset (< age 60 for II and III; age 60 to 69 for group IV) and level of toxin exposure (limited or regular for II, none or limited for groups III and IV). Even the low risk groups I and V had some level of risk for PD at ages 70 and older. Thus the mutual information investigated using GoM was more informative in terms of age, sex, and toxin exposure compared to straightforward association analysis. This approach, avoiding multiple comparisons which would have decimated each of the individual associations [1], was able to estimate risk for individuals according to the level of membership in the vulnerable groups. At ages < 60, statistically significant 3-fold and 4-fold elevation in risk was found for persons who had limited (0.20-0.39) resemblance to groups II and III, respectively, compared to those having very little (< 0.20) resemblance. Successive increments in group II membership carry risks of 2.67, 7.13, and 29.0. Successive increments in group III membership carry risks of 4, 16, and 64. Higher levels of risk are predicted by, for example, 0.5 membership in each of groups II and III. Hence, clinically relevant and statistically significant results were obtained.
Taking the Rotterdam cohort as a guide, incidence increases from 0.3 per 1000 person-years at ages 55 to 65 years to 4.4 per 1000 person-years at ages 85 years and older [27]. The incidence of symptoms of parkinsonism was similar for men and women, but men more often met diagnostic criteria (male-to-female ratio, 1.54; 95% CI, 0.95 to 2.51), hence, the great care taken in this study to consider age at onset. Because men were more often exposed to pesticides compared to women, sex, per se, was not used to determine the risk groups.
Biological interpretation is not straightforward, yet the results lend further credence to the believe that faulty combinations of mitochondrial complex I subunits pose significant risk for age-related PD, presumably, by reduced ATP production and increased production of reactive oxygen species. The results imply that certain persons are robust and others vulnerable to PD when exposed to pesticides. Measurement of pesticide exposure information was structured, but imperfect: memory fades; duration beyond six months was not investigated. One feature of GoM, the identification of extreme types, minimizes this problem by filtering a lack of fidelity in the data. To the extent that groups are misidentified, risk estimates would be expected to be biased toward the null,  that is, underestimate risk. The ability of GoM to interpret mixtures of genetic, clinical, and pathologic data is further demonstrated in these referenced papers [28][29][30][31][32][33][34].
In summary, fuzzy latent class analysis was employed to identify sets of polymorphisms located in nuclear genes encoding mitochondrial complex I proteins associated with PD, and effect modification by toxin exposure. Even partial resemblance to the identified risk sets carried appreciable risk for PD. This form of analysis may prove a particularly useful way for hypothesis generation and subsequent investigation of specific gene x gene and gene x environment interactions in relation to common sporadic PD.