A Two-Study Comparison of Clinical and MRI Markers of Transition from Mild Cognitive Impairment to Alzheimer's Disease

A published predictor model in a single-site cohort study (questionable dementia, QD) that contained episodic verbal memory (SRT total recall), informant report of function (FAQ), and MRI measures was tested using logistic regression and ROC analyses with comparable measures in a second multisite cohort study (Alzheimer's Disease Neuroimaging Initiative, ADNI). There were 126 patients in QD and 282 patients in ADNI with MCI followed for 3 years. Within each sample, the differences in AUCs between the statistical models were very similar. Adding hippocampal and entorhinal cortex volumes to the model containing AVLT/SRT, FAQ, age and MMSE increased the area under the curve (AUC) in ADNI but not QD, with sensitivity increasing by 2% in ADNI and 2% in QD for a fixed specificity of 80%. Conversely, adding episodic verbal memory (SRT/AVLT) and FAQ to the model containing age, Mini Mental State Exam (MMSE), hippocampal and entorhinal cortex volumes increased the AUC in ADNI and QD, with sensitivity increasing by 17% in ADNI and 10% in QD for 80% specificity. The predictor models showed similar differences from each other in both studies, supporting independent validation. MRI hippocampal and entorhinal cortex volumes showed limited added predictive utility to memory and function measures.


Introduction
Mild cognitive impairment (MCI) often represents a transitional state between normal cognition and Alzheimer's disease (AD) [1,2]. Accurate prediction of transition from MCI to AD aids in prognosis and targeting early treatment [3]. Episodic verbal memory impairment and informant report of functional deficits in complex social and cognitive tasks are features of incipient AD, and impairment in these domains is associated with transition from MCI to AD [4,5].
Most biomarkers of MCI transition to AD are related to the underlying disease pathology of amyloid plaques and neurofibrillary tangles [6]. Hippocampal and entorhinal cortex atrophy on MRI scan of brain [7], parietotemporal hypometabolism on 18 FDG PET [8], increased amyloid uptake using PET [9], and decreased amyloid beta-42 (Aβ42) with increased tau/phospho-tau levels in the cerebrospinal fluid (CSF) [10,11] each significantly predict transition from MCI to AD. The apolipoprotein E ε4 allele increases AD risk, but is not a strong biomarker of transition from MCI to AD [3].
In a meta-analysis, memory deficits appeared to be superior to MRI hippocampal atrophy in predicting transition to AD [12], but studies in the meta-analysis had highly variable subject inclusion/exclusion criteria and assessment methods. There has been a lack of direct head-to-head comparison of clinical and neuroimaging predictors of transition across different studies.
In our single-site study (Questionable Dementia or QD study) that evaluated and followed a broadly defined sample of patients with MCI, a published predictor model that included specific cognitive, functional, olfactory, and MRI measures strongly predicted transition to AD [3]. In the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, cognitive and functional measures and several biomarkers are assessed in samples of MCI, AD, and healthy control subjects at baseline and serially during followup. In this paper, the first goal was to test the accuracy of a combination of predictor variables derived from the QD study to predict transition from MCI to AD in a completely independent ADNI sample. The validation of specific predictor combinations, rather than individual measures, has rarely been done in independent samples. This is essential before specific cutpoints, and ranges for specific predictors in such models can be developed with confidence for eventual clinical application. The second goal was to evaluate the relative utility of clinical and MRI measures in predicting transition from MCI to AD.

Methods
Patients with MCI in the QD and ADNI studies were included, and patients with AD (ADNI) and healthy control subjects (QD and ADNI) were excluded. The 3-year followup samples were chosen because most transitions occur to AD within 3 years of clinical presentation [13].

QD Study.
As previously reported, patients 41-85 years old who presented with subjective memory complaints for clinical evaluation to a Memory Disorders Clinic were eligible if they had a Folstein Mini-Mental State Exam (MMSE) score ≥22 out of 30, memory impairment defined as MMSE recall ≤2/3 objects at 5 minutes or a Selective Reminding Test (SRT) delayed recall score >1 SD below norms, and absence of a consensus diagnosis of dementia made by two experienced raters [3]. Patients could also be included if they had other cognitive and functional deficits. This study began before criteria for MCI were published [1,2]. Baseline MCI subtype using the criterion of >1.5 SD below norms on cognitive tests was determined post hoc by using age, education, and sex-based regression norms derived from 83 healthy control subjects [4]. Using this approach, 73% of patients met the Peterson criteria for single or multidomain amnestic MCI, and this subsample was also compared to ADNI. The presence of specific neurological or major psychiatric disorders led to exclusion [3]. Patients were followed every 6 months for up to 9 years, and the two raters made a consensus diagnosis at each time point. The sample comprised 148 patients with MCI at baseline, and 126 patients were in the 3-year followup sample.

ADNI Study.
Data were obtained from the ADNI study (http://adni.loni.ucla.edu/), a project launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, private pharmaceutical companies, and non-profit organizations as a $60 million, 5-year publicprivate partnership. The primary goal is to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.
Participants 55-90 years old were enrolled if they had at least 6 years of education, spoke English or Spanish, agreed to longitudinal followup and neuroimaging tests, had single or multidomain MCI by the Petersen criteria with MMSE scores between 24 and 30, a memory complaint verified by informant, an abnormal memory score (1.5 SD below age-adjusted cutoff) on the Logical Memory II subscale (delayed paragraph recall) from the Wechsler Memory Scale-Revised, and absence of a dementia diagnosis. All participants had a Geriatric Depression Scale score of <6 and a modified Hachinski score of ≤4. For a more detailed account of the inclusion/exclusion criteria, please see http://www.adni-info.org/. Raters at each site made consensus diagnoses at six-month intervals that included an evaluation of transition from MCI to AD, which was reviewed by a central committee. Data were obtained from ADNI on October 31, 2010. Of 394 individuals with MCI at baseline evaluation, 282 subjects completed 3 years of followup.

Comparable Baseline Measures Chosen for Analysis from QD and ADNI.
In the QD study, the SRT total recall (12 items, 6 trials) was the strongest predictor among the five hypothesized neuropsychological predictors examined [3]. The SRT was not done in ADNI, but the comparable measure of total recall across 6 trials in the Auditory Verbal learning Test (AVLT) was not used for study inclusion criteria and was available. Informant report of the patient's functioning using the Pfeffer Functional Activities Questionnaire (FAQ) total score and MRI hippocampal and entorhinal cortex volumes was additional predictors in the final model in QD [3] that were also assessed in ADNI.
Both studies conducted MRI on 1.5T scanners: a single GE scanner in QD, and GE or Siemens or Philips scanners across 48 sites in ADNI. In QD, hippocampal volume was assessed by a semiautomated method with specific anatomical landmarks used to define hippocampal boundaries, and entorhinal cortex volume was computed from three slices centered at the level of the mammillary bodies [7]. In ADNI, MRI hippocampal and entorhinal cortex volumes were derived from postprocessed image analysis that used FreeSurfer (FS) version 4.3.0 by researchers at the University of California, San Francisco (UCSFFSX); the data are available at http://adni.loni.ucla.edu/. The volume derivation process is described at http://www.loni.ucla.edu/twiki/bin/ view/ADNI/ADNIPostProc. For both studies, intracranial volume was a covariate in all analyses of hippocampal and entorhinal cortex volumes.

Statistical Analyses.
Summary statistics were calculated to describe the sample characteristics in the ADNI and QD studies. For each study, Chi-square and t-tests were used to detect differences in baseline categorical and continuous variables between MCI patients with and without transition to AD by three years of followup (there were few non-AD dementia cases in both studies). The QD and ADNI studies had different available followup duration times, and therefore survival analysis was not used for comparisons. For both datasets, specific sets of baseline predictors were examined in logistic regression models for the binary outcome of transition to AD within 3 years after baseline evaluation. With each model, sensitivity and specificity were calculated for all possible cut points on the predicted risk of transition to AD to construct receiver operating characteristic (ROC) curves. From the ROC curves, the area under the curve (AUC) was compared statistically between datasets and between nested models within each dataset.

Demographic and Clinical Features of the Two Samples.
Compared to the QD sample, the ADNI sample was older, had a greater proportion of males, had a higher proportion with the apoE ε4 allele, and reported greater functional impairment ( Table 1). The samples did not differ in years of educational attainment and MMSE scores.

Prediction of Transition from MCI to AD by 3-Year
Followup. The majority of patients in ADNI (157/282 or 55.6%) and a minority of patients in QD (33/126 or 26.1%) converted to AD by 3-year followup; the disparity likely related to more stringent inclusion criteria for memory impairment in ADNI compared to QD. Based on logistic regression analyses, the combination of age and MMSE was a poor predictor in ADNI and showed low sensitivity at the fixed level of 90% specificity in QD (top of Table 2). Models that included age with MMSE and specific combinations of AVLT or SRT total recall, FAQ scores, hippocampal and entorhinal cortex volumes showed greater sensitivity, specificity, and predictive accuracy in the QD study compared to ADNI (  A threshold of 0.5 was used on predicted risk derived from the logistic regression models. Area under the curve (AUC) was derived from receiver operating characteristic (ROC) analyses. N = 282 (157 converters) in ADNI and N = 126 (33 converters) in QD. The differences between models in AUCs are slightly different from the direct subtraction of AUCs between models because of missing data that ranged from 1% to 4% for the variables examined in ADNI and 1% to 5% for the variables examined in QD. * P < 0.05, * * P < 0.01.
of Table 2). The AUC increased consistently across the two studies when episodic verbal memory (AVLT/SRT) and function (FAQ) measures were added to the model containing the combination of age, MMSE, and hippocampal and entorhinal cortex volumes (P < 0.0001 in ADNI and P = 0.0254 in QD; Model 2 versus Model 3, bottom of Table 2 and Figure 1), with an appreciable increase in sensitivity for a fixed specificity of 80% and 90% in both ADNI (increases of 17% and 15%, resp.) and QD (increases of 10% and 17%, resp.; top of Table 2 and Figure 1). Conversely, adding hippocampal and entorhinal cortex volumes to AVLT/SRT, FAQ, age, and MMSE significantly increased the AUC in ADNI (P = 0.0035) but not in QD (P = 0.20) and led to a small increase in sensitivity for a fixed specificity of 80% and 90% in ADNI (increases of 2% and 6%, resp.) and QD (increases of 2% and 7%, respectively, top of Table 2). In both samples, the differences in AUCs between the three statistical models examined were very similar (bottom of

Discussion
Within each sample, QD and ADNI, the differences in AUCs between predictor models were similar, suggesting robustness and generalizability across outpatient settings.
International Journal of Alzheimer's Disease When advising patients and families about the likelihood of transition from MCI to AD, a predictor model with specificity over 80% is essential because a false positive rate of over 20% (specificity less than 80%) is clinically unacceptable [14,15]. In the predictor model, adding hippocampal and entorhinal cortex atrophy to age, MMSE, and the episodic verbal memory and function measures increased sensitivity only to a small extent at fixed specificities of 80% and 90%. These findings suggest limited added utility for MRI hippocampal and entorhinal cortex volumes to clinical assessment of memory and function in predicting transition from MCI to AD. In contrast, adding measures of episodic verbal memory and function to the model that combined age, MMSE, and hippocampal and entorhinal cortex volumes appreciably increased sensitivity for fixed levels of 80% and 90% specificity in both samples. In both studies, the model that included AVLT/SRT, FAQ, and hippocampal and entorhinal cortex volumes with age and MMSE showed the strongest predictive accuracy. For episodic verbal memory measures, numerical ranges and cutoffs for specific ages and education levels can inform the likelihood of transition to AD. Although delayed recall deficit is typical in AD, both immediate recall (incorporates learning) and delayed recall show comparable predictive accuracy for the transition from MCI to AD [4]. The use of a single episodic memory measure in the predictor models examined does not replace the need for a comprehensive neuropsychological evaluation for diagnostic purposes [4]. Informant reports of FAQ scores reflect instrumental, social, and cognitive functional impairments, but specific cutoffs for prediction of transition to AD are not established [5,16]. International efforts to standardize MRI imaging parameters and methods of volumetric assessment [17], both of which have varied widely across studies, may lead to the development of specific cutoffs for hippocampal and entorhinal cortex atrophy that improve predictive accuracy.
The use of cognitive markers has some advantages over neuroimaging: objectivity in scoring, comparative economy in expense and time, and reliability. One argument is that episodic verbal memory should not be used as a marker because it is used for inclusion criteria and in the diagnostic process. However, evaluation of severity of episodic verbal memory deficit as a predictor in patients with amnestic MCI who have episodic verbal memory deficits is analogous to the established strategy of evaluating severity of depression as a predictor of clinical course and treatment response in major depression [18]. Further, using memory test scores in prediction creates a statistical handicap, rather than an advantage, by restricting the range in baseline memory test performance [12]. Of note, the AVLT memory measure examined as a predictor in this paper was not part of the study inclusion criteria in ADNI (WMS-R logical memory was used). The same rationale applies to the incorporation of the MMSE, which is widely used and clinically relevant, in predictor analyses even though it is part of the screening criteria for study inclusion.
Informant report of functional impairment using the FAQ was not part of the inclusion criteria in either QD or ADNI, and the definition of MCI by the original Petersen criteria requires the absence of significant functional impairment [1,2]. Therefore, the use of informant report of functional impairment is independent of the diagnostic criteria for MCI, and our findings indicate that this type of assessment is important in predicting transition to AD [3,5].
Clinical and neurobiological markers have been incorporated recently into diagnostic classification systems. An international panel used the terms "prodromal dementia" and "predementia" to indicate that neurobiological markers may identify patients with incipient AD who cannot be diagnosed clinically [19]. The new NIA diagnostic criteria separate core clinical criteria from research criteria that employ neurobiological markers [20], partly because diagnostic and predictive accuracy for neurobiological markers has not been fully developed and validated. Our results emphasize the need for such validation.
There have been few comparisons of predictor models between studies. In a comparison of ADNI to a Finnish study, classification performance did not increase after the inclusion of 10 variables that included CSF measures, apolipoprotein E ε4, MRI measures, age, and education [21]. The overall model was not strong, possibly because key cognitive and functional measures were excluded. Another study compared different samples of patients with MCI who had 18 FDG PET with generally positive results [22] but without cut-points for clinical application. Our report represents a novel independent validation of predictor models that included clinical, memory, functional, and MRI measures. The consistency in the differences between models in each study indicates that this two-study comparison is broader and more clinically relevant than prior validation attempts [21,22].
From the ADNI database, several reports show moderate predictive accuracy for weighted scores within a global cognitive test [23] and moderately strong predictive accuracy for specific neuropsychological test scores [24], consistent with other studies [4]. The best possible fit from a highdimensional pattern classification approach using ADNI MRI data [25] led to results similar to our report that used volumetric measures, but other MRI analytic strategies using ADNI data have led to lower predictive accuracy [26,27]. Entorhinal cortex volume enhanced prediction in both ADNI and QD in our comparisons, supporting the evaluation of entorhinal cortex volume as a predictor [7].
There were some limitations to this paper. The two samples differed in sex and age distribution and cognitive test scores, significant episodic verbal memory deficits were required in ADNI compared to broader inclusion criteria in QD that may partly account for higher transition rates in ADNI, and different episodic verbal memory measures and different MRI volumetric assessment methods were compared. Nonetheless, within each sample for several combinations of predictors the differences in AUCs were similar. The high transition rate in ADNI suggests that some patients diagnosed with MCI by 3-year followup may convert in subsequent years, likely leading to a higher rate of false negatives in ADNI. This may partly explain the lower accuracy for predictor combinations in ADNI. In ADNI, the smaller number of patients at 3-year followup was partly related to some recently recruited patients not yet having had the opportunity to reach 3-year followup at the time of data analysis for this paper. This issue also precluded the use of survival analysis in this sample. In QD, we derived the strongest predictors from a set of a priori measures in a large neuropsychological test battery and examined comparable measures from the shorter ADNI neuropsychological assessment. While administering a comprehensive neuropsychological test battery is important for diagnostic purposes, our clinically relevant approach of examining individual measures facilitates comparison across studies and demonstrates the predictive strength of even a single episodic verbal memory test. Baseline MRI measures were examined because serial MRI measures were not available in QD. It remains unclear if serial imaging measures are superior to baseline imaging in predicting long-term outcome [28]. Serial imaging measures provide useful information about structural changes associated with disease progression, but they are expensive, not current clinical practice, and not useful in early converters. Cerebrovascular disease may contribute to cognitive decline in these patients [19,20]. However, hyperintensities, lacunes, and infarcts could not be assessed systematically in QD because of the MRI sequences obtained (no FLAIR or comparable sequence) and therefore could not be compared with ADNI. Absent neuropathological validation, we considered examining CSF measures from ADNI (not done in QD) for in vivo validation of transition to AD, but CSF was not collected in approximately half the ADNI sample and neuropathological validation of CSF tau and Aβ abnormalities has not been established.
In QD, the pathophysiological measure [19] of olfactory identification deficits (not done in ADNI) strongly predicted transition to AD with limited overlap in prediction with the SRT and MRI measures [3,29]. In ADNI, 18 FDG indices (not done in QD) significantly predicted transition to AD and were superior to the ADAS-cog [8], but the ADAS-cog is a global cognitive measure used primarily in clinical trials of AD patients and is not established as a strong predictor of transition from MCI to AD. PET amyloid imaging discriminates among AD, MCI, and controls [30] and correlates at autopsy with amyloid plaques [9]. However, approximately 10-30% of healthy controls show increased amyloid uptake [30] and whether these subjects have incipient AD needs confirmation in long-term followup studies. The sensitivity and specificity of CSF levels of Aβ42 and tau/phospho tau, and their ratio, for predicting MCI transition to AD in ADNI [31] and in a European multicenter study [32] ranged from 65% to 75%, which is slightly lower than that in other reports [10,11]. For CSF markers, further refinement of assay technique and validation in long-term followup studies are needed to establish more definitive cut-points for individual and ratio measures that have varied to some extent across studies [10,11,32].
This report suggests that volumetric evaluation of medial temporal lobe atrophy adds only marginally to the information obtained by cognitive testing and assessment of episodic memory, and it cannot yet be recommended for wide clinical use to assess the risk of patients with MCI being diagnosed with AD during followup. In the clinic, visual inspection ratings are likely to lead to lower predictive accuracy than either the QD or ADNI volumetric assessments. Structural 7 neuroimaging with MRI remains useful to rule out specific causes of cognitive impairment, for example, stroke, tumor. A key conclusion from this report is that conducting neuropsychological evaluation is important, and interviewing family members or other informants about the patient's functioning may be at least as important as conducting an MRI scan. Several clinical and neurobiological markers, including cognitive test scores, functional ability, and MRI and 18 FDG PET measures, are influenced considerably by age and other demographic factors, and their utility needs to be evaluated in more heterogeneous samples. The comparative predictive utility of clinical and neurobiological markers needs further assessment across different populations as these measures improve in predictive accuracy.

Disclosure
Data used in the preparation of this paper included data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.ucla.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. Complete listing of ADNI investigators is available at http://adni.loni.ucla.edu/.