Morphological Factor Estimation via High-Dimensional Reduction: Prediction of MCI Conversion to Probable AD

We propose a novel morphological factor estimate from structural MRI for disease state evaluation. We tested this methodology in the context of Alzheimer's disease (AD) with 349 subjects. The method consisted in (a) creating a reference MRI feature eigenspace using intensity and local volume change data from 149 healthy, young subjects; (b) projecting MRI data from 75 probable AD, 76 controls (CTRL), and 49 Mild Cognitive Impairment (MCI) in that space; (c) extracting high-dimensional discriminant functions; (d) calculating a single morphological factor based on various models. We used this methodology in leave-one-out experiments to (1) confirm the superiority of an inverse-squared model over other approaches; (2) obtain accuracy estimates for the discrimination of probable AD from CTRL (90%) and the prediction of conversion of MCI subjects to probable AD (79.4%).


Introduction
A growing body of literature relates the use of machine learning methods to build classification functions from features of interest extracted from medical imaging data (e.g., magnetic resonance images (MRI), positron emission tomography). We focus specifically on applications within the context of aid to clinical diagnostic in Alzheimer's disease (AD) and/or the prediction of future clinical status for individuals with Mild Cognitive Impairment (MCI), a putative precursor to AD [1][2][3][4][5][6]. These techniques have in common the reduction of large, high-dimensional image vectors into smaller features spaces and the identification of a low-dimensional discriminating function. Authors have reported attempts to further simplify the discriminating function by calculating a single, quantitative scalar measure, for example, the structural abnormality index score [7], the structural-functional biomarker score [8], and the disease evaluation factor [9]. While the first two rely on support vector machine analysis of a feature space composed of grey matter concentration patterns, the latter relies on linear kernel approaches to data reduction and classification of MRI appearance, defined as the combination of T1-weighted intensity and local volume shape characteristics for all voxels within a volume of interest. Using the same features of interest as described in [9], we propose a different morphological factor formulation, extensible to other modalities and to other sources of data. We derive the formulation and estimate its efficiency within the context of aid to diagnostic in probable AD by verifying the hypothesis that it accurately describes current and future clinical status.

Subjects.
A total of 349 subjects were included in this study, with ethics approval obtained from each institution represented.
The first cohort, or reference group, consisted in 149 young, neurologically healthy individuals obtained with permission from the ICBM database [10], whose scans were used to create a reference feature space of image data.
The second cohort, or AD test group, consisted in 150 subjects: 75 patients with a diagnosis of probable AD (AD group) and 75 age-matched controls (CTRL group) without neurological or neuropsychological deficit. The probable AD subjects were individuals with mild to moderate probable AD [11] recruited among outpatients seen at the IRCCS Fatebenefratelli (Brescia, Italy) between November 2002 and January 2005. CTRL subjects were taken from an ongoing study of the structural features of normal aging at the same center [12]. All subjects were followed a minimum of 3 years after inclusion; this longitudinal clinical evaluation constitutes our reference diagnostic.
The third cohort, or MCI test group, consisted in 49 MCI subjects taken from a prospective project on the natural history of MCI, carried out at the IRCCS Fatebenefratelli. The project was aimed to study the natural history of nondemented persons with apparently primary cognitive deficits, that is, deficits not due to psychic (anxiety, depression, etc.) or physical (hypothyroidism, vit. B12 and folate deficiency, uncontrolled heart disease, uncontrolled diabetes, etc.) conditions. Patients were rated with a series of standardized diagnostic and severity instruments, including the Mini-Mental State Examination (MMSE; [13]). In addition, patients underwent diagnostic MRI and laboratory testing to rule out other causes of cognitive impairment. These inclusion and exclusion criteria for MCI were based on previous seminal studies [14][15][16]. Amnestic or nonamnestic, single or multiple domain MCIs were included in the study.
All MCI patients underwent a yearly follow-up visit, consisting of complete clinical and neuropsychological examination, from 1 to 4 years after enrolment. In those individuals that converted to dementia, status was ascertained according to clinical diagnostic criteria for AD [11], subcortical vascular dementia [17], dementia with Lewy bodies [18], and frontotemporal dementia [19]. Within the larger prospective cohort of 100 MCI patients enrolled from April 2002 to December 2006, we have selected patients retrospectively for this study based on their (a) having been followed clinically a minimum of 48 months after their baseline MR scan; (b) having remained either stable (MCI-S group; N = 29) or progressed to probable AD (MCI-P group; N = 20; mean progression 1.5 yrs; SD 0.7 yrs). The 48-month longitudinal clinical evaluation constitutes our reference diagnostic.
Data for the last subject (validation subject) was obtained with permission from the pilot, multicentric European ADNI project [20] (E-ADNI). It consisted in a healthy volunteer that acted as human quality control phantoms and that was scanned three times at IRCCS Fatebenefratelli (scan; repeat scan, same session; rescan) on the same day.
Ethics Committees approved the study, and informed consent was obtained from all participants.

Data Processing.
We provide an overview of the automated image processing methodology, which follows essentially the steps outlined in Duchesne et al. with some modifications [3]. Images from all subjects were processed in an identical fashion using a publicly available toolkit (MINC: http://www.bic.mni.mcgill.ca/ServicesSoftware/HomePage). Processing included intensity inhomogeneity correction [21], nonlocal means denoising [22], intensity scaling, global and linear registration [23], extraction of a predetermined volume of interest centered on the medial temporal lobes, nonlinear registration within the volume of interest towards a common reference target [24], and computation of logdeterminants of the Jacobian of the deformation field [25].

Data Reduction and Feature
Selection. The first data reduction step was to construct a feature space based on the N = 149 subjects from the ICBM reference group. To this end, we used Principal Components Analysis (PCA) of two high-dimensional image vectors within a volume of interest centered on the medial temporal lobe to generate a low-dimensional feature space for classification: (1) the T1-weighted MRI intensity within the volume of interest, transformed into z-score; and (2) log-determinants within the volume of interest. With PCA we moved from a massive amount of data (2 × 149 × 4E10 5 voxels) to a lower subspace model of maximum N-1 dimensionality, further restricted by using only the first k eigenvectors λ that contribute up to a given threshold r in the description of the total variance of the system: Once the reference eigenspace was formed, the reference group data was no longer used. We then proceeded by projecting rasterized vectors of intensity and local volume changes for subjects in the AD and MCI test groups into the reference space. The distribution of eigencoordinates along any principal component for a given population was assessed via quantile plots and Shapiro-Wilke statistics for normal distribution. Following the projection, we used a system of supervised linear classifiers to identify the hyperplane that best separated the groups under study (e.g., CTRL versus probable AD; MCI-S versus MCI-P). To this end, the data was first normalized to guard against variables with larger variance that might otherwise dominate the classification. We employed forward stepwise regression analysis via Wilk's λ method to select the set of discriminating variables {λ F }, with F N − 1, forming the discriminating hyperplane.

Comparative Morphological Factor Construction.
The morphological factor is based on the concept of distance along the restricted set of eigenvectors {λ F }. In the imagebased feature space, this distance d can be calculated in a number of different fashions.

Manhattan Distance.
We propose initially the signed difference between subject eigencoordinates along the eigenvector λ F and the mean of the CTRL distribution for that International Journal of Alzheimer's Disease 3 eigenvector, denoted m CTRL ; as this distance increases the likelihood of belonging to the CTRL group decreases: 2.5.2. Euclidean Distance. We propose the Euclidean distance between position p i of each subject s i and both CTRL and probable AD means along the restricted set of eigenvectors As the distance to one center decreases, the distance to the second should increase. In (3) we demonstrate the distance to the mean of the probable AD group: (3)

Weighted Distance.
It is possible to weigh each eigenvector by an associated measure of significance, for example, Wilk's λ from the stepwise regression analysis [9] or a factor derived from univariate t-tests. While the Wilk's λ is trivially obtained from the regression analysis, an univariate weight such as the Koikkalainen factor formulation [26] entails performing a t-test comparing the group eigencoordinate distributions (e.g., CTRL versus probable AD; MCI-S versus MCI-P) for each eigenvector of the restricted set, resulting in the P-value p(λ F ) for that distribution; from these P-values the significance weight S F is calculated, The significance increases as the differences between the CTRL and AD groups grow and reaches zero when there is no statistically significant difference (at the P = .05 level) between both distributions. The resulting weighted distance D i combines the aforementioned distances (Manhattan, Euclidean) with a weight S F (either Wilk's λ or Koikkalainen factor) over all eigenvectors F from the restricted set {λ F } as follows:

Gravitational Model.
As the final formulation, we extend the principle of image-based distance to the context of an attraction field that follows Netwon's Law of Universal Gravitation, whereby any two elements of mass m within the feature space will exert upon one another an attractive force that will vary proportionally to the inverse of the square of the distance between them. In our context the force exerted by one group (e.g., CTRL) decreases as the distance between a subject and the center of mass of the CTRL group grows, while the force exerted by the second group (e.g., probable AD) increases as distance decreases between the same subject and the second group's center of mass. In a multiple group scenario, the calculated combined force serves as a quantitative measure of the likelihood of belonging to one of the groups.
In such a classical formulation the force between any subject s i with mass m i , to the centers of mass of, for example, the CTRL group (CM CTRL ) and the AD group (CM AD ), is expressed as: being the formulation for the centers of mass calculations, where M is the total mass for all subjects in the group, m i their individual masses, and p i their individual positions in feature space as derived in the previous section. The distance metric that can be used can be anyone of the aforementioned distances; for the purposes of the current study, the Euclidean distance as formulated in (3) was employed. We chose to retain the concept of "mass" even though it has no real bearing within the present context of an imagebased feature space. It could be replaced with different information regarding individuals in the groups, for example, Braak histopathological staging [27]. Alternatively, one can vary the specificity and sensitivity of the attraction field by increasing the "mass" of subjects in one of the groups (e.g., CTRL or probable AD). For these purposes however we set the mass of each subject to unity, and, further, for equal considerations of simplicity, we set the gravitational constant G also to unity. As is, the result is an inverse-squared law relationship.
Statistics and measurements were computed using the MATLAB Statistics Toolbox (The MathWorks, Natick, MA).

Experiments.
Once the reference space was created, all of the experiments that we conducted were performed in a leave-one-out fashion whereby one subject from the study groups was temporarily removed, allowing for an independent estimate of the low-dimensional discriminant function and the calculation of the eigendistribution means and centers of mass. Only then was the left-out subject entered in the system and its morphological factor computed. The final results consist in the comparison of the independently acquired morphological factors for each subject.
We ran three distinct experiments: (a) determination of the relative accuracies of each distance formulation (Manhattan, Euclidean, Weighted Distance (Wilk's λ), Weighted Distance (Koikkalainen), Gravitational model) for the discrimination of CTRL versus probable AD; (b) determination of the accuracy of the best distance formulation for the discrimination of MCI-S versus MCI-P; (c) determination of the resolution of the best distance formulation based on the CTRL versus probable AD discriminant function using the E-ADNI scan-rescan dataset.

Demographics.
There were no statistically significant differences for age between the 75 probable AD and 75 NC   individuals (P > .05) in the AD test group. There was a statistical difference for age between the MCI-S and MCI-P groups (P = .001) and for baseline MMSE (P = .01) (see Table 1).

Data Processing and Feature Selection.
We set the variance ratio r (see Equation (1)) to 0.997, resulting in a reference PCA model composed of 256 intensity and local volume change eigenvectors. We proceeded with forward stepwise regression analysis using Wilk's λ method (P-toenter = .005) to select the discriminating variables forming the hyperplane separating each group (e.g., CTRL versus probable AD; MCI-S versus MCI-P). This was performed in a leave-one-out fashion to eliminate overlearning of the dataset. The median number of eigenvectors λ F retained in the discriminating function for either CTRL versus probable AD or MCI-S versus MCI-P was four. Table 2 displays the different accuracies obtained for the five different formulations for the morphological factor at the task of discriminating CTRL versus probable AD (leave-one-out). The Gravitational model's accuracy was 90%, superior to the Weighted Distance models. Using the Gravitational model, we report the results for the morphological factor for the CTRL versus probable AD experiment and the MCI-S versus MCI-P experiment in Table 3. The distributions of morphological factors for all groups, alongside quantile plots to assess normality (CTRL and probable AD groups) are shown in Figures 1 and 2.

Morphological Factor Calculation.
The receiver operating characteristic (ROC) curve for the task of discriminating CTRL from probable AD shows the trade-offs possible in sensitivity and specificity (Figure 1(c)). The area under the ROC curve was 0.9444. At the 90% accuracy point (135/150), specificity was 87.5% and sensitivity 92.9%.
With the Gravitational model we computed the ROC curve for the discrimination of MCI-S from MCI-P (Figure 2(c)). The Area under the ROC curve was 0.7940. At 72.3% accuracy, specificity was 62% and sensitivity 75%.
Finally, we computed the morphological factor for the E-ADNI human phantom volunteer, using the CTRL and probable AD cohorts as a training group for the determination of the discriminating function. Using the Gravitational model, the average factor value was −0.4 or 4 standard deviations away from the mean of the CTRL distribution, with an average difference in scan-rescan factor of 4%. Notably, the morphological index obtained via a weighted distance method (Koikkalainen factor) had an average difference in scan-rescan factor of less than 1%.

Discussion
The gravitational or inverse-squared law model constitutes a novel development in the strategies towards obtaining a single quantitative factor from data reduction and machine learning of very high-dimensional MRI input data towards International Journal of Alzheimer's Disease discrimination of individual subjects. Its inherent flexibility makes multigroup comparisons trivial, alongside the introduction of other sources of data. Its performance compares favorably to other results in the MRI literature within the context of discriminating CTRL versus probable AD [2]. As a single dimensional scalar, the morphological factor metric achieves strong accuracy (90%), especially when compared to other multidimensional discrimination functions (e.g., 92% as reported in [3]). It has also a strong result when put within the clinical context of discriminating CTRL versus probable AD, where inclusion evaluations are reportedly 78% accurate (albeit against final histopathological diagnostic). While lower, accuracy figures for the prediction of progression to probable AD in the MCI cohort (on average, 1.5 years before clinical diagnostic) are also strong and compare favorably to published results on MRI data [4,6]. A study comparing these approaches (e.g., within a monocentric setting, such as the Open Access Series of Imaging Studies [28] or multicentric setting such as the Alzheimer's Disease Neuroimaging Initiative [29]) would be worthwhile.
The paper uses the leave-one-out approach to feature selection (stepwise regression analysis), which allows a correct generalization of the morphological factor as it is not tested on the same data.
Clinical interpretation of changes in image features associated with changes in the morphological factor should provide insight into the development of AD and would need to be compared to existing results from voxel-based morphometry studies, structural studies (e.g., hippocampal and entorhinal atrophy), and histopathological confirmation studies. Overall, we speculate that the specific patterns of intensity and local volume change differences result from different levels of advanced extracellular plaque formation, neurofibrillary tangles accumulation and other pathological processes between CTRL and probable AD, and between stable and progressing MCI. With regards to the features employed in this method, the differences in local volume changes should mirror the changes noticed in other reports, such as visual assessment [30], while differences in grey level might reflect the intensity of neuronal loss induced by the neuropathological changes [31], which precede volume loss as visualized on MRI. Such an evaluation however is beyond the scope of this paper.
The difference in factor averages between probable AD and CTRL was 15%. At this level, the minimum trial size required to detect this difference is 59 individuals for both samples (α = 0.05; β = 0.50) and reaches 75 individuals if we include scan-rescan variability.

Limitations.
There are a number of limitations in this study. One pertains to the fact that the MRI images for the International Journal of Alzheimer's Disease 7 probable AD subjects were acquired at the time of diagnosis; therefore, some of the patients have had AD for a number of years. In turn, this implies that extensive neurodegeneration has taken place at this point and should artificially facilitate the discrimination with CTRL. However, the fact that the latter were age matched and the fact that the results in the MCI cohort remain significant alleviate part of this concern. It would be useful to assess if the morphological factor correlates with different indices of disease severity, cognitive deficits, or other biomarkers. Neuropathological confirmation is also required to replace the clinical evaluation as a gold standard. Finally, the patterns of abnormalities that can be found by the method are restricted to a space that is built from healthy, young controls. It is not the optimal space to describe normal aging and/or AD-related variability. However, it does tend to maximize the distance between both groups, as we noticed from building a few reference spaces in an N-fold validation of the CTRL/probable AD groups that achieved lower accuracies.
We estimate that the proposed formulation of the morphological factor is relevant within the context of aid to diagnostic and prediction of future clinical status in probable AD.