Group Factor Analysis for Alzheimer's Disease

For any neuroimaging study in an institute, brain images are normally acquired from healthy controls and patients using a single track of protocol. Traditionally, the factor analysis procedure analyzes image data for healthy controls and patients either together or separately. The former unifies the factor pattern across subjects and the latter deals with measurement errors individually. This paper proposes a group factor analysis model for neuroimaging applications by assigning separate factor patterns to control and patient groups. The clinical diagnosis information is used for categorizing subjects into groups in the analysis procedure. The proposed method allows different groups of subjects to share a common covariance matrix of measurement errors. The empirical results show that the proposed method provides more reasonable factor scores and patterns and is more suitable for medical research based on image data as compared with the conventional factor analysis model.


Introduction
Modern medical imaging techniques are capable of measuring human brain in vivo [1]. For instance, magnetic resonance (MR) imaging measures nuclei of atoms, and positron emission tomography detects the positron-emitting radionuclides to construct three-dimensional images. The imaging procedures are designed and settled before medical or cognitive experiments. Once the protocol is established, the laboratory and the hospital begin to recruit a variety of subjects of interest into experimental sessions. Errors resulting from individual scans are actually generated from common sources, such as the scanner, protocol, and software. Initial classification of subjects into groups can be realized by using clinical diagnosis, which may be uncertain to some extent, provided by physicians along with subjects' anamnesis.
Conventional factor analysis [2] models reduce highdimensional data into a few latent variables and assume that data x were generated by a set of unobserved independent unit-variance Gaussian source f plus uncorrelated zero-mean Gaussian random noise u, x = f + u, where is the factor loading matrix. The sample covariance of x can be expressed as +Ψ, where Ψ is a diagonal covariance matrix of random noises. The goal of factor analysis is to find and Ψ that maximally fit the sample covariance [3][4][5]. The EM algorithm was proposed to estimate the matrices [6]. Factor analysis is commonly applied to the dataset as a whole or to different groups of data separately, which may result in factor patterns hard to interpret and limit the potential use of the method in a wider range of medical applications. In this study, we propose a mixture factor analysis model (MFAM) to assign a common covariance matrix of noises or measurement errors to different groups of subjects but to allow individual groups having their own latent structures. In the empirical application, we analyzed an Alzheimer's disease (AD) dataset by first extracting the volumetric information from MR anatomical images for both healthy controls and the patients suffering either AD or mild cognitive impairment, followed by applying the proposed MFAM to the volumetric data.  the f scores distributed as Gaussian within each group, the data vector can be decomposed into a linear combination of factor loadings for each group [7,8], that is, ∈ × ,

Material and Method
where x is -dimensional and each factor scores f | has variables, that is, f ∈ . The parameter is associated with the proportion of subjects in the th group, = ( ). The indicator variable is one, = 1, when the data belongs to th group, otherwise is set to zero, = 0. The formula (1) using introduces the main difference from previous mixture models of factor analysis. The data vector x need not be centered and the mean of the th group data is . The covariance matrix of residuals u is a diagonal matrix Ψ = diag[Ψ 1 , Ψ 2 , . . . , Ψ ]. The data distribution can be expressed as In this work, capitalized denotes the probability function of a vector or a matrix and lowercase denotes the probability function of a scalar. The factor scores are assumed to be distributed as Gaussian The notation is the identity matrix of order . The distribution of data x in each group is given by Based on the MFAM (2), the likelihood function is as follows: where denotes the expectation. The is the number of data vectors (subjects) with subscript for the th subject. We need to compute the expectation of the variables, To estimate in (5), the posterior probability of the th group is calculated as where the probability of x given is  The parameters in (7) is the prior probability derived from the clinical diagnosis. Therefore, the expectation of given x in (6) is proportional to the numerator in (7), To calculate (6), we consider that the posterior probability of f given x is After some arithmetic calculation, (f | x) can be expressed as where = ( Ψ −1 + ). Hence, the expectation of f given x is From above, (f | , x ) in (6) is calculated as where = ( Ψ −1 + ), according to (12).
There is no constraint on those factor loadings . The estimation of is simply the maximum of . A convenient way to express in (5) is achieved by settingf The expected log likelihood function can be expressed as To maximize with respect tõ, we equate the derivative of (14) to zero, where Computational  All the variables are estimated by the EM algorithm. In the E-step, the algorithm computes the expectation of the factor scores in (6) and the second moment of the scores, by The covariance matrix of residual, Ψ, can be estimated by its inverse matrix, Substituting (15) for̃and making constraints on the diagonal of Ψ, we obtain The prior probability ( ) should be proportional to the clinical diagnosis such that the estimation of the factor loadings and the factor scores can capture the latent factors of different disease groups. The proposed model also carries the same indeterminacy problem associated with factor patterns; that is there exist numerous orthogonal transformations to rotate the matrix of factor loadings without changing the maximum of [9]. Considering be any × orthogonal matrix, = = . Equation (1) can be written  where The assumption, f * | ∼ (0, ), is kept. The covariance of x is * ( * ) + Ψ = + Ψ = + Ψ, which remains the same. Therefore, there are infinite equivalent solutions to satisfy the maximum of (5). Imposing reasonable constraints to identify a set of model parameters can make the factor loadings scientifically interpretable. A widely used approach for a simple factor structure is realized by setting some factor loadings to hypothetical values such as zeros.
The permutation and changing the sign of columns in the factor loading matrix with factor scores does not affect the model at all and the algorithm will yield the same solution. In order to realize consistent, interpretable, and comparable results, we suggest to recursively test all combinations to find the one of them that has the highest similarity among factor loading matrices so that we can find a coherent interpretation for different groups of subjects. Each pair of factor loading and factor scores can be multiplied by either +1 or −1. The sets of loadings has (2 ) combinations. The possible permutation of the set of loadings is the factorial of . The complexity of the recurrence is therefore 2 × ( !) . The problem can be formulated as a bipartite matching and the Hungarian algorithm can find the match in a lower complexity.

Data Description.
The T1-weighted MR images of 416 subjects were downloaded from the Open Access Series of Imaging Studies [10], which is publically available for analysis. All the T1-weighted images were acquired on a 1. . Multiple intrasession acquisitions provide extremely high signal-to-noise ratio, making the data amenable to our analysis. The available images were provided skull stripped, gain field corrected, and registered to the atlas space of Talairach and Tournoux [11] with a 12parameter rigid affine transform. The resolution of the images is 176 × 208 × 176. The number of voxels, which is more than six million, is much larger than the number of subjects. We extracted the clinically and psychologically interested regions instead of processing whole voxels in the image. The subcortical structures are extracted by the segmentation method [12] which uses manually labeled image data as priori information for a Bayesian framework that utilizes the principles of the active shape and appearance models. The size of a subcortical region was calculated by multiplying the voxel size and the number of voxels in the region. Fifteen subcortical regions were successfully extracted. According to a demographic study by the National Institute on Aging and Alzheimer's Association based on the data collected in the Chicago Health and Aging Project, the prevalence of dementia among individuals aged 71 and older was 13.9%, and AD (Alzheimers disease) was 9.7% [13]. The study was based on a sample of 856 individuals. The was estimated to be [76.4%, 13.9%, 9.7%] which is close to the statistics in our empirical data. The data vector of each subject had fifteen dimensions, each corresponding to the volume size of a subcortical structure divided by the estimated total intracranial volume. The average size of all of the intracranial volume is 1480.5 cm 3 . The intracranial volume is estimated by the linear registration from a manually measured intracranial volume of a standard brain to the individual brain [14]. The analysis of variance (ANOVA) of the data for each structure were calculated and shown in Table 1 and Figure 1. The smaller value indicates high probability of inequality of the structure size among the three groups.
We subtracted the mean from the data and used the remainder for analysis. Using the covariance matrix of the data to estimate the factor scores would cause that a few structures dominate the factor loadings; therefore, we divided each dimension by its standard deviation to compel each of them to have unit variance. After the algorithm converged, we used varimax rotation [15], which transforms the loadings  into the space that maximizes the variance, to rotate the factor loadings. Given data, the expectation of its type was set to  Figure 3 shows the trend of the likelihood climbs as adding the number of factors in the analysis. In the scree plot in Figure 2, three eigenvalues of the covariance matrix of the whole dataset are greater than one and the cumulative percentage of variance from the largest three eigenvalues reaches 78%. Thus we set = 3 in this analysis. The factor loadings for the three groups are shown in Figure 4, in which the vertical axis marks the fifteen regions. The vAD denotes the group of very mild AD. The log likelihood in (5) after the algorithm converges is −5475.814. The loading of structures has symmetric property and usually the right and left structures have similar loadings. Using the factor loadings to estimate and the expected group information given x by (9), we obtain the adjusted and turned proportions as [82.89%, 7.97%, 9.14%] . This may suggest the underlying variation among different groups of subject and need further investigation. Note that the reestimated proportion ℎ is not binary anymore.

Results
We show the results of conventional factor analysis in Figure 5 as a comparison. The program run on the mild AD patients in the dataset cannot achieve reproducible results; therefore, the quantity of mild AD's results in Figure 5 varies from time to time. The analysis for the AD group cannot converge, however the factor loadings are reproducible. The distance of whole factor loading matrices among the three groups for conventional factor analysis is 5.28 while the proposed method is 4.54. The correlation of the three-factor loading matrix estimated by conventional factor analysis methods is  Table 2 lists the values for all factors by the Kolmogorov-Smirnov test [16] on the factor scores against a Gaussian distribution. The test examines the difference between input distributions and a Gaussian distribution. The smaller the values, the more strongly the test rejects the Gaussian assumption. The algorithm tries to search for loads with normally distributed factor score, hence large indicates the factor fit well to Gaussian distribution.
The means (centers) of the clusters are shown in Figure 6. The means are near the origin and include negative value because the data are standardized by the subtraction of the overall mean of the data in the preprocess stage. The yellow color in the first column indicates that healthy controls have larger sizes in subcortical structures, and the second and the third columns indicate that the patients have smaller sizes in different subcortical regions in general. The AD patient has very small thalamus, putamen, and hippocampus. The hippocampus is related to memory and learning. The putamen is a structure involved in the regulation of voluntary movement. The abnormal pallidum in Figure 4 can cause movement disorders. Figure 7 shows the associations of first factor scores with the score of minimental state examination (MMSE) by both methods.

Conclusions
The proposed method finds closer and more correlated factor loadings than the conventional method because it considers the same error matrix for different groups of data. The result of conventional factor analysis having higher normality 8 Computational and Mathematical Methods in Medicine for AD patients than normal subjects is less convincing. Conventional factor analysis that decomposes the observed data together intermixes the latent factors. Taking the data apart will misseparate the noise. This work proposed using a mixture model of factor analysis method for neurodegenerative disease research by showing highly correlated factor loading across different groups of subjects and together with proper normality of the factor scores.