Alzheimer's Disease Classification Based on Image Transformation and Features Fusion

It has become an inevitable trend for medical personnel to analyze and diagnose Alzheimer's disease (AD) in different stages by combining functional magnetic resonance imaging (fMRI) and artificial intelligence technologies such as deep learning in the future. In this paper, a classification method was proposed for AD based on two different transformation images of fMRI and improved the 3DPCANet model and canonical correlation analysis (CCA). The main ideas include that, firstly, fMRI images were preprocessed, and subsequently, mean regional homogeneity (mReHo) and mean amplitude of low-frequency amplitude (mALFF) transformation were performed for the preprocessed images. Then, mReHo and mALFF images were extracted features using the improved 3DPCANet, and these two kinds of the extracted features were fused by CCA. Finally, the support vector machine (SVM) was used to classify AD patients with different stages. Experimental results showed that the proposed approach was robust and effective. Classification accuracy for significant memory concern (SMC) vs. mild cognitive impairment (MCI), normal control (NC) vs. AD, and NC vs. SMC, respectively, reached 95.00%, 92.00%, and 91.30%, which adequately proved the feasibility and effectiveness of the proposed method.


Introduction
Alzheimer's disease (AD) [1] is one of the currently incurable brain disorders, characterized by insidious onset and continuous development, which can cause a continuous decline of patient's cognitive and memory abilities, eventually can lead to abnormal life. Studies [2][3][4] suggested that significant memory concern (SMC) may be an early stage of mild cognitive impairment (MCI) and AD. Clinical symptoms are objectively poor memory and cognitive decline accompanied by changes in brain structure. And it maybe evolves into AD. It is very important to accurately diagnose the disease situation of patient because at present AD cannot be cured completely and can only be slowed or be prevented to further develop by treatment. In addition, the diagnosis of AD requires a mass of medical data which is inhomogeneous, so medical staff bear the heavy burden caused by man-made data analysis.
With the rapid development of deep learning [5][6][7][8] and medical imaging technologies, more and more researchers used medical imaging means such as magnetic resonance imaging (MRI) [9][10][11][12], positron emission computed tomography (PET), computed tomography (CT), and deep learning method to assist medical personnel to accurately diagnose and treat AD patients with various stages. Huang et al. [13] improved deep learning network, called VGGNet, which was utilized to classify three-dimensional (3D) images. In the experiments, a classifier was trained using T1-MRI and FDG-PET images, and high precision was achieved. Islam and Zhang [14] proposed a convolutional neural network by combining dense network modules. The experimental results showed that the accuracy of the nondementia stage was 99%. Zhang et al. [15] designed a convolutional neural network to extract the features of dual modalities including PET and MRI images. The extracted features and information resulted from the minimental status test (MMSE) and the clinical dementia rating (CDR) were fused. The accuracies of AD and normal control (NC), MCI and NC, and AD and MCI were 100%, 96.58%, and 97.43%, respectively. Jain et al. [16] adopted a mathematical model based on a convolutional neural network (CNN) with transfer learning to diagnose AD. In this model, VGG-16 trained on the ImageNet dataset was used as a feature extractor for classification tasks. The classification accuracies of AD and NC, AD and MCI, and NC and MCI reach separately 99.14%, 99.30%, and 99.22%. Most of the above methods are involved in binary classification in AD, NC, and MCI. However, during the evolution from NC to AD, some other stages such as SMC exist. Therefore, in this paper, some subtype classification research on AD with various stages is made.
PCANet is one of the common convolutional neural networks proposed by Chen et al. [17]. Subsequently, Li et al. [18] improved PCANet network from two-dimensional (2D) CNN to three-dimension (3D) CNN and diagnosed AD patients using structure MRI (sMRI). In this paper based on the 3D PCANet, the max-pooling layer and rectified linear unit (ReLU) layer are added behind each convolution layer to reduce the redundancy of image features. The improved 3D PCANet model is used to extract texture and nonlinear features of brain images. Experimental results demonstrate that the improved method can effectively increase the accuracy of classification.
As a noninvasively imaging technology, fMRI [19] is used to measure spontaneous brain activity which can reflect the status of different brain regions at different times. Many studies suggest that different levels of functional characteristics of fMRI such as amplitude of low-frequency amplitude (ALFF) [20], regional homogeneity (ReHo) [21], and regional functional correlation strength (RFCS) [22] can reflect brain diseases, which can assist medical personnel to diagnose brain diseases. Dai et al. [23] used different types of transforms on fMRI data including ALFF, ReHo, RFCS, and gray matter density (GMD) data and combined with multilevel characterization with multiclassifier (M3) to realize the diagnosis of AD patients. Good results were obtained. However, when multimodal data directly are used, feature redundancy often happens, and classification results will further be influenced. Aiming at the above problem, in this paper, two functional image transforms are selected for fMRI images including ALFF and ReHo and are used to extract features. Then, canonical correlation analysis (CCA) is used to fuse these two features.
Inspired by the above ideas, a method to diagnose AD based on different functional characteristics of fMRI and CCA fusion strategy is proposed in this paper. First, fMRI images are preprocessed and transformed into mReHo (mean ReHo) and mALFF (mean ALFF) images. Then, these two kinds of transformed images are inputted into the improved 3D PCANet model, respectively, for feature extraction. Next, these two features are fused by CCA. Finally, support vector machine (SVM) is utilized to classify. Contributions of this paper are as follows.
(1) Because fMRI data are four-dimensional (4D) form, and features cannot be directly extracted, 4D fMRI images are converted into 3D form using image transformation such as ALFF and ReHo (2) Traditional 3D PCANet network is improved by adding a maximum pooling layer behind each convolution layer which is used to extract image features. So, feature redundancy and human error can be effectively reduced (3) CCA is used to fuse two kinds of image features, and the accuracy of the model classification is improved (4) AD patients with different stages, especially including SMC, are fully automatically classified, which can assist medical personnel to accurately diagnose and analyze AD The rest of this paper is organized as follows. In Section 2, we introduced the experimental dataset and proposed method, respectively. In Section 3, we gave the experimental results and analysis of our proposed method and compared methods. A conclusion is drawn in Section 4.

Methodology
The framework diagram of our proposed method is shown in Figure 1. Specifically, first, fMRI images were preprocessed and transformed. Then, the transformed images were extracted features using the improved 3DPCANet. Then, these two kinds of features were fused by CCA. Finally, SVM was used to classify AD patients with different stages. The detailed steps are explained in the following sections.
2.1. Data Preprocessing. fMRI used in this study were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The fMRI dataset includes 34 patients with AD, 26 patients with SMC, 57 patients with EMCI, 35 patients with LMCI, 38 patients with MCI, and 50 NC. Detailed information is shown in Table 1.
fMRI data analysis is used to DPARSF [24] toolkit. Due to the instability of the initial fMRI signal, the first 10 time points of each fMRI data are deleted, and the points are made timing correction, realigned, and normalized. The images are registered to the template proposed by the Montreal Neurological Institute (MNI). The preprocessed images are shown in Figure 2. Low-frequency signal energy is utilized to represent the activity of neurons in different brain regions. Mean amplitude of low-frequency fluctuations (mALFF) is obtained by dividing the average ALFF of all voxels in the whole brain, because the brain structure of AD patients has changed, and likewise, it is believed that the activity of neurons in each brain area will also change compared with that of the normal control group. The preprocessed fMRI data are calculated mALFF. The after mALFF transformation images are shown in Figure 3.

ReHo Transform Images.
ReHo method was originally proposed by Jiang and Zuo [21]. which was used to measure the regional synchronization degree of fMRI time course. ReHo assumes that the selected voxels have temporary similarity with their adjacent voxels which is measured by Kendall's harmony coefficient. f ðM 1 , N 1 , O 1 , T 1 Þ is used to represent fMRI data, where M 1 is the number of rows, N 1 is the number of columns, O 1 is the number of layers, and T 1 is the number of time points for each voxel (the length of time series). The data contains , and the local consistency of the time series of j 1 th voxel and K 1 nearest neighborhood voxels (usually K 1 is 6, 18, 26) is calculated as follows: (1) The time series of voxel K 1 + 1 is expressed as a matrix X 1 with size T 1 × ðK 1 + 1Þ pixels, where X 1 ði, jÞ represents the i 1 th time point of j 1 th voxel (2) The element value in column vector is replaced by the rank of its column which is the ordinal number Step two Step three Step one Step two Conv kernel-A1 Unfold 1 PCA Max-pooling ReLU Step One Step two Step three Step two Conv kernel-A1 1 Unfold 1 U PCA Max-pooling ReLU Step One Step three    3 Computational and Mathematical Methods in Medicine of the value of the element in j 1 th column, and the matrix R 1 with size T 1 × ðK 1 + 1Þ is obtained where R 1 ði, jÞ represents the rank at the i 1 th time point of the j 1 th voxel (3) Kendall harmony coefficient of the time series of (K 1 + 1)th voxel is calculated, as shown in the following formula where The closer it is to 1, the greater the similarity of the time series is.
The voxels in the ReHo transform images show similar at the same time series. The greater Kendall's harmony coefficient is, the more similar these time series are. Mean ReHo (mReHo) is obtained by dividing the average ReHo value of the whole brain. After smoothing, the processed images are shown in Figure 4.
2.4. Improved 3DPCANet. PCANet [18] is a simple convolutional neural network, which mainly includes the principal component analysis (PCA), convolutional layer, binary hash, and block histogram. The objective of PCA is to achieve the eigenvector of the target matrix, and the eigenvector is taken as the convolution kernel parameter. The role of binary hashing and block histogram is indexing and pooling. In this paper, based on the traditional 3DPCANet, the max-pooling layer and ReLU layer are added behind each convolution layer to reduce the redundancy of image features after convolution and, similarly, to study texture features and nonlinear features of brain images. Subsequently, the classification accuracy improved. B training images fΓ j g B j=1 with size are considered as the input of 3DPCANet. Features are extracted process using 3DPCANet as follows.
(a) Input layer The e image of all the training samples is sided-cut into a block with the size of k × k × k. The L × W × H patch is produced by the e image. Subsequently, the patch is vectorized and standardized, i.e., u e,1 , u e,2 , ⋯, u e,o , ⋯, u e,L×H×W , where u e,o represents the oth vector of eth image (as shown in Equation (2)).
All voxel patches are processed by Equation (2) to obtain matrix U e .
where u e,1 , u e,2 , ⋯, u e,o , ⋯, u e,L×H×W are expressed different vectors from the same image. Whole training images are processed through Equation (3) to obtain matrix U (as shown in Equation (4)).
Then, the matrix U is processed dimensionality reduction by PCA, and PCA minimizes the reconstruction error on a group of standard orthogonal filters, which is described as where I C 1 is the identity matrix with size C 1 × C 1 . The solution V of this formula is the eigenvector of UU T . The expression of the PCA filter is as shown in Equation (6).
where mat is a function which maps the vector to the matrix w ∈ ℝ k×k×k . q l ðUU T Þ represents the lth feature vector of UU T , and w 1 l ∈ ℝ k×k×k is the lth filter generated in the first step. The PCA filter is convolved with the jth training image Γ e in the training image, which is expressed by where the symbol " * " represents convolution, and the filter w 1 l is used to convolve all B training images and to generate B * C 1 images. Then, the max-pooling and ReLU are performed on the image Γ 1 e,l generated by formula (7), which is expressed by where the symbol " ⊗ " represents the max-pooling operation. P 1 denotes the max-pooling layer in the first step, and Π 1 e,l represents the image after the maximum pooling layer and ReLU layer are processed.

(b) Middle layer
In the middle layer of PCA calculation, C 1 images are generated using the eth image in the B training images by the first step, among which lth image is performed similar operations such as Equations (4), and matrix Y is gotten. Computational and Mathematical Methods in Medicine The filter w 2 h is obtained by PCA in the first stage on the matrix Y. The image Π 1 e,l generated by formula (8) in the first step is convolved by the obtained PCA filter, which is described by formula (10).
Among B * C 1 images generated in the first step, each image is used to generate C 2 images by formula (8). The max-pooling and ReLU layer processing are performed on the image Π 2 e,l,h after convolution, as described by where P 2 denotes the max-pooling layer in the middle layer, and the image Ω 2 e,l,h is generated by max-pooling and ReLU layer operation.
(c) Output layer The Heaviside function Hð•Þ is used to binarize all B * C 1 * C 2 images, and weighted processing is performed to get O e,l .
Finally, β blocks with size k × k × k from each image O e,l are divided in the form of overlapping or nonoverlapping. In the program, we use R to represent the overlapping rate between blocks. The histogram of each block is made statistics shown by where BhistðO e,l Þ is a function of block division, histogram statistics, and concatenation of image O e,l . F e represents the final eigenvector of the eth training images Γ e using 3DPCANet. The hyperparameters of the improved 3DPCANet include the size k × k × k of the block, filtering parameters C 1 and C 2 in each stage, the number β of blocks, and overlapping rate R between blocks in the output layer.
In conclusion, convolutional kernel parameters of improved 3DPCANet are studies by PCA, and improved 3DPCANet does not require back propagation. Therefore, improved 3DPCANet need not a host of dataset training. It is suitable for small datasets. Due to the small amount of AD data, improved 3DPCANet is used as the feature extraction model in this paper. We use improved 3DPCANet to extract features of the transformation images mALFF and mReHo, respectively. Then, these two kinds of features are fused by CCA. [25] is one of the algorithms which is used to find correlations between different kinds of data. X 2 ðn 2 , m 2 Þ and Y 2 ðn 3 , m 2 Þ is assumed to represent different kinds of datasets, respectively, where m 2 represented the number of samples of the two datasets. Similarly, n 2 and n 3 is represented dimensions of two data features, respectively. CCA is used to reduce the dimension of X 2 and Y 2 . Likewise, the feature vectors X 2 ′ ðn 1 ′ , m 2 Þ and Y 2 ′ ðn 1 ′ , m 2 Þ of n 1 ′ -dimension are obtained as described by

Canonical Correlation Analysis. CCA
where a 1 and b 1 represented the projection vectors of X 2 and Y 2 , respectively. The projection criterion of CCA is that when the number of dimensions of the two sets of data is reduced to n′ dimension, the correlation coefficient of them is the largest. The objective function of CCA is showed by where a 1 and b 1 is obtained by maximizing ρðX 2 ′, Y 2 ′Þ to get the corresponding projection vectors X 2 ′ and Y 2 ′ . In this paper, CCA was used to find the correlation features between the transform images including mALFF and mReHo of the same patient. The correlation features were fused. Subsequently, fused features were inputted into SVM classifier to achieve the classification of AD patients with different stages.
2.6. SVM. Support vector machine (SVM) [26] is a supervised learning classifier, and the maximal margin hyperplane of learning samples is obtained when making boundary decisions. The decision function of the SVM classifier is expressed by where T m represents the hyperplane, and ω is the normal vector of the hyperplane. p m denotes the eigenvector, and Z represents the bias. ζ m is the relaxation coefficient, C represents the penalty factor, and N represents the number of sample. Sequential minimal optimization (SMO) is the most common method to find the global optimal solution of SVM. In addition to SMO algorithm, other methods including 5 Computational and Mathematical Methods in Medicine elephant herding optimization [31] and krill herd algorithm [32] also can solve the above same problem. SVM has two design methods including one-to-one and one-to-many. In this paper, we choose one-to-one way for SVM classifier and SMO optimization algorithm.

Experimental Results and Analysis
In this paper, two kinds of image transformation such as mALFF and mReHo are used. The improved 3DPCANet is used for feature extraction, and SVM is used for  Computational and Mathematical Methods in Medicine classification. All the deep learning models used in this study were built using the pytorch framework, running on a server with a 1.7 GHz Intel Xeon E5-2603 v4 CPU, 16.0 GB RAM, NVIDIA RTX2070 GPU 8 GB, and the Windows 10 (64bit version) operating system. The evaluation indicators used in this paper include accuracy, sensitivity, and specificity (as shown in Equation (17)). In the traditional experiment, only the accuracy is used as evaluation criteria of the model, and consequently, the algorithm performance cannot be roundly evaluated in numerous aspects. For original dataset in this paper, the positive class is far greater than the negative class. Imbalance of data is serious. Therefore, the F1 value and the area under curve (AUC) of receiver-operating characteristic (ROC) and the coordinate axis are both used as the evaluation index, among which the F1is calculated by precision and sensitivity (as shown in Equation (18)).
As can be seen from the results in Table 2 that for mALFF transformation, the experimental results were significantly improved with MCI vs. AD, other experiments also have different improvements on the basis of maintaining the original results. For mReHo transformation, results of 3 groups of data are improved significantly on all indicators in SMC vs. MCI, SMC vs. AD, and EMCI vs. LMCI. The experimental results show that using improved 3DPCANet, more discriminative image features can be extracted, and the classification results are effectively promoted because the max-pooling layer and ReLU layer are added behind (2) The role of fusion strategies To verify the effectiveness of the data fusion, multimodal data fusion and multiple classifiers are used in this paper. The detailed results are shown in Table 3.
As can be seen from the results in Table 3 that compared with single-modal, fusion method can effectively improve the experimental results. However, the results of direct fusion of two data in series are still poor due to feature redundancy. If CCA is used to fuse the features of two transformed images, better results for AD patients at different stages are obtained because CCA can find the most relevant classification features of the two images, and the fusion features enhance the classification discriminative power.
Among these results, the classification accuracy of SMC vs. MCI is 95.00%, and the F1 value and AUC are 95.65% and 92.71%, respectively. Evaluation indicators of NC vs. SMC, NC vs. MCI, NC vs. AD, and MCI vs. AD have been improved compared with those of single modal. Obviously, more effective classification features of AD patients at different stages can be mined by CCA.
SMC and MCI are early stages of AD, so brain structure changes are small and clinical diagnosis can easily result in misdiagnosis. NC vs. MCI and SMC vs. MCI are classified with accuracy of 88.89% and 95.00%. F1 are 86.96% and 95.65%, and AUC are 82.22% and 92.71%, respectively. The experimental results show that the proposed method in this paper can effectively classify the different stages of AD, including the initial stage that is difficult to diagnose. In addition, because the results on SVM are better than those with softmax classifier, in this paper, SVM is used in subsequent experiments. groups of experiments, namely, mALFF and mReHo image classification, mALFF and mReHo tandem fusion classification, and mALFF and mReHo CCA fusion classification. Specificity is presented as the abscissa, and sensitivity is represented by the ordinate. It can be seen from the ROC curves in the 4 subimages that the proposed method in this paper has the largest AUC area compared with the single-modal method and direct fusion in series.
In this paper, mReHo map and mALFF map on MCI vs. AD, NC vs. AD, NC vs. MCI, NC vs. SMC, SMC vs. AD, and SMC vs. MCI are visualized by using REST Slice Viewer, respectively, as shown in Figures 6 and 7. By these maps, we can find the differences using two samples T-test.
It can be seen from Figures 6 and 7 that brain regions are influences by mReHo transformation including precentral, calcarine, posterior cingulate cortex, cuneus, lingual, medial and paracingulate gyrus, superior occipital gyrus, fusiform, superior parietal gyrus, middle temporal gyrus, and hippocampus. And brain regions are influenced by mALFF transformation including fusiform, inferior temporal gyrus, hippocampus, middle occipital gyrus, calcarine, middle temporal gyrus, precentral, lingual, and cingulate gyrus. So, the brain regions such as fusiform, hippocampus, calcarine, middle temporal gyrus, precentral, and lingual contributes important role for classification, and these regions are also focused on in this paper.

(3) Comparison results of different methods
In this paper, experimental results were compared with the state-of-the-art methods. As can be seen from the results in Table 4. Fewer experimental datasets were used to obtain better classification results in NC vs. MCI and MCI vs. AD, and the NC vs. AD experimental results are close to the results of other methods. Because CCA is used to perform dimensionality reduction and fusion processing on fMRI transformation images in the proposed method in this paper, it can effectively reduce feature redundancy and image noise. Therefore, the accuracy of image classification is increased, and the best experimental results are obtained.

Conclusion
In this paper, an AD classification method based on image transformation and features fusion is proposed. The main ideas include that, firstly, fMRI data, respectively, are made image transformation on mALFF and mReHo. Then, an improved 3DPCANet for feature extraction in two kinds of transformation images is proposed and is, respectively, used to extract features, and these two kinds of features are fused by CCA. Finally, SVM is used to classify AD patients with different stages. SMO was used to find the global optimal solution for SVM. Besides the SMO method, some of the most representatively computational intelligence algorithms can also be used to solve the above problem, like monarch butterfly optimization (MBO), earthworm optimization algorithm (EWA), elephant herding optimization (EHO), moth search (MS) algorithm, slime mould algorithm (SMA), and Harris hawks optimization (HHO). The experimental results show that in the proposed method, improved 3DPCANet reduces feature redundancy and image noise, and texture and nonlinear features of brain images can be extracted, because the maximum pooling layer and ReLU layer are added behind each convolutional layer, which makes the classification features more abundant and robust. Compared with the single model method, if the fusion strategy of two fMRI features like CCA is used, better results can be obtained, which show that fusion strategy can assist medical personnel to accurately diagnose SMC, MCI, EMCI, LMCI, and AD patients.

Data Availability
fMRI dataset used in this study comes from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The fMRI dataset includes 34 AD patients, 57 EMCI patients, 35 LMCI patients, 26 SMD patients, and 50 NC. Experimental data is obtained by sending an email to the ADNI and signing the related agreement. Since, in this laboratory, the classification of Alzheimer's disease is studied by the fusion of fMRI and sMRI image information, the subjects possessing fMRI images and sMRI images are selected in the ADNI dataset. The link on ADNI dataset is http://adni.loni.usc.edu/.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript.

Acknowledgments
This work is supported by the Joint Project of Beijing Natural Science Foundation and Beijing Municipal Education Commission (no. KZ202110011015).