Alzheimer’s disease (AD) is an irremediable neurodegenerative disorder that causes dementia in elderly people around the globe. It has been predicted that the pervasiveness of AD will double within the next 2 decades and that one out of every 85 people will be afflicted with the disease by 2050 [
An accurate and early diagnosis of AD and identification of the risk of progression from mild cognitive impairment (MCI) to AD provide AD sufferers with awareness of the condition’s severity and allow them to take preventative measures, such as making lifestyle changes and taking medications [
Earlier, majority of diagnosis work was accomplished manually or semimanually for measuring a priori region of interest (ROI) of MRI, based on the reality that subjects with AD experience have more cerebral atrophy when compared to HCs [
The aim of this article is to present an automated approach for diagnosing AD by using the “whole brain analysis” method. It has achieved popularity, since it examines entire voxels of the brain. It is not essential to segment the brain as earlier, and it does not require any biomarker for the classification purpose. The main drawback is dimensionality that can be resolved through highspeed computers, which is comparably inexpensive [
Scholars have presented different methods to extract effective features for the detection of AD and other types of pathological brain disease. Additionally, classification models and methods survive; nevertheless, not all of them are suitable for the processing of MR brain images. Based on latest literature, we found two drawbacks with the previous work: (i) The discrete wavelet transform (DWT) is usually utilized for feature extraction. The DWT has better directional selectivity in horizontal, vertical, and diagonal directions and has better image representation than Fourier transform, but its major drawbacks are that it has poor directionality, is sensitive to shifts, and lacks phase information. (ii) Most of the stateoftheart mechanisms consider only single slicebased detection (SSD) per patient. The obtained slices may not contain the foci of the disease.
To tackle above problems, we suggested two improvements. First, we propose a DTCWT that possesses attractive properties for image processing, including shift invariance and high directionality. Second, we consider multiple slices for each patient unlike previous studies, so that information gain is more consistent, reliable, and accurate. In hospitals, multiple slicebased detection is utilized because of its inexpensiveness. Research has clearly showed that the DTCWT is more suitable than the traditional wavelet domain for feature extraction [
Our contribution aims to introduce a novel method for AD detection with higher accuracy than stateoftheart methods, on the basis of DTCWT, PCA, and ANN technique. Furthermore, we build a computeraided diagnosis (CAD) system, which can be utilized in the early diagnosis of ADrelated brain area and subjects. Our objective is to develop assisting tool for clinicians.
All of the preprocessing methods are used to obtain good results. To show effectiveness of our proposed system, we have evaluated performance measures including accuracy, sensitivity, specificity, precision, and bar plot for the comparison of the proposed method with the existing systems. The paper is arranged as follows. Section
In our study, the dataset is accessed from Open Access Series of Imaging Studies (OASIS). OASIS is a project, for compiling and sharing MRI datasets of the brain to make such data accessible to the scientific community. The data are accessible at
Dataset sample (axial view after preprocessing).
Normal
Alzheimer’s disease
OASIS provides two types of data: crosssectional and longitudinal MRI data. In this study, we used crosssectional MRI data because we aimed to develop an automatic system for detecting AD, which would not require longitudinal data that had been gathered from AD patients over long periods of time.
The dataset consists of 416 subjects whose ages are between 18 and 96. In our study, we consider 126 samples (including 28 ADs and 98 HCs). Table
Statistical data of the participants.
Factor  HC  AD 

No. of patients  98  28 
Age (years)  75.91 ± 8.98  77.75 ± 6.99 
Education  3.26 ± 1.31  2.57 ± 1.31 
Socioeconomic status  2.51 ± 1.09  2.87 ± 1.29 
CDR  0  1 
MMSE score  28.95 ± 1.20  21.67 ± 3.75 
Gender (M/F)  26/72  9/19 
The dataset contains information about the patient’s demographics. The demographic features contain gender (M/F), age, education, socioeconomic status, and handedness. The mini mental state examination (MMSE) is a short 30point questionnaire test utilized to monitor for cognitive impairment and dementia. The MMSE test comprises simple questions and problems in numeral areas: the time and place, repeating list of words, arithmetic, language utilization, and comprehension, in addition to basic motor skills. Clinical dementia rating (CDR) is a numeric scale measuring the severity of symptoms of dementia. The patients’ cognitive and functional performances were accessed in six areas: memory, orientation, judgement and analytical, community affairs, residence and hobbies, and individual care. The patients’ CDR ranks and education level are listed in Tables
Clinical dementia rating scale.
CDR  Rank 

0  Nondementia 
0.5  Very mild dementia 
1  Mild dementia 
2  Moderate dementia 
Education codes.
Code  Description 

1  Beneath high school graduate 
2  Secondary school graduate 
3  Some college 
4  College graduate 
5  Above college 
The proposed method consists of three important stages, namely, feature extraction using DTCWT, feature dimensionality reduction using PCA, and classification using feedforward artificial neural network. The overall block diagram of the suggested method is shown in Figure
Block diagram of the proposed system.
For each patient, each scanning session involves the MR of three or four T1weighted image scans. In order to add the signaltonoise ratio (SNR), all indicated MRI scans with the identical protocol of the same individual are motioncorrected and spatially coregistered, to the Talairach coordinate space to produce an averaged image, and then are brainmasked. The motion correction recorded the 3D images of all scans and then developed an average 3D image in initial acquisition space. Also, the scans are then resampled to 1 mm × 1 mm × 1 mm. The obtained image is converted from acquisition space to Talairach coordinate space. Lastly, the brain extraction is achieved.
We used MRIcro software (which can be downloaded from
The discrete wavelet transform (DWT) is an image processing method [
The
Here, the
The dualtree complex wavelet transform (DTCWT) is a modified version of the traditional DWT. To help boost the directional selectivity impaired by DWT, DTCWT is proposed. The traditional DWT is shift variant because of the decimation operation used in the transform. As a consequence, a small shift in the input signal can create a very dissimilar set of wavelet coefficients formed at the output. It utilizes two real DWTs processing input data in parallel [
The DTCWT provides a solution for “shiftinvariant problems” as well as for “directional selectivity in two or more dimensions,” which are both shortcomings of the ordinary DWT [
Suppose one is provided with CQF pairs
if and only if
The DTCWT is implemented utilizing two wavelet filter banks functioning in parallel.
A 2D image
A study was carried out in [
The coefficient from the DTCWT enlarges the dimensionality of feature space that makes the classification job more complicated.
Additionally, it leads to excessive computational overhead and enormous memory storage. As a result, it is essential to lower the dimension of the feature set and get the significant features to boost the classification result. Since the last two decades, a method called PCA has earned much more attention for data visualization and reduction of dimensionality. It systematically projects the initial input data to a lowerdimensional space, wellknown as principal subspace through an orthogonal transformation while preserving most of the data variations. For a stated set of likely correlated variables, these transformation outcomes in a set of values of linearly uncorrelated variables are called as principal components (PCs). All of the steps to implement PCA are demonstrated in Algorithm
Let us consider a set of data. PCA is employed to find a linear lowerdimensional reduction of the dataset. In this case, the variance of the constructed data is preserved. PCA limits the feature vectors to the component it selects, which leads to an effective classification algorithm. The main idea behind implementing PCA is reduction of the dimensionality of the DTCWT coefficients, which results in more adequate and accurate classification.
The following algorithm is utilized to obtain the principal components from the input matrix and finally fed to the feedforward neural network. Now, the input matrix possesses only these PCs. Hence, the size of the matrix is reduced. Therefore, feature extraction is done in two steps: DTCWT extracts the wavelet coefficients, and essential coefficients are later selected by the PCA as described in Algorithm
Let
Accomplish the following steps:
Feedforward neural networks (FNN) are broadly used in pattern classification because they do not need any information regarding the probability distribution or a priori probabilities of distinct classes. Neural networks (NN) harness power from their densely parallel structure and their ability to acquire information from experience. As a result, they can be utilized for accurate classification of input data into different classes, provided that they are pretrained. The architecture of a multilayer feedforward neural network is shown in Figure
Architecture of a multilayer feedforward neural network.
Three factors need to be considered in designing an ANN for a specific application: (i) the topology of the network, (ii) the training algorithm, and (iii) the neuron activation function. A network may have many layers of neurons, and its complete architecture may possess either a feedforward or a back propagation structure. A multihiddenlayer back propagation NN with sigmoid neurons in its hidden layer is chosen. Similarly, linear neurons are selected for the output layer. The training vector is provided to the NN, which is instructed batch mode [
Mathematicians have already proven that a conjugate gradient (CG) algorithm, probing along conjugate gradient directions, produces a faster convergence than the steepest descent directions do. Among CG algorithm, the scaled conjugate gradient (SCG) method is the most powerful [
Let
The calculation of the outputs of all neurons in the hidden layer is done by
Here,
The outputs of all neurons in the output layer are stated as follows:
Here,
The error is articulated as the MSE of the distinction among output and target value [
Let us consider that there are
The hidden layer or the output layer
Hidden or output layer
Connection weight matrix between (a) input layer and hidden layer and (b) hidden layer and output layer.
The proposed method is implemented using the 32bit Matlab 2015b environment on Intel(R) Core (TM) i32120, with a processing speed of 3.30 GHz and 2 GB of RAM running Microsoft Windows 7. Readers can repeat our results on any computer with which MATLAB is compatible.
This article aims at developing a CAD of AD brain system with better performance. The pseudocode is listed in Table
Pseudocode of the proposed system.
Step 1: Import. 
(a) Import the OASIS dataset. 
(b) Ensure MRI as normal or abnormal brain. 
Step 2: Resample the image into 256 × 256. 
Step 3: Compute 5level DTCWT on the preprocessed images. 
Step 4: Perform PCA on the obtained matrix. The selected number of principal component (PC) should preserve at least 90% of total variances. 
Step 5: Train feedforward neural network by taking input as reduced set of feature vectors and their corresponding class labels. 
Step 6: Evaluation 
(a) Obtain the confusion matrix. 
(b) Calculate the classification accuracy and other essential parameters. 
It is always a major concern to find the optimum value of decomposition level
In this paper, we extract the DTCWT coefficients from the input images. The features of 5th resolution scales are selected because they provide higher classification performance than other resolution level scales. The DTCWT has a multiresolution representation as the wavelet transform does. For disease detection, it is preferable to use a few intermediate coefficient scales as the classifier input. The lowest scales have lost fine signal details whereas the most highly detailed scales contain mostly noise. Therefore, we prefer to choose only a few intermediate scales for the DTCWT coefficients. These obtained coefficients are sent as input to the PCA.
Excessive features increase calculation time as well as memory storage. In addition, they sometimes make classification much more complicated, which is known as curse of dimensionality. In this article, we utilized PCA to decrease the number of features.
Therefore, the extracted feature from DTCWT is sent to the PCA for the feature reduction. For each image, there are 768 features after 5th level of decomposition. As we have employed 32 slices for each patient, the total number of features becomes 32 × 768. Now, the image is reformed into a row vector of 1 × 24,576. The row vectors of 126 subjects are arranged into an “input matrix” with dimensions of 126 × 24,576. It is still too large for calculation. So, the input data matrix is now decomposed into the principal component “score matrix” and the “coefficient matrix.” The score matrix size after decomposition is 126 × 125. Here, the rows and columns of “score matrix” correspond to subjects and components, respectively.
The variance with the number of principal components from 1 to 18 is listed in Table
Detailed data of PCA.
No. of prin. comp.  1  2  3  4  5  6  7  8  9 

Variance (%)  63.18  72.15  77.08  80.28  83.05  84.55  85.68  86.59  87.47 
No. of prin. comp.  10  11  12  13  14  15  16  17  18 
Variance (%)  88.18  88.28  89.41  89.96  90.44  90.86  91.23  91.58  91.91 
Variances versus number of principal component.
The 14 PCs are directly sent to BPNN. Thus, the number of input neurons
There are several techniques to evaluate the efficiency of classifiers. The performance is calculated on the essence of the overall confusion matrix. It holds the correct and incorrect classification results. Table
Confusion matrix for a binary classifier to discriminate between two classes (
True class  Predicted class  


 






Evaluation indicators.
Indicator  Explanation 

TP  True positive, anticipating an AD to AD 
FP  False positive, anticipating an HC to AD 
TN  True negative, anticipating an HC to HC 
FN  False negative, anticipating an AD to HC 
Here, AD brains are assumed to hold the value “true” and NC ones are assumed to hold the value “false” following normal convention.
The accuracy is the most accepted empirical measure to access effectiveness of classifier. It is formulated by
Sensitivity is the measure of the proportion of true positives that are correctly classified, and specificity is the measure of the proportion of negatives which are correctly classified. These are calculated by
The precision and the recall are formulated by
In order to execute a strict statistical analysis, stratified crossvalidation (SCV) is used. We apply a 10fold CV technique in this experiment because of two reasons: (1) to make balance between reliable estimate and computational cost and (2) for providing a fair comparison because the common convention was to take the value of
A 10fold CV means we have to divide our dataset randomly into ten mutually exclusively folds of approximately equal size and almost the same distribution. In each run, 9 subsets will be used for training, and the remaining one will be utilized for the validation. This process is repeated 10 times, in which every subset is utilized for validation once. The 10fold CV is repeated 50 times; namely, a 50x 10fold CV is implemented.
The accuracies, sensitivities, and specificities obtained from the 50 runs of 10fold CV are presented in Table
Algorithm performance comparison for MRI brain image.
Algorithm  Accuracy (%)  Sensitivity (%)  Specificity (%)  Precision (%) 

DTCWT + PCA + FNN (proposed) 




VBM + RF [ 
89.0 ± 0.7  87.9 ± 1.2  90.0 ± 1.1  N/A 
DF + PCA + SVM [ 
88.27 ± 1.9  84.93 ± 1.21  89.21 ± 1.6  69.30 ± 1.91 
EB + WTT + SVM + RBF [ 
86.71 ± 1.93  85.71 ± 1.91  86.99 ± 2.30  66.12 ± 4.16 
BRC + IG + SVM [ 
90.00  96.88  77.78  N/A 
BRC + IG + VFI [ 
78.00  65.63  100.00  N/A 
Curvelet + PCA + KNN [ 
89.47  94.12  84.09  N/A 
US + SVDPCA + SVMDT [ 
90.00  94.00  71.00  N/A 
To further determine the effectiveness of the proposed “DTCWT + PCA + FNN,” we compared it with seven stateoftheart approaches in Table
Finally, the proposed “DTCWT + PCA + FNN” achieved an accuracy of 90.06 ± 0.01%, a sensitivity of 92.00 ± 0.04%, a specificity of 87.78 ± 0.04%, and a precision of 89.60 ± 0.03%. With respect to classification accuracy, our approach outperforms five other methods and is almost equal to the accuracies of the remaining two methods that did not account for means and standard deviations. We also achieved a promising sensitivity and a promising specificity. Hence, our results are either better than or comparable to those of the other methods. The bar plot of the algorithm comparison is shown in Figure
Bar plot of the algorithm comparison ([
Acronyms list.
Acronym  Definition 

AD  Alzheimer’s disease 
HC  Healthy control 
MR(I)  Magnetic resonance (imaging) 
DTCWT  Dualtree complex wavelet transform 
PCA  Principal component analysis 
FNN  Feedforward neural network 
DWT  Discrete wavelet transform 
OASIS  Open Access Series of Imaging Studies 
MMSE  Mini mental state examination 
CDR  Clinical dementia rating 
SNR  Signaltonoise ratio 
VBM  Voxelbased morphometry 
RF  Random forest 
DF  Displacement field 
SVM  Support vector machine 
EB  Eigenbrain 
WTT  Welch’s 
RBF  Radial basis function 
BRC  Brain region cluster 
IG  Information gain 
KNN 

ANN  Artificial neural network 
SCV  Stratified crossvalidation 
We presented an automated and accurate method for AD identification based on a DTCWT, PCA, and FNN. The results showed that the proposed method achieved an accuracy of 90.06 ± 0.01%, a sensitivity of 92.00 ± 0.04%, a specificity of 87.78 ± 0.04%, and a precision of 89.6 ± 0.03% and outperformed 7 stateoftheart algorithms.
We will focus our future research on the following aspects: (i) testing other advanced variants of wavelet such as 3DDTCWT, wavelet packet analysis, and fractional calculus; (ii) utilizing different feature reduction techniques such as independent component analysis (ICA) [
The authors declare that they have no conflict of interest.
This research was supported by the Brain Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (NRF2014M3C7A1046050). And this study was financed by the research fund of Chosun University, 2017.