Twin SVM-Based Classification of Alzheimer's Disease Using Complex Dual-Tree Wavelet Principal Coefficients and LDA

Alzheimer's disease (AD) is a leading cause of dementia, which causes serious health and socioeconomic problems. A progressive neurodegenerative disorder, Alzheimer's causes the structural change in the brain, thereby affecting behavior, cognition, emotions, and memory. Numerous multivariate analysis algorithms have been used for classifying AD, distinguishing it from healthy controls (HC). Efficient early classification of AD and mild cognitive impairment (MCI) from HC is imperative as early preventive care could help to mitigate risk factors. Magnetic resonance imaging (MRI), a noninvasive biomarker, displays morphometric differences and cerebral structural changes. A novel approach for distinguishing AD from HC using dual-tree complex wavelet transforms (DTCWT), principal coefficients from the transaxial slices of MRI images, linear discriminant analysis, and twin support vector machine is proposed here. The prediction accuracy of the proposed method yielded up to 92.65 ± 1.18 over the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, with a specificity of 92.19 ± 1.56 and sensitivity of 93.11 ± 1.29, and 96.68 ± 1.44 over the Open Access Series of Imaging Studies (OASIS) dataset, with a sensitivity of 97.72 ± 2.34 and specificity of 95.61 ± 1.67. The accuracy, sensitivity, and specificity achieved using the proposed method are comparable or superior to those obtained by various conventional AD prediction methods.


Introduction
Alzheimer's disease (AD) is the most familiar cause of dementia, with patients comprising 50%-80% of all dementia sufferers. The disease affects memory, cognition, and behavior. As AD is a neurodegenerative condition, several types of atrophy occur in the hippocampus and other areas of the brain. Despite being the 6th leading cause of death in the USA, it is not a common disease. Currently, there is no cure; however, some preventive measures can be taken to mitigate risk factors and slow the degenerative process. An estimated $605 billion globally and $220 billion in USA is spent annually on diagnosing AD. Many people suffer from AD worldwide, and demands on researchers are growing rapidly. MRI is an effective medical image construction technique, as it has the proven potential to view structural changes in the human brain, internal organs, and other tissues.
MRI produces high-quality structural images, providing distinctive tissue information, which enhances both the accuracy of brain pathology diagnosis and quality of treatment. A key advantage of this technique is its noninvasiveness. Many studies have been conducted using multivariate analysis algorithms and structural/functional MRI to classify neurological diseases [1][2][3]. A primary focus of these studies was the large dimensionality of extracted features and the identification of disease signatures among them where the most discriminative information of the said diseases exists. Results showed significant cerebral structural changes in several brain ROIs, particularly in the hippocampus and entorhinal cortex [4]. Global and internal intensity-based features, [3,5], as well as geometric-and surface-based features [6,7], have been used in earlier studies for classifying disease. The authors presented an electroencephalogram (EEG) coherence study of Alzheimer's disease using a probabilistic neural network (PNN) and showed significant accuracy in distinguishing true AD from the control groups [8]. Chaplot et al. [9] stratified AD using discrete wavelet coefficients as a feature for training and testing Support Vector Machines (SVMs) and neural network classifiers. Extracting essential discriminatory features from MRI brain images is imperative for competent analysis of disease diagnosis. The preferred feature extraction methods, amongst those most frequently used, are independent component analysis [10], wavelet transform [11], and Fourier transform [12]. This study has been conducted using discrete wavelet features and the k-nearest neighbor algorithm (k-NN) [11] on an artificial neural network (ANN) [11,13]. Zhang and Wang [14] ran AD prediction models using displacement field estimation between AD and healthy controls using an SVM, twin support vector machine (TWSVM), and generalized eigenvalue proximal SVM (GEPSVM) as classifiers. Tomar and Agarwal [15] reviewed several types of twin SVM algorithms, their optimization problems, and their applications.
The biomarkers used in our proposed method are MRI images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies (OASIS) datasets. Our primary reason for using DTCWT over DWT is its effective representation of singularities (curves and lines), even though DWT has the advantage of representing the functions in multiscale and compressed forms. In DTCWT, shifts in magnitude variance can be achieved to a higher degree [16]. In our proposed method, DTCWT coefficient-based AD classification has been proposed using principal component analysis and linear discriminant analysis of extracted coefficients; a TWSVM was utilized as a supervising technique. Classification performance is documented regarding accuracy, sensitivity, and specificity, after applying 10-fold cross validation and running the program 10-20 times. Our method produced superior results when compared with several conventional AD classification methods.

Material and Methods
A total of 172 subjects from the ADNI dataset were used-86 AD and 86 HC. In addition, we used 95 subjects from the OASIS dataset-44 HC and 51 subjects suffering from very mild to mild AD.
2.1. Overview of Experimental Data. Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http:// adni.loni.usc.edu).
The ADNI was launched in 2003 as a public-private partnership led by Principal Investigator Michael W. Weiner, MD. The primary goal of the ADNI is to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early-onset Alzheimer's disease AD. For up-to-date information, visit www.adni-info.org. The demographic details of data used from the ADNI are shown in Table 1.
In addition, we utilized MRI images downloaded from the OASIS dataset. OASIS is a database designed to compile MRI datasets and make them freely accessible to the scientific community. OASIS compiles two types of data: crosssectional MRI data and longitudinal MRI data. Our study utilized cross-sectional MRI data, as our aims are to develop an automatic system for detecting AD, for which longitudinal MRI data is not optimal.
The OASIS dataset consists of 416 subjects aged between 18 and 96 years. Our study included 51 AD patients (35 with CDR = 0.5 and 16 with CDR = 1) out of 100 having dementia and 44 HC out of 98 normal subjects. Table 2 shows the demographic details of the subjects used in our study. Both men and women are included and all subjects are right handed. The scale of the CDR is listed in Table 3.

Proposed
Approach. The proposed approach is made up of 4 phases: preprocessing and slice extraction, feature extraction, projection of features into lower dimension, and efficient classification of the disease. Figure 1 shows all phases in detail.

Preprocessing and Slice
Extraction. All MRI images used for training and testing the TSVM of our proposed approach are viewed using the ONIS toolbox and exported   functions and complex-valued wavelets. DTCWT engages two real DWTs, which provide the real and imaginary components of the wavelet transform, respectively. In addition, two filter bank types are set: analysis filter banks and synthesis filter banks. These filter banks are used for implementing DTCWT to ensure that overall transformation becomes almost analytic, as shown in Figure 3. The DTCWT can be denoted in matrix form as where D h and D g are rectangular matrices. For the input image x, complex wavelet coefficients can be represented as The DTCWT coefficients of input images are shift invariant; they do not change when an image is shifted in time or space. In addition, DTCWT employs segregation of 6 diverse directions (±15, ±30, and ±45) for 2D images and 28 different directions for 3D images, while conventional DWT only allows for isolation of horizontal and vertical directions. For  each 2D slice subject image, we extracted 5-level DTCWT coefficients from one scale.

Principal Component Analysis.
Principal component analysis (PCA) [17] is a dimensionality reduction technique that is applied to map features onto lower dimensional space. This data transformation may be linear or nonlinear. One of most frequently used linear transformation is PCA, which is an orthogonal transformation used to convert possibly correlated samples to linearly uncorrelated variables. The number of principal components is lower than or equal to the number of original variables. The PCA conversion process is shown in Figure 4. The PCA is summarized as follows: (i) Calculating the mean of the data and zero mean data (ii) Constructing the covariance matrix (iii) Acquiring the eigenvalue and the eigenvector (iv) Projecting the data matrix with eigenvectors corresponding to the highest to lowest eigenvalues.

Linear Discriminant Analysis.
A generalized Fisher linear discriminant [18] is used for the linear projection of features to separate two or more classes. To make effective and discriminative projected features, PCA coefficients can be projected on to a new LDA projection axis.
To find the class separation projection axis, it is necessary to determine between-class scatter and within-class variability.
The between class variable matrix can be denominated by sample variance as Within class variance matrix can be expressed as where z k is kth sample variable belonging to a class. The generalized Rayleigh coefficient is where W is the matrix for LDA coefficients. This can be characterized using the generalized eigenvalue problem as where λ is the eigenvalue. If S w is singular matrix, (6) can be simplified as where the eigenvectors of S w −1 S B will be W. The eigenvector matrix will be W LDA , The PCA coefficients can be projected onto l lower dimensional LDA projection termed by eigenvectors corresponding nonzero higher energy eigenvalues, where l ≤ k.  The final feature matrix F is evaluated as 2.2.5. Twin Support Vector Machine. Jayadeva and Chandra [19] proposed a novel dual hyperplane-based variant twin SVM. The concepts of generalized eigenvalues proximal support vector machine (GEPSVM) are applied here, which require two nonparallel optimum hyperplanes for each class. There are two quadratic programming (QP) problems optimized as TSVM pairs, as in a typical SVM. Mathematically, the TSVM primal problem can be optimized by solving the following two quadratic programming problems: Here, X i i = 1, 2 are input features, w i i = 1, 2 are the normal hyperplane vectors, b i i = 1, 2 are bias terms, C i i = 1, 2 are the vectors of positive penalty parameters, o i i = 1, 2 are the suitable dimensional matrices of ones, and ξ i i = 1, 2 are the slack variables. Hence, the TSVM finds two hyperplanes, each of which is nearer to the data sample of one class than to that of another. Therefore, minimizing (11) and (12) will compel the hyperplanes to approximate the data of each class and enhance the classification rate. The optimization problem can be solved in the Lagrange duality principle [15].

Results and Discussions
3.1. Background. In this article, our proposed approach is presented using Fisher linear discriminant analysis of DTCWT principal components. The details of our proposed method are shown in Figure 1. The advantage of WT over FT is its multiple-scaled representations and frequency components with spatial domain information. Fourier coefficients only produce image frequency information, whereas wavelets contain powerful observations of the spatial and frequency domain in a multiscaled format. In addition, wavelet representation is spatially localized; Fourier functions are not spatially localized as they consist only of image frequency components. MRI images can be represented and processed at numerous resolutions and can therefore be used as an incisive framework for processing multiresolution images. Finally, DWT coefficients can be extracted by using arrays of low and high pass filter banks.
However, there are multiple drawbacks to conventional wavelet transform. These include drift in wavelet coefficient oscillation towards positive and negative around singularities, shift variance of signal (which may cause oscillation of wavelet coefficient samples around singularities), substantial aliasing of amply spaced wavelet coefficient patterns, and lack of directional selectivity perturbs to process and model geometric image features (such as edges and ridges). In these cases, flaws regarding conventional DWT are not experienced by Fourier transform. Inspired by Fourier transform, our improved DTCWT is used to overcome these drawbacks. Previous studies have shown that DTCWT feature-based AD disease detection performs better than typical DWT-based feature extraction [20]. Furthermore, DTCWT produces superior singularities of line and curve representation. Thus, discriminative feature can be extracted comparatively, which is crucial for any pattern classification problem.
Misclassification rates and higher dimensionality of features present problems concerning pattern classification. For smooth classification, dimensionality reduction techniques are employed to transform data from higher to lower dimensional spaces. PCA is the most frequently applied linear transformation and addresses these concerns. Extracted features are analyzed using PCA for feature reduction. For each MRI image from the OASIS and ADNI datasets, there are 49,152 (1536 × 32) features. After applying PCA, this is reduced to 95 × 94 for OASIS data and 172 × 171 for ADNI data.
After PCA, the classification may still not be sufficient, as PCA does not account for variability of features within a class or between classes. To ensure that the PCs are more  separable, it is needed to transform data onto another space combining directions that will find axes, which will maximize the gap between different classes. Thus, LDA is applied to project PCs onto new projection axes for more effective disease classification. TSVM is an emerging efficient pattern classification and regression algorithm in machine learning. Numerous studies have shown that TSVM is highly effective in terms of classification, regression performance, and time complexity [19,[21][22][23]. Hence, we have applied TSVM using linear discriminant DTCWT principal components as input features.
All programs are executed in MATLAB 2015b installed on an Intel (R) Core (TM) i3-4160 CPU system. The time complexity of the extraction of DTCWT and DWT coefficients from a 2D MRI image slice are 0.5148 and 0.5109, respectively. There is no significant difference in CPUelapsed time when comparing transform methods. As a dimensionality reduction technique, we used PCA to omit higher dimensional input features.
In addition, it is not feasible to train and test a classifier with higher dimensional features due to elapsed time. The CPU-elapsed time to achieve TSVM classification performance was approximately 88.40 seconds without reducing dimensions. The time required for our proposed method is approximately 15.74 seconds-faster than the methods that do not employ fisher discriminant analysis.

Performance Evaluation.
The performance of a binary classifier can be visualized using a confusion matrix, as shown in Table 4. The number of examples correctly predicted by the classifier is located on the diagonal. These may be divided into true positives (TP), representing correctly identified patients, and true negatives (TN), representing correctly identified controls. The number of examples wrongly stratified by the classifier may be divided into false positives (FP), representing controls incorrectly classified as patients, and false negatives (FN), representing patients incorrectly classified as controls.
Accuracy is determined measuring the proportion of examples that are correctly labeled by a classifier:

Accuracy = TP + TN TP + TN + FP + FN 13
This may not be an ideal performance metric if the class distribution of the dataset is unbalanced.
For example, if class C1 is much larger than C2, a high accuracy value could be obtained by a classifier that labels all examples as belonging to class C1. Sensitivity is the rate of true positives (TP), and specificity is the rate of true negatives (TN). Sensitivity and specificity are defined as The previous measures are likely to provide an efficient overall performance assessment of a classifier.

Performance of Classification.
In this study, the proposed hybrid method has been used for OASIS and ADNI data to distinguish control subjects from AD subjects. The recorded classification performance regarding accuracy (acc), sensitivity (sens), and specificity (spec) has been shown in a bar diagram in Figure 5 and in Figure 6. Performance varies depending on the principal components used for training and testing, as shown in Figure 7 for ADNI data. After testing with different PC values for both datasets, it was concluded that optimal classification performance was achieved with PC = 20. To run a strict  statistical analysis, stratified cross validation (SCV) is applied. We have applied 5-fold CV to OASIS data and 10-fold CV to ADNI data, as the number of subjects in the OASIS dataset is lower than that of the ADNI dataset. 5-fold CV divides the dataset into five folds, whereas the 10-fold CV divides the dataset into ten folds. The accuracies, sensitivities, specificities, and other statistical performance measures obtained with 10-20 runs of 10-fold SCV and 5-fold SCV are shown in Tables 5 and 6, respectively.
Although comparison with conventional methods can be difficult, we have compared our approach with some recent conventional disease detection algorithms using both datasets.
To analyze the performance over the ADNI dataset, the classification performance has been documented with both run-wise fold-wise classification, as shown in Tables 7 and 8.  Table 8 shows the classification performance where linear discriminant analysis is not used. Individual columns and rows represent the classification accuracy of the corresponding runs and folds. Consequently, accuracy is calculated taking the average of all folds and runs. The classification performance in all 10 or 5 folds of each run can be analyzed with that.
However, classification performance has become more efficient when LDA-projected features are considered, as shown in Tables 5 and 9 and Figure 5. Our method has been distinguished from the volumetric feature-based research study proposed by Schmitter et al. [24], and it outperforms the results thereof, as shown in Figure 5. Additionally, our results    were compared with kernel SVM-based classification and produced superior performance. Likewise, to analyze and stratify OASIS dataset, identical methods have been used, namely run-wise and fold-wise classifications, as depicted in Tables 10 and 11. We observed, as shown in Tables 6 and 12 and Figure 6, that our method yielded an accuracy of 96.68 ± 1.44, a sensitivity of 97.72 ± 2.34, and a specificity of 95.61 ± 1.67. This classification performance has also been documented without using LDA; however, results improve when LDA is applied on principal dual-tree complex wavelet transform coefficients or principal DWT coefficients and TSVM is used as a classifier. The result is efficient when DTCWT principal coefficients are used over DWT method.
To further verify the efficacy of the proposed method, we compared it with 12 state-of-the-art approaches, as shown in Table 12, which utilized different statistical settings.
The results show that US + SVD-PCA + SVM-DT [25] yielded an accuracy of 90%, a sensitivity of 94%, and a specificity of 71%; BRC + IG + SVM [26] achieved an accuracy of 90.00%, a sensitivity of 96.88%, and a specificity of 77.78%; and curvelet + PCA + KNN [27] obtained stratification an accuracy of 89.47%, a sensitivity of 94.12%, and a specificity of 84.09%. We observed that these methods have lower specificity compared to the other methods mentioned previously. In contrast, BRC + IG + Bayes [26] yielded higher specificity.
Similarly, BRC + IG + VFI [26] yielded a classification accuracy of 78%, sensitivity of 65.63%, and specificity of   100%. Although it yielded high specificity, accuracy and sensitivity yielded by this algorithm were comparatively poor. All other methods achieved satisfying results. VBM + RF [28] obtained an accuracy of 89.0 ± 0.7%, a sensitivity of 87.9 ± 1.2%, and a specificity of 90.0 ± 1.1. These promising results were achieved largely due to voxel-based morphometry (VBM).
Finally, taking classification performance into consideration, our approach outperforms all other methods analyzed here. We have also produced promising performance metrics for sensitivity and specificity. Hence, we submit that our results are either superior or comparable to the other compared methods.

Conclusions
Our proposed experiment uses LDA on the principal components of DTCWT coefficients and TSVM to stratify AD. Our proposed detection method for the ADNI dataset yielded an accuracy of 92.65 ± 1.18% with high sensitivity and specificity. Our proposed method also outperforms those of Zhang et al. [13] and El-Dahshan et al. [11] and the volumetric feature-based classification proposed by Schmitter et al. [24]. In addition, the classification performance of our proposed experiment for OASIS data performs better when compared with the several state-of-the-art approaches specified in this paper-yielding an accuracy of 96.68 ± 1.44 with similarly high sensitivity and specificity.
In the future, we will carry forward our research focusing on the following: (i) 3D DTCWT-based feature extraction with multiresolution analysis and classification and (ii) convolutional neural network-(CNN-) based classification using 3D MRI.

Disclosure
The investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report.