WTD-PSD: Presentation of Novel Feature Extraction Method Based on Discrete Wavelet Transformation and Time-Dependent Power Spectrum Descriptors for Diagnosis of Alzheimer's Disease

Alzheimer's disease (AD) is a type of dementia that affects the elderly population. A machine learning (ML) system has been trained to recognize particular patterns to diagnose AD using an algorithm in an ML system. As a result, developing a feature extraction approach is critical for reducing calculation time. The input image in this article is a Two-Dimensional Discrete Wavelet (2D-DWT). The Time-Dependent Power Spectrum Descriptors (TD-PSD) model is used to represent the subbanded wavelet coefficients. The principal property vector is made up of the characteristics of the TD-PSD model. Based on classification algorithms, the collected characteristics are applied independently to present AD classifications. The categorization is used to determine the kind of tumor. The TD-PSD method was used to extract wavelet subbands features from three sets of test samples: moderate cognitive impairment (MCI), AD, and healthy controls (HC). The outcomes of three modes of classic classification methods, including KNN, SVM, Decision Tree, and LDA approaches, are documented, as well as the final feature employed in each. Finally, we show the CNN architecture for AD patient classification. Output assessment is used to show the results. Other techniques are outperformed by the given CNN and DT.


Introduction
e brain is the body's most important organ. e disorders that affect the brain are extremely important to manage since, in most situations, once alterations occur, they are irreversible in extreme circumstances. Dementia is defined as the loss of cognitive and functional thinking abilities. e most prevalent cause of dementia is AD. e AD strikes people in their mid-60s. Alzheimer's disease affects more than 5.5 million individuals worldwide [1]. Memory loss, language problems, and behavioral changes are all indications of AD. Symptoms of the nonmemory part include trouble locating words, eye problems, decreased cognition, and poor judgment. Brain imaging, cerebrospinal fluid, and blood are the biological signs. Normal age-related decrease in cognitive function, which is more gradual and associated with less impairment, should be distinguished from AD. e disease frequently begins with little symptoms and progresses to serious brain damage. Dementia affects people differently; therefore their abilities deteriorate at varying rates. Early and reliable identification of AD is advantageous to disease management. Neuroimaging techniques like magnetic resonance imaging (MRI) and computed tomography (CT), as well as single-photon emission computed tomography (SPECT) and positron emission tomography (PET), can be utilized to rule out other forms of dementia or subtypes. It has the potential to forecast the progression of prodromal into AD. Neurologists can use medical image processing and machine learning methods to see if a person is developing AD. Image segmentation and classification are critical tasks in MRI data analysis for detecting AD [2]. Structural MRI (SMRI) provides visual information regarding the atrophic areas of the brain caused by the tissue level abnormalities that underpin AD/MCI. PET measures cerebral glucose metabolism, which is a reflection of functional brain activity [3]. e quantity of amyloid beta-protein and amyloid tau tangles accumulated in the cerebrospinal fluid (CSF) is an early predictor of AD. SMRI has already been shown to be sensitive to presymptomatic illness and might be used as a disease biomarker [4]. MRI appears to be the most sensitive imaging examination of the brain in everyday clinical practice. It provides information on gray matter, white matter, and CSF morphology. Structural MRI can record atrophic brain areas noninvasively, allowing us to see anatomical alterations in the brain. As a result, they have been recognized as a possible indication of illness development, and ML approaches for disease detection are being researched extensively [5].
e MRI scan can be utilized in image processing to evaluate the likelihood of early detection of AD. Intensity adjustment, K-means clustering, and the region growing method are image processing techniques used in MRI to extract white and gray matter [6]. e same approach may be used to compute brain volume. Because the raw MRI brain image is too large to be utilized for classification, the MR images must be preprocessed before feature extraction and classification can be performed for illness diagnosis.
rough the warping of labeled atlas, one of the most generally used approaches is to divide the image into numerous anatomical areas, that is, regions of interest (ROIs), and the regional measurements, such as volumes, are calculated as the features for AD classification [7]. To identify the most discriminative features from ROIs for multimodality classification of AD/MCI, a discriminative multitask algorithm was presented. In ML, each data item should be characterized as a feature vector.
ere are numerous research advocated extracting various characteristics from MRI scans and then classifying the resulting vectors. e quality of the produced feature vectors is, nevertheless, reliant on image preprocessing due to registration errors and noise. As a result, domain knowledge is required to extract discriminative features. CNN's layered design has a big influence on its performance. Greater classification accuracy is anticipated to arise from a layer structure that is better suited for MRI images. e input images in this article are Two-Dimensional Discrete Wavelets (2D-DWT).
e Time-Dependent Power Spectrum Descriptors (TD-PSD) model is used to represent the subbanded wavelet coefficients. e primary property vector is made up of the characteristics of the TD-PSD model. Based on classification algorithms, the collected characteristics are applied independently to present AD classifications. e classification is used to determine the kind of tumor. For feature extraction of wavelet subbands from three sets of mild cognitive impairment (MCI), AD, and HC test data, we employed the TD-PSD technique.

Literature Review
For diagnosing AD, feature vectors from MRI images must be extracted. Several feature extraction techniques have been proposed in the recent decade since the outcome of ML is determined by the extracted feature vectors. Employing many specified templates, Liu et al. [8] retrieved multiview feature representations for subjects and divided subjects within a particular class into distinct subclasses in each view space. Support vector machine-based (SVM) ensemble learning was used. Suk et al. proposed a multitask and multikernel SVM learning approach for a stacked autoencoder with a deeplearning-based feature representation [9]. Due to registration mistakes and noise, the quality of the recovered features is dependent on image preprocessing. As a result, domain knowledge is required while extracting discriminative features. It takes a long time and a lot of effort to acquire hand-crafted features. More crucially, hand-crafted features seldom generalize well. As a consequence, this study recommends employing deep learning to extract data characteristics. Sadeghipour and Sahragard [10] developed a novel approach for facial identification that is based on an enhanced SIFT algorithm. Acharya et al. [11] created an ML system that can detect AD symptoms in a brain scan. For classification, the system combined MRI with a variety of feature extraction techniques. e T2 imaging sequence was used to get the images. Filtering, feature extraction, Student's t-test-based feature selection, and k-Nearest Neighbor-(KNN-) based classification were among the quantitative approaches used in the paradigm. e findings revealed that when compared to other approaches, the Shearlet Transform (ST) feature extraction methodology provides better outcomes for Alzheimer's diagnosis. With the ST + KNN approach, the suggested tool achieved 94.54 percent accuracy, 88.33 percent precision, 96.30 percent sensitivity, and 93.64 percent specificity. According to Sadeghipour et al. [12], combining fireflies with intelligent systems would lead to breast cancer detection. e results show that by comparing the performance of the suggested system to other methods, it is evident that it is superior in both performance and accuracy. Sadeghipour and Moradisabzevar [13] investigated the development of intelligent toy cars as a method of screening children with autism.
e results show that the screening of autistic children was 100 percent accurate. e study by Zhou et al. [14] investigated probabilistic inflection points for the decomposition of LiDAR hidden echo signals. Yan et al. [15] examined the structure and in vitro test results of waxy and regular maize starches after thermal processing using plasma-activated water. Eslami and Yun [16] have developed a novel approach called A + MCNN  [23], the scheduling problems for health care systems [24], and the optimization of users based on a clustering method [25]. A new approach to penetration testing based on extended classifier networks has been proposed by Yazdani et al. [26]. A model of an application created for mobile Android systems was provided by Lauraitis et al. [27], which may be used to examine central nervous system movement problems occurring in individuals suffering from Huntington's, Alzheimer's, or Parkinson's illnesses. Specifically, the model detects tremors as well as cognitive deficits through the use of touch and visual stimulation modalities, among other things. According to the findings, the adoption of intelligent applications that may assist in the evaluation of neurodegenerative illnesses is a significant advancement in medical diagnostics and should be encouraged. According to Sadeghipour et al. [28], the xcsla system can be used to develop an intelligent diabetes diagnosis system. According to the results of the program implementation document (pid) on databases, the proposed technique can detect diabetes more accurately than the conventional xcs system, the Elman neural network, svm clustering, knn, c4.5, and ad tree. Farahanipad et al. [29] developed a pipeline for the identification of hand 2D keypoints using unpaired image-to-image translation. In Shi et al.'s [30] study, they investigated the effect of ultrasonic intensity on the structure and characteristics of sago starch complexes and their implications for the quality of Chinese steamed bread. Sadeghipour et al. [31] developed a new expert clinical method for the diagnosis of obstructive sleep apnea using the XCSR classifier. Rezaei et al. [32] used depth images to automate mild segmentation of hand parts. According to the results, a model without segmentation-based labels may achieve a mIoU of 42%. Quantitative and qualitative findings support our method's efficiency. Yue et al. [33] use automated anatomical labeling (AAL) template to divide the brain into 90 regions of interest (ROIs). ey choose the informative voxels in each ROI with a baseline of their values and arrange them into a vector to divide the uninformative data. e first stage characteristics were then chosen based on the voxel correlation between distinct groups. e fetched voxels were then put into a convolutional neural network (CNN) to understand the profoundly hidden properties of each subject's brain features maps. e testing findings showed that the suggested technique was reliable and had a promising performance when compared to other methods in the literature.
For increasing classification accuracy and identifying high-order features that potentially provide pathological information, Li et al. [44] used a novel feature extraction approach known as radiomics. As a consequence, they defined ROIs as brain regions mostly dispersed in the temporal, occipital, and frontal areas. A total of 168 radiomic characteristics of Alzheimer's disease were found to be stable (alpha >0.8). e maximum accuracies for categorizing AD versus HC, MCI versus HCs, and AD versus MCI were 91.5 percent, 83.1 percent, and 85.9 percent, respectively, in the classification trial. Silva et al. [46] suggested a model for diagnosing AD based on deep feature extraction for MRI classification.
e goal of this model was to distinguish between AD and HC. For extracting the best characteristics of the selected region, the CNN architecture was also developed in three convolutional layers. e model's effectiveness and reliability for the diagnosis of AD were shown by a comparison study with previous studies in the literature. Table 1 lists several more techniques.

Methods and Materials
is a wavelet expansion function that is connected to wavelet ψ(x) and scaling φ(x), we get [47] f c j 0 (k)'s are scaling coefficients, and j 0 is a starting counter. e d j (k) coefficients are wavelet coefficients (see Figure 1). e following are the expansion coefficients: It is also known as the discrete wavelet transform of f(x) if the expansion function is a series of crisp numbers.
Computational Intelligence and Neuroscience e expansion series is represented by equations (2) and (3) (DWT pair) [47,48]: where M is the number of samples to be converted, and J is the number of transformation levels; it equals 2 J . To construct a 1D scaling function ϕ and associated wavelet ψ [39], 2D, φ(x, y), and 3D, ψ H (x, y), ψ V (x, y), and ψ D (x, y), are usually necessary.
A two-level wavelet transformation creates four subbands, as seen in Figure 1. In this diagram 2↓, ψ H , ψ V , and ψ D indicated deviations along horizontal, vertical, and diagonal boundaries, respectively. Digital filtration and downsamplers can be used to perform 2D-DWT. e additional subbands are produced using discrete 2D scaling functions and 1D-FWT on f (x, y) [49].

Feature Extraction.
e discrete Fourier transform (DFT) is supposed to explain the signal trace as a function of frequency X[k] as a product of the sampled representation of the signal as x[j] with j � 1, 2, . . . N, length N, and sampling frequency fs Hz. If we remember Parseval's theorem, the sum of the square of the function equals the whole square of its transformation. We begin the feature extraction procedure. e Fourier transform's whole notion of frequency is usually thought to be symmetrical with respect to zero frequency. It has similar sections that cover both positive and negative e Parseval theorem might indeed be used when n � 0 is used. For nonzero values of n, the Fourier transform timedifferentiation feature could be applied. e n 'th means multiplying the k by the spectrum to the n 'th power, according to this feature. e derivative of a time-domain function is alluded to as Δ n for distinct time signals.
Root Squared Zero-Order Moment (m 0 ). is is a function that displays the frequency domain's total power and looks like this All channels could standardize their related zero-order moments by splitting them into zero-order moments. Root Squared Second-and Fourth-Order Moments. e second time is utilized as power, but it is subsequently shifted to k 2 P[k], which refers to the frequency function: e moment is obtained by repeating this approach: e overall energy of the signal is reduced when the second and fourth signals are taken into account. For decreasing the noise impact on all moment-based features, to normalize the domains of m 0 , m 2 , and m 4 , we perform the following power transformation: e experimental value of λ is set to 0. As a result of these settings, the first three features extracted are as follows: Sparseness. is feature calculates the quantity of vector energy in a small number of additional components. It is then followed by A feature shows a vector with all elements equivalent to a zero-sparseness index, i.e., m 2 , and m 4 � 0, due to differentiation and log(m 0 /m 0 ) � 0. All other sparseness levels, on the other hand, should have a value greater than zero. Irregularity Factor (IF). e ratio of peak numbers divided by zero-crossings up is expressed by this metric. A random signal's number of upward zerocrossings (ZC) and number of peaks (NP) can only be characterized in terms of spectral instances. e following is how the appropriate feature should be written: Computational Intelligence and Neuroscience

Columns
Covariance (COV). Our COV function is described as the standard deviation on arithmetic averages divided by the standard deviation on arithmetic averages: Teager energy operator (TEO). It mainly depicts the signal amplitude and instantaneous fluctuations, which are particularly sensitive to even little variations. TEO has been proposed as a method for modeling nonlinear speech signals. It was later widely employed in the audio signal processing industry. It is made up of the following parts:

Proposed Feature Extraction Methods.
e goal of this research is to apply machine learning algorithms to identify Alzheimer's disease. Figure 2 show the block diagram of the proposed method. To begin, we employed a two-stage 2D-DWT to break down input images into wavelet subbands. e obtained wavelet coefficients are utilized to derive classification features. e TD-PSD model is then used to extract features, with the first step using HH1, HL1, LH1, and the second stage using LL2, HH2, HL2, and LH2. e PCA approach is employed to diminish the number of features, and then AD is categorized using multiple machine learning algorithms using the retrieved feature. e following is the pseudocode for the provided method (Algorithm 1).

Data Collection.
In AD, structural MR imaging results demonstrated microscopic neurodegeneration and are a measure of brain atrophy (loss of synapses, dendritic processes, and neurons). In volumetric or voxel-based assessments of brain atrophy, the degree of atrophy and the extent of cognitive impairment are closely associated.
ere is a relationship between cognitive decline and brain atrophy. Atrophy does not appear to be exclusive to AD on MR images. e degree of hippocampal atrophy, on the other hand, is highly correlated with autopsy Braak staging [50]. Braak staging of neurofibrillary tangles in antemortem MR imaging and postmortem AD staging match to the topographic distribution of atrophy on MR images (medial, basal, and lateral temporal lobes, as well as the medial parietal cortex) [51]. e data collection includes atrophy and clinical stages of AD. ere is negligible atrophy in the cognitively normal control individual, while there is significant atrophy in the AD patient. e MCI individual, on the other hand, has an intermediate amount of atrophy. On Kaggle [52], the dataset is accessible online. e MRI images are 256 × 256 PNG grayscale images that have been utilized to analyze and evaluate AD in three classes: AD, MCI, and an HC group.

Feature Extraction and Reduction.
In this section, the process of feature extraction is described. Based on the conceptual diagram of Figure 2 and pseudocode, the first step in the presented method is wavelet decomposition. e results of decomposition are presented in Figure 3. Regarding Figure 3, a two-level decomposition is done for each input image. From the first step, three subbands of low-high (LH1), high-low (HL1), high-high (HH1), and from the second step low-low (LL2), LH2, HL2, and HH2 are used for feature extraction.
In the next step, each subband matrix is reshaped to a vector, and all the zeros are removed from the vectors. e final vectors are our pseudotime series that are used for feature extraction. e properties of the seven subbands are presented in Figure 4. Based on the amplitude and frequency of subbands, the LL2 subbands include the maximum number of points and properties of input images. However, all subbands are consequential in this diagnosis.
Based on the feature extraction results, each image has 49 features (7 subbands with 7 TD-PSD features). Moreover, principal component analysis is employed to reduce the features. Based on Figure 5, the first seven features include almost 100% effect of all features. Consequently, the number of features is reduced to 7 based on the screen plot in Figure 5(a). Moreover, the cumulative value of the eigenvalue is presented in Figure 5(b).

Results of Classification.
In this section, the classification is done using different machine learning methods. e input layer of the classification methods is seven reduced features of the images, and the output layer is the three-class label of  Computational Intelligence and Neuroscience 7 AD, MCI, and HC. Total 600 MRI images are employed for the classification of AD. e confusion matrixes of the presented methods are illustrated in Figure 6. e blue balls show the true values, and the red balls are the false value of the classification. Moreover, labels 1, 2, and 3 display the HC, AD, and MCI, respectively. Regarding the results of the KNN method, from 200 input HC, AD, and MCI images 193, 141, and 109 are diagnosed correctly. Based on the results, the sensitivity of the KNN for diagnosing Alzheimer's disease for HC is acceptable. Depending on the results, the SVM and LDA approaches reached the weak result for the diagnosis of AD. However, the results of DT show that the sensitivity of the method is 94%, 91.5%, and 97.5%, respectively. It means that the WTD-PSD is compatible with the DT approach for this problem. In other words, 188, 183, and 195 MRI images from HC, AD, and MCI are detected, respectively. Moreover, the precision of the method is 91.70%, 95.30%, and 96.10% for HC, AD, and MCI, accordingly.
To approve the presented feature, we used CNN architecture also for this problem. e architecture of the CNN is presented in Figure 7.
Input layer includes      Figure 8. e horizontal axis of the ROC curve represents the rate of the false-positive index depending on the HC class. e genuine positive rate is shown by the vertical axis. e best classifier has the highest rate of true positives and the lowest number of false positives. Based on the results, the CNN and DT method shows the two best classifiers for the presented features. Moreover, the area under the curve (AUC) value is an index to compare the classifiers. e AUC and the accuracy of the machine learning classifiers are presented in Figure 9. Centered on the results, the accuracy of SVM, LDA, KNN, DT, and CNN is 45%, 53.70%, 73.80%, 94.33%, and 98.50%, respectively. Based on this chart, the CNN architecture with the highest accuracy and AUC is the more accurate and compatible method for diagnosing Alzheimer's disease using the WTD-PSD. Moreover, DT is the second method with a higher AUC.

Discussion
Since each data sample in ML should be defined as a feature vector, several researches have recommended extracting various features from MRI scans and then categorizing the vectors generated as a consequence of this process. Image preprocessing, on the other hand, is necessary to increase the quality of the recovered feature vectors because of registration mistakes and noise in the image. It is necessary to have domain knowledge in order to derive discriminative qualities. Discrete wavelet is employed as the input image in this study, and it has a two-dimensional representation. e subbanded wavelet coefficients are modeled using the Time-Dependent Power Spectrum Descriptors model, which is implemented in MATLAB. Each of the attributes of the TD-PSD model is represented by one of the leading property vectors. e collected characteristics are utilized in an autonomous manner to construct AD classifications, which are based on classification algorithms. On the basis of the findings, the accuracy of SVM, LDA, KNN, DT, and CNN are correspondingly 45 percent, 53.70 percent, 73.80 percent, 94.33 percent, and 98.50 percent. SVM is the most accurate of the five models. According to this figure, the CNN architecture with the highest accuracy and AUC is the most accurate and compatible technique for diagnosing Alzheimer's disease when utilizing the WTD-PSD than the other two methods. Furthermore, DT is the second most accurate approach with a larger AUC.

Conclusion
Many studies have advised extracting numerous features from MRI scans and then categorizing the resulting vectors since each data sample in ML should be described as a feature vector. However, image preprocessing is required to improve the quality of the recovered feature vectors due to registration errors and noise. For extracting discriminative characteristics, domain knowledge is required. e Two-Dimensional Discrete Wavelet is used as the input image in this work. e Time-Dependent Power Spectrum Descriptors model is used to model the subbanded wavelet coefficients. e leading property vector is made up of the characteristics of the TD-PSD model. Based on classification algorithms, the extracted features are applied independently to present AD classifications. e classification is used to determine the kind of tumor. We extracted wavelet subband features from three sets of MCI, AD, and HC data using the TD-PSD method. According to the KNN approach, images 193, 141, and 109 are correctly detected from 200 input HC, AD, and MCI images. According to the findings, the KNN's sensitivity for identifying AD in HC patients is adequate. According to the findings, the SVM and LDA approaches yielded a poor outcome for diagnosing AD.
e DT findings, on the other hand, demonstrate that the method's sensitivity is 94 percent, 91.5 percent, and 97.5 percent, respectively. It indicates that for this issue, the WTD-PSD is compatible with the DT technique. In other words, MRI images from HC, AD, and MCI are observed in 188, 183, and 195, respectively. Furthermore, the method's precision for HC, AD, and MCI is 91.70 percent, 95.30 percent, and 96.10 percent, respectively. According to the CNN classifier's findings, the method's sensitivity for HC, AD, and MCI is 94 percent, 91.5 percent, and 97.5 percent, respectively. Furthermore, out of 200 images, 197, 198, and 196 are recognized for each class. Eventually, 91.7 percent, 95.3 percent, and 96.1 percent precision are achieved. e CNN architecture with the greatest accuracy and AUC is the more accurate and compatible technique for diagnosing AD utilizing the WTD-PSD, according to this figure. DT is also the second approach with the highest AUC.

Data Availability
Data are available and can be provided upon direct request to the corresponding author at ali.taghavi.eng@iauctb.ac.ir.