Automatic Diagnosis of Mild Cognitive Impairment Based on Spectral, Functional Connectivity, and Nonlinear EEG-Based Features

Accurate and early diagnosis of mild cognitive impairment (MCI) is necessary to prevent the progress of Alzheimer's and other kinds of dementia. Unfortunately, the symptoms of MCI are complicated and may often be misinterpreted as those associated with the normal ageing process. To address this issue, many studies have proposed application of machine learning techniques for early MCI diagnosis based on electroencephalography (EEG). In this study, a machine learning framework for MCI diagnosis is proposed in this study, which extracts spectral, functional connectivity, and nonlinear features from EEG signals. The sequential backward feature selection (SBFS) algorithm is used to select the best subset of features. Several classification models and different combinations of feature sets are measured to identify the best ones for the proposed framework. A dataset of 16 and 18 EEG data of normal and MCI subjects is used to validate the proposed system. Metrics including accuracy (AC), sensitivity (SE), specificity (SP), F1-score (F1), and false discovery rate (FDR) are evaluated using 10-fold crossvalidation. An average AC of 99.4%, SE of 98.8%, SP of 100%, F1 of 99.4%, and FDR of 0% have been provided by the best performance of the proposed framework using the linear support vector machine (LSVM) classifier and the combination of all feature sets. The acquired results confirm that the proposed framework provides an accurate and robust performance for recognizing MCI cases and outperforms previous approaches. Based on the obtained results, it is possible to be developed in order to use as a computer-aided diagnosis (CAD) tool for clinical purposes.


Introduction
Dementia is the most prominent neurological disorder among the elderly, resulting in deterioration of cognitive abilities such as memory, thinking, behavior, and limitation in performance of daily activities [1]. While dementia mostly affects people over the age of sixty, some individuals are afflicted with this condition at a younger age. It was estimated that 44.4 million people around the world suffer from some forms of dementia [2,3]. Alzheimer's disease (AD) is the most common type of dementia, comprising about 60% to 80% of dementia cases [4]. AD causes severe memory loss, cognitive impairment, and behavioral changes. In the United States, it is the third most costly illness and the sixth most common cause of mortality [4]. Unfortunately, there is no effective treatment for AD and anti-AD medications merely serve to alleviate the symptoms of the disease. Hence, diagnosis of AD at an early stage is essential for controlling the progress of the disease. Mild cognitive impairment (MCI) is the intermediate stage between the normal cognitive deficit due to aging and AD or other forms of dementia, and it is frequently considered the early stage of AD. It is estimated that about 15% to 20% of dementia cases lead to AD [5]. According to some experts, diagnosis of MCI is more critical than AD, because it is more optimum to the management of the disease at the MCI stage. A more accurate method for diagnosis of MCI is therefore crucial to the control of disease progress. Nonetheless, the symptoms of MCI are often indistinguishable from those associated with growing older. Diagnosis of MCI was done integrating multiple experiments including psychological tests such as mini-mental state examinations (MMSE), blood tests, spinal fluid analysis, neurological examination, and magnetic resonance imaging (MRI). However, these traditional methods are laborious, time-consuming, and error prone. Therefore, many MCI cases may not be diagnosed accurately and in a timely fashion. Hence, novel and automated methods for early and accurate MCI diagnosis are in high demand. Electroencephalography (EEG) is an efficient modality which records the bioelectrical activity of brain neurons corresponding to various states from the scalp surface area. EEG signals can depict the functioning of the brain with great temporal resolution and are noninvasive, inexpensive, and portable for recording. As a result, it might be employed as a biological biomarker for MCI and other types of dementia that is objective and trustworthy. The manual interpretation of these signals is tough and challenging due to their nonstationary and nonlinearity nature. Many researchers are urged to employ machine learning techniques for automated EEG signal analysis in light of the advancements in computer science and artificial intelligence. For instance, numerous studies have been conducted to automatically detect different neurological disorders such depression [6][7][8], epilepsy [9][10][11], seizure [12][13][14][15], Parkinson's disease [16,17], and schizophrenia [18].
EEG-based machine learning frameworks for diagnosing AD, MCI, and other forms of dementia have been reported in a number of research [19][20][21][22][23][24][25][26][27][28]. An automatic MCI diagnosis method, for instance, was presented by Kashefpoor et al. [19] and is based on the extraction of spectral characteristics of EEG signals. This method involved extracting 19 spectral features from 19-channel EEG data based on the delta, theta, alpha1, alpha2, gamma, beta1, and beta2 frequency subbands. The best discriminative features were then chosen using a correlation-based feature selection algorithm. The classification test was carried out using the neuro-fuzzy (NF) and k-nearest neighbor (KNN) classifiers, and an accuracy of 88.89% was reported in their study as the best classification result. Using spectrum and complexity analysis, McBride et al. proposed an EEG-based classification framework that distinguishes between AD, MCI, and healthy individuals [20]. The complexity analysis comprises computational activity, mobility, complexity, sample entropy, and Lempel-Ziv complexity parameters. The spectral analysis involves extracting features from the delta, theta, alpha1, alpha2, gamma, beta1, and beta2 frequency subbands. In this work, an average accuracy of 79.2% was obtained for MCI classification using the support vector machine (SVM) classifier. In a different study,  EEG signals along with supervised dictionary learning techniques like label consistent K-SVD (LC-KSVD) and correlation-based label consistent K-SVD (CLC-KSVD) to diagnose MCI [21]. On two separate dictionary learning classifiers, they applied a time series signal as well as a vector of retrieved spectral features. By casting a vote between the predictions of the time series signal and spectral characteristic vector, the final forecast for each sample was determined. The CLC-KSVD approach yielded an accuracy of 88.9%, which is the highest achieved accuracy in this investigation.
A single-channel EEG-based technique for MCI diagnosis using speech-evoked brain responses was presented by Khatun et al. [22]. Using time and spectral domain analysis to extract 590 characteristics from the recorded sounds, the top 25 were chosen using the random forest (RF) method. They used logistic regression (LR) and support vector machine (SVM) classification models for the classification task, and the best classification result that they found was an accuracy of 87.9%. Based on the spectral-temporal analysis, Yin et al. proposed an integrated MCI diagnosis approach in [23]. This method uses spectral-temporal analysis to extract a collection of features and then uses a developed wrapper algorithm called the three-dimensional (3-D) evaluation algorithm to derive an ideal feature subset. The SVM classifier in this investigation had the best accuracy rate, which was 96.94%. Power spectral density (PSD), skewness, kurtosis, spectral skewness, spectral kurtosis, spectral crest factor, spectral entropy (SE), and fractal dimension (FD) properties were used by Sharma et al. to construct an automatic EEG-based MCI detection technique [24]. The study's dataset was gathered under four different conditions: open eyes, closed eyes, the finger tapping test (FTT), and a continuous performance test (CPT). The SVM classifier achieved the best classification accuracy of this method, which was 96.94%. Fast Fourier transform (FFT) and wavelet transform (WT) feature extraction techniques were used by Durongbhan et al. in [25] to offer an automatic AD detection methodology based on EEG recordings. The presented findings show that this method used the KNN classifier to achieve an accuracy of 99%. Using the piecewise aggregate approximation (PAA) compression approach and feature extraction techniques like permutation entropy (PE) and autoregressive (AR), Siuly et al. created a system for automatically identifying MCI patients [26]. The classification models utilized in this investigation were the extreme learning machine (ELM), SVM, and KNN. According to the presented data, the ELM classifier had the best classification performance thanks to its accuracy of 98.78%. Using the interhemispheric coherence features and the properties of EEG subbands, Oltu et al. proposed another EEG-based paradigm for the classification of MCI, AD, and healthy people [27]. For feature extraction, this method used the discrete wavelet transform (DWT), PSD, and interhemispheric coherence; for classification, it used bagged trees. The best outcome of this study was 96.5%. Another strategy for differentiating between AD and healthy participants using EEG data was put out by Safi and Safi [28]. This method involved applying the DWT, PSD, and empirical mode decomposition (EMD) algorithms to the signals and then extracting certain features from the outputs of the aforementioned algorithms, including variance, kurtosis, skewness, Shannon entropy, sure entropy, and Hjorth parameters. This study compared SVM, KNN, and regularized linear discriminant analysis (RLDA) and found that KNN had the greatest classification performance with an accuracy of 97.64%.
A viable and accurate method for clinical usage has not yet been provided, despite the fact that numerous studies on automated EEG-based MCI diagnosis have been carried out. In other words, achieving an optimal/robust performance with 2 Computational and Mathematical Methods in Medicine high accuracy, which might be used for clinical applications, is the main problem of autonomous EEG-based MCI diagnosis. So, based on EEG signals, we present a precise machine learning-based methodology for MCI detection. To do this, EEG data are processed to extract three key feature sets, including spectral, functional connectivity, and nonlinear properties. The sequential backward feature selection (SBFS) technique is used to choose the optimal feature combination.
To select the optimal categorization model for the suggested methodology, various models are evaluated. Additionally, each feature set and its combinations with other feature sets are investigated in the suggested technique to determine which combination is optimal. Additionally, the most important functional connectivity aspects as well as EEG signal strength disparities in common EEG frequency subbands between MCI and HC patients were also looked into.  Figure 1. Preprocessing, feature extraction, feature selection, classification, and validation make up the proposed methodology. The EEG signals underwent preprocessing to remove noise and artifacts. In this step, each signal was sliced into one-minute segments after suppressing noise and artifacts. Then, using EEG segments, the spectral, functional connectivity, and nonlinear feature sets were retrieved. Following that, 10-fold crossvalidation was used to randomly divide the samples into the training and testing sets. To choose the most effective discriminative features, the training set was used during the feature selection process. The feature selection algorithm in this work was the SBFS approach. The training set was then used to train the classifier after the nonselected features had been eliminated from the testing and training sets. The trained classification model was then used to classify each case of the testing set for validating the suggested methodology.

Materials and Methodology
2.2.1. Preprocessing. Eye movements, eye blinks, electromyogram (EMG), electrocardiograms, electrode channel drift, and power line interference are just a few of the noise and aberrations that can contaminate EEG signals and prohibit a pure portrayal of brain function in the data. In order to prevent further inaccurate analysis, it is crucial to remove noise and artifacts from the EEG signal. Using the EEGLAB toolbox in MATLAB, a preprocessing approach was conducted to all recorded EEG signals in this investigation [29]. A band-pass filter with a low cutoff frequency of 0.5 Hz and a high cutoff frequency of 32 Hz was used to filter the signals in the first step. Due to the components of the aforementioned artifacts being focused outside the frequency band between 0.5 Hz and 32 Hz, it reduces the effects  Computational and Mathematical Methods in Medicine of EMG and power line interference. The independent components of the signals were then extracted using the independent component analysis (ICA) methodology, and using a voting classification method, each component was classified into artifact and nonartifact classes. The ICLabel [30] and MARA [31] automatic plugins, along with manual inspection, projected a label for each component in this voting classification approach, and the final predicted label was the one that obtained more than half of the votes. The signals were then rebuilt after the expected artifact components had been removed. The remaining noisy intervals were then taken out of the reconstructed signals using a visual analysis. Finally, to increase the amount of samples, the preprocessed EEG signals were sliced into one-minute segments using a nonoverlapping sliding window. It should be noted that the labels on the EEG data segments matched those on the EEG signal's original label.

Feature Extraction.
By converting each raw sample's values into a select few useful features, feature extraction in machine learning is aimed at representing each sample with the qualities that are task relevant. The spectral, functional connectivity, and nonlinear feature sets are extracted from the EEG segments using the suggested methods. The explanation of each extracted feature set is as follows: (1) Spectral features: the spectral properties of the EEG signals are intended to depict the characteristics of the frequency subbands of the EEG at various scalp locations. Due to their relation to brain's function, these characteristics may serve as diagnostic criteria for neurological illnesses. In this study, the spectral feature set was created by extracting the band power of each EEG segment's interhemispheric asymmetry and theta (4 to 8 Hz), alpha (8 to 13 Hz), and beta (13 to 32 Hz) frequency subbands. To determine the band power of frequency subbands, the PSD of EEG segments was calculated using the Welch periodogram [32]. The Hamming window with a 50% overlap between the windows was used in the Welch periodogram. It is important to note that each channel of the EEG segments was used to extract the band powers of the aforementioned frequency subbands. The definition of interhemispheric asymmetry, which measures disparities in the band power of the frequency subbands in the left and right hemispheres, is as follows: where IA, P RH , and P LH stand for the interhemispheric asymmetry, the band power in the right hemisphere, and the band power in the left hemisphere, respectively. The interhemispheric asymmetry for the channel pairings Fp2-Fp1, F4-F3, F8-F7, C4-C3, T4-T3, P4-P3, T6-T5, and O2-O1 was computed in this work for the delta, theta, alpha, and beta frequency subbands (2) Functional connectivity features: numerous research has looked at brain connection in recent years to understand how information is processed, sent to, received by, or shared between various brain regions during various cognitive tasks and mental states. A branch of neuroscience known as functional connectivity seeks to quantify the statistical relationships between the dynamics of concurrently recorded signals [33] in order to gauge brain connectivity. Coherence, mutual information, and synchronization likelihood are only a few examples of functional connectivity metrics used to assess brain connectivity.
Here, a collection of characteristics based on the statistical correlations between EEG channels in various scalp areas was extracted using the synchronization likelihood method [34]. In general, synchronization likelihood analyzes the nonlinear and linear dependencies between two signals, which may be complex and significantly different in the two signals, to estimate the synchronization between them. The possibility of synchronization between two signals is represented numerically by a value between zero and one. More synchronization between two signals is indicated by a greater value for this metric. First, a time-delay embedding method that is specified as follows [34] was used to create a state-space representation of an M-channel EEG signal: where kεf1, 2, ⋯, Mg, iεf1, 2, ⋯, Ng, m, l are the channel number, the index of each discrete sample, embedding dimension, and lag parameters, respectively. Next, (P ε k,i ) is defined as follows to state that X k,i and X k,j vectors are closer than a distance of ε [34].
where ϕ, j:j, ω 1 , and ω 2 stand for Heaviside step function, Euclidean distance, dimension of the Theiler correction window, and dimension of the sharpening window, respectively. Each of the Theiler and sharpening windows establish a window around the discrete sample i to modify autocorrelation consequences and sharpen the time resolution of the synchronization measure. Now, ε k,i is determined for each k and each i for which P In the next step, the number of channels where the X k,i and X k,j will be closer together than ε k,i ðH i,j Þ is determined for each sample pair ði, jÞ and within the considered window (ω 1 < ji − jj < ω 2 ) as follows [34]: Computational and Mathematical Methods in Medicine H i,j indicates how many of the embedded time series signals resemble each other, and it varies between 0 and M. Now, the synchronization likelihood for each channel (k) and each discrete sample pair ði, jÞ, (S k,i,j ) is defined in (5) [34].
To obtain the rate of synchronization between channel k at sample i and all other M − 1 channels (S k,i ), an averaging over all j of S k,i,j is performed as follows [34]: S k,i ranges between p ref and 1. S k,i = 1 indicates the maximum synchronization of all M channels, and S k,i = p ref corresponds with the case where all M channels have minimum synchronization. Finally, the average of S k,i over all i is computed and expressed as synchronization likelihood between k channels. In this work, p ref , l, m, ω 1 , and ω 2 were set to 0.01, 10, 10, 100, and 410, respectively, and the synchronization likelihood between the same channels was not calculated. It is worth mentioning that these paraments were chosen using trial and error experiments to obtain the most reasonable connectivity image (3) Nonlinear features: EEG signals by their very nature have complicated behavior and nonlinear dynamic properties. In light of this, nonlinear analysis techniques may be superior than conventional linear analysis techniques for describing EEG signals. In this study, certain nonlinear features, including detrended fluctuation analysis, Higuchi fractal properties, correlation dimension, Lyapunov exponent, C0-complexity, Kolmogorov entropy, Shannon entropy, and approximate entropy, were computed from each channel of the EEG segments. The features are each further detailed in the paragraphs as follows (a) Detrended fluctuation analysis: it is an algorithm for estimating the statistical self-affinity of a time series [35]. Consider a finite signal, xðtÞ of length N. First, a summation version of xðtÞ is obtained as follows: where x is the mean value of xðtÞ. Next, XðkÞ is segmented into the n windows with equal lengths and a least-square line is calculated by minimizing the squared errors within each window. Y n ðkÞ denotes the resulting least-square line fitting.
Then, the root-mean-square deviation from the trend, the fluctuation, is computed as follows: where FðnÞ is the fluctuation. Finally, the computing of (8) is replicated for windows with different sizes to form a logarithmic scale of FðnÞ against n. It can be denoted by FðnÞ = n α , in which α represents the self-affinity of the signal. In other words, α is the extracted feature by detrended fluctuation analysis (b) Higuchi fractal properties: Higuchi proposed an algorithm to calculate the fractal dimension of a signal in 1988 [36]. Given a signal with N samples (xðtÞ), T new signals are generated using the following equation: where τ = 1, 2, ⋯, T and ½r denotes the integer part of r. Let L τ ðTÞ represent the length of each signal which is defined as follows: Also, LðTÞ is defined to obtain the mean length for each signal as follows: Finally, (11) is computed for all T values ranging from T min to T max and the slope of the linear fitting of ln LðTÞ versus ln 1/T is considered as the Higuchi fractal dimension of xðtÞ. In this study, the T min and T max were set to 1 and 30 values, respectively. These parameters were selected such that the Higuchi fractal characteristic could be extracted from the study's data (c) Correlation dimension: it is another approach for calculating the fractal dimension by measuring the occupied space by a set of random points. In 1983, Grassberger and Procacia presented a method for computing correlation dimension, which is the most common method for estimating correlation dimension [37]. Firstly, it constructs an m-dimensional vector using time delay (τ) and embedding dimension (m), which can be denoted as follows:

Computational and Mathematical Methods in Medicine
where i = 1, 2, ⋯, N − ðm − 1Þτ, x is the signal with N samples, and X is the m-dimensional vector. Afterwards, the correlation integral of X is defined as the probability that two points of the set are in the same partition of size r. It is obtained using the following equation: where CðrÞ and ϕ represent the correlation integral and the Heaviside step function, respectively. In the next phase, (14) is utilized to obtain the raw correlation dimension.
Finally, the different values of D are computed using (14) for the incremental value of m. This process causes a gradual of D, and eventually, D reaches saturation. The saturated value of D is considered as the estimated correlation dimension of the signal (d) Lyapunov exponent: the chaos of a dynamic system is quantified by the Lyapunov exponent, which estimates the development or decay rate of minor perturbations along each major axis of the phase space system [38]. Consider a dynamic system with the d dimension. It is possible to determine the d number of Lyapunov exponents for this system. However, in the majority of real-world applications, the greatest Lyapunov exponent (LLE) is regarded as the extracted feature by the Lyapunov exponent. A dynamic system's maximal Lyapunov exponent (λ 1 ) is defined as follows: where d j ðiÞ denotes the mean Euclidian distance between two neighbor trajectories at i time and d j ð0Þ represents the Euclidian distance between the jth pair of initially most adjacent neighbors after i time. In order to compute the LLE, the following equation is used: where yðiÞ and <ln ðd j ðiÞÞ > represents the approximated LLE and the average value of the natural logarithm of d j ðiÞ over all values of j, espectively (e) C0-complexity: it is a measure that is aimed at quantifying irregularities of a signal by defining the ratio of the irregular components to the original signal [39]. Consider a signal xðnÞ with the N number of samples. Firstly, the fast Fourier transform of the xðnÞ (XðkÞ) is computed and the average value of the magnitude of XðkÞ (M) is obtained as follows: Now, a spectrum called YðkÞ is constructed using XðkÞ and M as follows: By applying inverse Fourier transform to YðkÞ, yðnÞ is obtained and the C0-complexity of the xðnÞ is provided as follows: where C0, A 1 , and A 0 represent C0-complexity, the power of irregular, and regular parts of xðnÞ,respectively (f) Kolmogorov entropy: it reflects the loss of information's rate of a signal to quantify its chaotic degree [40]. In order to compute it, an equation based on the average rate of the loss of information of a signal with n samples is defined as follows: where P i 0 ⋯i n−1 and KE denote the loss of information per each sample and estimated Kolmogorov entropy, respectively. It is worth mentioning that the positive and finite value of KE represents that the dynamic phenomena in the signal are chaotic. Moreover, the zero value indicates that the signal contains regular phenomena and infinite KE corresponds with the existence of nondeterministic phenomena in the signal (g) Shannon entropy: it is another metric for quantifying the chaotic rate of a signal, proposed by Shannon [41]. Given a signal with N samples, the Shannon entropy is defined as follows: where H and p i represent the Shannon entropy of the signal and the probability of having the i sample in the signal, respectively (h) Approximate entropy: it is an algorithm to estimate the rate of regularity and the unpredictability of fluctuations of a signal [42]. If this estimation is higher, the signal contains more irregularity. In the first step of this algorithm, a sequence of vectors ðXðiÞÞ is constructed from a signal as follows: where xðnÞ is the signal with N samples. In the next phase, the distance between XðiÞ and XðjÞ (D½XðiÞÞ, XðjÞ) is computed using the following equation: where j:j is the Euclidean distance. Now, C m i ðrÞ is calculated for each i, i = 1, 2, ⋯, N − m as follows: where r denotes the threshold for D½XðiÞ, XðjÞ. Finally, the approximate entropy (ApEn) is determined as follows: where Φ m ðrÞ is defined as follows: In this work, the values of m and r were set to 2 and 0:2 var ðxÞ, respectively.
The amount of features in each feature set in the suggested methods is displayed in Table 1. Table 1 shows that the spectral, functional connectivity, and nonlinear feature sets, comprising 108, 171, and 152 characteristics, respectively. It is important to note that the aforementioned features are vectorially concatenated to provide a vector for each data. There are 19 features in each frequency subband power. We recovered 8 features for theta, alpha, beta, and theta frequency subbands for the IA feature. 32 features were therefore included in the IA. Additionally, each nonlinear feature has 19 features.

Feature Selection.
In machine learning frameworks, the goal of feature selection is to find the best subset of features to improve the classification performance by ignoring irrelevant and redundant attributes. In this study, a wrapper feature selection algorithm called SBFS was employed to obtain the best features' subset for discriminating MCI and Input: The entire feature set, Y = fy 1 , y 2 , ⋯, y d g Output: The selected feature subset, S k = fS j jj = 1, 2, ⋯, k ; S j εYg where k = 1, 2, ⋯, d 1. Start with the entire set, S 0 = Y. 2. Eliminate the worst feature, s * = argmaxðJðS k − sÞÞ, where sεY − S k . 3. Update S k+1 = S k − s * ; k = k + 1. 4. If JðS k Þ < JðS k−1 Þ, go to step 6. 5. Else go to the step 2. 6. Stop.
Algorithm 1: The SBFS algorithm. 7 Computational and Mathematical Methods in Medicine HC samples. The main steps of the SBFS algorithm are illustrated in Algorithm 1. Briefly, SBFS attempts to obtain the best features' subset by sequentially eliminating features from the entire feature set. As shown in Algorithm 1, removing features continues as long as the objective criterion is ascending [43]. In this work, the criterion (J) is set to the average accuracy in 10-fold crossvalidation. It should be mentioned that the classifier of the SBFS method was set to linear discriminant analysis (LDA) in our study.

Classification.
The main component of supervised machine learning frameworks is classification, which is aimed at predicting a class label or a specific example of incoming data. Classification models carry out this task. The classification model for the suggested machine learning framework was chosen by comparing a number of them in this study, including SVM with linear (LSVM) and radial basis function (RBFSVM) kernels, LR, KNN, decision tree (DT), naive Bayes (NB), RUSBoost (RB), and GentleBoost (GB). Additionally, before training and testing classifiers, the training and testing sets were applied to the z-score transformation. Finding the optimum decision boundary that can categorize an n-dimensional feature space into classes is the primary objective of SVM models. A hyperplane and a hyper radial basis curve are the ideal decision boundaries for LSVM and RBFSVM, respectively. The extreme points or vectors selected by these models aid in determining the optimal decision boundary. Support vectors are used to describe these severe situations. A nonparametric supervised learning technique that makes advantage of neighbor similarity is the KNN algorithm. A class membership is the result of this method. A datum is assigned to the class that has the highest percentage of support from its k closest neighbors after receiving a majority vote from those neighbors. A DT method is a decision-support tool that categorizes each item class using a tree-like model of decisions and their potential outcomes. A subset of probabilistic classifiers called NB is based on the Bayes theorem and assumes independence between sample features. On ensemble classifiers, the RB and GB models are built. Using different base models as a starting point, ensemble learning creates a new classifier that outperforms all of its component classifiers. The Bayesian optimizer was also utilized to optimize the hyperparameters of LSVM, RBFSVM, LR, DT, NB, RB, and GB models. Validation. The 10-fold crossvalidation strategy was utilized in this work to assess the classification performance of the suggested MCI diagnosis method. This method divides the dataset into 10 folds at random. The model is then trained using a subset of 9 folds as the training set and validated using a subset of 1 fold as the testing set. To make each fold the testing subset once, this procedure is repeated ten times. The classification performance of the suggested method may be assessed using the accuracy (AC), sensitivity (SE), specificity (SP), F1-score (F1), and false discovery rate (FDR) performance metrics by applying the testing set to the trained model and comparing the predicted and actual labels. The following is how the aforementioned metrics are calculated:

Results
This section evaluates the performance of the suggested framework from a number of angles. The obtained results of the classification models using the suggested strategy are provided in the first portion in order to choose the best of them. The next step was to select the ideal combination for the suggested framework by using each set of attributes and their combinations as the input framework. Then, in order to examine the difference between the two groups based on the aforementioned features, the EEG signal strength disparities in the alpha, beta, theta, and delta frequency subbands as well as the most important functional connectivity features between MCI and HC cases are investigated. The EEG signal band powers and functional connectivity feature sets, among the recovered feature sets in this study, may offer some biological notions of MCI. To ascertain which EEG signal band powers and functional connectivity coefficients most significantly differ between MCI and HC subjects, we presented these sections. The intersection of the returned subsets by SBFS in the execution of the suggested framework's 10-fold crossvalidation is then reported and examined. Finally, a report and analysis of the leaveone-participant-out crossvalidation approach's acquired results for the suggested framework was made.   Table 2. According to Table 2's findings, LSVM had the best classification performance. The given framework utilized the LSVM model to achieve an average AC of 99.4%, SE of 98.8%, SP of 100%, F1 of 99.4%, and FDR of 0%. Comparing this classifier to the others, it produced results with higher means for AC, SP, and F1 and a lower mean for FDR. Additionally, it outperformed the other classification models in terms of performance measures with the lowest standard deviation, demonstrating the greater stability of its performance within the framework that was presented. According to the results in Table 2, KNN and RBFSVM were the second and third best classification models in the framework that was provided. In comparison to other classifiers, RBFSVM earned the SP metric's greatest mean and lowest standard deviation. With the lowest mean of AC, SE, and F1 metrics among the employed classifiers, NB offered the worst classification performance. The classification performance of the other classifiers, including RB, GB, DT, and LT, was comparable. The box plots of the AC, SE, SP, and F1 metric-achieved values by the proposed framework utilizing classification models are shown in Figure 2. As depicted in Figure 2, the acquired AC, SE, SP, and F1 metric values of LSVM, KNN, and RBFSVM were reasonably high, confirming that the suggested method's use of these classifiers produced acceptable performance. LSVM outperformed the other classifiers in terms of AC, SE, SP, and F1 performance metrics. Additionally, LSVM's boxplots for the aforementioned metrics were less negative than those for other classifiers. These findings show that, in terms of classification accuracy and performance resilience, the LSVM model is better to alternative classifiers for the suggested framework. Overall, the findings showed that LSVM, KNN, and RBFSVM were the three best classifiers for the suggested framework. The receiver operator characteristic (ROC) curves for each classification model in the proposed framework are shown in Figure 3. DT, GB, KNN, LSVM, LR, NB, RB, and RBFSVM had areas under the curve (AUC) of 0.85, 0.86, 0.98, 0.99, 0.92, 0.87, 0.65, and 0.99, respectively. These findings showed that employing the LSVM and RBFSVM classification models, the proposed framework had the best classification performance.

3.2.
Results per Feature Set. The optimal input for the suggested framework was chosen in this part using each feature set and its combinations as the input classification framework. It is important to note that the three top classifiers from the previous section carried out these evaluations. The classification outcomes of the proposed framework employing each input and the LSVM, RBFSVM, and KNN classification models are shown in Table 3. Based on the  Table 3, the integration of functional connectivity, spectral, and nonlinear feature sets provided the best classification performances by obtaining the highest average of AC, SE, SP, and F1 parameters and the lowest means of the FDR metric. Additionally, as compared to the other inputs, the integration of the aforementioned feature sets yielded the lowest standard deviations of evaluation metrics, proving that the proposed framework is more reliable when all EEG feature sets are used as input. LSVM, which delivered an average AC of 99.4%, SE of 98.8%, SP of 100%, F1 of 99.4%, and FDR of 0%, provided the best classification performance of the suggested framework employing the combination of functional connectivity, spectral, and nonlinear feature sets. Performance was nearly identical when functional connectivity was combined with the spectral and nonlinear feature sets. In terms of classification accuracy performance, these combinations come in second. KNN achieved an average AC of 98.8%, SE of 98.7%, SP of 97.5%, F1 of 98.9%, and FDR of 0.6%, which was the best classification performance of the suggested framework employing the combination of functional connectivity and spectral feature sets. KNN also offered the best classification performance for the combination of functional connectivity and nonlinear feature sets, with an average AC of 98.8%, SE of 100.0%, SP of 97.0%, F1 of 99.0%, and FDR of 1.8%. In terms of performance for classification accuracy, these combinations come in second. The suggested framework's functional connectivity and spectral feature set combination produced the best classification results when utilizing KNN, which had an average AC of 98.8%, SE of 98.7%, SP of 97.5%, F1 of 98.9%, and FDR of 0.6%. With an average AC of 98.8%, SE of 100.0%, SP of 97.0%, F1 of 99.0%, and FDR of 1.8%, KNN also offered the greatest classification performance for the combination of functional connectivity and nonlinear feature sets. These pairings are in second place for categorization accuracy performance. KNN achieved the best classification results of the suggested framework employing the combination of functional connectivity and spectral feature sets, with an average AC of 98.8%, SE of 98.7%, SP of 97.5%, F1 of 98.9%, and FDR of 0.6%. KNN also offered the best classification performance for the combination of functional connectivity and nonlinear feature sets, with an average AC of 98.8%, SE of 100.0%, SP of 97.0%, F1 of 99.0%, and FDR of 1.8%.

EEG Signal Power
Analysis. This section looked into the changes in alpha, theta, beta, and delta EEG band power between MCI and HC samples at various scalp locations. In order to quantify the difference between the two groups by these band powers, the band powers of the aforementioned frequency subbands for MCI and HC cases were studied using the t-test approach. Table 4 provides the t -test results on the alpha, theta, beta, and delta EEG signal powers of MCI and HC cases in each EEG channel. According to these results, alpha and beta band powers provided the most significant difference between MCI and HC cases. Using alpha and beta band powers, the frontal, parietal, and temporal regions of the scalp offered greater discrimination than other regions. Theta power was the next-best EEG band power for MCI and HC discriminating. However, the delta band power was unable to distinguish significantly between the MCI and HC patients. Figure 4 displays the alpha, delta, beta, and theta band powers on the EEG topographic maps of HC and MCI subjects. According to Figure 4, the frontal lobe used alpha band power to significantly distinguish between HC and MCI individuals. The  12 Computational and Mathematical Methods in Medicine frontal, temporal, and occipital lobes contributed significantly to the variations between HC and MCI subjects in theta band power. The major differences between HC and MCI subjects employing beta band power were found in the left temporal and occipital areas. The frontal and occipital areas supplied the substantial variations between HC and MCI subjects for delta band power.

Functional Connectivity Analysis.
The major objective of this section is to use the t-test approach to identify the most important functional connectivity coefficients that distinguish between HC and MCI cases. In order to achieve this, the p value metric of the t-test method was used to rank the functional connectivity coefficients of MCI and HC cases and the top 10 values were identified. The top ten functional coefficients for distinguishing between HC and MCI samples are listed in Table 5 along with associated p value metrics. These findings show that there is a significant difference between the classes utilizing each of the top ten functional connectivity measures, with the statistical significance difference of the ten features between MCI and HC cases being less than 1e − 3. The F3-C3, Fp1-F3, F3-T5, P3-F7, P4-Cz, C3-O1, C3-C4, Fp1-Fp2, C3-Fp2, and Fz-C4 were the top 10 features which provided the most significant difference between HC and MCI classes among functional connectivity coefficients. The boxplot of the top ten functional connectivity features for MCI and HC cases is shown in Figure 5. These boxplots show that there can be large disparities between MCI and HC samples depending on these top ten features. The functional connection between the frontal, temporal, central, and parietal scalp regions is also inferred from these findings to have contributed to the most notable disparities between MCI and HC classes.

Selected Features.
In the evaluation of the proposed framework using 10-fold crossvalidation, the SBFS method returns a specific subset of features as selected features in each iteration, which resulted in 10 subsets of features for all iterations. The intersection of these ten subsets is reported and examined in this subsection. The details of these traits are listed in Table 6. These features totaled 361 and were divided into three sets: the spectral set (108), the functional connectivity set (171), and the nonlinear set (82). These results show that all spectral and functional connectivity features were chosen in all iterations of 10-fold crossvalidation and had the biggest proportion of the features chosen, whereas 82 nonlinear characteristics were chosen in all iterations.
3.6. Participant-Independent Evaluation. The suggested framework was validated using the leave-one-participantout crossvalidation approach in order to be assessed from the standpoint of the participant independence. In this method, the spectral, nonlinear, and functional connectivity properties were retrieved from each participant's EEG signals without segmenting them first. The remaining data were then utilized as the training set, with one data set serving as the testing set. Until each case was used as the testing set, this was reproduced 34 times. The classification model for the proposed framework in this evaluation was an LSVM classifier. The resulting confusion matrix by the suggested framework in this assessment is shown in Table 7.  Figure 4: The EEG topographic maps of HC and MCI cases in terms of the alpha, delta, beta, and theta band powers.

Discussion
This paper presents a machine learning framework for MCI diagnosis based on EEG signals using spectral, functional connectivity, and nonlinear features. The findings demonstrate how well the proposed framework performs accurate and reliable classification. The spectral and functional connectivity feature sets outperformed the nonlinear feature set among the feature sets. This may be due to the fact that nonlinear feature sets are less likely than spectral and functional connectivity feature sets to distinguish between MCI and HC patients. These feature sets may also offer biological explanations for cognitive states. This study used the t-test method to examine the band powers of the alpha, delta, beta, and theta frequency subbands in each EEG channel of MCI and HC individuals. This test revealed that the most important distinction between MCI and HC cases was between alpha and beta band powers. Beta band power is more connected to the traits of MCI and other types of dementia than alpha band power. The relationship between beta band power and several MCI-related cognitive processes, including as expectation, consciousness, memory, and problemsolving, may account for the physiological differentiation made by beta band power. Furthermore, the differentiation made by alpha band power may be linked to further MCIrelated cognitive states, such as difficulty focusing and extreme relaxation. Additionally, the top 10 functional connectivity characteristics distinguished between the HC and    Table 8. It should be noted that only [19,21] research utilized the same dataset and evaluation process for their approaches but the content and methodologies of other publications are not the same as our work. The given framework, which has the highest mean of AC among state-ofthe-art methods, outperforms the other state-of-the-art strategies for automatic MCI diagnosis based on EEG signals, according to the results described in Table 8. The fundamental improvement of the proposed framework over earlier research is the integration of the spectral, functional connectivity, and nonlinear properties, which have not been combined in such a way in other works. According to the results, the combination of spectral, functional connectivity, and nonlinear features produced the greatest classification results, exceeding earlier studies that predicated MCI diagnosis on EEG data. Another significant distinction between the proposed framework and earlier research is that the provided framework uses the SBFS algorithm as a feature selection strategy, whereas earlier works employed other feature selection approaches including RF and rank-based feature selection methods.
The main restriction is the tiny sample size of the used dataset. As a result, the proposed method's precise and reliable performance could not be highly generalizable. By dividing each EEG signal into five-minute chunks, we did our best to make up for this restriction. However, further MCI EEG datasets are necessary for the applicability of this paradigm and related strategies. On the other hand, there are very few publicly available MCI EEG datasets with more participants. It is advantageous to generalize the validation of the suggested methodologies because public datasets generally have the potential to open up new avenues for collaboration. The framework that is being given also has a heavy computational load. The method's computing load was raised even though the integration of all feature sets produced the maximum classification accuracy and produced a high-dimensional feature matrix. It further complicates how the framework should be interpreted in terms of physiological and biomarker characteristics. Additionally, it is unclear how clinically applicable this study and all automatic EEG-based MCI detection methods are. More clinical experimental evidence is required to confirm these techniques' clinical efficacy.
However, this paper proposes an automated MCI diagnosis framework based on EEG signals with an accurate and robust classification performance according to the obtained results. It could be developed to use it as a computer-aided diagnosis (CAD) tool for clinical purposes. Future studies can also focus on providing more MCI EEG datasets and implementing deep learning approaches for automatic EEG-based MCI diagnosis.

Conclusion
This study provided a spectral, functional connectivity, and nonlinear feature-based machine learning framework for automatic MCI diagnosis. In this framework, SBFS was applied as a feature selection approach to choose the best subset of features and enhance classification performance. Additionally, many categorization models were assessed in order to choose the best one for the suggested framework. The optimal input for the proposed MCI diagnosis framework was also determined by applying each of the feature sets and their combinations to the proposed framework. Based on the obtained results, the LSVM classifier combined with functional connectivity, spectral, and nonlinear feature sets achieved the best classification performance of the proposed framework, which provided an average AC of 99.4%, SE of 98.8%, SP of 100%, F1 of 99.4%, and FDR of 0%. Additionally, the leave-one-participant-out crossvalidation method was used to evaluate the offered framework. The results showed that the LSVM model had an AC of 91.1%, SE of 88.8%, SP of 93.7%, F1 of 91.4%, and FDR of 5.8%. These findings demonstrate how well the newly presented framework performs accurate and reliable classification. The current methodology offered a superior classification performance in terms of robustness and accuracy when compared to earlier research for EEG-based automatic

Data Availability
The used dataset in this study is a public dataset. The data statement was cheked and it is valid.