Prediction of Cognitive Decline in Temporal Lobe Epilepsy and Mild Cognitive Impairment by EEG, MRI, and Neuropsychology

Cognitive decline is a severe concern of patients with mild cognitive impairment. Also, in patients with temporal lobe epilepsy, memory problems are a frequently encountered problem with potential progression. On the background of a unifying hypothesis for cognitive decline, we merged knowledge from dementia and epilepsy research in order to identify biomarkers with a high predictive value for cognitive decline across and beyond these groups that can be fed into intelligent systems. We prospectively assessed patients with temporal lobe epilepsy (N = 9), mild cognitive impairment (N = 19), and subjective cognitive complaints (N = 4) and healthy controls (N = 18). All had structural cerebral MRI, EEG at rest and during declarative verbal memory performance, and a neuropsychological assessment which was repeated after 18 months. Cognitive decline was defined as significant change on neuropsychological subscales. We extracted volumetric and shape features from MRI and brain network measures from EEG and fed these features alongside a baseline testing in neuropsychology into a machine learning framework with feature subset selection and 5-fold cross validation. Out of 50 patients, 27 had a decline over time in executive functions, 23 in visual-verbal memory, 23 in divided attention, and 7 patients had an increase in depression scores. The best sensitivity/specificity for decline was 72%/82% for executive functions based on a feature combination from MRI volumetry and EEG partial coherence during recall of memories; 95%/74% for visual-verbal memory by combination of MRI-wavelet features and neuropsychology; 84%/76% for divided attention by combination of MRI-wavelet features and neuropsychology; and 81%/90% for increase of depression by combination of EEG partial directed coherence factor at rest and neuropsychology. Combining information from EEG, MRI, and neuropsychology in order to predict neuropsychological changes in a heterogeneous population could create a more general model of cognitive performance decline.


Feature vectors for classifications
• Each of the 14 measures of interaction calculated for EEG segments during the two sessions of rest (14 × 2 classifications).
• Each of the 14 measures during learning, immediate recall, delayed recall after two weeks, immediate recognition, and delayed recognition after two weeks (14 × 5 classifications).
Then, we created combinations of all of these feature vectors: • All EEG measures during rest with all MRI feature vectors (14 × 2 × 3 classifications).
• All EEG measures during rest with all MRI feature vectors and the neuropsychological feature vector (14×2×3×1 classifications).
• All EEG measures during cognitive tasks with all MRI feature vectors and the neuropsychological feature vector (14 × 5 × 3 × 1 classifications).

Feature subset selection
Because of the high dimensionality of the data, we implemented a feature subset selection procedure. Specifically, it is known that when this length exceeds the size of the sample, it can cause artificially high accuracies due to overfitting. This is easily the case for the EEG measures of interaction, because here the length of the feature vector is up to 17 × 17 × 6 for the 17 selected channels and the 6 frequency bands.
Classification and feature subset selection was done in a nested design with 3 layers with 5-fold cross validation (an illustration can be found in Figure 1 in the Supplementary section). We implemented an outer layer as a division of the data into 20% of the data for testing the resulting model, and 80% for feature vector optimisation and cross validation, i.e. submitted to the middle layer. The middle layer is a first inner loop, implemented again with 5-fold cross-validation. This loop aims to estimate the consistency of selected features, since each run yields a different feature vector. The inner layer is a second, thus, nested inner loop, again with 5-fold cross-validation in order to perform adequate feature subset selection. So-called k-fold cross-validation consist of k repetitions of leaving out N/k samples as the training set, while the remaining N − (N/k) samples are used during the training step.
All subsets were drawn in order to maintain the original proportion of the two groups of participants with vs. without cognitive decline on the respective subscale.
The whole algorithm is described as follows: 1. First, one fifth of the segments were excluded as the outer-layer test set for the final validation step in the outer layer, while the remaining four fifths of segments were used as the outer-layer training set and, thus, submitted to the next step. 2. The outer-layer training set obtained from the outer loop was again divided into 5 equal sized subsets, each one maintaining the proportion of group sizes (with/without decline) from the original sample. For each of these 5 sets, the following steps were repeated: (a) The set was left out, the other 4 sets were merged to form the middle-layer training set.
(b) A t-test for the middle-layer training-set segments was calculated between the two conditions, thus yielding one p-value for each entry of the feature vector. (c) The resulting p-values were sorted in ascending order. (d) The feature vector was initiated by taking the feature with the smallest p-value, thus, the initial length was one.
(e) For this feature vector, the classification accuracy was calculated with 5-fold cross-validation, thus, the middle-layer training set was divided into an inner-layer 5-fold partition with an inner-layer training-and testing set (f) Now, the next feature from the sorted list was added. For this feature vector, the inner-layer classification with 5-fold cross-validation was repeated. (g) Now the result was compared to the previous result. The new entry to the feature vector was included only if the condition constraints were met as follows: • The classification accuracy obtained with the current feature vector was ≥ the maximum of the previously obtained classification accuracies; that is, the second accuracy had to be ≥ than the first entry; for the 6th entry accuracy was compared to the accuracy of the previously obtained feature vector of 5 entries, which is the vector with the maximal accuracy. • If the so far best sensitivity/specificity, or in other words, accuracy for segments of the first condition/second condition, respectively, was lower than 0.75, then the obtained sensitivity had to be ≥ than this maximum. • If the so far best specificity/sensitivity, was lower than 0.5, then the obtained specificity had to be larger, that is > than this maximum. (h) This way, features were added and tested for their contribution to the classification accuracy until all available features were used, or until the feature vector reached a maximum of 30 entries, or if more than a consecutive number of 10% of all available features was not added to the feature vector. If 10% was less than 100 features, than the maximum number of features that were tested was 100 or, if the maximum number of available features was lower than 100, the maximum number. 3. The average length N of the resulting 5 optimised feature sets was calculated. The number of times each feature was selected across these 5 runs was counted. A final feature vector was formed by including only those features which were selected at least in 2 of the 5 iterations. If this resulted in no features, all features were included that were selected at least in 1 out of 5 iterations. If the resulting feature vector included more than N features, only the top-most selected 30 features were included. If all features were selected the same amount of times (e.g. one time) a random selection was chosen. 4. The resulting feature vector was used to train a support vector machine on the outer-layer training set, and the resulting model was used to classify the outer-layer test set, which was then used to calculate the general classification accuracy and the within-group accuracy for the two conditions (i.e. sensitivity/specificity).
The threshold of 0.75 was selected as rough estimators for above-chance classification; a value of 0.75 can be considered to be clearly above chance, since the expected chance level would be around 0.5.

Task
The learning session contained the presentation of 72 pairs of german nouns. The order of the words was kept constant over all participants. Of these pairs of words, 36 had an obvious semantic relationship (such as water -glas), and 36 had no obvious relationship (such as heaven -bookshelf). This variation should ease the remembering for half of the words, while making it more difficult for the second half. First, after presentation of each pair of words, the participant had to indicate whether there was a relationship between the two words or not, by pressing a button on the keyboard. After the button was pressed the participant was prompted on the screen with the question 'Relation between words?' and in a second line below the instruction 'Please spell out the relationship and press button to continue.' In this time window, the participant was requested to spell out the potential relationship that came to his or her mind. This step allowed us to control for the learning strategy employed by the participants. Thinking of a possible relationship should facilitate learning.
The recall session consisted of 72 trials, repeating the 72 word-pairs from the learning phase in the same order. Each trial was formed by a cued recall and a recognition phase. In the cued recall, only the first word was given on the screen, and a question mark indicated that the second word should be reported. Participants proceeded with a button press to the next screen on which they were asked to spell out the second word or to indicate that they had forgotten it. An experimenter took a note on the correctness of the word. Only identical words were considered as correct, with one exception where the plural of a word was accepted as correct (storystories). After that, a further button press brought the participant to the recognition phase. Here, next to the cue word, three words were presented. The correct word appeared in a pseudo-randomized order on the three positions, and the participants had to select the correct word via button press.