The Implicit Function as Squashing Time Model: A Novel Parallel Nonlinear EEG Analysis Technique Distinguishing Mild Cognitive Impairment and Alzheimer's Disease Subjects with High Degree of Accuracy

Objective. This paper presents the results obtained using a protocol based on special types of artificial neural networks (ANNs) assembled in a novel methodology able to compress the temporal sequence of electroencephalographic (EEG) data into spatial invariants for the automatic classification of mild cognitive impairment (MCI) and Alzheimer's disease (AD) subjects. With reference to the procedure reported in our previous study (2007), this protocol includes a new type of artificial organism, named TWIST. The working hypothesis was that compared to the results presented by the workgroup (2007); the new artificial organism TWIST could produce a better classification between AD and MCI. Material and methods. Resting eyes-closed EEG data were recorded in 180 AD patients and in 115 MCI subjects. The data inputs for the classification, instead of being the EEG data, were the weights of the connections within a nonlinear autoassociative ANN trained to generate the recorded data. The most relevant features were selected and coincidently the datasets were split in the two halves for the final binary classification (training and testing) performed by a supervised ANN. Results. The best results distinguishing between AD and MCI were equal to 94.10% and they are considerable better than the ones reported in our previous study (∼92%) (2007). Conclusion. The results confirm the working hypothesis that a correct automatic classification of MCI and AD subjects can be obtained by extracting spatial information content of the resting EEG voltage by ANNs and represent the basis for research aimed at integrating spatial and temporal information content of the EEG.


INTRODUCTION
The electroencephalogram (EEG), since its introduction, was considered the only methodology allowing a direct and online view of the "brain at work." At the same time, abnormalities of the "natural" aging of the brain have yet been noticed in different types of dementias. The introduction of different structural imaging technologies in the 1970's and 1980's (computed tomography and magnetic resonance imaging) and the good results in the study of brain function obtained with techniques dealing with regional metabolism, glucose and oxygen consumption, and blood flow (single-photon emission computed tomography, positron emission tomography, functional magnetic resonance imaging) during the following two decades closet the role of EEG in a secondary line, particularly in the evaluation of Alzheimer's dementia (AD) and related dementias.

Computational Intelligence and Neuroscience
Lately, EEG computerized analysis in aged people has been enriched by various modern techniques able to manage the large amount of information on time-frequency processes at single recording channels (wavelet, neural networks, etc.) and on spatial localization of these processes [2][3][4][5][6][7][8][9][10]. The results have encouraged the scientific community in exploring electromagnetic brain activity, which changes by aging and can greatly deteriorate, through the different stages of the various forms of dementias. The use of neural networks represents an alternative and very promising attempt to make EEG analysis suitable for clinical applications in agingthanks to their ability in extracting specific and smooth characteristics from huge amounts of data. Computerized processing of a large quantity of numerical data in wakeful relaxed subjects ("resting" EEG) made easier the automatic classification of the EEG signals, providing promising results even using relatively simple linear classifiers such as logistic regression and discriminant analysis. Using global field power (i.e., the sum of the EEG spectral power across all electrodes) as an input, some authors reached an accurate differential diagnosis between AD and MCI subjects with accuraces of 84% and 78%, respectively [11,12]. Using evaluation of spectral coherence between electrode pairs (i.e., a measure of the functional coupling) as an input to the classification, the correct classification reached 82% when comparing the AD and normal aged subjects [13,14].
Spatial smoothness and temporal fluctuation of the EEG voltage are considered as measures of the synaptic impairment, along with the notion that cortical atrophy can affect the spatiotemporal pattern of neural synchronization generating the scalp EEG. These parameters have been used to successfully discriminate the respective distribution of probable AD and normal aged subjects [15]. The interesting new idea in that study [15] was the analysis of resting EEG potential distribution instant by instant rather than the extraction of a global index along periods of tens of seconds or more. Table 1 summarizes the results of a higher preclassification rate with ANN's analysis than with standard linear techniques, such as multivariate discriminatory analysis or the nearest-neighbour analysis [16]. Some authors [17] developed a system consisting of recurrent neural nets processing spectral data in the EEG. They succeeded in classifying AD patients and non-AD patients with a sensitivity of 80% and a specificity of 100%. In other studies, classifiers based on ANNs, wavelets, and blind source separation (BSS) achieved promising results [18,19]. In a study from the same workgroup of this paper, we used a sophisticated technique based on blind source separation and wavelet preprocessing developed by Vialatte et al. [18] and Cichocki et al. [20][21][22] recently, whose results appear to be the best in the field when compared to the literature. We named this method BWB model (blind source separation + wavelet + bumping modeling), [1]. The results obtained in the classifications tasks, comparing AD patients to MCI subjects, using the BWB model, ranged from 78.85% to 80.43% (mean = 79.48%).
The aim of this study is to assess the strength of a novel parallel nonlinear EEG analysis technique in the differential classification of MCI subjects and AD patients, with a high degree of accuracy, based on special types of artificial neural networks (ANNs) assembled in a novel methodology able to compress the temporal sequence of electroencephalographic (EEG) data into spatial invariants. The working hypothesis is that this new approach to EEG based on nonlinear ANNsbased methods can contribute to improving the reliance of the diagnostic phase in association with other clinical and instrumental procedures. Compared to the results already presented by the workgroup [1], the included new artificial organism TWIST could produce a better classification between AD and MCI.

MATERIAL AND METHODS
The IFAST method includes two phases.
(1) A squashing phase: an EEG track is compressed in order to project the invariant patterns of that track on the connections matrix of an autoassociated ANN. The EGG track/subject is now represented by a vector of weights, without any information about the target (AD or MCI). (2) "TWIST" (training with input selection and testing) phase: a technique of data resampling based on the genetic algorithm GenD, developed at Semeion Research Center. The new dataset which is composed by the connections matrix (output of the squashing phase), plus the target assigned to each vector, is splitted into two sub samples, each one for five times with a similar probability density function, in order to train, test, and validate the ANN models.

General philosophy
The core of this new methodology is that the ANNs do not classify subjects by directly using the EEG data as an input. Rather, the data inputs for the classification are the weights of the connections within a recirculation (nonsupervised) ANN trained to generate the recorded EEG data. These connection weights represent a model of the peculiar spatial features of the EEG patterns at the scalp surface. The classification, based on these weights, is performed by a standard supervised ANN. This method, named IFAST (acronym for implicit function as squashing time), tries to understand the implicit function in a multivariate data series compressing the temporal sequence of data into spatial invariants and it is based on three general observations.
(1) Every multivariate sequence of signals coming from the same natural source is a complex asynchronous dynamic highly nonlinear system, in which each channel's behavior is understandable only in relation to all the others. (2) Given a multivariate sequence of signals generated from the same source, the implicit function defining the above-mentioned asynchronous process is the conversion of that same process into a complex hypersurface, representing the interaction in time of all the channels' behavior.
(3) The 19 channels in the EEG represent a dynamic system characterized by asynchronous parallelism. The nonlinear implicit function that defines them as a whole represents a metapattern that translates into space (hypersurface) that the interactions among all the channels create in time.
The idea underlying the IFAST method resides in thinking that each patient's 19-channel EEG track can be synthesized by the connection parameters of an autoassociated nonlinear ANN trained on the same track's data.
There can be several topologies and learning algorithms for such ANNs; what is necessary is that the selected ANN be of the autoassociated type (i.e., the input vector is the target for the output vector) and that the transfer functions defining it benon linear and differentiable at any point.
Furthermore, it is required that all the processing made on every patient be carried out with the same type of ANN, and that the initial randomly generated weights have to be the same in every learning trial. This means that, for every EEG, every ANN has to have the same starting point, even if that starting point is random.
We have operated in two ways in order to verify this method's efficiency.
(1) Different experiments were implemented based on the same samples. By "experiment," we mean a complete application of the whole procedure to every track of the sample.
(2) The second way is using autoassociated ANNs with different topologies and algorithms on the entire sample in order to prove that any autoassociated ANN can carry out the task of translating into the space domain the whole EEG track through its connections.

The squashing phase
The first application phase of the IFAST method may be defined as "squashing." It consists in compressing an EEG track Autoassociative backpropagation with two layers ij,k , W 0j,k ); con W 0j,j = 0. W ij,j = 0 means that every ith EEG track is processed by the two-layered autoassociated ANN in which W j, j = 0, as the connections on the main diagonal are not present (see Figure 1). It is possible to use different types of autoassociated ANNs to run this search for spatial invariants in every EEG.
(1) A backpropagation without a hidden unit layer and without connections on the main diagonal (for short, AutoBp):

First hidden layer
Second hidden layer New recirculation network This is an ANN featuring an extremely simple learning algorithm: AutoBP is an ANN featuring N 2 − N internode connections and N bias inside every exit node, for a total of N 2 adaptive weights. This algorithm works similarly to logistic regression and can be used to establish the dependency of variables from each others. The advantage of AutoBP is due to its learning speed, in turn due to the simplicity of its topology and algorithm. Moreover, at the end of the learning phase, the connections between variables, being direct, have a clear conceptual meaning. Every connection indicates a relationship of faded excitement, inhibition, or indifference between every pair of channels in the EEG track of any patient.
The disadvantage of AutoBP is its limited convergence capacity, due to that same topological simplicity. That is to say, complex relationships between variables may be approximated or ignored (for details, see [23,24]).
(2) New recirculation network (for short, NRC) is an original variation [25] of an ANN that has existed in the literature [26] and was not considered to be useful to the issue of autoassociating between variables.
The topology of the NRC which we designed includes only one connection matrix and four layers of nodes: one input layer, corresponding to the number of variables; one output layer whose target is the input vector; two layers of hidden nodes with the same cardinality independent from the cardinality of the input and output layers. The matrix between input-output nodes and hidden nodes is fully con-nected and in every learning cycle, it is modified both ways, according to the following equations: NRC then features N 2 internode adaptive connections and 2·N intranode adaptive connections (bias). The advantages of NRC are its excellent convergence ability on complex datasets and, as a result, an excellent ability to interpolate complex relations between variables. The disadvantages mainly have to do with the vector codification that the hidden units run on the input vectors making the conceptual decoding of its trained connections difficult.
(3) Autoassociative multilayer perceptron (for short, AMLP) may be used with an auto-associative purpose (encoding)-thanks to its hidden units layer, that decomposes the input vector into main nonlinear components. The algorithm used to train the MLP is a typical backpropagation algorithm [27].
The MLP, with only one layer of hidden units, features two connection matrices and two intranode connection vectors (bias), according to the following definitions: N = number of input variables = number of output variables; M = number of nodes in the hidden layer; Hidden Output Multilayer perceptron (IFAST : noise reduction) The advantages of MLP are its well-known flexibility and the strength of its backpropagation algorithm. Its disadvantages are the tendency to saturate the hidden nodes in the presence of nonstationary functions, and the vector codification (allocated) of the same hidden nodes.
(4) Elman's hidden recurrent [28] can be used for autoassociating purposes, again using the backpropagation algorithm (for short, autoassociative hidden recurrent AHR, see Figure 4). It was used in our experimentation as a variation for MLP with memory set to one step. It is not possible to call it a proper recurring ANN in this form, because the memory would have been limited to one record before. We used this variation only to give the ANN an input vector modulated at any cycle by the values of the previous input vector. Our purpose was not to codify the temporal dependence of the entrance signals, but rather to give the ANN a "smoother" and more mediated input sequence. The number of connections in the AHR BP is the same as an MLP with extended input, whose cardinality is equal to the number of hidden units: The software IFAST (developed in Borland C) [29] produces the squashing phase through the training operated by these four networks; in the "MetaTask" section the user can define the whole procedure by selecting (i) the files that will be processed (in our case every complete EEG), Autoassociative hidden recurrent (ii) the type of network, (iii) the sequence of the records for every file (generally random), (iv) the number of epochs of training, (v) a training stop criterion (number of epochs or minimum RMSE), (vi) the number of hidden nodes of the autoassociated network, which determines the length of the output vector of the file processed (vii) the number of matrices, depending on the type of the autoassociated network selected, (viii) the learning coefficient and delta rate.

TWIST
From this phase, the procedure is completely different from the one described in our precedent work [1]. The choice of following a different methodology was due to the will of improving the classification results and removing causes of loss of information.
In the former study, the dataset coming from the squashing phase was compressed by another autoassociated ANN, in the attempt of eliminating the invariant pattern, codified from the previous ANN, relating to specific characteristic of the brain (anxiety level, background level, etc.) which is not useful for the classification, leaving the most significant ones unaltered. Then the new compressed datasets were split into two halves, (training and test) using T&T [30] evolutionary algorithm, for the final binary classification.
Rather in this work, the elimination of the noisiest features and the classification run parallel to each other. We will show that the new procedure has obtained better performances.
First of all, a new dataset called "Diagnostic DB" was created for easier understanding. The diagnostic gold standard has been established, for every patient, in a way that is completely independent of the clinical and instrumental examinations (magnetic resonance imaging, etc.) carried out by a group of experts whose diagnosis has been also reconfirmed in time. 6 Computational Intelligence and Neuroscience The diagnoses have been divided into the following two classes, based on delineated inclusion criteria: (a) elderly patients with "cognitive decline" (MCI); (b) elderly patients with "probable Alzheimer" (AD); We rewrote the last generated dataset, adding to every H ns vector the diagnostic class that an objective clinical examination had assigned to every patient. The H ms vectors represent the invariant traits s as defined by the squashing phase for every m-th subject EEG track, that is, the columns number of the connections matrix depending on the specific autoassociated network used.
Then the dataset is ready for the next step. This new phase is called TWIST [31] and includes the utilization of two systems T&T and IS [30], both based on a genetic algorithm, GenD, developed at Semeion Research Centre [32].
T&T systems are robust data resampling techniques able to arrange the source sample into subsamples, each one with a similar probability density function. In this way the data split into two or more subsamples in order to train, test, and validate the ANN models more effectively.
The IS system is an evolutionary system for feature selection based on a wrapper approach. While the filter approach looks at the inner properties of a dataset providing a selection that is independent of the classification algorithm to be used afterwards, in the wrapper approach various subsets of features are generated and evaluated using a specific classification model using its performances as a guidance to optimization of subsets.
The IS system reduces the amount of data while conserving the largest amount of information available in the dataset. The combined action of these two systems allows us to solve two frequent problems in managing artificial neural networks: (1) the size and quality of the training and testing sets, (2) the large number of variables which, apparently, seem to provide the largest possible amount of information. Some of the attributes may contain redundant information, which is included in other variables, or confused information (noise) or may not even contain any significant information at all and be completely irrelevant.
Genetic algorithms have been shown to be very effective as global search strategies when dealing with nonlinear and large problems.
The "training and testing" algorithm (T&T) is based on a population of n ANNs managed by an evolutionary system. In its simplest form, this algorithm reproduces several distribution models of the complete dataset D Γ (one for every ANN of the population) in two subsets (d [tr] Γ , the training set, and d [ts] Γ , the testing set). During the learning process each ANN, according to its own data distribution model, is trained on the subsample d [tr] Γ and blind-validated on the subsample d [ts] Γ . The performance score reached by each ANN in the testing phase represents its "fitness" value (i.e., the individual probability of evolution). The genome of each "network in-dividual" thus codifies a data distribution model with an associated validation strategy. The n data distribution models are combined according to their fitness criteria using an evolutionary algorithm. The selection of "network individuals" based on fitness determines the evolution of the population, that is, the progressive improvement of performance of each network until the optimal performance is reached, which is equivalent to the better division of the global dataset into subsets. The evolutionary algorithm mastering this process, named "genetic doping algorithm" (GenD for short), created at Semeion Research Centre, has similar characteristics to a genetic algorithm [33][34][35][36][37] but it is able to maintain an inner instability during the evolution, carrying out a natural increase of biodiversity and a continuous "evolution of the evolution" in the population.
The elaboration of T&T is articulated in two phases. In a preliminary phase, an evaluation of the parameters of the fitness function that will be used on the global dataset is performed. The configuration of a standard backpropagation network that most "suits" the available dataset is determined: the number of layers and hidden units, some possible generalizations of the standard learning law, the fitness values of the population's individuals during evolution. The parameters thus determined define the configuration and the initialization of all the individual networks of the population and will then stay fixed in the following computational phase. The accuracy of the ANN performance with the testing set will be the fitness of that individual (i.e., of that hypothesis of distribution into two halves of the whole dataset).
In the computational phase, the system extracts from the global dataset the best training and testing sets. During this phase, the individual network of the population is running, according to the established configuration and the initialization parameters.
Parallel to T&T runs "Input Selection" (IS), an adaptive system, based on the same evolutionary algorithm GenD, consisting of a population of ANN, in which each one carries out a selection of the independent and relevant variables on the available database.
The elaboration of IS, as for T&T, is developed in two phases. In the preliminary phase, a standard backpropagation ANN is configured in order to avoid possible over fitting problems. In the computational phase, each individual network of the population, identified by the most relevant variables, is trained on the training set and tested on the testing set.
The evolution of the individual network of the population is based on the algorithm GenD. In the I.S. approach, the GenD genome is built by n binary values, where n is the cardinality of the original input space. Every gene indicates if an input variable is to be used or not during the evaluation of the population fitness. Through the evolutionary algorithm GenD, the different "hypotheses" of variable selection, generated by each ANN of the population, change over time, at each generation; this leads to the selection of the best combination of input variables. As in the T&T systems, the genetic operators crossover and mutation are applied on the ANNs population; the rates of occurrence for both operators 7 are self-determined by the system in an adaptive way at each generation.
When the evolutionary algorithm no longer improves its performance, the process stops, and the best selection of the input variables is employed on the testing subset.
The software based on TWIST phase algorithm (developed in C-Builder [31]) allows the configuration of the genetic algorithm GenD: • the population (the number of individual networks), • number of hidden nodes of the standard BP, • number of epochs, • the output function SoftMax, • the cost function (classification rate in our case).
The generated outputs are the couple of files SetA and SetB (subsets of the initial db defined by the variables selected) that will be used in the validation protocol (see Section 2.3).

The validation protocol
The validation protocol is a fundamental procedure to verify the models' ability to generalize the results reached in the Testing phase of each model. The application of a fixed protocol measures the level of performance that a model can produce on data that are not present in the testing and/or training sample. We employed the so-called 5 × 2 cross-validation protocol (see Figure 6) [38]. This is a robust protocol that allows one to evaluate the allocation of classification errors. In this procedure, the study sample is randomly divided ten times into two subsamples, always different but containing a similar distribution of cases and controls.
The ANNs' good or excellent ability to diagnostically classify all patients in the sample from the results of the confusion matrices of these 10 independent experiments would indicate that the spatial invariants extracted and selected with our method truly relate to the functioning quality of the brains examined through their EEG. The samples were matched for age, gender, and years of education. Part of the individual data sets was used for previous EEG studies [2][3][4]. In none of these studies we addressed the specific issue of the present study. Local institutional ethics committees approved the study. All experiments were performed with the informed and overt consent of each participant or caregiver.

Subjects and diagnostic criteria
The present inclusion and exclusion criteria for MCI were based on previous seminal studies [39][40][41][42][43][44][45][46] and designed for selecting elderly persons manifesting objective cognitive deficits, especially in the memory domain, who did not meet criteria for a diagnosis of dementia or AD, namely, with, (i) objective memory impairment on neuropsychological evaluation, as defined by performances ≥ 1.5 standard deviation below the mean value of age and educationmatched controls for a test battery including memory rey list (immediate recall and delayed recall), Digit forward and Corsi forward tests; (ii) normal activities of daily living as documented by the patient's history and evidence of independent living; (iii) clinical dementia rating score of 0.5; (iv) geriatric depression scale scores < 13.
Exclusion criteria for MCI were: (i) mild AD, as diagnosed by the procedures described above; (ii) evidence of concomitant dementia such as frontotemporal, vascular dementia, reversible dementias (including pseudodepressive dementia), fluctuations in cognitive performance, and/or features of mixed dementias; (iii) evidence of concomitant extrapyramidal symptoms; (iv) clinical and indirect evidence of depression lower than 14 as revealed by GDS scores; (v) other psychiatric diseases, epilepsy, drug addiction, alcohol dependence, and use of psychoactive drugs including acetylcholinesterase inhibitors or other drugs enhancing brain cognitive functions; (vi) current or previous systemic diseases (including diabetes mellitus) or traumatic brain injuries.
Probable AD was diagnosed according to NINCDS-ADRDA criteria [47]. Patients underwent general medical, neurological, and psychiatric assessments and were also rated with a number of standardized diagnostic and severity instruments that included MMSE [48], clinical dementia rating scale [49], geriatric depression scale [50], Hachinski ischemic scale [51], and instrumental activities of daily living scale [52]. Neuroimaging diagnostic procedures (computed tomography or magnetic resonance imaging) and complete laboratory analyses were carried out to exclude other causes of progressive or reversible dementias, in order to have a homogenous probable AD patient sample. The exclusion criteria included, in particular, any evidence of (i) front temporal dementia diagnosed according to criteria of Lund and Manchester groups [53]; (ii) vascular dementia as diagnosed according to NINDS-AIREN criteria [54] and neuroimaging evaluation scores [55,56]; (iii) extra pyramidal syndromes; (iv) reversible dementias (including pseudo dementia of depression); (v) Lewy body dementia according to the criteria by McKeith et al. [57]. It is important to note that benzodiazepines, antidepressant, and/or antihypertensive drugs were withdrawn for about 24 hours before the EEG recordings.

EEG recordings
EEG data were recorded in wake rest state (eyes-closed), usually during late morning hours from 19 electrodes positioned according to the international 10-20 system (i.e., analysis was carried out after EEG data were rereferenced to a common average reference. The horizontal and vertical electrooculogram was simultaneously recorded to monitor eye movements. An operator controlled, online, the subject and the EEG traces by alerting the subject any time there were signs of behavioural and/or EEG drowsiness in order to keep the level of vigilance constant. All data were digitized (5 minutes of EEG; 0.3-35 Hz band pass 128 Hz sampling rate).
The duration of the EEG recording (5 minutes) allowed the comparison of the present results with several previous AD studies using either EEG recording periods shorter than 5 minutes [58][59][60][61][62] or shorter than 1 minute [7,8]. Longer resting EEG recordings in AD patients would have reduced data variability, but they would have increased the possibility of EEG "slowing" because of reduced vigilance and arousal.
EEG epochs with ocular, muscular, and other types of artefact were preliminarily identified by a computerized automatic procedure. Those manifesting sporadic blinking artefacts (less than 15% of the total) were corrected by an autoregressive method [63].
The performances of the software package on EOG-EEG-EMG data related to cognitive-motor tasks were evaluated with respect to the preliminary data analysis performed by two expert electroencephalographists (gold standard). Due to its extreme importance for multicentric EEG studies, we compared the performances of two representative "regression" methods for the EOG correction in time and frequency domains. The aim was the selection of the most suitable method in the perspective of a multicentric EEG study. The results showed an acceptable agreement of approximately 95% between the human and software behaviors, for the detection of vertical and horizontal EOG artifacts, the measurement of hand EMG responses for a cognitive-motor paradigm, the detection of involuntary mirror movements, and the detection of EEG artifacts. Furthermore, our results indicated a particular reliability of a "regression" EOG correction method operating in time domain (i.e., ordinary least squares). These results suggested the use of the software package for multicentric EEG studies.
Two independent experimenters-blind to the diagnosis-manually confirmed the EEG segments accepted for further analysis. A continuous segment of artefact-free EEG data lasting for 60 seconds was used for subsequent analyses for each subject.

Preprocessing protocol
The entire sample of 466 subjects was recorded at 128 Hz for 1 minute. The EEG track of each subject was represented by a matrix of 7680 sequential rows (time) and 19 columns (the 19 channels).
Every autoassociative ANN independently processed every EEG of the total sample in order to assess the different capabilities of each ANN to extract the key information from the EEG tracks.
After this processing, each EEG track is squashed into the weights of every ANN resulting in 4 different and independent datasets (one for each ANN), whose records are the squashing of the original EEG tracks and whose variables are the trained weights of every ANN.
After TWIST processing, the most significant features for the classification were selected and at the same time the training set and the testing set with a similar function of probability distribution that provides the best results in the classification were defined.
The validation protocol 5x2CV was applied blindly to test the capabilities of a generic supervised ANN to correctly classify each record (the number of inputs depending on the number of variables selected by IS).
A supervised MLP was used for the classification task, without hidden units. In every experimentation, in fact, we were able to train perfectly the ANN in no more than 100 epochs (root mean square error (RMSE) < 0.0001). That means that in this last phase, we could have used also a linear classifier to reach up the same results.

RESULTS
The experimental design consisted in 10 different and independent processing for the classification AD versus MCI. Every experiment was conducted in a blind and independent manner in two directions: training with subsample A and blind testing with subsample B versus training with subsample B and blind testing with subsample A. Table 3 shows the mean results summary for the classifications of AD versus MCI, compared to the results obtained in the experimentations reported in a previous study [1], based on a different protocol (without the TWIST phase).
Regarding the protocol IFAST-TWIST, the ABP and AHR achieved the best results comparing AD with MCI subjects (94.10% and 93.36%), but all the performances are considerably better than those obtained in the previous study.

DISCUSSION
Various types of nonreversible forms of dementias represent a major health problem in all those countries where the average life span is progressively increasing. There is a growing amount of scientific and clinical evidences that brain neural networks rearrange their connections and synapses to compensate neural loss due to neuro degeneration [64]. This process of plasticity maintains brain functions at an acceptable level before clear symptoms of dementia appear. The length of this presymptomatic period is currently unknown but, in the case of AD, often preceded by MCI, it lasts several years. Despite the lack of an effective treatment, able to block progression and/or to reverse the cognitive decline, it is generally agreed that early beginning of the available treatment (i.e., inhibitors of anticholinesterase drugs) provides the best results [65]. A significant advancement in the fight against dementias would be to have in our hands a non-invasive, easyto-perform, and low-cost diagnostic tool capable of screening with a high rate of positive prognostication a large at-risk population sample (i.e., MCI, subjects with genetic defects and a family history of dementias or other risk factors). To test this issue, we performed automatic classification of MCI and AD subjects extracting with ANNs the spatial content of the EEG voltage. The results showed that the correct automatic classification rate reached 94.10% for AD versus MCI, better than the classification rate obtained with the more advanced currently available nonlinear techniques. These results confirm the working hypothesis that this EEG approach based on ANNs can contribute to improve the precision of the diagnostic phase in association with other clinical and instrumental procedures. The present results suggest that the present variant of IFAST procedure (TWIST) could be used for a large screening of MCI subjects under control, to detect the first signs of conversion to AD for triggering further clinical and instrumental evaluations crucial for an early diagnosis of AD (this is invaluable for the beginning of cholinergic therapies that are generally carried out only in overt AD patients due to gastro intestinal side effects). Indeed, the actual percentage of correct discrimination between MCI and probable AD is around 94%. This rate is clearly insufficient for the use of the IFAST procedure for a diagnosis, due to 6% of misclassifications. The present results prompt future studies on the predictive value of cortical EEG rhythms in the early discrimination of MCI subjects who will convert to AD. This interesting issue could be addressed by a proper longitudinal study. MCI subjects should be divided into "converted" and "stable" subgroups, according to final out-come as revealed by followup after about 5 years (i.e., the period needed for conversion of all MCI subjects fated to decline over time based on the mentioned literature). That study should demonstrate that the spatial EEG features at baseline measurement as revealed by the IFAST procedure might be discriminated between MCI converted and MCI stable subjects. Furthermore, baseline values of spatial EEG features in individual MCI subjects should be successfully used as an input by the IFAST procedure to predict the conversion to dementia. This intriguing research perspectives are the sign of the heuristic value of the present findings. However, apart from clinical perspectives, the present findings have an intrinsic value for clinical neurophysiology. They provided further functional data from a large aged population to support the idea that spatial features of EEG, as a reflection of the cortical neural synchronization, convey information content able to discriminate preclinical stage of dementia (MCI) from probable AD.
Furthermore, the evaluation of that diagnostic contribution may motivate future scientific studies probing its usefulness for prognosis and monitoring of AD across temporal domain.
Although EEG would fulfil up all the previous requirements, the way in which it is currently utilized does not guarantee its ability in the differential diagnosis of MCI, early AD, and healthy nonimpaired aged brains. The neurophysiologic community always had the perception that there is much more information about brain functioning embedded in the EEG signals than those actually extracted in a routine clinical context. The obvious consideration is that the generating sources of EEG signals (cortical postsynaptic currents at dendritic tree level) are the same ones as those attacked by the factors producing symptoms of dementia. The main problem is that usually in the signal-to-noise ratio the latter is largely overcoming the former. This paper suggests that the reasons why the clinical use of EEG has been somewhat limited and disappointing with respect to early diagnosis of AD and identification of MCIdespite the progresses obtained in recent years-are due to the following, erring, general principles: (A) identify and synthesizing the mathematical components of the signal coming from each individual recording site, considering the EEG channel as exploring only one, discrete brain area under the exploring electrode, and suming up all of them in attempt to reconstruct the general information; (B) focusing on the time variations of the signal coming from each individual recording site, (C) mainly employing linear analysis instruments.
The basic principle which is proposed in this work is very simple; all the signals from all the recording channels are analyzed together-and not individually-in both time and space. The reason for such an approach is quite simple; the instant value of the EEG in any recording channel depends, in fact, upon its previous and following values, and upon the previous and following values of all the other recording channels.
We believe that the EEG of each individual subject is defined by a specific background signal model, distributed in time and in the space of the recording channels (19 in our case). Such a model is a set of background invariant features able to specify the quality (i.e., cognitive level) of the brain activity, even in so a called resting condition. We all know that the brain never rests, even with closed eyes and if the subject is required to relax. The method that we have applied in this research context completely ignores the subject's contingent characteristics (age, cognitive status, emotions, etc.). It utilized a recurrent procedure which squeezes the significant signal and progressively selects the features useful for the classification.

CONCLUSIONS
We have tested the hypothesis that a correct automatic classification of MCI and AD subjects can be obtained extracting spatial information content of the resting EEG voltage by ANNs. The spatial content of the EEG voltage was extracted by a novel step-wise procedure. The core of this procedure was that the ANNs did not classify individuals using EEG data as an input; rather, the data inputs for the classification were the weights of the connections within an ANN trained to generate the recorded EEG data. These connection weights represented a useful model of the peculiar spatial features of the EEG patterns at scalp surface. Then the new system TWIST, based on a genetic algorithm, processed the weights to select the most relevant features and at the same time to create the best subset, training set, and testing set, for the classification. The results showed that the correct automatic classification rate reached 94.10% for AD versus MCI. The results obtained are superior to those obtained with the more advanced currently available nonlinear techniques. These results confirm the working hypothesis and represent the basis for research designed to integrate EEG-derived spatial and temporal information content using ANNs.
From methodological point of view, this research shows the need to analyze the 19 EEG channels of each person as a whole complex system, whose decomposition and/or linearization can involve the loss of many key information.
The present approach extends those of previous EEG studies applying advanced techniques (wavelet, neural networks, etc.) on the data of single recording channels; it also complements those of previous EEG studies in aged people, evaluating the spatial distributions of the EEG data instant by instant and the brain sources of these distributions [2][3][4][5][6][7][8][9][10].
With complex systems, it is not possible to establish a priori which information is relevant and which is not. Nonlinear autoassociative ANNs are a group of methods to extract from these systems the maximum of linear and nonlinear associations (features) able to explain their "strange" dynamics.
This research also documents the need to use different architectures and topologies of ANNs and evolutionary systems within complex procedures in order to optimize a specific medical target. This study's EEG analysis used (1) different types of nonlinear autoassociative ANNs for squashing data; (2) a new system, TWIST, based on a genetic algorithm, which manages supervised ANNs in order to select the most relevant features and to optimize the distribution of the data in training and testing sets; (3) a set of supervised ANNs for the final patterns recognition task.
It is reasonable to conclude that ANNs and other adaptive systems should be used as cooperative adaptive agents within a structured project for complex, useful applications.

NOTE
IFAST is a european patent (application no. EP06115223.7date of receipt 09.06.2006). The owner of the patent is Semeion Research Center of Sciences of Communication, Via Sersale 117, Rome 00128, Italy. The inventor is Massimo Buscema. For software implementation, see [53]. Dr. C. D. Percio (Associazione Fatebenefratelli per la Ricerca) organized the EEG data cleaning.