Deep Learning Approach for Automatic Classification of Ocular and Cardiac Artifacts in MEG Data

,


Introduction
In magnetoencephalography (MEG) recordings, electrophysiological brain activity is measured noninvasively by means of detecting small magnetic field changes of the electrical activity within the human brain. In addition to environmental noise, in MEG recordings the neuromagnetic brain signals interfere with strong magnetic field components from ocular and cardiac activities. As the signal strength of these biological artifacts is relatively large compared to the brain signal, the artificial signals need to be separated and removed prior to analysis.
For successful isolation of ocular and cardiac artifacts from MEG and electroencephalography (EEG) recordings independent component analysis (ICA) has been extensively and successfully used over the last ten years [1][2][3][4][5][6][7][8]. Semiand fully automatic ICA-based approaches use different strategies to identify ocular and cardiac activities from a set of independent components. To tackle this problem, a variety of time-series-, frequency-, and topographical-based models have been proposed [1,3,7,9,10]. In time-series based models, a statistical evaluation on the decomposed signals [3,11,12] is typically performed, while spatial-based approaches often perform a linear transformation to the sensor space [9,13] to localize or estimate the origin of the source [14]. In many of these approaches a reference signal is needed, such as recordings from electrocardiography (ECG) and electrooculography (EOG), which is used to identify the features of interest. The signal quality of such recordings strongly influences the reliability of the artifact rejection method applied. In particular, poor electrode conductivities, positioning of the electrodes, and even slight arm or chest movements will have a large impact on the ECG and EOG signal quality. In only a few studies, the artifact rejection methods applied have not relied on additional reference signals [3,[15][16][17].

Journal of Engineering
In recent years, statistical machine learning methods have been introduced into the field of neuroimaging [18,19]. These methods have been increasingly used for feature extraction, classification, and decoding in MEG and EEG [20][21][22]. The different approaches can be organized into supervised, unsupervised, semisupervised, or reinforcement learning based on the desired outcome of the algorithm. As pointed out by the authors in [18,20] supervised machine learning methods can be used in decoding or encoding settings to relate brain images to behavioral or clinical observations [18] or identification of artifacts [20].
Another deep learning based multilayer technique is the so-called convolutional neural network (CNN) [23]. In machine learning, CNN refers to a type of feed-forward DNN and is comprised of one or more convolution-pooling layers. CNN-based approaches typically require some preprocessing steps where feature vectors are extracted, for example, using independent component analysis (ICA) on which a classifier is then applied [19,20,24]. Recently, CNN has been successfully applied to EEG recordings to classify motor imaginary signals [21], mental load [25] and artifact rejection [20]; CNNs, however, are often applied to classify image-types of input data, while different approaches have been suggested for the classification of time-series data [26].
In this paper we aim to introduce a combined deep learning approach for the classification of ocular and cardiac artifacts in MEG recordings without using reference signals. To achieve this, a deep convolutional neural network (DCNN) approach is applied to decomposed MEG signals, in which the spatial and temporal signatures are combined to construct an appropriate feature space that could increase the linear separability of the initial data. This way, the overall classification process is greatly simplified and allows for a fast and robust classification with high accuracies.
The main contributions of this study are twofold: (1) we will demonstrate that the DCNN model introduced here can successfully be used in the context of ocular and cardiac artifact rejection using MEG and (2) we describe a fully automatic artifact rejection workflow which can be set up for MEG/EEG without the need for EOG and ECG recordings.

MEG Dataset.
In order to obtain a dataset of sufficient size for the classification task we used a total of 132 MEG recordings from 48 subjects. The neuromagnetic data were recorded using a whole-head magnetometer system with 248 channels (MAGNES-3600WH MEG) from 4D Neuroimaging. To avoid possible bias due to different experimental paradigms the dataset includes task (auditory, visual, and motor tasks) and nontask (resting state) related experiments. The dataset consists of three blocks of experiments or data types, in which the heart rate and ocular activity are expected to vary and where the number of recordings used was almost equally balanced for each subject. The first block (40 experiments) includes data from task related experiments, while the second and third blocks consist of data recorded in rest, where the subject was asked to have their eyes open (43 experiments) or closed (49 experiments), respectively. Participation in all experiments was in accordance with the ethics committee of the Medical Faculty of the Rheinisch-Westfälische Technische Hochschule Aachen (RWTH Aachen University), Germany. All volunteers gave their informed written consent after explanation of the procedure and the purpose of the experiment. The acquisition was performed continuously with a duration varying across experiments from 3 minutes (resting state) to 10 minutes (task related experiment) and a sampling rate of either 1017.25 or 678.17 Hz, depending on the experimental paradigm. For the task related experiments, the central 3 minutes of recordings were extracted for further analysis. ECG and EOG signals were recorded simultaneously with the MEG signals. Both types of ocular activities were recorded measuring horizontal and vertical eye movements (eye blinks).

Data Preprocessing.
After acquisition, the MEG raw signals were visually inspected for unusual large noise or artifacts. Upon occurrence, these channels were removed and replaced by an interpolated signal using information from the surrounding channels and employing routines provided by MNE-Python [27]. Environmental noise was removed by the subtraction of appropriate weighted reference signals from the MEG signals [28]. The method is capable of removing environmental noise as well as interference from the power line by taking the 11 reference channels from the MEG system into account. In addition, the signal quality of the auxiliary channels (ECG and EOG) was checked using visual inspection to ensure sufficient data quality of the ECG and EOG signals in the dataset used.
ICA has been applied to separate brain signals from ocular and cardiac activities. The method is well known to provide good results on both MEG and EEG recordings [3,8,11,14,[29][30][31][32].
Since multiple recordings from different experimental paradigms are used, they consist of different sampling rates and durations. In order to prepare a uniform dataset, only three minutes of data from each measurement were considered, where all data were sampled down to 250 Hz and band pass filtered from 1 to 45 Hz.
To further increase the sample size data augmentation has been applied to each measurement of around 3 minutes duration by means of a sliding window of 60 s with an offset of 10 s. In this way, we were able to obtain a set of additional data segments of one minute length to serve as new input data for ICA decomposition. The total number of all data segments used for analysis is 1112.
Prior to the application of ICA, the continuous data were chopped into segments of one-minute durations. The FastICA algorithm, as implemented in MNE-Python [27], was applied to each of the data segments. To estimate the number of components used in ICA, we used the minimum description length (MDL) method to compute the model order to be used for decomposition [33]. In order not to include any bias from different dimension reduction results in the DCNN model, the median model order of 34 was chosen and fixed for all ICA decomposition.

Labeling Features.
A prerequisite for supervised learning strategies is to label the features adequately. For building a labeled dataset, the components were initially classified using some reference methods. Components reflecting cardiac activity were identified using cross-trials phase statistics (CTPS), a method which is implemented in MNE-Python [27] and is known to reliably identify weak and strong cardiac activity from independent components [11,34]. Ocular activity was identified by correlating the decomposed signals with both EOG channels as implemented in MNE-Python [27]. The performance of artifact removal was visually inspected by plotting the rejection performance as described in [11]. The results of this artifact identification scheme served as a reference and thus provides the labels for the DCNN model.
To prepare the dataset for training, all ocular and cardiac components were labeled and grouped as artifact components, while the same amounts of nonartefactual components were randomly extracted from the remaining set of independent components and labeled as nonartifact components. The total number of samples , which were fed into the model, is = ∑ =1 ( + ), with and being the number of artifacts and nonartifacts (i.e., other components), respectively, and = and being the number of data segments used. For example, if two components are labeled as cardiac activity and two components were found to be highly correlated with ocular activity, we will achieve a set of 8 components from which 4 nonartifact components have been randomly selected from the pool of 30 nonartifact components. Given a dataset of 1112 one minute ICA chops, we end up with a training sample size of 7122 components, of these approximately artifact and nonartifact components are equally balanced.
To further increase the data variability during training, all samples were flipped (i.e., multiplied by −1). This effectively doubles the number of data used for training. The dataset was then split into a training and validation dataset.

A Combined Deep Convolution Neural Network (DCNN)
Model. We developed a combined neural network architecture specifically for the classification task at hand. The workflow of the proposed combined DCNN model includes four stages which can be summarized as follows: (1) data preprocessing, (2) ICA decomposition, (3) DCNN-based classification of artifact components, and finally (4) artifact removal and back transformation. An overview of the workflow is illustrated in Figure 1.
Utilizing a supervised learning strategy, the model parameters are learned by training the deep network using a backpropagation learning technique. Training in which backpropagation is used mainly involves two steps, the feedforward and the backpropagation steps. In the first step, the labeled training set is fed to the network from the input layers, through the hidden layers, to the output layer. During backpropagation, a fitting error (i.e., the error of the output layer) is computed by finding the difference between the network output and the desired output, before it is propagated back through the network. The training dataset is fed through the model over multiple iterations, known as epochs. The number of epochs is decided based on an early stopping method, where the model converges before it suffers from overfitting. Finally, the trained network is used to perform the classification process using new samples from the validation set.
In order to build a fully fledged classifier that is able to identify cardiac and ocular activity from independent components without using additional information from EOG and ECG, we have developed a combined neural network model, in which temporal and spatial information is incorporated. To accomplish a spatiotemporal classification task, we are taking advantage of two different deep learning strategies. The neural network model consists of two input branches or pathways, through which each of the independent components passes in order to extract the temporal (time course) and spatial (topography projected on the sensor array) information (cf. Figure 2) as extracted by ICA. In the temporal branch, time-series are passed through a pair of convolution-pooling layers, while in the spatial branch the information is passed through a fully connected layer ( Figure 2). Finally, the two branches are then concatenated and fed into a two neuron, classification layer.
In the temporal branch, we use two 1D-convolutional layers, each followed by a max pooling operation. In this branch, we use 15 and 10 filter kernels of sizes 24 and 28 and a pool size of 24 and 14 for the first and second layer, respectively ( Figure 2). The temporal signal is thus convolved using multiple kernels using shared weights, while the pooling layers are used to reduce the size of the input signal and at the same time preserve the contained information. In this step, the initial input signal with sample size of 15000 is reduced to a 156 × 10 feature matrix, which is fully connected to 12 neurons in the concatenation layer. The spatial branch consists of a fully connected layer with 150 neurons, where its output is fed into the concatenation layer using 12 fully connected neurons to combine the temporal and spatial information.
Hyperparameters in the proposed DCNN model have been optimized for both branches separately. In particular, the structure of the network, the number of neurons, activation and loss functions, and parameters used for convolution, such as the dropout rate to reduce early overfitting [35], have been varied through testing. The final network architecture, as depicted in Figure 2, uses the rectified linear unit (ReLU) ( ) = max(0, ), as an activation function in all layers except for the output layer, where the softmax function is used [36]. The loss function is defined as the categorical cross entropy between the output vector and the target label vector (often called one-hot encoded vector) [37]. Backpropagation with a minibatch size of 32 is performed utilizing the ADAM optimizer [38] with the following parameter: the learning rate (alpha) was set to 0.001, the two exponential decays for the first (beta1) and second-order moment estimates (beta2) were set to 0.9 and 0.999, respectively, and epsilon was set to 10 −8 to prevent any division by zero. A 50-fold cross-validation was performed using the shuffle-split technique described in [39] with a ratio of 80% and 20% of training and validation data, respectively. The

Results
Neuromagnetic recordings from 48 subjects were divided into 30 and 18 datasets for training and validation, respectively. The data selection for the training and validation set was performed randomly. The training dataset consists of 26 task related and 58 resting state data, respectively. For validation, 14 task related and 34 resting state experiments were included in the dataset. In both types of datasets, for half of the resting state experiments the subjects were asked to have their eyes open.
The average accuracy and loss function were computed as a function of epochs during the training and validation in order to monitor the training progress and convergence of the model (Figure 3). Figure 3 illustrates the effect of the significant amount of dropout applied during training in comparison to the validation, which is performed without dropout. The final model was selected utilizing the early stopping technique, by selecting the model at epoch 32, yielding a peak validation accuracy of 94.6%.
The training accuracy is typically lower than the validation accuracy due to the high dropout rate used when computing the training accuracy. The model is exposed to the training dataset with 50% dropout when computing either accuracy or loss while the validation is performed without any dropout of any form.
The final result is illustrated in the confusion matrix, where the accuracy of the classification for the two classes, "artifact" and "other" component, is shown (Figure 4). The result is illustrated using the confusion matrix shown in Figure 4, which is based on a total of 2494 samples originating from the validation set, with about half of the samples labeled as "artifact" and the other half labeled as "other." The model achieved a sensitivity of 91.8% and a specificity of 97.4%. The area under the curve (AUC) of the receiver operating characteristic (ROC) is 98.2%. To avoid possible bias in terms of the small validation set, we further confirmed the overall accuracy of the model by performing a 50-fold shuffle-split cross-validation. The best median accuracy was encountered at epoch 63 yielding 94.2%, with the lower and upper quartiles at 93.4%, and 94.5%, respectively. Figure 5 illustrates one example of classification results for an ICA decomposition in comparison to the reference (i.e., CTPS and Pearson's correlation) scores. In Figure 5, four components are labeled as artifact and are highlighted in red, while nonartifact components are highlighted in black. This is confirmed by the DCNN classification with a very high and low probability for the artifact and nonartifact components, respectively. The values from the reference method in Figure 5 cannot directly be compared to the model scores, as they refer to the actual correlation and CTPS values, defining ocular and cardiac activities, respectively. In this example, all ICs with correlation and CTPS values above the thresholds (dashed vertical lines) were correctly identified by the DCNN model, which is shown by the high model probability values (orange bars) shown in Figure 5. For comparison, the corresponding EOG and ECG signals are plotted in green and blue, respectively.
In order to evaluate the performance of the classification process within the artifact rejection scheme, a so-called performance plot is shown in Figure 6 [1,11,12,42,43]. In this figure, results from artifact removal using data from a representative subject are shown by averages aligned to peak onsets from the ECG and EOG signal, before and after artifact removal. Figure 6 nicely illustrates that the applied the classification scheme works. A summary of the most important parameters used for training and the results obtained by the DCNN model are given in Table 1.
The DCNN model introduced here consists of two branches, the temporal and spatial branch (cf. Figure 2). To test and show that the model profits from both branches the spatial and temporal branch were trained as individual models to investigate their contribution to the combined model (i.e., the DCNN model). Receiver operating characteristic (ROC) analysis have then been applied to all models in order to compare the performance with the DCNN model. The validation accuracy of the spatial and temporal branch peaked at 93.7% and 93.5% at epoch 31 and 42, respectively. Figure 7 illustrates that the model improves when both, the temporal and spatial branches, are combined. This is indicated by a larger area under the curve (AUC) in Figure 7.

Discussion
In this study, we introduce a fully automated deep learning approach for the classification and removal of ocular and cardiac artifact components for neuromagnetic recordings. This is done without both the need for human intervention and the need for additional EOG and ECG recordings. The method presented here uses both spatial and temporal information to classify and separate artifacts from other source contributions. As input data, the model uses independent components, where ICA separates the multivariate signal into its additive components. ICA is well known and widely applied for artifact rejection in the field of MEG and EEG [1,11,12]. However, after ICA decomposition, the challenge is to correctly identify the artifact components from a set of decomposed signals. Most of the existing methods rely on auxiliary recordings, such as EOG and ECG. These are typically recorded in synchrony with the MEG data and serve as a reference signal and thus may introduce an additional source of error depending on the data quality of the auxiliary signals. Moreover, the use of auxiliary recordings increases the preparation time necessary for measurements. Currently there are very few approaches in literature that address a fully automated artifact rejection workflow using neural networks on MEG/EEG data. Winkler and colleagues presented an artifact rejection method using EEG [32]. The authors used a manually predetermined set of parameters to classify components. The method has been shown to be quite robust when dealing with EEG data, but translation to MEG maybe more difficult and time consuming. More recently, Garg et al. presented a method using multilayered CNNs to classify artifact components from MEG data [44,45]. In their work, the neural networks are trained to classify ECG and EOG related artifact components from the timeseries of the ICs. In a related study, the same authors showed automatic detection of EOG related components using a spatial representation of the independent components [44]. As a part of their conclusions they suggest a neural network which combines both temporal and spatial information, which is ideal. In this paper, we show how the spatiotemporal information of the two different types of artifacts can be easily combined to construct a DCNN model that achieves a very high accuracy rate of 94.4%.
In comparison to [45], our training dataset also consists of data from task and nontask related experiments, which allows the model to be applied to both types of experiments.
To introduce more variability to the model, we used short data segments of 1-minute length and data augmentation for which we (i) randomly extracted different time windows for ICA decomposition and (ii) we flipped the time courses of the decomposed signals (i.e., components are multiplied by −1). It is possible to use longer time windows for ICA analysis as long as one can assume stationarity within the time segment to be analyzed.
The results obtained from our model are comparable to recently published work of [44][45][46]. However, we have demonstrated that when temporal and spatial information are combined into one model the artifact rejection performance is increased (Figure 7). Further, a source of variability in results of machine learning based artifact rejection utilizing ICA is not only derived from the different types of networks and parameters used, but also from the number of components used for ICA decomposition. The number of components used, however, is often fixed to either the full number of sensors or an arbitrarily fixed number [47]. In this approach, we also used a fixed number of components for ICA decomposition, but for this, the minimum description length (MDL) method was applied to estimate the optimal number of components [33]. From a total of 248 components, the median number of 34 components, as  Table 1. We noticed that using larger or smaller learning rates did not change or affect the final classification rates too much. However, we did find that it speeds up or slows down the convergence of the network. Overfitting was further reduced by including dropout layers, as shown in Figure 2. Moreover, the use of the ReLU activation function largely speeds up the training process as suggested in [48].
Another critical issue in a supervised machine learning approach is that the model requires exact labeling of the features to be learned. In [48] labeling was performed by experts making decisions from visual inspection of the ICA decomposed signals. In contrast, we used statistical measures to identify the set of reference components [20,32,44,45]. This choice of labeling cannot be considered as the ground truth either. However, it is well reproducible and more practical and does not rely on the results of experts.
The DCNN model introduced here is capable of learning ocular and cardiac activities from independent components and can effectively work with data from different MEG systems, as long as the model is trained (once) on the decomposed MEG data. For MEG systems using either the magnetometer or the axial gradiometer as pickup coils, retraining the model may not be necessary but is recommended. Within the model the data is normalized; thus the temporal dynamics and the topographies will not differ qualitatively. The only requirement here is that, for the spatial branch of the DCNN model, the topographies have to be interpolated to a fixed number of sensors, in order to cope with different numbers of MEG channels used by different vendors. If MEG data are recorded using planar gradiometer and magnetometer (e.g., using the Vectorview or Triux system from Elekta Neuromag) the topographies for the spatial branch will differ, but the topographies from the magnetometer could be used instead, but retraining the model would be the best method of choice.
In summary, key novel aspects of this work are as follows. (1) Combined DCNN model is applied using time-series from ECG related components and flattened spatial information from EOG related components to classify both artifacts simultaneously. (2) To train the model, the data are not manually labeled with the help of experts; instead statistical measures are used which will ensure more reproducible results. (3) Compared to previous work, in our dataset, a larger variability is ensured by using a large variety of different paradigms in the task-based experiments (auditory and visual stimuli, motor response) and data from resting state with and without eyes open.

Conclusions
A fully automated machine learning-based artifact classification scheme is introduced to classify cardiac and ocular artifacts from an MEG dataset. This method is simple and accurate and does not require user intervention. Once trained, no additional ECG and EOG recordings are needed within the workflow of artifact rejection. The model was validated using a leave-one-out cross-validation test and Journal of Engineering 9 revealed a median accuracy of about 94.4% indicating high reliability of artifact removal in all of our task and nontask related MEG experiments. While we have shown the results from a magnetometer sensor-based MEG system, the approach should translate seamlessly to other types of sensor configurations including EEG.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors' Contributions
Ahmad Hasasneh and Nikolas Kampel contributed equally to this work.