Autism Spectrum Disorder Detection by Hybrid Convolutional Recurrent Neural Networks from Structural and Resting State Functional MRI Images

This study aims to increase the accuracy of autism spectrum disorder (ASD) diagnosis based on cognitive and behavioral phenotypes through multiple neuroimaging modalities. We apply machine learning (ML) algorithms to classify ASD patients and healthy control (HC) participants using structural magnetic resonance imaging (s-MRI) together with resting state functional MRI (rs-f-MRI and f-MRI) data from the large multisite data repository ABIDE (autism brain imaging data exchange) and identify important brain connectivity features. The 2D f-MRI images were converted into 3D s-MRI images, and datasets were preprocessed using the Montreal Neurological Institute (MNI) atlas. The data were then denoised to remove any confounding factors. We show, by using three fusion strategies such as early fusion, late fusion, and cross fusion, that, in this implementation, hybrid convolutional recurrent neural networks achieve better performance in comparison to either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). The proposed model classifies subjects as autistic or not according to how functional and anatomical connectivity metrics provide an overall diagnosis based on the autism diagnostic observation schedule (ADOS) standard. Our hybrid network achieved an accuracy of 96% by fusing s-MRI and f-MRI together, which outperforms the methods used in previous studies.

ASD refers to a range of neurodevelopmental disorders with behavioral and cognitive impairments that place a huge burden on patients, families, and society.Identifying ASD patients directly in comparison to healthy controls is important for early detection and intervention.ASD's exact cause is still unknown [10].Due to lack of knowledge of neuropathology, symptom-based diagnosis often results in poor treatment.
Early accurate diagnosis of ASD is pivotal to develop specialized interventions [11].Due to its complex nature and highly heterogeneous symptoms, the diagnosis of ASD is very challenging [12].
Neuroimaging is an attractive noninvasive modality to cross the gap between environment, genes, and cognitive and behavioral phenotypes in ASD.Several studies in neuroimaging have used diferent techniques such as structural and functional magnetic resonance imaging (MRI) [12][13][14][15][16][17].Similar studies have contributed to our understanding of brain changes in ASD subjects on structural and functional connectivity levels.Functional connectivity has been used to presage early autism diagnosis and restrict correlations within specifc neural circuits across blood oxygenated level-dependent (BOLD) signals at different brain regions [18].
A number of studies have aimed to diagnose ASD based on structural magnetic resonance imaging (s-MRI) and functional magnetic resonance imaging (f-MRI) data [1].In an earlier study, McKeown et al. anatomized f-MRI data into spatial components by blind separation [19].Later, Uddin et al. presented a model using logistic regression classifer and independent component analyses in order to diferentiate between diseased and health patient groups [20].S-MRI data delineate the structural properties of the brain and have received attention from researchers [21][22][23][24][25][26].
Another study proposed a new model for distinguishing between ASD positive and negative individuals grounded on the features of s-MRI and f-MRI data using histogram of oriented gradients [27].
Te goal of the present study is to formulate an efective machine learning (ML) architecture to enhance the efectiveness of ASD diagnosis.We aim to classify ASD patients and HC participants using s-MRI in conjunction with rsf-MRI data from a large multisite data repository, namely, ABIDE (autism brain imaging data exchange).Te dataset is phenotypically rich and consists of diferent modalities from an important clinical population.We also aim to identify signifcant brain connectivity features via functional connectivity classifcation of ASD patients and HC participants.We apply deep learning to identify ASD patients, grounded on the patient's brain blood oxygen level-dependent (BOLD) activation patterns.Multimodality fusion on s-MRI and f-MRI improves classifcation performance over the existing methods in our implementation.Te proposed multimodality hybrid method achieves state of the art accuracy of 96% in distinguishing ASD from HC individuals.We benefted from the combination of convolutional neural networks (CNNs), which has strong modeling, and feature extraction power and recurrent neural networks (RNNs), which fused and ordered time series data.Furthermore, there are also privileges of the dataset and the atlas used for preprocessing.

Data Description.
In the present study, both T1 weighted structural MRI and T2 weighted functional MRI data are obtained from image and data archive powered by laboratory of neuro imaging (LONI) [28] from ABIDE [29].All data were used under the direction and approval of the respective institutions' ethics boards.ABIDE is based on a collaboration of 17 international imaging sites that have aggregated and are openly sharing neuroimaging data from 539 individuals sufering from ASD and 573 typical HC [30] in the neuroimaging informatics technology initiative (NIfTI) format.Te data collected from these 1112 subjects consist of structural and resting state functional MRI data along with an extensive array of phenotypic information.All subjects have been selected by evaluating phenotypic information like age, gender, and intelligence.It is known that the scanning infrastructure in each imaging site used diferent parameters such as repetition time (TR), echo time (TE), number of voxels, number of volumes, openness or closeness of the eyes, and protocols for the data.
Fivefold cross validation strategy was used to evaluate the performance.In detail, each source was split into fve subsets with an approximately equal number of subjects.We used four subsets of the data for training and the other for validation to select the model each time.Ten, we conducted the adaptation process on time series cross validation.Te augmented validation data were used during adaptation process.
In this study, we used the statistical parametric mapping (SPM) software version 12 (SPM12) built in MATLAB and computation, display, and analysis of connectivity (CONN) toolbox.SPM integrated toolbox was developed [31] as an extension to SPM for incorporating morphometric voxelbased (VBM), seed-based (SBM), or region of interest (ROI)-based neuroimaging methods.
F-MRI is a noninvasive technique to assess brain functions by using signal changes [14].A group of small cubic elements referred as voxels represent the brain volume of f-MRI data.F-MRI consists of time series data extracted from each voxel by keeping track of its activity over time.Te time series represent the signal measured at each voxel.Rsf-MRI is used for analyzing brain disorders implementing f-MRI techniques while the subject is in a resting state.Te major approach explored for discriminating between typically and autistic developed brains was shape and volumetric based analysis of s-MRI.S-MRI is generally classifed as an anatomical study consisting of two categories of features, namely, shape features and volumetric features.
Te heterogeneity of disorders of autistic individuals has increased the need for personalized approaches to analyze and prognosticate both functionally and anatomically for each autistic subject.Hence, in the present study, we combined s-MRI and f-MRI data with the aim of achieving better diagnostic accuracy and suggesting optimum treatment plan for every autistic subject.We analyze our results to ascertain that they ft better with autism diagnostic observation schedule (ADOS).Correlation is analyzed among all subjects for trait score diferences and ADOS total scores to extract features of autism severity.

Data Preprocessing.
Neuroimages display thousands of cortical and subcortical areas, providing information on structures and functions.Brain atlases are used to divide brain images into a limited number of regions of interest (ROI) in order to overcome complexity [32].Figure 1 depicts the overall pipeline of the approach we propose.For each modality, data preprocessing is necessary in order to avoid the risk of scanner bias and the efect of heterogeneity of protocols.In addition, the steps of denoising, fusion, and analysis to evaluate hybrid deep learning methods and correlation with ADOS total score are explained in the following sections.
First, in order to convert 2D f-MRI to 3D s-MRI, we used ROI percolation Harvard-Oxford atlas.Ten, our preprocessing pipeline consisted of functional realignment and  [1].For the context of the present study, we downloaded the time series for the brain areas specifed in MNI standard brain atlas [33].In our literature review, we have realized that the MNI atlas has rarely been used with the large volume and diferent modality of ABIDE dataset.It is included in diferent neuroimaging analysis packages, including the statistical parametric mapping package (SPM).We have selected MNI atlas in order to perform comparisons across subjects and studies, particularly of subcortical data, which is accurately aligned by nonlinear volume registration in comparison to cortical data.In addition to that MNI atlas overcomes the neuroimage diferences in shape, size, and relative orientation.Te advantage of MNI atlas is that it focuses on disorders and artifacts on neuroimaging data used to analyze its functional and structural connectivity from the top portion of the brain to the bottom portion of the cerebellum [34].
Preprocessing is a signifcant step to remove the efects of diferent scanners, artifacts, or partial volume efects and the variability between subjects that may stem from data acquisition.In order to reduce execution time and achieve better accuracy, preprocessing of neuroimages generally consists in performing a fxed set of operations on the data.We used the CONN [35] functional connectivity toolbox that works with MATLAB/SPM.In order to reduce physiological and other noise sources, additional removal of movement and temporal covariates, temporal fltering and windowing of the residual BOLD contrast signal, frst level estimation of multiple standard f-MRI and s-MRI measures, and second-level random-efect analysis, CONN provides a method as well as component based noise correction.Although global signal regression could also have been considered, the component based noise reduction method allows for interpretation of inverse correlations because there is no global regression signal in our implementation.Te toolbox implements f-MRI and s-MRI measures, such as estimation of seed-to-voxel and ROI-to-ROI functional correlations, as well as semipartial correlation and bivariate/ multivariate regression analysis for multiple ROI sources, graph theoretical analysis, and novel voxel-to-voxel analysis of functional connectivity.
In the course of functional realignment and unwarp, all neuroimages that belong to a subject are oriented in reference to the frst image of the time series of that subject.Te purpose of slice-timing correction is to set the time series of the voxel so that all the voxels in each image have a common reference time.Outlier identifcation scans are identifed based on the observed global BOLD signal and the amount of subject motion.Te change in the global BOLD signal at any time is calculated as the change in the average BOLD signal within SPM's global mean mask scaled to standard deviation units.In addition, we employ the relative probability densities of gray matter (GM), white matter (WM), and cerebrospinal fuid (CSF) in MNI space as inputs to the hybrid method.Terefore, direct segmentation provides segmentation into GM, WM, and CSF tissue classes.Also, direct normalization iteratively performs tissue classifcation from intensity values from functional and structural reference images and estimates nonlinear spatial transformations that approximate posterior and anterior tissue probabilities until convergence.Finally, data are smoothed in order to clean images of nonbrain artifacts from the series of voxels.Tis consists in averaging the neighbor voxel signals, as blood supply and its functions are usually close among neighboring brain voxels.Without disturbing the BOLD signal, temporal fltering eliminates redundant components from time series of voxels [36,37].Autism Research and Treatment 2.3.Data Denoising.Using neuroimages in order to diagnose ASD is challenging due to the noise redounded from the image recording process.Consequently, there are many fltering approaches such as NLM flters, wavelet based flters, and band-pass flters, to extract the noise [38].In this study, we prefer band-pass fltering for denoising the pipeline to reduce unwanted phase shifts.MATLAB signal processing toolbox is particularly useful to flter signals with flter design parameters such as flter type, flter order, and attenuation.It combines two steps that use linear regression of potential artifacts in the BOLD signal and temporal band-pass fltering.BOLD signals are forecasted and removed separately for each voxel and for each subject due to factors identifed as potential confounding efects.Working with this fltering, we resample all data to ensure equally spaced points for comparison into subjects.
To that end, we use MATLAB function resample, which applies an antialiasing band-pass flter to the time series and compensates for the delay introduced by the flter.Tis function resamples the input sequence, the raw head motion in our case [39].
Inhomogeneity correction is applied to increase accuracy of artifacts in images created by nonhomogeneous brain tissues.Various techniques such as histogram matching are available for normalizing the volume of images [38].
While minimizing the efects of noise sources such as head movement and physiological variations, temporal frequencies below 0.008 Hz or above 0.09 Hz are removed from the BOLD signal using a band-pass flter [40].
Figure 2 shows a sample of denoising output obtained from our dataset.Functional connectivity (FC) measures can be best classifed by estimating the distribution of FC values between randomly selected pairs of points within the brain before and after denoising in order to minimize the efect of artifactual factors.After preprocessing pipeline but before denoising considering the BOLD signal, FC distributions show large intersession, intersubject variability with degrees of positive biases including large scale physiological, and subject motion efects.After denoising, FC measures orient approximately centered in the positive side with considerably reduced intersession and intersubject variability.[41], the newly proposed cross fusion fully convolutional neural network (FCN) performed best among the multimodality and fusion networks.Based on that fnding, three alternative fusion strategies were considered in the present work: early, late, and cross fusion, as shown in Figure 3.

Classifcation Methods. Investigating another line of research
For early fusion (Figure 3(a)), the preprocessed f-MRI and s-MRI neuro images are combined for each subject thus producing a tensor.Tis input tensor is processed using the model network.For late fusion (Figure 3(b)), parallel streams process the f-MRI and s-MRI images independently before being fed into the model network.Te output is fed through the neural network that carries out information fusion.For cross fusion (Figure 3(c)) which we propose, there are two processing branches connected by trainable scalar cross connections.Te purpose of the process is to provide the functional connectivity matrix (FCM) information with cross-trainable fusion parameters rather than limiting the features to a single plane.Te diference between cross fusion and studies in the related literature is the usage of hyper parameters.To overcome dimensional diferences of feature matrices that belong to diferent neuro images during the pairwise comparison, training is carried out with a selected value of the parameter α (Figure 4).It was observed through trial runs that higher α value required almost prohibitive processing times and lower values resulted in unacceptably blurred images.Tus, α � 0.05 was selected to provide acceptable image quality with available processing power.During training, the parameter is automatically adjusted to integrate two diferent information modalities f-MRI and s-MRI.
With the scalar crosslinks formed with A1 (α) and B1 (α) in layer 1, N ∈ {0, 0.01, 0.02, . .., 0.09, 1} probabilities of each layer are calculated within the cross fusion.α controls the gradient range.To further demonstrate the efects of α on fusion results, we have selected threshold of α � 0.05.Te FCM image (Figure 4) shows areas where gray matter, white matter, and CSF features are clustered.
Figure 4 left side shows a sample of preprocessed crosssectional volumes and right side shows their corresponding feature maps.In addition, each subimage corresponds to a single flter.Te convolutional flters are sensitive to features of the preprocessed cross-sectional volumes of the patients with a diagnosis of ASD.
To tackle the high dimensionality of the acquired features, we selected tissue kind as a feature.In the literature, several novel CNN or RNN models were constructed to create diferent features with diferent confguration parameters.By taking inspiration from them, we selected only diferent tissue area-related features.Te maps in Figure 4 are shown with the descriptive information of the clusters obtained at the selected signifcance level.
After data preprocessing and denoising, the frst stage of our framework consists of a CNN and an RNN in a hybrid form.Te main idea of these networks is to use a convolutional layer.Both networks are used to detect spatial dependencies in data within the help of the convolution layer [42].In order to analyze multidimensional time series, CNN and RNN are useful [43].Te advantage of this model lies in the possibility of using a pretrained model.
CNN has three introductory layers referred as fully connected convolution layer, pooling layer, and the fnal convolution layer.First, the input signal is directly connected to the convolution layer and a kernel is used for convolution operation.In addition, operation results are created as a feature map for the next layer.Between two layers of convolution is a layer of pooling.In order to reduce the size of feature mat, the pooling layer is used.Otherwise, inside the same hidden layer, RNN sends feedback signals to the other neurons within the related layer (Figure 5).Te output of the CNN layer was created by selecting α parameter of 0.05 and given as input to the RNN layer.Ten, the feature vector is formed with the RNN output.In the fully connected layer, performance evaluation was made frst separately and then by combining subject together with concatenation of data.At the last stage, classifer and output process takes place and the model result is parsed as ASD and HC.took a little over 2 hours per epoch and around 2 days and a half for the fully trained hybrid convolutional recurrent neural networks.Number of iterations is the number of passes, each pass processing data that belong to all subjects.Our method takes on average 2-3 minutes to segment the data of a single subject from the ABIDE dataset (nearly two days for all 1112 subjects).In high performance computing environments, CONN can distribute our processing and analyses in parallel across multiple nodes.Tis can result in a very signifcant reduction in processing time.
For each pair of subjects, Pearson's correlation coefcients have been used with ADOS report.It is signifcant to have multiplicity adjustments to control the false discovery rate (FDR) for the test.In this study, we have applied the FDR with the threshold of 0.1 for correlation analysis [44].

Summary Statistics.
Tere is no public dataset available consisting of data from diferent modalities such as electroencephalography (EEG), difusion tensor imaging (DTI), MRI, and f-MRI (resting state and task based), that belong to the same individuals.Furthermore, there is a lack of ASD subsyndromes data such as Asperger's syndrome (AS) [45] and pervasive developmental disorder, not otherwise specifed (PDD-NOS) [46], and distribution rates according to number of samples by gender are also low.For future studies, availability of datasets that provide diferent modalities will help researchers to improve ASD detection accuracy using ML and deep learning methods.
We observed that the combination of ML classifers with other clinical features of ASD improved the accuracy of ASD diagnosis.Te current sample size identifes relatively relevant brain regions at high risk for ASD, suggesting that this method can be extended to large and more heterogeneous ASD populations.Using s-MRI and f-MRI modalities in conjunction, we have shown that a higher level of diagnosis accuracy can be achieved.
For each subject, local diagnosis accuracy for both s-MRI and f-MRI feature matrices is calculated.Table 1 shows the accuracy, sensitivity, and specifcity obtained for s-MRI and f-MRI when using all features.Accuracy measures the proportion of correct predictions made by the model.It is defned as the ratio of the number of correct predictions to the total number of predictions made.Sensitivity measures the proportion of actual positives that are correctly identifed as positive by the model.It is defned as the ratio of the number of true positives to the total number of actual positives.And also, specifcity measures the proportion of actual negatives that are correctly identifed as negative by the model.It is defned as the ratio of the number of true negatives to the total number of actual negatives.Table 2 shows the accuracy achieved by diferent fusion (early, late, and cross) strategies.As can be seen, cross fusion with ADOS yielded the highest accuracy among the other fusions.We do not prefer late and cross fusion processes without ADOS because the score obtained with ADOS is consistently higher than that obtained without ADOS.Our results show that the hybrid model, achieving classifcation performances of 96.02%, 92.83%, and 85.70% for the accuracy, sensitivity, and specifcity, respectively, is signifcantly superior to the single CNN and RNN models.
Our hybrid algorithm provides high accuracy and specifcity when s-MRI and f-MRI are analyzed together.Our model also fuses the s-MRI and f-MRI datasets, which    provides an accuracy of 96.02% accuracy, higher than alternatives.
We have investigated the efects of diferent s-MRI and f-MRI parameters on the machine learning algorithm.Proposed diagnosis may get better via both modalities, and we have observed that the addition of s-MRI and f-MRI parameters in features specifc for ASD classifcation gives a higher signifcant Pearson correlation at P = 0.001 than benchmark data with ADOS total score.Tus, the current data suggest that the approach of a localized diagnosis with fusion of diferent modality datasets, fusion strategies, and correlation to ADOS will greatly improve accuracy, sensitivity, and specifcity.
In Table 3, we compare individual CNN, RNN, hybrid CNN-RNN, and other recent machine learning methods with similar studies, albeit on diferent datasets and diferent diseases, based on the usage of neuroimaging data, in terms of accuracy.Studies using CNN, only RNN, their combination, and other methods are shown.A study reports a CNN study with a very high accuracy of 100 percent for Alzheimer disease Hossesini-asl et al. [47], another one presents a two-dimensional CNN with the high accuracy of 90.29 percent for hyperactivity disease [71], and other one achieves an accuracy of 98.8 percent for Parkinson's disease [58].Among the studies that utilized the Parkinson's disease dataset, the study achieved an accuracy of 82.89 percent using both CNN and RNN, which is a hybrid method [65].Researchers show the usefulness of ML techniques to identify and predict generalized disease.Application of ML technique in EEG of patients with epilepsy is very recent and is emerging with promising results within balanced accuracy of 98.13% [70].In addition, in Table 4, we compare diferent ASD studies in which machine learning methods have been applied on diferent sets of neuroimaging data, diferent modalities, and diferent ML methods.Another inspiring publication showed that the computer-aided diagnosis system was able to accurately distinguish between individuals with ASD and controls, achieving an accuracy rate of 87.1% [15].Yet another more recent work by the same author [18] demonstrates the potential of using dynamic functional connectivity analysis to identify brain regions associated with specifc symptoms of ASD with 47 subjects which is lower than we are.By identifying these regions, the author aims to contribute the development of more targeted and personalized interventions for individuals with ASD.Many studies in the literature have focused on group level diferences between individuals with ASD and typically developing controls.While these studies have identifed 8 Autism Research and Treatment some brain regions consistently associated with ASD, they do not account for the variability in brain structure and function that exist within the ASD population.Another diference between our study and some related studies is the use of a combination of s-MRI and f-MRI data.Te combination of these two types of data allows for a more comprehensive analysis of brain structure and function, which may improve the accuracy of ASD diagnosis.Researchers have developed several approaches for seizure detection using ML classifers and statistical features [88,94].A recent publication [84] demonstrates substantial diference in the efciency and accuracy of various biomarkers used for ASD diagnosis.Te diference in the performance of various biomarkers is due to heterogeneity of ASD.Our fusion of f-MRI and s-MRI data has improved the accuracy of existing autism detection systems by combining two modalities.Some studies in the literature have investigated special biomarkers consisting of biological molecules used for biomedical imaging and neuromodulation.In the present study, we did not investigate biomarkers but rather focused on algorithmic enhancement of accuracy.In addition, we combined CSF with WM and GM.Our machine learning methodology and fusion strategies are diferent from that applied by Jamwal et al. achieving higher accuracy via a novel neural network structure.

Conclusion
In general, it is difcult to generalize the fndings of studies utilizing a small selection of samples.In addition, many studies in related areas focus on diferent age populations, thus limiting generalizability.Studies in the literature that focus on gender diferences also inevitably reduce sample sizes, leading to reduced statistical confdence.An important challenge of neuroimaging datasets is the unavailability of diferent modalities.By using the ABIDE dataset, we were able to overcome these challenges, through utilizing s-MRI and f-MRI data together for a large number of subjects.Clinical studies have shown that using multimodality techniques play a signifcant role in increasing the accuracy of ASD diagnosis [97].Our contribution can be summarized as implementing diferent modality fusion with higher accuracy and correlation with ADOS within a hybrid method consisting of CNN and RNN.Future direction in the path towards more efective ASD diagnosis and treatment is expected to further exploit the potential of hybrid ML algorithms for classifcation.Local analysis of the brain regions is expected to enable clinicians to deliver personalized treatments to autistic individuals.And also, our cross fusion infrastructure will be provide region based analysis of the brain, which we believe that it can allocate subjects on the autism spectrum and help clinicians deliver personalized treatments to individuals with autism.Another possibility that has emerged with our approach is the integration of further imaging modalities such as DTI and EEG data to diagnostic studies based on neuroimaging, in order to obtain a higher number of features and using biomarkers to improve classifcation accuracy.In addition, subcategorization of autistic disorders such as Asperger and PDD-NOS via multimode neuroimaging may become possible using the proposed hybrid ML approach.

Figure 1 :
Figure 1: Overall pipeline of the proposed approach.

Figure 3 :
Figure 3: Framework of the fusion strategies.(a) Early fusion, (b) late fusion, and (c) cross fusion.

Figure 2 :
Figure 2: A sample of denoising output.

Figure 5 :
Figure 5: Overall block diagram of a CNN-RNN used for ASD detection.

Table 1 :
Performance comparison among various ML methods applied on the ABIDE dataset.

Table 2 :
Accuracies achieved with diferent fusion strategies.

Table 3 :
Recent studies in the literature non-ASD.

Table 4 :
Recent studies in the literature with ASD.