Detecting Abnormal Brain Regions in Schizophrenia Using Structural MRI via Machine Learning

Utilizing neuroimaging and machine learning (ML) to differentiate schizophrenia (SZ) patients from normal controls (NCs) and for detecting abnormal brain regions in schizophrenia has several benefits and can provide a reference for the clinical diagnosis of schizophrenia. In this study, structural magnetic resonance images (sMRIs) from SZ patients and NCs were used for discriminative analysis. This study proposed an ML framework based on coarse-to-fine feature selection. The proposed framework used two-sample t-tests to extract the differences between groups first, then further eliminated the nonrelevant and redundant features with recursive feature elimination (RFE), and finally utilized the support vector machine (SVM) to learn the decision models with selected gray matter (GM) and white matter (WM) features. Previous studies have tended to report differences at the group level instead of at the individual level and cannot be widely applied. The method proposed in this study extends the diagnosis to the individual level and has a higher recognition rate than previous methods. The experimental results of this study demonstrate that the proposed framework distinguishes SZ patients from NCs, with the highest classification accuracy reaching over 85%. The identified biomarkers are also consistent with previous literature findings. As a universal method, the proposed framework can be extended to diagnose other diseases.


Introduction
Schizophrenia (SZ) is a group of major psychiatric diseases with unknown etiology. SZ has the highest prevalence of all mental illnesses and is very difficult to treat. Over the last few decades, many neuroimaging studies have demonstrated that schizophrenia is a disorder involving widespread abnormalities in the brain structure [1][2][3][4][5]. However, the specific mechanisms involved in producing these structural deficits remain incompletely understood. In recent years, it has been consistently reported that SZ patients have structural abnormalities in the brain, including the middle temporal gyrus, middle frontal gyrus, thalamus, and corpus callosum (CcSum) [6][7][8]. The brain structure location and neurobiological processes underlying these structural abnormalities are central to the pathophysiology of schizophrenia. Furthermore, alterations to the brain structure are linked to key psychotic symptoms (such as auditory hallucinations [9,10], neurosensory deficits [11,12], and social dysfunction [13,14] in SZ).
At present, the diagnosis and monitoring of SZ mainly hinge on doctors' judgment through patients' clinical response, history, and neurological examination. The diagnosis and monitoring of SZ are heavily dependent on doctors' clinical experience and related knowledge. In other words, this subjective judgment may add risk to the diagnosis and treatment of SZ. For a more accurate diagnosis, neuroimaging methods have been widely used to study brain morphology, which provides important information about possible pathophysiologic mechanisms [15][16][17][18].
Due to its good contrast and high spatial resolution, structural magnetic resonance imaging (sMRI) has become one of the most popular neuroimaging modalities [19][20][21]. Most existing research has investigated conventional statistical analysis methods to explore the differences between SZ patients and normal controls (NCs) based on group studies [15,22,23]. Despite the ability of conventional statistical analysis methods to detect some abnormal brain regions in SZ, they are univariate methods and often overlook the correlations among voxels, which often contain important characteristic information. Furthermore, conventional statistical analysis only considers differences among groups, and it is difficult to generalize the diagnosis to individual patients.
To overcome the drawbacks of conventional statistical analysis, machine learning (ML) techniques have been applied to analyze neuroimaging data. These techniques can extract stable structural or functional patterns from neuroimaging data and may potentially be useful for finding significant neuroimaging-based biomarkers. Currently, promising results have been reported for the classification of SZ patients and NCs [24][25][26].
The most common feature of sMRI is the so-called brain tissue volume (obtained from voxel-based morphometry). However, the existence of too many irrelevant features can greatly degrade the classification accuracy, especially in neuroimaging studies. The preprocessed brain MRI may contain >100,000 nonzero voxels. In comparison, the sample size (number of subjects or observations) is often less than 1000 [27]. Thus, the number of features (voxels) greatly exceeds the number of observations (sample size). This issue is a common problem in machine learning studies and is known as the "curse of dimensionality" [28][29][30]. The curse of dimensionality can lead to overfitting of the learned model. Therefore, choosing and utilizing appropriate feature selection methods can effectively improve the performance of the model.
For most supervised ML studies, the corresponding supervised feature selection method uses high-dimensional neuroimaging data and the required outcome labels (e.g., +1 treatment responders and −1 treatment nonresponders) to select relevant features and discard redundant features and noise [27]. More specifically, these techniques are subdivided into three categories [30][31][32]: (1) "filter methods," which use simple statistical measures (e.g., mean, variance, and correlation coefficients) to rank features according to their relevance in detecting group-level differences, such as s t-tests, analysis of variance (ANOVA), and Pearson correlation coefficients; (2) "wrapper methods," which use a cost function to optimize the machine learning model and rank features in terms of their relevance; and (3) "embedded methods," which select relevant features as "part" of the machine learning process by enforcing certain "penalties" on the machine learning model to yield a small subset of relevant features.
A recent study [33] utilized a support vector machine (SVM) to learn the decision model to classify Alzheimer's disease patients and normal controls, which achieved an area under the curve (AUC) of over 88.82%. Another study [8] used cortical thickness in conjunction with surface area in schizophrenia patients to perform discriminative analysis and obtained an accuracy of 85.0%. Though the good performance with machine learning produced an excessive number of features (voxels), there is a risk of overfitting [34,35]. To overcome this problem, a large number of feature selection methods have been proposed. One study [36] selected discriminative features using Fisher's criterion to train the SVM model. As a result, the classification accuracy reached 76.25% for identifying bipolar disorder patients from normal controls. Another study [37] detected first episode psychosis, which utilized principal component analysis (PCA) to reduce the number of nonrelevant features in cortical thickness and gray matter volume and then applied deep neural networks (DNNs) to construct the classification model. The authors achieved a classification accuracy of over 70.5%. However, most of these studies have analyzed only gray matter (GM). In fact, several studies [17,38,39] have demonstrated that there is a nonnegligible change in white matter (WM) in SZ and that it is also necessary to analyze WM.
As an effective feature selection algorithm, recursive feature elimination (RFE) evaluates the contribution of each feature and then eliminates the smallest contribution features iteratively [40][41][42]. In this study, a machine learning framework based on coarse-to-fine feature selection is proposed. The framework first uses two-sample t-tests to roughly select features and then eliminates the nonrelevant and redundant features via RFE. Finally, the SVM is utilized to learn the decision models for WM and GM separately. The experimental results demonstrate that the proposed method is able to differentiate SZ patients from NCs with a maximum accuracy of approximately 85% and can find biomarkers of SZ that are consistent with those found in previous studies, including the left and right middle temporal gyrus, right middle frontal gyrus, thalamus, corpus callosum, fusiform gyrus, occipital lobe, cuneus, postcentral gyrus, and cerebellum.
The contributions of our work include the following: (1) We developed a machine learning framework to differentiate SZ patients from NCs. The proposed machine learning framework adopts a coarse-to-fine approach to roughly reduce the dimensionality of features with two-sample t-tests and then further with RFE. Hierarchical feature selection is helpful for preserving informative features and eliminating redundant ones. Furthermore, coarse-to-fine feature selection is easy to use to identify biomarkers of 2 Computational Intelligence and Neuroscience schizophrenia (abnormal brain regions). The proposed machine learning framework does not apply to schizophrenia only. It can also be generalized to other diseases to classify patients and NCs based on sMRI.
(2) The experimental results demonstrate that the proposed method achieves a better classification performance than other methods. Furthermore, the identified biomarkers are consistent with the findings of previous related research works. (3) Previous research works have mainly focused on gray matter and have seldom investigated white matter in schizophrenia patients. This study analyzes gray matter and white matter separately and finds that white matter has a better discriminative ability than gray matter, which provides a reference for clinical diagnosis.

Subjects and MRI Data Acquisition.
The imaging data and phenotypic information used in this study were obtained from the Centers for Biomedical Research Excellence (COBRE) dataset, which was collected and shared by the Mind Research Network and The University of New Mexico (http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html).
To reduce the impact of different subtypes of SZ, we chose only paranoid schizophrenia from the dataset. Paranoid schizophrenia is the most common type of SZ and has a slower course of disease development, a later neurodegenerative onset time, and a better curative effect [43,44].
In this study, we selected 34 paranoid schizophrenia patients and 34 normal controls from the dataset. The selected subjects were right-handed and were aged between 20 and 60. All subjects were examined and excluded if they had a history of a neurological disorder, a history of mental retardation, a history of severe head trauma with more than 5 minutes of loss of consciousness, or a history of substance abuse or dependence within the last 12 months. Diagnostic information was gathered using the Structured Clinical Interview for DSM Disorders (SCID). The demographics are reported in Table 1.
All sMRI data were acquired with a multiecho MPRAGE (MEMPR) sequence. The parameters used were a repetition time (TR) of 2530 ms; echo times (TEs) of 1.64, 3.5, 5.36, 7.22, and 9.08 ms; an inversion time (TI) of 900 ms; an FOV (field of view) of 256 × 256 mm; a matrix of 256 × 256 × 176; a flip angle of 7°; a voxel size of 1 × 1 × 1 mm; a slab thickness of 176 mm; a number of echoes of 5; and a total scan time of 6 min.

2.2.
Preprocessing. sMRI data were analyzed with the Statistical Parametric Mapping (SPM) software package SPM8 (Wellcome Department of Imaging Neuroscience, London, UK; http://www.fil.ion.ucl.ac.uk/spm) using the voxel-based morphometry (VBM) [45,46] protocol. First, spatial normalization of all 3D volumes with the T1 template was provided by SPM8 for bias correction (removal of positional and volume differences). Second, each T1-weighted MRI was segmented into three tissue probability maps (TPMs), including GM, WM, and cerebrospinal fluid (CSF). Third, the tissue volume was obtained by modulating the segmented tissue maps. Finally, a Gaussian kernel with a 6 mm isotropic full width at half maximum was employed for spatial smoothing.

Machine Learning Framework.
After the preprocessing step, feature selection based on the coarse-to-fine approach was conducted to reduce the dimensionality of the features. First, two-sample t-tests were conducted to roughly select features, and then, RFE was used to further eliminate nonrelevant and redundant features. Lastly, a linear SVM classifier was trained to classify SZ patients and NCs. The workflow of the proposed machine learning framework is shown in Figure 1.

Feature Selection.
To obtain a good classification performance, two-sample t-tests were used to perform a rough preliminary selection in this paper. Then, RFE was used to further select discriminative features.

Two-Sample t-Tests.
Due to the large amount of redundant information in sMRI, two-sample t-tests were used to initially screen the voxels. As a classical statistical analysis method, two-sample t-tests can extract significant differences between groups by computing the statistical significance value. Suppose x 1 and x 2 represent the means of a feature of the two groups. S 2 1 and S 2 2 denote the corresponding variances, and the significant differences between groups on this feature can be calculated as follows: where N 1 and N 2 denote the sample sizes. The ability of a feature to distinguish between two groups is evaluated by the absolute value of T. The greater the absolute value of T, the more discriminative the feature.  [47] is a greedy method for ranking all features to obtain an optimal feature subset for classification. To perform this ranking, RFE trains a machine learning model (e.g., linear support vector machine or relevance vector machine), then ranks all features in terms of some specific ranking criteria, and finally removes the features with the smallest rankings. The procedure is repeated until all features are removed. Since RFE can eliminate a fixed quantity or percentage of features depending on the user's requirements and has a strong ability to explain differences, it has been popular in neuroimaging studies [40][41][42].
Currently, most studies [48][49][50][51] have combined RFE with the SVM to perform feature selection. The SVM is presently one of the best-known classification techniques and has computational advantages over other classification methods, and many previous studies [52][53][54][55][56] have proven that the linear SVM performs well in small sample datasets. To allow the classifier to generalize unseen data well and to avoid overfitting problems, we introduced the SVM soft margin classifier.
Taking the soft margin SVM as an example, assuming m input training samples Coarse-to-fine feature selection precondition: where ω is the weights of features, C is a nonzero penalty coefficient that controls the trade-off between the training error and the margin, and ζ is called slack variables that are associated with the misclassified samples.
Since the above optimization problem is difficult to solve, it can be rewritten as a dual problem using a Lagrangian multiplier method as follows: where α corresponds to the weights of observation samples. The observation samples with nonzero weights represent support vectors. Consequently, the weights of features or voxels are calculated as To evaluate the contribution of each feature, the weight ω is ranked based on its squared value (ω i ) 2 . Finally, the lowest ranking feature is removed from feature sets F: Subsequently, the above process is iterated until a termination criterion is reached or until the feature set F is empty. Then, each feature corresponds to a weight, which expresses the importance of the feature. Finally, a user-defined ratio (e.g., 2%) is applied to remove the lower ranking features. The RFE process is shown in Figure 2.
Dice measure(DM) � 2 (TP + FN)/TP +(TP + FP)/TP ,  Computational Intelligence and Neuroscience precision (TP/(TP + FP)), of which GM and DM are computed with the same weights of SN and precision and F2M is computed with higher weights of SN.

Results
To verify the performance of the coarse-to-fine feature selection proposed in this study, we compared it with six other machine learning methods: (1) directly using SVM to classify (SVM), (2) using two-sample t-tests to select features and SVM to classify (2T + SVM), (3) using RFE to select features and SVM to classify (RFE + SVM), (4) using principal component analysis (PCA) and SVM to classify (PCA + SVM), (5) using independent component analysis (ICA) and SVM to classify (ICA + SVM), and (6) using tree-based feature selection and SVM to classify (TBFS + SVM). Furthermore, to test the performance of the proposed machine learning framework, we compared it with other frameworks that select features roughly based on 2T, perform PCA (ICA, TBFS), and then apply SVM for classification (2T + PCA + SVM, 2T + ICA + SVM, 2T + TBFS + SVM). Finally, we analyzed the biomarkers of SZ using coarse-to-fine feature selection.

Parameter Setting.
A linear SVM was applied in this study, and previous studies [52,53] proved that the linear SVM works better on small sample datasets. The value of the penalty coefficient "C" was set to 1.0 because many experimental tests have shown that a value of 1.0 can obtain a satisfactory discrimination performance.
The number of retained features has a significant impact on the results when using RFE. We tested the effect of different numbers of retained features on the results. The experimental results showed that retaining 40% and 14% of the voxels for GM and WM, respectively, and using an elimination ratio of voxels each round of 5% achieved the best performance.

Two-Sample t-Tests.
To test the performance of different P values, two-sample t-tests were performed with three different P values (<0.05 [57][58][59], <0.01 [60], and <0.001 [61]). The cluster-size value was set to 50 [62][63][64][65], and three differentiated tissue maps of GM and WM were obtained and are shown in Figures 3 and 4. From the figures, we can see that as the P values decrease, the region selected (red parts) becomes smaller. Consequently, the smaller the P value, the more the information being filtered, which may result in the removal of useful features. Therefore, the P < 0.05 criterion was adopted in our coarse-to-fine feature selection algorithm. Tables 2 and 3,  respectively. From Table 2, we can see that feature selection can achieve a better classification performance than using SVM directly for GM. When using only one feature selection algorithm, such as 2T + SVM, RFE + SVM, or PCA + SVM, RFE + SVM can achieve the best performance, which shows that RFE has a better feature selection ability than other methods. When the coarse-to-fine approach is used, such as 2T + RFE + SVM, 2T + PCA + SVM achieves the best performance and reaches an accuracy of 79.81%. The proposed 2T + RFE + SVM method achieves a similar performance (accuracy of 79.62%).

Classification Performance. The results of different methods based on GM and WM are shown in
From Table 3, when only one feature selection algorithm is used for WM, all five feature selection methods (2T + SVM, RFE + SVM, PCA + SVM, ICA + SVM, and TBFS-SVM) have a better classification performance than using SVM directly. However, the performances of RFE + SVM, PCA + SVM, ICA + SVM, and TBFS + SVM are worse than that of 2T + SVM for WM. The reason for this result may be that WM has more redundant features, and RFE, PCA, and other methods did not identify useful features from many irrelevant voxels. In contrast, as a traditional statistical analysis, it is easy to explore significant differences based on prior knowledge about different groups. Four coarse-to-fine frameworks perform better than using single feature selection, and the proposed 2T + RFE + SVM method achieves the best performance and reaches an accuracy of over 85% for WM.
According to the above experiments, we can see that coarse-to-fine feature selection selects more discriminative features. Using the same coarse-to-fine machine learning framework, RFE can achieve a better performance than the others, and the receiver-operating characteristic (ROC) curves of WM and GM with the proposed ML framework are shown in Figure 5. The AUC represents the performance of the classification experiment. The larger the AUC, the better the performance. It can be seen that the AUC for WM is better than that for GM.

Identification of Abnormal Brain
Regions. The discriminative brain regions (biomarkers) of GM and WM selected by the proposed method are illustrated in Figure 6. We selected clusters as biomarkers when the cluster size was ≥50. For GM, 14 brain regions were detected. For WM, 24 brain regions were detected. Detailed brain biomarkers are shown in Tables 4 and 5. From Table 4, we can see that the GM brain regions that were detected included the cerebellum, fusiform gyrus, temporal lobe, occipital lobe, frontal lobe, right supramarginal gyrus, angular gyrus, and postcentral gyrus. From Table 5, we found that the WM brain regions that were detected included the cerebellum, fusiform gyrus, temporal lobe, occipital lobe, frontal lobe, lentiform nucleus, thalamus, corpus callosum, cuneus, subgyral, and postcentral gyrus. The selected abnormal brain regions were similar in GM and WM. This finding shows that SZ can cause changes in specific brain regions, and these regions are also considered SZ biomarkers.

Discussion
Previous group-level statistical analysis of neuroimaging data uncovered some neuroanatomical and functional 6 Computational Intelligence and Neuroscience differences between SZ patients and NCs [66][67][68]. Nevertheless, those findings had limited clinical applications. Machine learning can be adopted for single subject prediction and has shown significant potential for disease diagnosis [69][70][71][72] at the individual level.
To verify the performance of the proposed method, we performed the following experiments: (1) We used the SVM directly on smoothed voxels. (2) We used single feature selection and SVM, such as 2T + SVM, RFE + SVM, PCA + SVM, ICA + SVM, and TBFS + SVM. (3) We used coarse-to-fine feature selection methods for classification, such as 2T + PCA + SVM, 2T + ICA + SVM, 2T + TBFS + SVM, and 2T + RFE + SVM. The experiments illustrated the following: (1) SVM classification with feature selection is better than direct SVM classification, which means that the feature selection method can discard redundant features and extract useful (a) (b) (c) Figure 3: Three two-sample t-test maps with different P values in GM. The red brain region represents the differential brain region between SZ patients and NCs. As the P value becomes stricter (smaller), the differential brain region becomes smaller.  Figure 4: Three two-sample t-test maps with different P values in WM. As the P value becomes smaller, the changes in WM are similar to those in GM. However, compared with GM, the differential WM brain regions between SZ patients and NCs are relatively small. (a) P < 0.05. (b) P < 0.01. (c) P < 0.001. (3) The performance of the coarse-to-fine framework was better than that of single feature selection methods, which indicates that selecting features hierarchically can filter features more effectively. Structural abnormalities have been repeatedly demonstrated in SZ patients compared to NCs in previous MRI studies [73][74][75]. However, most research has focused on DTI to analyze structural changes in SZ and has rarely found structural differences on sMRI. In the current study, the GM and WM features selected by the proposed method provided discriminatory information about anatomically abnormal patterns in schizophrenia. The proposed method in the current study revealed sensitive and accurate information about anatomically abnormal patterns in the frontal lobe, postcentral gyrus, corpus callosum, and cuneus, especially in the thalamus, fusiform gyrus, temporal lobe, and cerebellum. These identified abnormal biomarkers were consistent with those found in the literature [8,[76][77][78][79]. Some brain regions were found in both GM and WM, including the cerebellum, fusiform gyrus, temporal lobe, occipital lobe, and frontal lobe, which showed that SZ could indeed cause structural changes in these brain regions. Many previous studies have reported that the concentrations of some substances (such as dopamine) change in the cingulate cortex and amygdale in SZ, but we did not detect structural changes in these areas. The possible reason is that changes in the dopamine concentration do not cause structural abnormalities.
This study has several limitations. First, a small sample size is a common pitfall in most similar studies. To improve the generalizability and clinical applicability of machine   Multimodal data, such as fMRI, DTI, and brain connectivity data, remain to be explored to provide either complementary or additional information for accurate recognition and lateralization in SZ. Finally, we identified the discriminative regions based on the AAL, BA, and JHU atlas, with the potential drawback that some atlas areas might be too large or unspecific to detect group differences.

Conclusion
In this study, a coarse-to-fine ML framework was investigated to differentiate SZ patients from NCs and to detect  Figure 6: Abnormal brain regions located by our method. The selected feature is shown as a cross section of the brain, which is scanned every 5 mm, and the red brain regions are selected regions. Each discriminative brain region contains more than 50 voxels and has a P value  Computational Intelligence and Neuroscience biomarkers of SZ using sMRI. The experiments demonstrated that feature selection algorithms can effectively improve the classification performance. The use of coarseto-fine feature selection extracted more effective information and significantly improved the classification accuracy. The experiments also indicated that the classification performance of WM was significantly better than that of GM. Therefore, it can be concluded that SZ has a greater impact on WM. This conclusion is consistent with previous findings. Furthermore, the proposed coarse-to-fine feature selection effectively located abnormal brain regions, which provides a helpful aid for the clinical diagnosis of SZ. As a universal method, the proposed framework can be extended to diagnose other diseases.

Data Availability
The datasets used in the current study can be obtained from the Centers for Biomedical Research Excellence (COBRE) at http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html.