^{1}

^{2}

^{3}

^{4}

^{3}

^{5}

^{1}

^{1}

^{2}

^{3}

^{4}

^{5}

This paper aims at developing new theory-driven biomarkers by implementing and evaluating novel techniques from resting-state scans that can be used in relapse prediction for nicotine-dependent patients and future treatment efficacy. Two classes of patients were studied. One class took the drug N-acetylcysteine and the other class took a placebo. Then, the patients underwent a double-blind smoking cessation treatment and the resting-state fMRI scans of their brains before and after treatment were recorded. The scientific research goal of this study was to interpret the fMRI connectivity maps based on machine learning algorithms to predict the patient who will relapse and the one who will not. In this regard, the feature matrix was extracted from the image slices of brain employing voxel selection schemes and data reduction algorithms. Then, the feature matrix was fed into the machine learning classifiers including optimized CART decision tree and Naive-Bayes classifier with standard and optimized implementation employing 10-fold cross-validation. Out of all the data reduction techniques and the machine learning algorithms employed, the best accuracy was obtained using the singular value decomposition along with the optimized Naive-Bayes classifier. This gave an accuracy of 93% with sensitivity-specificity of 99% which suggests that the relapse in nicotine-dependent patients can be predicted based on the resting-state fMRI images. The use of these approaches may result in clinical applications in the future.

Smoking cigarettes is the leading cause of preventable mortality in the United States, with around 50% of lifelong smokers dying from illnesses such as heart disease, stroke, and cancer [

In a pilot study, a trend was observed for fewer withdrawal symptoms after smoking cessation for subjects taking NAC versus subjects taking the placebo [

Previously, Smitha et al. [

The aim of this study was to be able to develop a way to use fMRI data to predict which patients will relapse and which will not. This classification is normally done after 6 months of treatment or 12 months past the start of addiction treatment [

In this study, four different machine learning classifiers along with features based on high activity areas in the brain extracted by a novel scheme for voxel selection were employed. Areas of high activity are defined to be those where more oxygen-rich blood is flowing and fMRI is able to map these areas. In addition to this, the accuracy of classification will rely heavily on how the data is reduced. Thus, three different data reduction techniques were employed to perform this study. Tahmassebi et al. [

The main goal of this study was to determine whether or not the drug N-acetylcysteine (NAC) would decrease nicotine dependency. NAC may have an effect on relapse in smoking cessation [

Anatomical and functional slices of the brain.

Anatomical

Functional

Magnetic fields of the nuclei in oxygen-rich blood are flipped due to the combination of a strong magnetic field and radio waves. This produces a detailed map of the regions where the ratio of flow of oxygen-rich blood to the brain is high which explains the high activity areas of the brain. This is called BOLD signal which is also studied in this paper. The BOLD signal is generally modeled as the convolution of the stimulus function with Hemodynamic Response Function (HRF) [

We were given the fMRI data in 4-dimensional spatiotemporal NIFTI (Neuroimaging Informatics Technology Initiative) format. The data contains subject-dependent artifacts due to the long process of the scans, possible movements of the subjects, and physiological noise [

The preprocessing stage [

Raw and preprocessed fMRI data.

Raw

Preprocessed

To extract high activity features from the big data, a novel voxel selection scheme (mask) [

Dealing with a size of

ICA [

Feeding vector

PCA transforms features of the original input data orthogonally to a new space to reduce redundancy and reach high information density. These variables in the new space are principal components which are linearly uncorrelated. PCA is also referred to as Karhunen-Loeve transformation or the Hotelling transform [

We could create a new basis by choosing eigenvectors

To demonstrate the importance of PCA, which is reducing the dimension of the data, we could have a rank

Employing all the above-mentioned equations leads us to a strategy such as subspace decomposition to find the largest eigenvalue and project the original data orthogonally onto its subspace. Thus, the eigenvalues closer to zero which contain the redundant information will be discarded [

SVD is the factorization of matrix

The performance of different data reduction algorithms with different numbers of components has been previously investigated, which suggests that 10 components would be ideal for this analysis [

Illustration of correlation matrices with 10 components for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

There are two classification strategies for medical images. For the first strategy, measurements of a set of features from a region in an image would be employed as the feature vector. This is called region-based classification. For the second strategy, voxel-based classification is used, where contextual or noncontextual information about every single voxel is used as feature vector to feed into the classifier [

Decision tree is a machine learning algorithm which partitions a set of input data recursively. A decision tree structure is made of a root node, which has no incoming edges and zero or more outgoing edges, internal nodes, each of which has one incoming edge and more outgoing, and leaf nodes, each of which has one incoming and no outgoing edges [

The CART, by Breiman et al. [

Based on the CART algorithm of how to build trees, if the splitting process continues to the point that there are few samples in each leaf of the tree, it is likely to overfit the data. On the other hand, a small tree also might not capture the important structural information about the sample space. This problem is known as the horizon effect. Therefore, the complexity of the tree in such a way that the estimated true error is low is desired. In this regard, a reduced error pruning algorithm, which is a bottom-up fashion pruning method, was employed. This improves predictive accuracy by starting at the leaves and replacing each node with its most popular class to reduce overfitting and increase the simplicity of the tree and speed of the process. This process continues until the prediction accuracy is not affected. The optimization part was repeated 51 times for each of the data reduction methods to reach the most efficient result.

Naive-Bayes is a classification technique based on Bayes Theorem with an assumption of independence among predictors to model probabilistic relationships between the feature matrix and the class labels [

The essential principle in Bayes method is assuming a known a priori and then minimization of the classification error probability, respectively. The class-conditional density function could be known or estimated from the available training dataset. During Bayesian estimation, the training set conditioned density function is getting updated by the training set which acts as observations to allow the conversion of the a priori information into an a posteriori density [

A simple introduction can be given by considering two pattern classes:

For the two-class patterns

Considering the Gaussian probability distribution function with

By choosing a monotonic logarithmic discriminant function we have

Now, by calculating the mean vector and covariance matrices of the discriminant function for each class from the training data, the data can be separated by a hyperplane (if they have an equal covariance matrix) or hyperquadrics (if they have an unequal covariance matrix).

To optimize the Naive-Bayes standard algorithm, the bag-of-token model was employed [

As discussed, the participants were indeed randomized at the first session to receive NAC or placebo. In addition to this, we first learned the two classes (relapse versus nonrelapse) and then tested them in a randomized way choosing randomly participants from the two groups. Moreover, due to the low numbers of participants,

Two classification tasks with three major data reduction methods, ICA, PCA, and SVD, were developed in Python [

As discussed, the pruning process of the cart using the reduced error pruning algorithm was repeated

Statistics of 51 runs of the reduced error pruning of the CART with 10-fold cross-validation with ICA, PCA, and SVD data reduction algorithms.

Algorithm | Type of tree | Min | Mean | Median | Max | STD |
---|---|---|---|---|---|---|

ICA | Original tree | 0.435 | 0.435 | 0.435 | 0.435 | 0.0 |

Pruned tree | 0.307 | 0.472 | 0.487 | 0.487 | 0.035 | |

PCA | Original tree | 0.794 | 0.794 | 0.794 | 0.794 | 0.0 |

Pruned tree | 0.358 | 0.484 | 0.487 | 0.487 | 0.017 | |

SVD | Original tree | 0.615 | 0.615 | 0.615 | 0.615 | 0.0 |

Pruned tree | 0.410 | 0.482 | 0.487 | 0.487 | 0.018 |

Misclassification error for the CART for different numbers of terminal nodes with ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

For the learning phase as Figures

To summarize the number of samples predicted correctly (true positive (TP) for positive class, and true negative (TN) for negative class) and incorrectly (false positive (FP) for positive class and false negative (FN) for negative class) they were calculated as a confusion matrix [

Confusion matrices of classification using CART for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

As shown in Figure

As seen in Figure

Illustration of tree evolution for the CART for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Contour plots of probability surface for classification using the CART for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

In classification with GNB, as shown in Figure

Confusion matrices of classification using the Gaussian Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Contour plots of probability surface for classification using the Gaussian Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Similar to the pruning process of the CART, the ONB algorithm ran for

Statistics of 51 runs of the optimized and Gaussian Naive-Bayes classifier with 10-fold cross-validation for ICA, PCA, and SVD data reduction algorithms.

| | | ||||
---|---|---|---|---|---|---|

Total | Min | Mean | Median | Max | STD | |

ICA | 0.564 | 0.358 | 0.411 | 0.384 | 0.538 | 0.053 |

PCA | 0.615 | 0.333 | 0.498 | 0.487 | 0.666 | 0.058 |

SVD | 0.615 | 0.384 | 0.495 | 0.487 | 0.666 | 0.039 |

Convergence rates of the optimized Naive-Bayes classifier for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Confusion matrices of classification using the optimized Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

To display the trade-off between sensitivity and specificity of the classifiers, Receiver Operating Curves (ROC) have been employed [

Figure

ROC curves for classification using the CART, Gaussian, and optimized Naive-Bayes with ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Next, the validated model (previously presented in Section

Statistics of 51 runs of the reduced error pruning of the CART with 10-fold cross-validation with ICA, PCA, and SVD data reduction algorithms.

Algorithm | Type of tree | Min | Mean | Median | Max | STD |
---|---|---|---|---|---|---|

ICA | Original tree | 0.512 | 0.512 | 0.512 | 0.512 | 0.0 |

Pruned tree | 0.333 | 0.333 | 0.333 | 0.333 | 0.0 | |

PCA | Original tree | 0.410 | 0.410 | 0.410 | 0.410 | 0.0 |

Pruned tree | 0.256 | 0.330 | 0.333 | 0.333 | 0.012 | |

SVD | Original tree | 0.512 | 0.512 | 0.512 | 0.512 | 0.0 |

Pruned tree | 0.307 | 0.332 | 0.333 | 0.333 | 0.003 |

Figure

Misclassification error for the CART for different numbers of terminal nodes with ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

As shown in Figure

This result matches the confusion matrices presented in Figure

Confusion matrices of classification using CART for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Contour plots of probability surface for classification using the CART for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Alternatively, the Naive-Bayes classifier based on its probabilistic nature solves the prediction classification problems implicitly. This would be an alternative to predict the subjects in the nonrelapse class better. Table

Statistics of 51 runs of the optimized and Gaussian Naive-Bayes classifier with 10-fold cross-validation for different data reduction algorithms.

| | | ||||
---|---|---|---|---|---|---|

Total | Min | Mean | Median | Max | STD | |

ICA | 0.461 | 0.282 | 0.329 | 0.333 | 0.384 | 0.020 |

PCA | 0.589 | 0.307 | 0.344 | 0.333 | 0.435 | 0.021 |

SVD | 0.410 | 0.282 | 0.319 | 0.307 | 0.410 | 0.024 |

As presented, the best error was achieved along with the SVD data reduction technique with a cross-validation error of

Confusion matrices of classification using the Gaussian Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Contour plots of probability surface for classification using the Gaussian Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

By optimizing the Naive-Bayes algorithm after

Convergence rates of the optimized Naive-Bayes classifier for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Both ICA and SVD algorithms reached the minimum estimated error, but the overall prediction accuracy of SVD was

Confusion matrices of classification using the optimized Naive-Bayes for ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

Figure

ROC curves for classification using the CART, Gaussian, and optimized Naive-Bayes with ICA, PCA, and SVD data reduction techniques.

ICA

PCA

SVD

The scientific goal of this study was to develop new theory-driven biomarkers by implementing and evaluating novel techniques from resting-state brain scans that can be used in relapse prediction for nicotine-dependent patients and future treatment efficacy. Two classes of patients were studied, one took the drug N-acetylcysteine and the other took a placebo. The patients underwent a double-blind smoking cessation treatment and the resting-state fMRI of their brains before and after treatment was recorded. The high dimensionality of the fMRI data taken from patients made it difficult for the classification tasks to employ the original preprocessed data as input. In this regard, data reduction algorithms including ICA, PCA, and SVD were employed. The CART decision tree and the Naive-Bayes classifier with two different implementations were chosen for the classification tasks. Based on the results, the following conclusions can be drawn:

The proposed model, including the features extracted from the resting-state fMRI brain scans, was validated by classifying the subjects into NAC and placebo classes. The optimized Naive-Bayes along with independent component analysis gave an accuracy of

The validation results indicated that independent component analysis can be employed to extract structural information to be used in a balanced-unbiased classification problem using both explicit and implicit classification algorithms.

The relapse results showed that singular value decomposition would extract critical features to be used in an unbalanced-biased classification problem employing implicit classification algorithms.

The relapse results showed that interpreting fMRI connectivity maps based on machine learning algorithms might result in developing novel theory-driven biomarkers with clinical applications in future.

All the analysis was based on the difference between baseline and follow up scans, which were acquired in the resting-state. Thus, it can be assumed that any difference between the groups is the result of NAC, since this was the only variable that differed between the groups. In addition to this, there were no differences between the groups at baseline, except the motivation to change their behavior.

In future work, deep learning and convolutional neural networks (CNN) can be employed to maximize the information for the training process. As discussed, the fMRI data is given in 4-dimensional NIFTI format (three spatial dimensions, one temporal dimension). In fact, a 3D movie for each subject for 200 snapshots before and 200 snapshots after the treatment is given. In this approach, the CNN model can be trained employing different kernels, subsampling, max-pooling, and padding, along with fully connected networks to maximize the amount of meaningful information in the training process. The trained model can be used in relapse prediction based on fMRI scans without using any voxel selection schemes.

The authors declare that there are no conflicts of interest regarding the publication of this article.

This work is partially supported by ONR Grant “Gulf of Mexico Spring School (GMSS) Deep Learning Workshop.” In addition to this, the authors would like to thank Dr. Lauren Bottorf and Eitan Lees for the careful revision of the final version of the manuscript.