This paper proposes a simple yet effective approach for detecting activated voxels in fMRI data by exploiting the inherent sparsity property of the BOLD signal in temporal and spatial domains. In the time domain, the approach combines the General Linear Model (GLM) with a Least Absolute Deviation (LAD) based regression method regularized by the pseudonorm l0 to promote sparsity in the parameter vector of the model. In the spatial domain, detection of activated regions is based on thresholding the spatial map of estimated parameters associated with a particular stimulus. The threshold is calculated by exploiting the sparseness of the BOLD signal in the spatial domain assuming a Laplacian distribution model. The proposed approach is validated using synthetic and real fMRI data. For synthetic data, results show that the proposed approach is able to detect most activated voxels without any false activation. For real data, the method is evaluated through comparison with the SPM software. Results indicate that this approach can effectively find activated regions that are similar to those found by SPM, but using a much simpler approach. This study may lead to the development of robust spatial approaches to further simplifying the complexity of classical schemes.
ECOS-NORD-FONACITPI-20100000299CDCHTA-ULAI-1336-12-02-BUniversidad Nacional Experimental del Táchira1. Introduction
The medical imaging modality known as functional Magnetic Resonance Imaging (fMRI) using blood oxygen level-dependent (BOLD) contrast is a noninvasive technique widely accepted as a standard tool for localizing brain activity [1]. During the course of an fMRI study, a series of brain images is acquired by repeatedly scanning the subject’s brain while he/she is performing a set of tasks or is exposed to a particular stimulus. A statistical analysis is carried out to analyze the images and detect which voxels are activated by a particular stimulation or task.
Since its development in the early 1990s, a variety of univariate and multivariate methods for analyzing fMRI data have been developed [2]. Among these, the most popular are the univariate approaches based on the General Linear Model (GLM) [1–3] and the multivariate approach based on independent component analysis (ICA) [2]. Traditionally, ordinary least squares (OLS) has been used as the primary approach to solve the inverse problem induced by the GLM owing to its simplicity and optimality under the maximum likelihood (ML) principle when the background noise is modelled as a white Gaussian noise [4]. However, findings contradicting the Gaussian assumption have been reported by Hanson and Bly [5]. Furthermore, statistical tests with real fMRI data have revealed that the empirical distribution of fMRI data has heavier-than-Gaussian tail distribution [5, 6]. Further, Luo and Nichols [7] have found that the BOLD signal often contains outliers. Under the heavy tail distribution model, it is well known that OLS fails in producing a reliable estimate for the GLM regression parameters. On the other hand, an important study conducted by Daubechies et al. [8] showed that the most critical factor for the success rate of the ICA algorithm in determining neural activity is the sparsity of the components rather than its independence.
In addition to these drawbacks, there are also a variety of research studies that have pointed out the existence of a common underlying principle involved in sensory information processing referred to as “sparse coding” [9]. According to this principle, sensory information is represented by a relatively small number of simultaneously active neurons out of a large population [9]. Literature has reported two notions of sparseness: (1) “lifetime sparseness” refers to the sporadic activity of a single neuron over time, going from near silence to a burst of spikes in response to only a small subset of stimuli (stimulus selectivity); and (2) “population sparseness” refers to the activation of a small fraction of neurons of a population in a given time window in response to a single stimulus [10, 11]. Therefore, it is natural and well-justified to explore sparse representations methods to describe fMRI signals of the brain.
In the last decade, there has been a growing interest in the use of sparse signal representation techniques for analyzing fMRI data based on the assumption that the components of each voxel’s fMRI signal are sparse and the neural integration of those components is linear. These studies include, among others, sparse representation of nonparametric components of partially linear models [12] and the sparseness of the activity related BOLD signal by using “activelets” [13]. Additionally, methods based on sparse dictionary learning for detecting activated regions in fMRI data have been proposed, including data-driven sparse GLM [14], fast incoherent dictionary learning (FIDL) [15], and sparse representation of whole-brain fMRI signals using an online dictionary learning approach [16]. In parallel, sparse models have been proposed to predict or decode task parameters from individual fMRI activity patterns by using methods such as the elastic net method [17], generalized sparse classifiers [18], or multivariate pattern analysis based on sparse representation models [19, 20].
In the context of sparse regression over the coefficients of the GLM, the solution of the regularized inverse problem induced by the GLM is obtained by using l1-regularized least squares (l1-LS) based methods [21]. Under this approach, the key assumption is that the underlying distribution of BOLD signal is Gaussian, in contrast with the above-mentioned studies where authors argued that the BOLD activation noise follows heavy tailed gamma distribution [5] and Rician distribution [22]. Besides, the impulsive noise is common in fMRI time series [7]. Thus, there is the need of developing robust regression methods that exploit the sparseness structure of the GLM regression parameters, on one hand, and detract from the effect of the impulsive noise on the coefficient estimations, on the other.
Motivated by recent developments in sparse signal representation and the biological findings of ‘sparse coding’ in the brain, in this paper, we propose a simple yet effective approach based on the sparsity of underlying BOLD signal in fMRI data that exploits both temporal and spatial sparse properties of the fMRI images. This paper has two aims, the first to explore the ability of a new robust regression method named l0-LAD (l0-regularized Least Absolute Deviation) to solve the inverse problem induced by the GLM. This method is suitable for applications where the underlying contamination follows a statistical model with heavier-than-Gaussian tails [23]. The second aim is to test the suitability of a Laplacian model in the spatial domain. More specifically, in the time domain, each time series related to a voxel is considered as a linear combination of a few elements (column vectors/atoms) of a suitably designed dictionary of stimuli and confounds. The l0-LAD algorithm is used to find whether or not a stimulus is present in the observed signal and, if so how much it contributes to signal formation. Subsequently, in the spatial domain, in order to determine brain activation zones due to a particular stimulus, a statistical map is generated, where each voxel’s value is represented by the value of the estimated parameter associated with the stimulus of interest for that voxel. The activated voxels are obtained by thresholding the statistical map, where the threshold is determined by exploiting the previous knowledge about sparsity of activated regions by assuming that the spatial parameters follows a Laplacian distribution model.
Our rationale in spatial domain is that the set of activated voxels in response to a given task is spatially sparse in the sense that only a reduced number of voxels that comprise the brain volume are activated. In fact, the behavior exhibited by different sets of spatial parameters associated with different stimuli of the in vivo PBAIC-2007 fMRI database [24] used in this work confirms what it is expected; that is, for each set, a very large number of spatial parameters have very small values while only a reduced number of parameters have significant nonzero value. The problem here is to decide how large must the spatial parameter of a voxel be to declared it as activated. In order to be able to identify activated voxels, a statistical analysis (described in Appendix) based on qualitative and quantitative tests over spatial parameters of each set was conducted. The results of this analysis suggest the Laplacian distribution as a plausible distribution model for spatial parameters.
The assumption of a Laplacian distribution model for explaining the set of spatial parameters has been proposed [19] in the context of fMRI data analysis based on sparse representation under the framework of predictive modelling. Unlike our approach, this is a multivariate analysis where the prediction task is formulated as a regression problem for which the voxel time series are the predictive variables and the time series of a fixed stimulus task is the response variable. In this context, it seems acceptable to assume a Laplacian model considering that it is common to find in the literature the Laplacian model associated with sparse linear models leading to an l1-regularization term under the maximum a posteriori (MAP) principle [4, 25, 26].
A very preliminary version of this manuscript was presented at the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society which focused mainly on exploiting the sparsity of the BOLD signal in the time domain whereas in the spatial domain the activation maps were generated by selecting the 300 most significant spatial parameters [27].
This paper is organized as follows. In Section 2, we briefly describe the GLM framework under the assumption that the parameter vector is known to be sparse. The proposed method for detecting active voxels by exploiting sparsity is then presented in Section 3. To assess the performance of the proposed methodology, we present in Section 4 numerical experiments with synthetic and real fMRI datasets. Finally, some concluding remarks are drawn in Section 6 where we give suggestions for future research avenues.
2. General Linear Model With a Sparse Parameter Vector
The GLM for the observed response variable yj at jth voxel, j=1,…,N, is given by(1)yj=Xβj+εj,where yj∈RM, with M the number of experimental scans, X∈RM×L denotes the design matrix which will be described shortly, βj∈RL represents the signal strength at the jth voxel with L the number of basic parametric functions, and εj∈RM is the noise vector at the jth voxel. This model aims to represent the time series related to a specific voxel as a linear combination of the column vector of the design matrix. Particularly, under the assumption that the parameter vector β is sparse and a few column vectors of X are combined to form the observed data yj. In the GLM representation, one is interested in finding the contribution of each column vector of X to the signal formation and from there to decide whether or not the jth voxel has been activated by a particular stimulus. This leads naturally to an L-dimensional inverse problem whose solution we are interested in finding. The most widely used approach for solving the inverse problem is the OLS method which is optimum under the ML principle when the entries of the noise vector are assumed to be independent, identically distributed following a zero mean Gaussian distribution with an unknown variance σ2. However, if one knows that the underlying contamination no longer follows a Gaussian noise model and, instead, it is better characterized by a statistical model with heavier-than-Gaussian tails a more robust method must be used to estimate the regression parameters [4]. Furthermore, as mentioned above OLS does not exploit the fact that the parameter vector β is sparse.
In this work, the solution to the inverse problem (1) is addressed under the framework of an l0-based constrained LAD method. That is, the solution to (1) is given by(2)β^j=minβjyj-Xβj1,subjecttoβj0≤D, where ·1 denotes the l1-norm, β0 denotes the l0 pseudonorm that counts the number of nonzero entries in βj, and D≪L is the sparsity level of βj which is unknown. The LAD based methods are optimum under the ML principle when the underlying contamination follows a Laplacian distributed model [23] and, therefore, more robust than OLS. Furthermore, the l0-based constraint induces the desirable sparsity in the inverse problem solution. A detailed description of l0-LAD method is given in next section.
2.1. The Design Matrix X
An important component in the GLM is the design matrix X which in the context of sparse signal representation of fMRI is the dictionary of parametric waveforms, called atoms, in which the time series yj admits a sparse representation. In general, the fMRI time series consists of the BOLD signal, noise, and confound components that arise due to hardware and physiological noise (subject’s motion, respiration, and heartbeats) [1]. Therefore, it is common practice, to incorporate into X as much information as possible, such that the model fits to the data as close as possible [3]. For this study, we propose to use a parametric dictionary composed as a union of two subdictionaries: D1 and D2. That is, X=[D1∣D2], where the column vectors of D1 are the expected task-related BOLD responses of each stimulus involved in the fMRI experiment given by convolving the stimulus with the model of the hemodynamic response function (HRF). Here the stimulus is assumed to be equivalent to the experimental paradigm, while the HRF is modelled using a canonical HRF [1, 3]. D2, on the other hand, contains a set of confounds modelled, as we will see later, by a low frequency sinusoidal signals.
3. Active Voxel Detection by Exploiting Sparsity3.1. Parameter Estimation by Solving the l0-Regularized LAD
Given the linear model,(3)y=Xβ+ε,where the index labelling the voxel has been removed for simplicity in notation, we want to find the explanatory variables that best suit the model under a certain error criterion. According to (2), we want to locate the column vectors of X and their contributions such that the data-fitting term Xβ-y1 reaches a minimum subject to the constraint that β has just a few nonzero values. This inverse problem can be reformulated as an l0-regularized LAD regression problem [28]. This is(4)minβy-Xβ1+τβ0,where τ>0 is the regularization parameter that balances the conflicting objectives of minimizing the data-fitting term while yielding, at the same time, a sparse solution on β [4]. Solving this l0-regularized LAD problem is NP-hard owing to the sparsity constraint imposed by the l0-pseudonorm. In [23], an iterative algorithm has been proposed to solve this optimization problem using a coordinate descent strategy and under the framework of a continuation approach for selecting the regularization parameter τ. Following this approach, the solution to the l0-LAD regression problem is achieved by reducing the L-dimensional inverse problem given by (4) to L one-dimensional problems by supposing that all entries of the sparse vector β are known but one of them. Therefore, in order to estimate the nth unknown entry of β, which in turn is the contribution of the nth stimulus/confound in the signal formation, the entries βj,j=1,…,L,j≠n, are treated as known variables taking values estimated in the previous iteration. According to this, the l0-LAD problem reduces to the one-dimensional minimization problem:(5)β^n=argminβn∑i=1Mrin-xinβn+τβn0+b,where b=τ∑j=1,j≠nLβj0, with βj0=1 if βj≠0, otherwise |βj|0=0, and rin denotes the ith entry of the nth residual column vector defined as rn=y-∑j=1,j≠nLβjxj, where xj∈RM denotes the jth explanatory variable in the design matrix X or equivalently the jth atom of the dictionary. Note that rn is the residual term that remains after subtracting from the observed fMRI signal y the contributions of all explanatory variables (stimuli and confounds) but the nth one. It was shown in [23] that the solution to the optimization problem (5) can be thought of as a two-stage process: parameter estimation and basis selection. More precisely, in a first stage, an estimation of βn is found by solving the unregularized optimization problem (5) leading to the weighted median operator as the underlying operation for estimating βn. That is,(6)β~n=MEDIANxin⋄rinxin,where Wi⋄vi=vi,vi,…,vi︷Witimes is the data replication operation.
The l0-regularization term in (5) leads to the second stage of the estimation process where a hard thresholding operator is applied on the estimated value given by (6). That is(7)β^n=β~n,rn1-rn-β~nxn1>τ0,otherwise.From (7), it can be seen that the nth entry of β^ is considered relevant and, hence, a nonzero element if τ<rn1-rn-β~nxn1≤β~nxn1. This latter inequality shows that the regularization parameter τ controls whether β~n is significant or not based on an estimate of its magnitude. In this work τ is treated as an adaptable parameter whose value changes as the iterative algorithm progresses. That is, τ(k+1)=ατ(k), k=1,…,K, where 0<α<1, τ(1)=XTy∞ and K is the number of iterations, for further details see [27].
3.2. Detecting Activation Zones by Exploiting a Spatial Sparsity-Induced Model
Once the regression parameters are determined following the approach described above, each voxel is represented by a L-dimensional vector βj. We next exploit the sparseness of the BOLD signal in the spatial domain by assuming a sparsity-inducing model. Although there are many statistical models that induce sparsity, we use a Laplacian model for reasons that will be further explained below. Let zj, j=1,…,N, be a statistical map related to a particular stimulus. That is,(8)zj=cTβ^j,j=1,…,N,where the vector c=[c1,c2,…,cL]T is defined according to the stimulus task that we are interested in. For instance, if one is interested in evaluating the presence of the lth stimulus, all the components of c are set to zero but the lth which is set to one; in this case, zj=β^lj,j=1,…,N, leading to a statistical map related to each stimulus. Next, from the statistical map, it is possible to generate activation maps by exploiting the a priori knowledge about sparseness of the activation zone, in the sense that only a reduced number of voxels that conforms the cerebral volume are activated by a particular stimulus. Since the statistical map and the sequence {β^lj}j=1N are equivalent, it is expected that this sequence be sparse. That is, only a few parameters are significant; intuitively, those associated with the voxels localized in the regions are activated by the lth stimulus.
As was mentioned above, the reasons that drive us to use a Laplacian model as a sparsity-inducing model are based on the following considerations. (1) The a priori knowledge about the sparseness of the activated regions suggests that the probability that zj→0 is very high. (2) A statistical analysis, detailed in Appendix, based on graphical tools (histogram, distribution fitting, and QQ-plots) and quantitative values (Akaike information criterion (AIC)) computed over the spatial parameters zj associated with different stimuli belonging to the PBAIC-2007 fMRI database, has yielded statistical evidence that supports the Laplacian model assumed for z. Furthermore, this analysis suggests the feasibility of adopting the Exponential Power Distribution (EPD), a more general approach that includes the Laplacian and Gaussian models as particular cases. (3) Results of applying the normalp package [29], an R package containing a collection of tools related to EPD, to both synthetic and real spatial data, corroborate the Laplacian distribution as the model more closely related to the empirical distribution of data.
Assuming that the probability distribution of the entries of z is Laplacian, that is,(9)zj~Lηl,κl,j=1,…,Nfor a particular stimulus, where ηl and κl are the localization and scale parameters for the lth stimulus, respectively, it is possible to estimate a threshold θ such that the jth voxel is declared as activated by the lth stimulus if the corresponding statistic zj defined in (8) exceeds the threshold value. Under this approach, the threshold θ is obtained from the Laplacian cumulative distribution function:(10)Fx=1-12e-x-η/κ,x≥η;12ex-η/κ,x<η,as the solution of F-1(p), for a given probability of false detection p. That is,(11)θ=F-1p=η-κsgnp-0.5ln1-2p-0.5,where the parameters η and κ are estimated from samples zj by using the maximum likelihood estimation (MLE) method [30], that is,(12)η^=medianz1,…,zN,κ^=1N∑j=1Nzj-η^.
4. Experiments
Given the critical lack of ground truth, a major difficulty with fMRI research concerns the validation of any analysis of experimental in vivo data. For this reason, the proposed method is first evaluated using synthetic fMRI data designed for this purpose following a similar approach to that described in [31]; for further details, see Section 4.1. Secondly, the method is applied to the in vivo PBAIC-2007 fMRI database.
In all experiments the dictionary is designed according to the structure X=[D1∣D2] described in Section 2.1. Although D1 and D2 vary with each experiment, there is a set of estimated BOLD signals corresponding to 13 different types of stimulus patterns classified as required features in the PBAIC-2007 database that are common to all designed dictionaries. These patterns, identified in the PBAIC-2007 database as follows, Arousal, Dog, Faces, FruitsVegetables, Hits, Instructions, InteriorExterior, SearchFruit, SearchPeople, SearchWeapons, Valence, Velocity, and WeaponsTools, are assembled (in that order) into a dictionary denoted, hereafter, by Dc.
4.1. Synthetic Data Generation
The synthetic data were generated by blending a simulated activation in nonactivated time series that were extracted from the PBAIC-2007 fMRI database. Thus, knowledge of the ground truth is assured while the noise is representative for real data. According to this, the synthetic time series ysyn at any voxel is given by(13)ysyn=bs+yna,where b>0 is the activation strength (if b=0 the voxel is a nonactivated one); s, described below, is the simulated activation time series; and yna is a nonactivated time series. The activation time series s, shown in Figure 1(b), was obtained by convolving the canonical hemodynamic response function h used in SPM [32] with a stimulus boxcar function x consisted of 3 randomly placed stimulations that last about 3 s; see Figure 1(a). That is,(14)st=x∗ht,where(15)ht=ta1-1b1a1Γa1e-b1t-cta2-1b2a2Γa2e-b2t.Here, Γ(z) denotes the gamma function defined by Γ(z)=∫0∞tz-1etdt. In (15), the values for the parameters are a1=2, a2=16, b1=b2=1, and c=1/16 [32].
(a) Synthetic boxcar signal x(t). (b) Synthetic activation s(t)=(x∗h)(t), where h denotes the canonical hemodynamic response function. (c) A synthetic time series ysyn generated by using (13) with b=4.
Likewise, in order to reduce the number of voxels to be analyzed, instead of trying to model the cerebral cortex a synthetic volume was designed. This volume (depicted in Figure 2) consists of 4 slices of size 20×20 voxels, each one with activated regions of different shapes (white-colored regions). There are altogether 108 activated voxels that represent 6.75% of the total number of voxels in the volume.
A synthetic volume consists of 4 slices. White regions denote the set of voxels that has been activated by the stimulus s.
The synthetic fMRI database was generated from 32000 nonactivated time series in the following steps. First, the 32000 nonactivated time series were divided into 20 sets of 1600 time series each. Then, each set of 1600 time series was inserted into the synthetic volume. Finally, for each one of the 108 selected voxels (white regions) the synthetic activation was added to their BOLD signals, while remaining voxels retain their original real BOLD signals. Using 4 different levels of activation strength (b=1,2,3,4) in (13) 4 groups (labelled as g1, g2, g3, g4) of synthetic data were generated. That is, synthetic activation time series within each group have the same activation strength. In total, 80 datasets were analyzed. Note that since the background time series are directly extracted from in vivo fMRI data the contrast between activation and background will not be exactly the same, even for time series within the same group. In order to have a quantitative measure of the difference between groups the signal-to-noise ratio (SNR) is determined for each group (see Table 1), which is defined as the average of the SNRi,i=1,…,20. Here the ith SNR is calculated as in [31]: SNR=std(bs)/std(yna) with std(·) being the standard deviation.
Signal-to-noise ratio of the synthetic dataset.
Group
1
2
3
4
SNR
0.1419
0.2838
0.4256
0.5675
SNR (dB)
-16.9604
-10.9398
-7.4200
-4.9207
Regarding the design of the dictionary, in this experiment D1=[s∣Dc], that is, the concatenation of the stimulus of interest s with the 13 stimuli is described above, while D2 is a parametric dictionary whose atoms are the DCT functions:(16)ϕωk,0m=akcosωkm,k=0,…,K-1,where a0=1/M, ak=2/M,k≥1, ωk=kπ/M, m=(2n+1)/2,n=0,…,M-1; and K=(2·M·TR)/128 denotes the number of DCT basis functions. Here, TR is the repetition time and it is set to 1.75 s, and M is the number of fMRI scans.
4.2. In Vivo fMRI Data
The PBAIC-2007 fMRI dataset was collected with a Siemens 3T Allegra scanner using an EpiBOLD sequence, with imaging parameters TR and TE being set to 1.75 s and 25 ms, respectively. Subjects were engaged in a Virtual Reality task during which they had to perform a number of tasks in a hypothetical neighborhood. The tasks included, among others, the acquisition of pictures of neighbors with piercing, the gathering of specific objects (e.g., fruits, weapons) and the avoidance of a growling dog (for a more detailed description of the experiment, see [33]). Three subjects’ data were available in the competition (labelled: subject 1, subject 13, and subject 14). Each subject’s data consisted of 3 runs with a time duration of approximately 20 minutes each (704 volumes in each run). Each run also included 24 time series of features describing the subjects experiences over 704 TRs each, which had been convolved with the double gamma hemodynamic response filter (HRF) produced by the SPM software [32]. In this study only the 13 features (tasks) classified as required in [33] were considered. For the analysis, fixation periods (the time when the virtual world is off) were excluded from the original dataset leading to a total of 500 volumes to analyze in each run. Each volume contains 64×64×34 voxels with a voxel size of 3.2×3.2×3.5mm3. All experiments described in this section are applied to preprocessed data belonging to subjects 1, 13, and 14 in a voxel-by-voxel basis. The preprocessing tasks include time and motion correction as well as detrending; further details are described in [24].
Given the acceptance of the SPM software in practice [2], in this work we use the outcome yielded by SPM as a benchmark for performing comparisons in order to know whether the proposed method yields reliable results or not. In order to ensure that the assumptions of random field theory used by SPM hold [3], an additional preprocessing step is performed on the fMRI data. To be more precise, a spatially filtering operation with a 3D Gaussian kernel is implemented as in [19]. During spatial smoothing SPM generates a binary 3D array m, which mask out the nonbrain voxels. That is, m(i,j,k)=1 if the voxel located at position (i,j,k) is within the brain, otherwise m(i,j,k)=0. The analysis is performed on the set of voxels inside the brain according to the SPM mask. Based on the structure of the dictionary two experiments are considered.
4.2.1. Experiment 1
In this experiment the dictionary is designed following the framework used by SPM [32]. That is, X∈R500×14 is constructed by combining the 13 convolved stimulus functions assembled in Dc and the all-one column vector 1 of size 500. Thus, the designed dictionary models the whole-brain activity [3]. Furthermore, to remove confounds SPM implements as part of the analysis a high-pass filtering based on the DCT basis set was described in (16) to both data and design matrix; therefore, in order to have comparable experimental conditions, this high-pass filter is also applied to yj and X before estimating the regression model with the l0-LAD method.
4.2.2. Experiment 2
In the second experiment, predictors that model confounds are incorporated to the dictionary. Specifically, the 13 DCT basis functions defined by (16) for high-pass filtering purposes are incorporated into X leading to the dictionary X=Dc∣D2∈R500×26. In this case, nontemporal smoothing is applied to the data nor to the designed dictionary. The model parameters are then estimated by using the iterative l0-LAD regression algorithm and the activation maps are obtained by thresholding the statistical map according to the approach described above.
5. Results and Discussions5.1. Synthetic Data
In order to identify the voxels activated by the stimulus s, the synthetic datasets described above are analyzed with the proposed method. All results in this experiment are obtained by choosing p=0.975 in (11), so that the percentage of probability of false alarms is 2.5% (equivalent to 2.7 voxels).
Given the knowledge of the ground truth, the proposed method is evaluated by comparing the set of voxels detected as activated against the set of true activation. From this information, it is determined the number of true detections, false alarms, and missed activated voxels. Figure 3(a) shows an illustrative example of the activation maps obtained for set 20. Activated voxels in all groups are shown overlaid on the ground truth, colored in gray. From the information shown, it is clear that the proposed method detects properly the set of activated voxels by the stimulus s when b≥2 (i.e., SNR≥0.2838), keeping the number of false alarms and missed activated voxels below the established error margin. In fact, similar results are obtained for the rest of sets as can be noted from Figure 3(b) that shows, for each group, the number of false alarms (top) and missed detections (bottom) as a function of the set number. It is important to point out that the average of false alarms and missed detections for groups 2 to 4 is close or below the percentage of error; see Table 2. Regarding group 1, the number of false alarms and missed activated voxels overcomes the expected rate. This behavior, typical of all sets of group 1 as can be seen in Figure 3(b), is caused by the high level of noise present in the signal (Table 1), that is, a consequence of the low energy level of activation of this group. That is, the activation level b=1 maintains the signal below the noise level making the activation detection more difficult.
Average of activated voxels, true detections, false alarms, and missed activations for synthetic data base.
Group
Activated voxels
True detections
False alarms
Missed activated voxels
g1
72.40
67.45
4.95
40.55
g2
102.80
102.25
0.55
5.75
g3
106.25
106.25
0
1.75
g4
107.20
107.20
0
0.8
(a) Activation maps (left to right: slice 1 to slice 4; top to bottom: group 1 to group 4) for stimulus s, set 20. (b) Number of false positives (top) and false negatives (bottom) for sets 1 to 20 and groups 1 to 4.
5.2. In Vivo fMRI Data
To illustrate the performance of the proposed method in detecting activation we choose the data of subject 14, run 1 and two representative tasks: Instructions and Faces. The selection of these particular tasks is based on the a priori knowledge of the anatomical and functional localization of auditory and visual cortex. With regard to the parameters α and K used by the l0-LAD method in the stages of estimation and basis selection, they are selected following the recommendations given in [23]. According to this, in both experiments, α and K are set to 0.95 and 100, respectively. On the other hand, with the purpose that the total number of voxels classified as activated voxels by both the SPM software and the proposed method could be as close as possible, the activation maps generated by using SPM are obtained by adjusting the Family-wise error (FWE) rate at 0.05 [34], while for the proposed method the values of p in (11) are set to 0.975 for Instructions task (in all experiments) and to 0.998 and 0.999 for experiments 1 and 2, respectively, in case of Faces task.
5.2.1. Results of the First Experiment
Figures 4 and 5 show the activation maps for the Instructions and Faces tasks, respectively, obtained with (a) the proposed method with reduced dictionary (14 atoms) and (b) the SPM software.
Activation maps (left to right, top to bottom: slices 11 to 22) for the Instructions task obtained with (a) proposed method with reduced dictionary (14 atoms) and (b) SPM software with FWE rate.
Activation maps (left to right, top to bottom: slices 9 to 19) for the Faces task obtained with (a) proposed method with reduced dictionary (14 atoms) and (b) SPM software with FWE rate.
Under these experimental conditions, for the Instructions task, the proposed method activates 676 voxels contained in 26 slices (2 to 26 and 28) and SPM activates 708 voxels contained in 22 slices (2 to 23). Although the activated slices are not exactly the same, the number of matching activated slices is high, to be more precise 22 (slices 2 to 23) for a matching percentage of 84.61% at slices level. At voxels level, however, the percentage of matching is 57.99%. It can be seen in Figure 4 that the activated regions yielded by both algorithms are similar in each one of the matching activated slices, although the proposed method tends to exhibit more isolated voxels than SPM which seems to promote clustering. Despite these differences, it is clear that the activated regions yielded by the proposed method are consistent with those achieved by SPM. Furthermore, as expected for this kind of stimulus, the activated areas detected by both methods appear to be in the auditory cortex. For the Faces task (Figure 5), SPM activates 91 voxels belonging to 11 slices (9 to 19), whereas the proposed method activates 95 voxels contained in 17 slices (4, 6, 9–19, 21, 22, 24, and 28), so the percentage of coincidence at slice level is 64.71%. Although only 50 of the voxels classified as activated by both methods are localized at the same position, both activated regions and activation patterns are quite similar. This is because, except for a very small number of voxels, the locations of the mismatched voxels classified as activated by the proposed method are located at positions sufficiently close to voxels activated by SPM. More importantly, areas detected as activated by both methods appear to be on the visual cortex as expected.
5.2.2. Results of the Second Experiment
Figures 6 and 7 show activation maps for the Instructions and Faces tasks, respectively. Under the Instructions experimental condition, the proposed approach activates 717 voxels contained in 22 slices (2, 3, 4, 7 to 23, 26, 27) whereas the SPM software activates the slices 2 to 23, coinciding with 17 slices activated simultaneously by both methods for a matching percentage at slice level of 77.27%. In this case, there are 453 activated voxels whose spatial localization agree with those of SPM maps for a matching percentage at voxel level of 63.18%. For the Faces task, the proposed method activates 120 voxels contained in 17 slices (8 to 24), while SPM activates 91 voxels contained in 11 slices (9 to 19). Therefore, the percentage of matching slices for this condition is 64.71%, whereas the number of voxels activated by both methods at the same spatial localization is 57 for a matching percentage at voxel level of 47.5%. An interesting issue to evaluate in this experiment is the effect of incorporating the DCT basis set to the dictionary instead of performing a previous temporal filtering of the data. Although the dictionary and the design matrix are no longer the same, from Figures 6 and 7, it is clear that the activated regions and the activation patterns look quite similar for both methods and, as expected, seem to be localized at the auditory and visual cortex, respectively. More interesting, using the extended dictionary produces an effect of “filling” in the activation maps that is caused by an increment in the number of activated voxels in relation to the amount of activated voxels using the reduced dictionary. For example, in case of the Instructions task (Table 3) 41 additional voxels are declared as activated while the number of activated slices decreases to 22. This new group of activated voxels tends to fill some empty spaces near to the activated regions in the activation maps obtained with the reduced dictionary. A similar behavior is exhibited in case of the Faces task. This effect is highlighted in Figures 8 and 9 by comparing the activation maps obtained with the proposed method by using the reduced dictionary (a) and the extended dictionary (b) for the Instructions and Faces tasks, respectively.
Results of experiments 1 and 2 indicating number of activated voxels (av), number of activated slices (as), and the matching percentages at voxel level (mpvl) and at slice level (mpsl) with respect to SPM software.
Stimulus
SPM
Reduced dictionary
Extended dictionary
av
as
av
as
mpvl
mpsl
av
as
mpvl
mpsl
Instructions
708
22
676
26
57.99%
84.62%
717
22
63.18%
77.27%
Faces
91
11
95
17
54.96%
64.71%
120
17
47.50%
64.71%
Activation maps (left to right, top to bottom: slices 11 to 22) for the Instructions task generated by (a) the proposed method with extended dictionary (26 atoms), and (b) SPM software with FWE rate.
Activation maps (left to right, top to bottom: slices 9 to 19) for the Faces task generated by (a) the proposed method with extended dictionary and (b) SPM software with FWE rate.
Activation maps for the Instructions task generated with the proposed method by using the reduced dictionary (a) and extended dictionary (b). The activated regions contained in the ellipses show the effect of “filling” observed when the parameter estimation is performed using the extended dictionary.
Activation maps for the Faces task generated with the proposed method by using the reduced dictionary (a) and extended dictionary (b). The activated regions contained in the ellipses show the effect of “filling” observed when the parameter estimation is performed by using the extended dictionary.
6. Conclusions and Future Works
In this paper, a new method for detecting activation in fMRI data that exploits the sparseness of the BOLD signal in both time and spatial domains is presented. By adopting a sparse approach in both domains two objectives are achieved. First, in time domain, the l0 pseudonorm promotes the concentration of energy (related to stimuli) of a fMRI signal in a reduced set of model coefficients. Thus, during the parameter estimation a preselection of candidates voxels to be activated is performed by identifying those whose relationship with a selected stimulus is significant in the sense that if a voxel is activated by a particular stimulus its regression coefficient is significantly high. Second, in spatial domain, by adopting a Laplacian model it is possible to determine the threshold parameter which allows deciding how “significant” is the magnitude of the spatial parameter to consider the voxel as activated. The proposed approach is validated in two ways, first, using synthetic data where the ground truth is known in advance. Secondly, the method is evaluated through the comparison of activation maps with those generated by SPM software using real fMRI data. For synthetic data, results demonstrate that our approach is able to identify most activated voxels without any false activation. In case of real data, the activated regions detected by our approach are similar to those yielded by SPM. The results show the efficacy of our method and suggest that incorporating prior knowledge about spatial sparsity into spatial classification can reduce significantly the burdensome posterior spatial analysis needed when sparsity is ignored. Although the proposed method does not promote clustering, the activated regions exhibit mostly connected patterns. However, given that fMRI data are spatially correlated an approach that jointly exploits sparsity and could induce clustering is under consideration as a future work.
AppendixStatistical Analysis of Spatial Parameters
As mentioned in Section 3.2, selecting a Laplacian model for the estimated spatial parameters is supported by a set of statistical tests on the parameters outputted by the l0-LAD based regression approach which, in turn, are associated with 13 stimuli belonging to the PBAIC 2007 database. The statistical tests consisted of a visual analysis through histograms, fitting distributions and QQ-plots. Furthermore, a quantitative analysis based on the AIC is performed for each stimulus.
Data analysis through histograms strongly suggests as a plausible model the Laplacian distribution mainly due to the concentration of mass centered around zero and the heaviness of the tails of the empirical distribution. Nevertheless, two probability distribution models are fitted to data: specifically, the Gaussian model which has been the standard model assumed for spatial analysis of fMRI data and the Laplacian model. For both models, maximum likelihood (ML) estimators are used to compute the respective distribution’s parameters (i.e., the location and dispersion parameters) using the spatial parameters zl=[β^l1,β^l2,…,β^lN] associated with the lth stimulus, l=1,…,13. Figure 10 shows the logarithm of the probability distribution along with the Laplacian and the Gaussian distribution for the stimuli: Dog, Faces, Instructions, and WeaponsTools. Furthermore, we focus our attention in examining the heaviness of distribution tails since the selection of threshold parameter θ depends mainly on the behavior of those tails for each selected distribution model. As can be seen in Figure 10, in general, the estimated distribution exhibits heavier-than-Gaussian tails. Furthermore, comparing the tails of the empirical distribution with respect to those of the Laplacian distribution, it seems to have good match for all stimuli. These findings are consistent with those obtained through QQ-plots. In particular, in case of the 4 stimuli considered above, all QQ-plots of the Gaussian distribution versus the estimated distribution, see Figure 11, show that on the right side the plot is above the straight line and on the left side it is below of the straight line. Departures from this straight line indicate departures from normality. Furthermore, the shape of the normal plot indicates that the estimated distribution has heavier tails than the Gaussian model.
Statistical behavior of spatial parameters set β^lj,j=1,…,N for the stimuli: (a) Dog, (b) Faces, (c) Instructions y, and (d) WeaponsTools. Solid line is the probability density estimated from the data, dashed line corresponds to the Laplacian density, and the dashed-dotted line corresponds to Gaussian density.
QQ-plots for Gaussian and estimated distributions of stimuli: (a) Dog, (b) Faces, (c) Instructions, and (d) WeaponsTools.
Since graphical tools lack rigor, a quantitative measure of the distance between the estimated distribution and the proposed models based on the AIC is used. The AIC is defined by AICm=-2ln(Lm(π^∣z1,z2,…,zN))+2K, where K is the number of parameters in the model, Lm is the likelihood function of the model with m∈{L,N}, and π^ is the ML estimated of the parameters corresponding to the model [35]. According to this, the Laplacian model better fits the data if AICL-AICN<0. From Figure 12, it is clear that the Laplacian model fits the data for all stimuli except 7 and 12, for which, according to the AIC, the nearest model is the Gaussian one.
Sign of the difference AICL-AICN for 13 stimuli of PBAIC 2007 data.
Previous analysis shows that the tails of the estimated distribution are generally heavier than those of a Gaussian distribution, but not as heavy as those of a Laplacian distribution. This finding suggests the feasibility of adopting a more general statistical model: the Generalized Gaussian (GG) distribution, which contains the Gaussian and Laplacian distributions as special cases. The GG density has the following form [29]:(A.1)GGx;μ,σ,ρ=12σρ1/ρΓ1+1/ρe-x-μρ/ρσρ,where Γ(·) defines the gamma function; μ, σ, and ρ>0 denote, respectively, the location, scale, and shape parameters. This last parameter determines the distribution shape and it is linked to the thickness of the tails [29]. An interesting aspect of the GG model is that the shape parameter ρ can be determined from the data; this fact allows us to estimate the value of ρ that gives the highest probability of producing the observed data. In order to estimate the value of ρ the normalp package [29] is used to analyze the estimated spatial parameters of both synthetic and in vivo data. For both databases the parameters σ and ρ are determined by using the function estimatep assuming the median as the location parameter. After that, the threshold θ is estimated by using the function qnormp with probability pr=0.025 as in our experiments.
In case of synthetic database, the shape parameter estimated by the function estimatep is ρ=1 for each spatial parameters set associated with the synthetic stimulus s. According to these results the distribution model more suitable for each set of synthetic spatial parameters is the Laplacian model. These results are consistent with those inferred from the histograms, estimated distribution, and the AIC for synthetic database. Certainly, by using ρ=1 on the qnormp functio,n the value estimated of θ coincides with those calculated by using the closed form expression (11). With regard to in vivo data, the results of analyzing the estimated spatial parameters of the 13 stimuli described above by using the normalp package are presented in Table 4. As can be seen in this table, in most cases, the shape parameter is equal to one, confirming, once again, the appropriateness of the Laplacian model. Note that for stimuli 6 and 12, the shape parameters are, respectively, 1.3637 and 1.7865, which by the closeness of the distribution it may be thought of as Laplacian and Gaussian, respectively. These findings are consistent with the previous analysis, even though the stimulus 7 is classified as Gaussian by the AIC criterion. This seemingly discrepancy in the classification of stimulus 7 can be explained from the estimated values of AICL and AICG, which are close together.
Results of analyzing in vivo data with the normalp package indicating the shape parameter ρ and the threshold θ estimated with (i) the normalp package and (ii) the closed form expression (11).
Stimuli
ρnormalp
θnormalp
θtheorexp
Arousal
1
0.00979738
0.00979711
Dog
1
0.1483275
0.14082884
Faces
1
1.24216047
1.24212600
FruitsVegetables
1
0.1328233
0.13281962
Hits
1
0.39250583
0.39249493
Instructions
1.3637
0.35744665
0.41064940
InteriorExterior
1
0.13646107
0.13645728
SearchFruit
1
0.04498026
0.04497901
SearchPeople
1
0.01944126
0.01944072
SearchWeapons
1
0.03387324
0.03387230
Valence
1
0.04011155
0.04011044
Velocity
1.7865
0.1221627
0.18050036
WeaponsTools
1
0.40075143
0.40074030
Disclosure
A partial version of this manuscript was presented as a plenary at the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society in Boston, in 2011 [27].
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors would like to acknowledge ECOS-NORD-FONACIT under Grant PI-20100000299, the Council of Scientific, Humanistic, Technological and Artistic Development of the Universidad de los Andes (CDCHTA-ULA) under Project I-1336-12-02-B, and the Universidad Nacional Experimental del Táchira (UNET) for their financial support.
LindquistM. A.The Statistical Analysis of fMRI Data20082344394642-s2.0-6764934261810.1214/09-STS282SartyG. E.Computing brain activity maps from fMRI time-series images200611902-s2.0-8492611312910.1017/CBO9780511541704PennyW. D.FristonK. J.Advanced image processing in magnetic resonance imaging200527Taylor and Francis Group54156310.1201/9781420028669.ch17LarssonE. G.SelnY.Linear regression with a sparse parameter vector200755245146010.1109/TSP.2006.887109MR2445956HansonS. J.BlyB. M.The distribution of BOLD susceptibility effects in the brain is non-Gaussian20011291971197710.1097/00001756-200107030-000392-s2.0-0035800230ChenC.-C.TylerC. W.BaselerH. A.Statistical properties of BOLD magnetic resonance activity in the human brain2003202109611092-s2.0-014210629710.1016/S1053-8119(03)00358-6LuoW.-L.NicholsT. E.Diagnosis and exploration of massively univariate neuroimaging models2003193101410322-s2.0-004317229810.1016/S1053-8119(03)00149-6DaubechiesI.RoussosE.TakerkartS.BenharroshM.GoldenC.D'ArdenneK.RichterW.CohenJ. D.HaxbyJ.Independent component analysis for brain fMRI does not select for independence20091062610415104222-s2.0-6764981110110.1073/pnas.0903525106OlshausenB. A.FieldD. J.Sparse coding of sensory inputs20041444814872-s2.0-404316621810.1016/j.conb.2004.07.007WolfeJ.HouwelingA. R.BrechtM.Sparse and powerful cortical spikes20102033063122-s2.0-7795476101810.1016/j.conb.2010.03.006Quian QuirogaR.KreimanG.Measuring Sparseness in the Brain: Comment on Bowers (2009)201011712912972-s2.0-7404912253310.1037/a0016917FadiliJ. M.BullmoreE.Penalized partially linear models using sparse representations with an application to fmri time series20055393436344810.1109/TSP.2005.853207MR2213563KhalidovI.FadiliJ.LazeyrasF.Van De VilleD.UnserM.Activelets: wavelets for sparse representation of hemodynamic responses201191122810282110.1016/j.sigpro.2011.03.0082-s2.0-80051470317LeeK.TakS.YeJ. C.A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion2011305107610892-s2.0-7995557126510.1109/TMI.2010.2097275AbolghasemiV.FerdowsiS.SaneiS.Fast and incoherent dictionary learning algorithms with application to fMRI2015911471582-s2.0-8492586103910.1007/s11760-013-0429-2LvJ.JiangX.LiX.ZhuD.ChenH.ZhangT.ZhangS.HuX.HanJ.HuangH.ZhangJ.GuoL.LiuT.Sparse representation of whole-brain fMRI signals for identification of functional networks20152011121342-s2.0-8492087349810.1016/j.media.2014.10.011CarrollM. K.CecchiG. A.RishI.GargR.RaoA. R.Prediction and interpretation of distributed neural activity with sparse models20094411121222-s2.0-5514911796310.1016/j.neuroimage.2008.08.020NgB.VahdatA.HamarnehG.AbugharbiehR.Generalized sparse classifiers for decoding cognitive states in fMRI201063571081152-s2.0-7795803840810.1007/978-3-642-15948-0_14LiY.YuZ.GuZ.NamburiP.GuanC.FengJ.Voxel Selection in fMRI Data Analysis Based on Sparse Representation20095610243924512-s2.0-7404914446210.1109/TBME.2009.2025866LiY.LongJ.HeL.LuH.GuZ.SunP.A Sparse Representation-Based Algorithm for Pattern Localization in Brain Imaging Data Analysis20127122-s2.0-8487080434510.1371/journal.pone.0050332e50332KimS.-J.KohK.LustigM.BoydS.GorinevskyD.An interior-point method for large-scale l1-regularized least squares200714606617den DekkerA. J.SijbersJ.Implications of the Rician distribution for fMRI generalized likelihood ratio tests200523995395910.1016/j.mri.2005.07.0082-s2.0-27944432963ParedesJ. L.ArceG. R.Compressive sensing signal reconstruction by weighted median regression estimates201159625852601MR284068810.1109/TSP.2011.21259582-s2.0-79957508877Pittsburg-EBC-Group, “PBAIC Homepage,” Accessed:March 2013TibshiraniR.Regression shrinkage and selection via the lasso1996581267288MR1379242JiS.XueY.CarinL.Bayesian compressive sensing200856623462356MR251663810.1109/TSP.2007.9143452-s2.0-44849087307GuillenB.ParedesJ. L.MedinaR.A sparse based approach for detecting activations in fMRIProceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011September 2011USA781678192-s2.0-8405521933310.1109/IEMBS.2011.6091926TroppJ. A.WrightS. J.Computational methods for sparse solution of linear inverse problems201098694895810.1109/JPROC.2010.20440102-s2.0-77952743135MineoA.RuggieriM.A software tool for the exponential power distribution: The normalp package2005124124PuigP.StephensM. A.Tests of fit for the laplace distribution, with applications20004244174242-s2.0-003432132610.1080/00401706.2000.10485715Zbl0996.62050MeyerF. G.ShenX.Classification of fMRI time series in a low-dimensional subspace with a spatial prior200827187982-s2.0-3754900508210.1109/TMI.2007.903251W. T. C. for Neuroimaging, “Statistical Parametric Map. http://www.fil.ion.ucl.ac.uk/spm/, Accesed: March 2015Pittsburg-EBC-Group, “2007 Guide Book,” Accessed: March 201310.1097/01.AOG.0000263909.39577.ecLitvakV.MattoutJ.KiebelS.PhillipsC.HensonR.KilnerJ.BarnesG.OostenveldR.DaunizeauJ.FlandinG.PennyW.FristonK.EEG and MEG data analysis in SPM8201120112-s2.0-7995502983110.1155/2011/852961852961BozdoganH.Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions19875233453702-s2.0-3425010802810.1007/BF02294361Zbl0627.62005