BOLD Noise Assumptions in fMRI

This paper discusses the assumption of Gaussian noise in the blood-oxygenation-dependent (BOLD) contrast for functional MRI (fMRI). In principle, magnitudes in MRI images follow a Rice distribution. We start by reviewing differences between Rician and Gaussian noise. An analytic expression is derived for the null (resting-state) distribution of the difference between two Rician distributed images. This distribution is shown to be symmetric, and an exact expression for its standard deviation is derived. This distribution can be well approximated by a Gaussian, with very high precision for high SNR, and high precision for lower SNR. Tests on simulated and real MR images show that subtracting the time-series mean in fMRI yields asymmetrically distributed temporal noise. Subtracting a resting-state time series from the first results in symmetric and nearly Gaussian noise. This has important consequences for fMRI analyses using standard statistical tests.


INTRODUCTION
Functional magnetic resonance imaging (fMRI) measures activity in different areas of the brain under different experimental conditions (e.g., active, rest). In the medical imaging literature, magnitudes in MR images are assumed to follow a Rice distribution [1][2][3][4][5][6][7], first studied by Rice [8, pages 100-103]. Most statistical analyses of fMRI images test the difference between experimental conditions against a null distribution, which applies when no task is performed. Parametric statistical fMRI analysis often assumes Gaussian noise [9][10][11][12][13][14], but findings contradicting this assumption have already been reported by Hanson and Bly [15], and tests for the distribution of the residual (noise) signal in fMRI data sets have been developed [16].
In this paper, we examine the properties of Rician noise, and the distribution of resting state images that are made by pairwise subtraction of MR images. Most standard tests, such as the t-test, F-test, and the z-test, rely on Gaussian distributed noise. Petersson et al. [17] argue that with Gaussian spatial smoothing, many degrees of freedom, and the multivariate central limit theorem, these tests are valid in functional neuroimaging, but they warn that low-count PET data show departures from normality. Similar effects can be seen in functional MR images. The Rician probability density function is very asymmetric if the signal is weak compared to the noise, so for low signal intensities and with a low signalto-noise ratio (SNR), Rician noise and Gaussian noise behave very differently and the Rician distribution has to be taken into account in order to prevent biased statistical results.
This problem is important for fMRI, because the scans may have relatively low SNRs, and the values of the BOLD contrast are very small compared to the noise. This is especially true for data with high temporal and/or spatial resolution: this will inevitably lead to lower SNR values.
The remainder of this paper is organised as follows. Section 2 introduces the Rician noise model for MR images. Section 3 derives analytical expressions for the probability distribution of the difference between two MR images, which are verified in a series of tests on synthetic noise images. Section 4 investigates the noise distributions in MR template images contaminated with noise and in a real fMRI time series, and discusses implications for the design of fMRI experiments. Section 5 contains some general conclusions.

NOISE IN MR IMAGES
During image acquisition in an MR scanner, magnetic fields are transmitted in pulses varying in frequency and phase. Voxel locations are selected by frequency and phase, and the resulting data consist of complex values. The frequency space 2 International Journal of Biomedical Imaging in which these data are represented is known as the k-space. The values in the real and imaginary parts of the image are Gaussian distributed. The k-space data are transformed to a Cartesian space via an inverse Fourier transform (IFT). The noise distribution in the resulting image is still Gaussian, because the IFT is a linear transform.
Most applications of MR imaging only use the magnitudes of the signal, because those magnitudes represent a physical property of the scanned object [18]. Let A(x) represent the magnitude of the MR image at voxel location x in the absence of noise. The magnitude r(x) of the signal at voxel location x in the magnitude image is where n 1 (x) and n 2 (x) are the real and imaginary parts of the noise and N(0, σ 2 ) is the Gaussian distribution with mean zero and standard deviation σ. The magnitude signal in each voxel x is Rician distributed [1][2][3], that is, Prob[r ≤ r(x) ≤ r + dr] = p A(x),σ (r), where p A,σ (r) is the Rician probability density with parameters A and σ defined by where I k (z) = 1 π π 0 e z cos(θ) cos(kθ)dθ (3) is the modified Bessel function of the first kind of order k, k∈N. Figure 1 shows the Rician probability density function (pdf) for varying values of A and σ. The shape of the PDF changes with both parameters. The distribution for A = 0 is called the Rayleigh distribution. For high SNRs, the Rician distribution approaches a Gaussian distribution [3]. The mean μ r = ∞ 0 r p A,σ (r)dr of the Rice distribution is given by [8, pages 100-103, Appendix 4B] where z = A/σ is the SNR, and I k , The standard deviation σ r = ∞ 0 r 2 p A,σ (r)dr − μ 2 r of the Rice distribution satisfies the relation [3] σ r = A 2 + 2σ 2 − μ 2 r .
As A/σ goes to infinity, these formulas yield μ r → A, σ r → σ, that is, the mean approaches the noise-free intensity and the standard deviation approaches the corresponding value of the underlying noise distribution N(0, σ 2 ). MR noise was modelled by computing the intensity distribution as in (1). To make a Rice-distributed noisy image from a real-valued noise-free image f (x), we use the following procedure for each voxel location x: Again, the noisy image is denoted by r(x). The local SNR is controlled through the ratio f (x)/σ, where f (x) and σ determine μ r and σ r as described in (4) and (5), respectively.

Statistical testing in fMRI
In fMRI analysis, one searches for activation in certain brain areas via statistical hypothesis testing. The null hypothesis H 0 states that there is no activation, other hypotheses correspond to several kinds of activation. Most fMRI analysis methods, such as statistical parametric mapping [13], use the general linear model (GLM). The GLM treats fMRI responses as the outputs of a linear time-invariant (LTI) system using a number of temporal basis functions, f 1 (·), . . . , f M (·), called explanatory variables. The GLM has the form where Y k,s is the observed data at voxel k, k = 1, . . .
where Y is a T × N matrix, X is the T × M design matrix containing the f m (t s ) values, β is T × N weight matrix, and e is the T × N residual matrix containing the part of the signal not modelled by any component in X. Statistical parametric tests often assume that the error values in e are independent and identically normally distributed, that is, e k,s ∼ N(0, σ 2 k ), where the standard deviation may depend on the voxel location.
In brain activation studies, one considers an equation of the form (6) for both the activated and the rest (null) condition. Let Y q k,s denote the observed signals under condition q (0 = "rest," 1 = "active"). Then a test statistic is formed at each voxel, for example, a t-statistic T k defined by where Y q k is the temporal average per voxel and S 2 k is the pooled variance estimate, that is, Under the assumption of Gaussian noise, that is, e q k,s ∼ N(0, σ 2 k ), Y q k,s is normally distributed with mean μ q and standard deviation σ k , and also the differences Y q k,s − Y q k are normally distributed. This implies that T k ∼ t 2M−2 , that is, T k has a t-distribution with 2M − 2 degrees of freedom under the null hypothesis H 0 : μ 0 = μ 1 , that is, no mean effect of activation occurs. For this reason the distribution of T k under this hypothesis is called the null distribution. Voxels where this hypothesis can be rejected are therefore designated as activated voxels. The significance of a certain observed voxel value is expressed by a so-called p-value, which is the probability of that voxel's intensity being attributable to mere chance. A p-value is calculated as the area under the graph of the t-distribution to the right of a given intensity value on the horizontal axis. A low p-value (say lower than 0.05) indicates that the measured value is probably not due to mere chance, that is, that it is a real activation.
The Rician distribution which applies to MRI data has a heavier right tail than a Gaussian, so p-values based on a Gaussian noise assumption with the standard deviation of the Rice distribution will be too low, introducing false positives. Hanson and Bly [15] found similar deviations, using gamma distributions instead of Rician PDFs.
As we have seen, it is the null distribution which is needed to compute p-values. From the discussion above, it is apparent that a sufficient condition for the usual statistical analysis to hold is that the difference signal at each voxel corresponding to the case of no activation has a Gaussian distribution. Therefore, our object of study in the remainder of this paper is the distribution of the noise in difference images of Rician-distributed MRI images without activation. As we will see, this distribution is indeed very close to a Gaussian, albeit with a standard deviation different from that of the initial Rice distribution.

Null distribution of the difference fMRI signal
The difference of two noisy versions r 1 (x) and r 2 (x), containing Rician noise, of the same underlying image f (x), is not Rician distributed. Let the null image s(x) be defined as s(x) = r 2 (x) − r 1 (x). Its probability density function (PDF) is denoted by C A,σ (s), where we write A instead of f (x) and σ is the standard deviation of the underlying noise distribution (cf. (1)). Then C A,σ (s) is the probability that the value of the difference s(x) falls in an infinitesimal interval around s : We will refer to C A,σ (s) as the null distribution.
Since it is easy to see that C A,σ (s) is symmetric, that is, C A,σ (s) = C A,σ (−s), we have the following expression valid for arbitrary values of s ∈ R: where δ(r) denotes the Dirac delta function. That is, C A,σ (s) is the cross-correlation of two identical Rice distributions.

4
International Journal of Biomedical Imaging The mean μ s and standard deviation σ s of the null distribution C A,σ (s) are given by where σ r is the standard deviation of the Rice distribution, see (5). For the derivation, see Appendix A.1. In the case A = 0, the PDF of r 1 , as well as that of r 2 , is For the Rayleigh case (A = 0), the integral in (10) can be explicitly evaluated. The resulting expression for C 0,σ (s) is ∞ z e −t 2 dt is the complementary error function [19]. For the derivation of this formula, we refer to Appendix A.2.
The following experiments investigate how well the PDF C A,σ (s) can be approximated by a Gaussian distribution.

Numerical approximation by a normal distribution
The distribution C A,σ (s), see (10), was numerically approximated by a Gaussian via the Levenberg-Marquardt curvefitting algorithm. The fit was carried out on an interval centered around zero with negligible function values outside this interval. Figure 2 shows the PDF C A,σ (s), as well as the Gaussian fitted to this distribution, for a number of values of A and σ. The plots show an excellent fit. Table 1 presents some quantitative results. It shows, for various values of A and σ, (i) the exact standard deviation σ s = √ 2σ r of the null distribution (11), where σ r was computed according to (4)-(5); (ii) the width σ Gauss of the Gaussian fitted to C A,σ (s); and (iii) the mean square error of the difference between C A,σ (s) itself and the fitted Gaussian. The difference between the width σ Gauss of the fitted Gaussian and the exact value σ s is very small, especially for high SNR (i.e., A/σ). Since σ r approaches σ for high SNR (see Section 2), σ s approaches √ 2σ in this limit. The mean square error should decrease when the SNR increases; this is confirmed by the experimental results. The null distribution does not have a heavy tail (it is slightly lighter than a Gaussian of width σ, see below). Function values outside the interval used in the fitting procedure represent a negligible portion of the distribution. Therefore, p-values from statistical tests based on Gaussian noise with an estimated standard deviation σ Gauss will be very accurate, and (because of the light tail) where there is a difference, the estimates will be conservative.

Tail of the null distribution
An important property of the PDF C A,σ (s) for statistical fMRI analysis is the tail behaviour (as |s| approaches infinity) of the distribution under the null hypothesis, because this determines the p-value corresponding to a certain threshold (cf. Section 3.1). For the limiting cases of low and high SNR, that is, A = 0 and A/σ large, we mathematically analysed the behaviour of the PDF C A,σ (s) when |s| becomes very large. The details are presented in Appendix A.3. We find that both in the Rayleigh case (A = 0) and for large values of A/σ, the tails of the distribution (10) are lighter than the tail of a Gaussian distribution: where the constant depends on A and σ. This is a Gaussian tail of width σ multiplied by a factor 1/|s|, which means that the distribution approaches zero even faster than a Gaussian distribution of width σ. This implies that if p-values based on a Gaussian are used, the test is slightly conservative, and will not give extra false positives.

Statistical tests of normality
An image of a uniform underlying intensity with Rician noise has a spatially stationary noise distribution. The distribution of the difference between two such images is symmetric.
To test whether this distribution is close to Gaussian, the Kolmogorov-Smirnov (KS) test was employed as follows. We created two images of a uniform intensity A with Rician distributed noise, and computed the difference between the noisy images. The KS test was applied to the difference image. The null hypothesis of the KS test is that the data are normally distributed, and this is rejected if the p-value of the KS test statistic is below 0.05. For a number of intensities A, images of different sizes were tested, and for each size and intensity, the test was repeated 32 times. Table 2 shows the mean p-values of the KS test statistics for each size, with intensity A = 1 and A = 5. As a reference, 32 images of the same size containing N(0, 1) noise were also tested, and their mean p-values are in the right column. This table shows that deviations from normality can only be detected in very large images with low intensities: for high intensities, they are too small to measure.  Figure 2: The exact null distribution C A,σ (s) (solid), the fitted Gaussian (dashed), and the difference between C A,σ (s) and the Gaussian (dotted). Note that the fitted Gaussian is hardly distinguishable from the exact distribution.

Parameter estimation in fMRI with the general linear model
For fMRI analysis, the possibility of accurately estimating the parameters of the noise is at least as important as using the right noise model. We tested the applicability of the GLM (see Section 3.1) by estimating the noise parameters in difference images created in the same way as s(x) in Section 3.2, and comparing them with the real underlying parameters.
A matrix e of error signals was created by making a time series of 128 difference images. The standard deviation of the temporal noise was computed in each voxel. Table 3 shows, for the same input A and σ as before, the measured temporal standard deviation σ temp , the mean standard deviation σ s in the difference images (which equals √ 2σ r , see Section 3.2), and the ratio σ s /σ temp . It shows that the standard deviation σ s is very accurately predicted by formula (11).

Evaluation of the test results
The statistical tests, the analytical results, and the numerical computations, all show that the difference between two MR images whose intensities are Rician distributed, can be very well approximated by a Gaussian distribution. The approximation is closest for high SNR, but is still very good for lower SNR. Given the parameters A and σ of the Rician spatial noise in a series of MR images and defining null images as pairwise difference images, the parameters of the Gaussian distribution that describes the temporal noise can be accurately estimated.

THE NOISE DISTRIBUTION IN fMRI
Images in an fMRI time series have a range of intensities, so the noise distribution is a sum of Rician PDFs (sums of Gamma PDFs have also been used, see [15] for an example). For each intensity A in the image, the noise is distributed differently (see Figure 3(c)), and this will have an influence on the parameter estimates of the GLM. Areas with a "true" grey value of 0, like the space around the body, have Rayleighdistributed noise, and the areas with higher grey values have more symmetric distributions, which are quite similar to a Gaussian, and they are centered around the grey value at that location. The total noise distribution is a mixture of all those distributions. The question is whether the conclusions about the noise in the difference image obtained in Section 2 also hold for noisy images with mixed distributions.

Shape of the noise distribution in MR images
A simulated MR image was acquired from the BrainWeb Magnetic Resonance Imaging simulator [20] with the following parameters: modality T2, slice thickness 1 mm, noise 0%, intensity nonuniformity 0%. Nonbrain voxels were excluded with the Brain Extraction Tool [21]. This noise-free T2 *weighted image (Figure 3(a)) was contaminated by Rician noise with a known σ (see Figure 3(b)). A residual image was obtained by subtracting the original MR image from the noisy MR image, and a null image was made using the procedure proposed in Section 3, that is, as the difference between two MR images containing Rician noise.
The dissimilarity between a Rician distribution and a Gaussian is largest for low signal intensities A. The previous section showed that the difference between two images of a uniform intensity A and Rician noise has zero mean and is near-Gaussian distributed, also for low signal intensities. This section examines the difference images when the noisefree images contain more than one intensity. Figure 4 shows the histograms of a noisy MR image, the difference between a noisy MR image and the noise-free image, and the difference between two noisy MR images, respectively. The histograms were computed for a range of values for σ and are presented together as surface plots. As σ decreases, the histogram of the noisy MR image changes from one Rayleigh-like PDF to a number of near-Gaussian PDFs (see Figure 4(a)). The histogram of the noisy image after subtraction of the original is asymmetric for high σ, and becomes more symmetric as σ decreases (see Figure 4(b)). The histogram of the difference images is symmetric for all σ (see Figure 4(c)).

Time series of MR images
A time series of 164 EPI scans was made on a 3 Tesla Intera scanner (Philips Medical Systems, The Netherlands), with repetition time TR = 3 seconds, volume size = 64 × 64 × 46 voxels of 3.5 × 3.5 × 3.5 mm 3 . No stimuli were presented, and the null hypothesis of no activation was assumed to be true throughout the experiment. Alignment of the images was done with SPM 99 program [13]. The time series was split in two disjoint sets: TS 1 (images 1, . . . , 82) and TS 2 (images 83, . . . , 164). The noise of TS 1 was centered around 0 by subtracting the time series mean image of TS 1 from each image. Note that although this is a common procedure in fMRI analysis, this means treating Rician noise as additive noise. To obtain an image with a symmetric noise distribution, difference images were made by subtracting the corresponding image of TS 2 from each image of TS 1 .
The histogram of the time series mean image (see Figure 5) was used to divide the images into three intensity ranges: low intensity (grey value 0, . . . , 300), medium intensity (grey value 301, . . . , 600), and high intensity (grey value > 600). Figure 6 shows the histograms of the grey values in the resulting time series within the three ranges. Gaussians were fitted to the histograms with the Levenberg-Marquardt curve-fitting algorithm. For medium and high intensities, the time series histograms show no significant asymmetries. For low intensities however, the time series TS 1 after subtracting the mean has an asymmetric histogram, while the time series TS 1 after subtracting TS 2 has a symmetric histogram. The fits are never perfect, except in the case of low intensities and after subtracting TS 2 . In that case, the intensity distribution has one predominant intensity (A = 0, see Figure 5), and the 1500 1000 500 0 difference distribution is close to those in the A = 0 cases of the previous section. In the other cases, the noise originates from voxels with various intensities, and the noise distribution resembles a mixture of Gaussians with mean μ = 0 and various σ.
Because the amount of asymmetry in the medium and high grey-value ranges is very small, the combination of thresholding and subtracting the time series mean may solve most of the problems concerning the Rician distribution of the noise. However, the new method presented in this paper of subtracting a second-time series is preferable: it has proved to yield symmetric noise distributions in all measurements considered.

Implications for fMRI designs
The assumption of Gaussian noise in the analysis of fMRI data should be used with care. Relying on the robustness of the standard tests most often works, but it does not solve the problem of the asymmetric noise distribution. A recent maximum-likelihood test based on the Rician distribution shows to be as powerful as the GLM-based test with a high SNR, but performs much better with a low SNR [22]. For using the assumption of Gaussian noise, difference distributions like the one presented here will be required. The example presented here of using an extra data set for every experiment is difficult for large studies, but this can be solved in a more practical way: a relatively small set of "null data" can be reused after randomisation in the time dimension. The only change in the formula for the GLM (7) is using Y − Y 0 instead of Y, with Y 0 the resting-state data set. It is trivial to see that this does not change the way the estimates are computed, even if different (more complex) design matrices X are used.

CONCLUSIONS
We have presented a noise model in BOLD fMRI that takes into account the Rician distribution of MR noise known from the literature. BOLD noise was defined as the difference between two MR images with Rician noise. We investigated the properties of the difference image under the null hypothesis (no brain activation), which is needed to determine pvalues in a statistical analysis. The problem was studied in several complementary ways: analytical calculation, numerical simulation, statistical estimation, and experimental validation on real EPI data. An analytic expression was derived for the statistical null distribution C A,σ (s) as an integral in terms of two underlying Rician probability densities with parameters A and σ. From this basic formula, analytical expressions were derived for the mean and standard deviation of the null distribution, as well as for its tail, that is, its asymptotic behaviour as s goes to infinity.
The null distribution C A,σ (s) was numerically approximated by a Gaussian function with the Levenberg-Marquardt nonlinear curve-fitting algorithm. The approximation by a Gaussian distribution was very good, with the accuracy increasing with SNR (i.e., A/σ). The standard deviation of the fitted Gaussian was found to be in excellent agreement with the exact standard deviation σ s derived from the analytical expressions.
The statistical properties of the noise were examined in two ways. The Kolmogorov-Smirnov test was applied to difference images of noise-only images with Rician distributed noise. A second test using the general linear model (GLM) compared the estimated noise parameters to the value predicted by the model, and showed that the agreement is excellent.
From the analytical results, the numerical computations, and the statistical tests, we concluded that the assumption of Gaussian distributed noise used in the fMRI literature could be justified. That is, the difference between two images whose intensities follow a Rice distribution can be very well approximated by a Gaussian distribution. The approximation is closest for high SNR, but is still quite good for lower SNR. Given the parameters A and σ of the Rician spatial noise in a series of MR images, the standard deviation of the Gaussian that describes the temporal noise can be accurately predicted.
The noise model was tested on simulated and real MR images. In a test that contaminated noise-free MR templates with Rician noise, MR noise was shown to have an asymmetric distribution when it is-incorrectly-treated as additive noise. As in the test with noise-only images, difference images of noisy MR pictures were found to have a symmetric distribution. The consequence for fMRI time series analysis is that subtracting the time series mean does not get rid of the asymmetry in temporal noise.
We tested thresholding the MR images as a fast and simple alternative to the difference image approach: it can remove asymmetry in the noise distribution to a large extent, depending on the robustness of the test that is used. Subtracting a second time series from the time series being analysed yields symmetric and close to Gaussian distributed noise.

A. MATHEMATICAL ANALYSIS OF THE NULL DISTRIBUTION
This appendix presents the derivations of the exact analytical results in Section 3 on the distribution of the difference signal under the null hypothesis. Extensive use is made of the concept of asymptotic expansions. We first provide a few formal definitions. Let φ(x) and ψ(x) be two functions defined for x ≥ x 0 . One writes φ(x) = O(ψ(x)), x → ∞, when constants K and x 1 exist such that |φ(x)| ≤ K|ψ(x)| for x ≥ x 1 . We call ∞ n=0 a n φ n (x) the asymptotic expansion of a function f when, for every N, Below, we only use the first term in the asymptotic expansion of some special functions (error function, Bessel function), and use the shorthand notation To make this precise, one has to refer to the full asymptotic expansions, as can be found in Abramowitz and Stegun [19]; for easy reference, we refer to the relevant sections of this handbook at the appropriate places.

A.1. Mean and variance of the null distribution
First, the mean μ s is zero because of the symmetry of C A,σ (s). Second, since the mean is zero, the variance of the null distribution satisfies (see (10) Here E(· · · ) denotes the average of the quantity within the brackets. So we have found that σ 2 s = 2σ 2 r , which directly yields (11).

A.2. Exact form of the null distribution in the Rayleigh case
Substituting the form (12) of the Rayleigh distribution in expression (10), we find Putting r/σ = x, |s|/σ = q, A/σ = a, we find after some algebra Writing τ = q/2, we can write this integral as the sum of two terms, each of which can be expressed in terms of the complementary error function Reexpressing τ in terms of the original variable s (i.e., τ = q/2 = |s|/(2σ)), we obtain formula (13).

A.3. Tails of the null distribution
We consider the limiting case of low versus high SNR, that is, A = 0 and A/σ large. A = 0. This is the Rayleigh case, for which we have derived an exact expression for the null distribution, see formula (13). When s is large, we can use the asymptotic behaviour of the error function [ Substituting this in (13), we find (after rearrangement of terms) which behaves as a Gaussian tail of width σ multiplied by a factor 1/|s|. A/σ large. Since A/σ is large, we apply the Gaussian approximation of the Rice distribution: As shown in [3], this approximation is already accurate for A ≥ 2σ. This formula is easy to derive by using the asymptotic expansion of the Bessel function I 0 as given in [19, Section 9.7.1]. Substituting this in (10), we get Putting r/σ = x, |s|/σ = q, A/σ = a, we find after some algebra C A,σ (s) ∼ 1 2πσ which again behaves as a Gaussian tail of width σ multiplied by a factor 1/|s|.