Total Variation Wavelet-Based Medical Image Denoising

We propose a denoising algorithm for medical images based on a combination of the total variation minimization scheme and the wavelet scheme. We show that our scheme offers effective noise removal in real noisy medical images while maintaining sharpness of objects. More importantly, this scheme allows us to implement an effective automatic stopping time criterion.


INTRODUCTION
The advent of digital imaging technologies such as MRI has revolutionized modern medicine. Today, many patients no longer need to go through invasive and often dangerous procedures to diagnose a wide variety of illnesses. With the widespread use of digital imaging in medicine today, the quality of digital medical images becomes an important issue. To achieve the best possible diagnoses it is important that medical images be sharp, clear, and free of noise and artifacts. While the technologies for acquiring digital medical images continue to improve, resulting in images of higher and higher resolution and quality, noise remains an issue for many medical images. Removing noise in these digital images remains one of the major challenges in the study of medical imaging.
While noise in medical images present a problem because they could mask and blur important but subtle features in the images, many proposed denoising techniques have their own problems. One of the widely discussed techniques is the wavelet thresholding scheme, which recognizes that by performing a wavelet transform of a noisy image, random noise will be represented principally as small coefficients in the high frequencies. Thus in theory a thresholding, by setting these small coefficients to zero, will eliminate much of the noise in the image. The wavelet hard thresholding scheme, which sets wavelet coefficients below certain threshold in magnitude to 0, is easy to implement and fast to perform, and depending on the threshold, it removes noise adequately. However, at the same time it also introduces artifacts as a result of the Gibbs oscillation near discontinuities. Since artifacts in medical images may lead to wrong diagnoses, the wavelet hard thresholding scheme is not practical for use in medical imaging without being combined with other techniques. An improvement over the wavelet hard thresholding is the wavelet soft thresholding scheme [1,2], which significantly reduces the Gibbs oscillation but does not eliminate it. The effectiveness of wavelet thresholding schemes in general are limited with combining them with other techniques. These other more complex techniques often try to take account of geometric informations by using waveletlike bases that better characterize discontinuities, such as curvelets [3,4]. Nevertheless, they do not completely eliminate the Gibbs phenomenon. Other methods with varying success have also been studied by different authors, for example, [5][6][7].
Another approach employs variational principles and PDE-based techniques. In this approach, a noisy image is modeled as z(x) = u 0 (x) + n(x) where u 0 denotes the uncontaminated underlying image and n denotes the noise. To reconstruct u 0 one considers the problem of minimizing where λ > 0, Ω is the domain on which z is defined, and the term R(u) is a regularization functional. Earlier efforts focused on least square-based functionals R(u)'s such as Δ 2 L 2 (ω) , Öu 2 L 2 (ω) , and others. While noise can be effectively removed, these regularization functionals penalize discontinuity, resulting in soft and smooth reconstructed images, with subtle details lost. Again, for medical imaging this is not practical, as subtle details could very well yield crucial information about the patients. International Journal of Biomedical Imaging A better choice for R(u) was proposed in [8], in which R(u) is the total variation (TV) of u given by Intensive studies have shown that the total variation better preserves edges in u, thus it allows for sharper reconstructions, for example, [9][10][11][12]. Among all the PDE-based techniques, the TV minimization scheme is a candidate that offers the best combination of noise removal and feature preservation. Solving the minimizers for the TV minimization (2), or (1) in general, amounts to solving certain PDEs, which is very similar to the anisotropic diffusion scheme proposed first in [13]. For the TV minimization it is easy to show that the PDE is given by But in practice, one introduces the time variable t and solves for u(x, t) by time-marching the equation The end result u(x, T), if T is large enough, will have all noise removed. An important attribute of the TV minimization scheme is that it takes the geometric information of the original images into account, in that it preserves significant edges. In fact significant edges are sharpened. This is similar to the anisotropic diffusion methods see [13,14] and references therein.
The time-marching of (4) is in essence solving for the minimizer of E(u) by gradient flow. Two approaches are used for achieving the best combination of noise removal and feature preservation. The straighforward approach is to tune the parameter λ. Obviously if λ is too large we may not remove enough noise. On the other hand, if λ is too small it is well known that the scheme will remove too many features and end up with a cartoon-like piecewise constant image [15,16]. But tuning the parameter λ is time consuming. Since in practice there is no original image to compare to, and the assumption of i.i.d. Gaussian noise is not always realistic, tuning λ often relies on experience and visual inspection. There is no automatic way for it as far as we know. A more widely used approach is to choose λ in a reasonable range without being precise about the choice. Instead, we try to stop the time-marching before it reaches the ground state at a point that offers a good combination of noise removal and feature preservation. But again here we face the problem of deciding when to stop. There have been efforts in this direction, see, for example, [17,18]. These proposed criteria are typically cumbersome and are based on some a priori knowledge about the noise such as the variance and type, which may not be realistic. With the explosion in volumes of medical images, this is a very significant issue.
In this paper we propose a wavelet TV denoising scheme. In our scheme, the wavelet coefficients are selected and modified subjecting to minimizing the TV norm of the reconstructed images. We demonstrate that while being as effective as the TV scheme in removing noise, the wavelet TV scheme allows us to modify the wavelet coefficients primarily in the high frequency domain, something that the regular TV scheme cannot do. Experiments show that the wavelet TV scheme preserves details like the regular TV scheme but offers a slightly higher PSNR in the reconstruction. It is also significantly faster in that far fewer iterations are needed for noise removal. The details of these improvements will be presented in a separate paper [19]. And unlike the traditional wavelet thresholding scheme, it does not introduce Gibbs' oscillations near discontinuities. These properties are consistent with other investigations that combine variational approaches with wavelet framework [20][21][22][23][24]. But more importantly, this scheme allows for an effective automatic stopping time criterion based on a certain statistical property of wavelet coefficients. An added advantage for our approach is that it leads to superior JPEG2000 compression for denoised images [21]. Given the increased use of JPEG2000 standard in medical imaging, this is a significant bonus.

THE WAVELET TOTAL VARIATION DENOISING METHOD
In this section, we describe our image denoising algorithm based on wavelet and TV minimization. We start with a standard noisy monochromatic image model where z(x), u 0 (x), and n(x) are real valued functions defined on R 2 , and they are compactly supported since they represent images in our study. The function u 0 (x) denotes the underlying noise-free image, z(x) the observed image, and n(x) the noise. In our general model, we assume that z(x), u 0 (x), and n(x) are in some space of functions F , such as L 2 (Ω) for some domain Ω. Let ψ j : j ¾ I be a basis for F . This basis can be an orthonormal basis, such as wavelets [25,26] if F is a Hilbert space, or any other type of bases in general. So for any f (x) ¾ F we have for some real (c j ).
In [21] a wavelet TV minimization model is proposed, in which ψ j is taken to be a wavelet basis for F = L 2 (Ω). In that model, the wavelet coefficients are selected and modified to achieve the goals of image processing such as denoising and compression. In this paper, we refine the above model. Key to our innovation is an automatic stopping criterion, a feature we believe to be very important for medical applications. Another improvement is the multiscale fitting parameters targeting denoising in the high frequency domain, which yields a significant reduction in number of iterations needed to achieve the desired denoising as well as a small performance improvement in terms of PSNR on simulated noisy images.
Y. Wang and H. Zhou 3 We first describe the denoising part in the general setting. Let and denote where β = (β j ). Define the total variation functional by where u = u(x, β), λ j > 0. In practice we often replace Ö x u(x, β) by The small parameter is used to prevent denominators from vanishing in numerical implementations. The goal of denoising is to minimize F(u) and find the minimizer u £ := u(x, β £ ) such that The objective functional in (9) differs somewhat from the one used in [21], where all λ j 's are uniformly set to a single parameter λ. With uniform parameter λ and an orthonormal basis ψ j the objective functional F(u) is the same as the objective functional E(u) in (1). Hence the minimizer of F(u) would be the same as that of E(u) for the regular TV scheme. By taking a basis that is not an orthonormal basis, such as a biorthogonal wavelet basis as we do in our implementation, F(u) is typically not the same as E(u), even with uniform parameter λ j . With nonuniform λ j 's the objective functional F(u) can be significantly different from E(u) in the original TV scheme. Like the regular TV denoising scheme, the wavelet TV scheme proposed here retain sharp edges without creating Gibbs' phenomenon.
One can use simple calculus of variation to obtain the derivative of the objective functional (9).
Then the Euler-Lagrange equation for the model is In practice, rather than solving the Euler-Lagrange equation (13) directly to denoise an image, we introduce an artificial time parameter t and time-march the image using gradient flow. More precisely, we set β = β(t) = (β j (t)) and solve the following time evolution equation: The minimizer of the TV wavelet model is the steady state of the above equation. However, it is well known that TV minimization often leads to images with cartoonish features. More precisely, the denoising algorithm will remove noise as well as fine structures, such as textures and subtle details, from an image. The consequence is that unless the parameter λ in (1) is carefully calibrated, if one evolves (14) for an extended time, the denoised image is often over-smoothed to the point that it is almost piecewise constant. The wavelet TV denoising scheme has the same issue. This is often unacceptable for most medical applications. In the original TV minimization scheme introduced in [8] or similar schemes such as anisotropic diffusion, there was no mechanism for stopping the time evolution. In fact, since the objective functionals do not measure information pertaining to noise in the processes, a mechanism to stop the time evolution automatically is virtually impossible. But in our wavelet TV denoising scheme this can be naturally done. The reason is that high frequency wavelet coefficients are well known to encode information about noise in images. This property of high frequency wavelet coefficients has served as the basis for virtually all wavelet denoising methods, such as the widely used hard or soft thresholdings, or wavelet shrinkage. Now, by choosing ψ j to be a wavelet basis, the same principle allows us to design a natural automatic stopping criterion for the wavelet TV minimization method, making it an extremely viable scheme for medical applications.
We now describe our automatic stopping criterion with the basis ψ j : j ¾ I being a wavelet basis-in our case we usually take the biorthogonal wavelet basis generated by the well-known 7-9 biorthogonal wavelets. (We remark that the conventional notation for wavelet bases use two or more indices, such as ψ jk . In this paper we only use one index for conciseness, and there should not be any confusion). Like in the wavelet hard thresholding scheme, we first choose a threshold ρ > 0. Let J ρ = j ¾ I D : β j (0) = α j ρ , where I D I is the index set corresponding to the diagonial portion of the highest frequency wavelet coefficients. Intuitively speaking, as in the wavelet hard thresholding scheme, the coefficients β j (0) : j ¾ J ρ will indicate how noisy the image is. In a noise-free image these wavelet coefficients will mostly be very close to 0. But in a noisy image they will be more substantial. Define μ(t) = (1/ J ρ ) j¾Jρ β j (t) . So μ(t) measures the noise in the image at time t. The key idea is that an automatic stopping criterion of the time evolution can be designed by measuring the reduction in the value μ(t) from the original value μ(0).
We can use two different approaches in setting the automatic stopping criterion. The first approach is the relative criterion. In the relative criterion, we consider μ(t)/μ(0). We will stop the time evolution whenever this value goes below a threshold b. For example, we may set b = 0.1. This threshold intuitively says that we stop the time evolution when we have reduced noise by 90%. The second approach is the absolute criterion. In the absolute criterion, we stop the time evolution if μ(t) drops below a threshold c. Since in a noise-free image we expect μ(t) to be very close to zero, it is reasonable to set an absolute threshold for μ(t) to achieve a desired denoising effect.
In the actual implementation the value ρ does not seem to affect the automatic stopping time sensitively. We usually take ρ = (2/ I D ) j¾ID α j . Both the relative criterion and the absolute criterion work well, although we typically use the relative criterion. For an image with moderate noise we set the threshold b to be between 0.05 and 0.1. In the more noisy cases such as the images shown in this paper, we use smaller threshold b around 0.03. We tested the automatic stopping time criterion on a number of MRI images for one lab. The thresholds for optimal performance stayed remarkably consistent. This is an important property for batch processing of medical images.

EXAMPLES
In this section we provide some examples to illustrate the performance of our algorithm. The first example is for testing. Artificial noise is added to an otherwise rather clean brain scan shown in Figure 1(a). The standard peak signalto-noise ratio (PSNR) is employed to quantify the performance of denoising, where PSNR = 10 log 10 ⎛ ⎝ 255 2 u u 0 where 255 is the maximum intensity value of the gray-scale images, u 0 the noise-free original image, u the noise added image, and ¡ 2 the standard L 2 norm. A conventional criterion is that larger PSNR signifies better performance. In addition, we use visual inspection to compare the performance in preservation of edges and other geometric features, which is not reflected through the PSNR measurement. In all ; the threshold is selected to reach the best PSNR improvement. We note that the hard thresholding gives better PSNR performance because it is optimal in the L 2 norm sense, but the soft thresholding gives better visual quality because its Gibbs' oscillations are less severe. examples shown here, we use Daubechies 7-9 biorthogonal wavelets with symmetric extensions at the boundaries. We performed denoising on the noise-added brain scan image using the standard wavelet thresholding schemes ( Figure 2) and our wavelet TV schemes (Figure 3). The thresholds in the wavelet hard and soft thresholding were chosen after some trials to ensure the best performance (in terms of PSNR) for fairness. This actually exemplifies the problem we try to solve: the only way to get optimal result is through trial and error experiments with the threshold. For our wavelet TV scheme we use the relative approach and have set the autostopping threshold b = 0.03. We show results for two different choices of the parameters λ j . In the first one we choose uniform λ j = 5. In the second, the fitting parameters λ j for the coarsest level wavelet coefficients (including low frequencies) are all set to λ j = 400. Afterwards with each finer level we decrease λ j 's by a factor of 4. Similar idea of choosing the parameters has appeared in [27] for a different purpose. As one can see, the wavelet TV scheme in both examples outperforms the wavelet thresholding significantly. But more importantly, the wavelet TV image maintained sharpness and many fine details, while the wavelet thresholding image looks soft with details lost. The uniform fitting parameter example performed similarly to the regular TV scheme with the same parameter. The multiscale fitting parameters wavelet TV scheme has a small advantage in PSNR, and in our opinion is visually better. However, the number of iterations is significantly smaller than either the uniform λ j wavelet TV scheme or the regular TV scheme.
In the next example (Figure 4), we apply the algorithms with uniform λ j = 5 to a real image without artificial noise.
The original image appears quite noisy. We cannot judge the performance by examining the PSNR as we do not have a noise-free image with which we can compare. However, by visual inspection it is evident that the denoised image, while removing a substantial amount of noise, suffers virtually no degradation in sharpness and details.