Total Variation Regularization of Matrix-Valued Images

We generalize the total variation restoration model, introduced by Rudin, Osher, and Fatemi in 1992, to matrix-valued data, in particular, to diffusion tensor images (DTIs). Our model is a natural extension of the color total variation model proposed by Blomgren and Chan in 1998. We treat the diffusion matrix D implicitly as the product D = LL T, and work with the elements of L as variables, instead of working directly on the elements of D. This ensures positive definiteness of the tensor during the regularization flow, which is essential when regularizing DTI. We perform numerical experiments on both synthetical data and 3D human brain DTI, and measure the quantitative behavior of the proposed model.


INTRODUCTION
Image processing methods using variational calculus and partial differential equations (PDEs) have been popular for a long time in the image processing research community. Among popular PDE methods are the anisotropic diffusion method proposed by Perona and Malik [1], the total variation method introduced by Rudin et al. [2], and various methods related to these [3][4][5][6][7]. Many of these methods were originally introduced for scalar-valued (gray-scale) images, and were later generalized to vector-valued (color) images.
During the last decade or so, a new magnetic resonance modality called diffusion tensor imaging (DTI) has been extensively studied [8][9][10][11][12][13]. Using DTI, it is possible to study anatomical structures like the nerve fibers in the human brain noninvasively. The DTI images are matrix valued. In each voxel of the imaging domain, we construct a diffusion tensor (i.e., diffusion matrix) D based on a series of K direction-specific MR measurements {S k } K k=1 . The matrix D ∈ R 3×3 is a symmetric, positive definite matrix where V is an orthogonal matrix, and Λ is a diagonal matrix with positive elements. We may look at the diffusion matrix as a hyperellipse where the eigenvectors {V i } 3 i=1 span the ellipsoid, and the corresponding eigenvalues {λ i } 3 i=1 determine the length of each semiaxis (see Figure 1). It is customary to arrange the eigenvalues in decreasing order. By the diffusion tensor model we assume that the set of images {S k } K k=1 is related to the nonweighted image S 0 by the Stejskal-Tanner equation [14,15] S k = S 0 e −bg T k Dgk , k = 1, 2, . . . , K.
Here g k ∈ R 3 denotes the direction associated with S k , and b > 0 is a scalar which among other factors depends on the acquisition time and the strength of the magnetic field [16].
Since D ∈ R 3×3 is symmetric, it has six degrees of freedom. Thus at least six measurements {S k } 6 k=1 are required to estimate the tensor, as well as the nonweighted measurement S 0 . The tensor D can be estimated from (2). In the case of more than six measurements S k , we can estimate D by, for example, a least-squares minimization. From the eigenvalue decomposition of the diffusion tensor, we can reveal properties like the dominant diffusion direction and the anisotropy of diffusing water molecules [17]. This information can be used to construct maps of the anatomy of the brain.
From the developments in DTI, a need for robust regularization methods for matrix-valued images has emerged. As far as the authors are aware of, there exists no state-ofthe-art method for regularization of tensor-valued images, although many methods have been proposed [18][19][20][21].
All measurements {S k } K k=1 contain noise, which degrades the accuracy of the estimated tensor. Compared with conventional MR, direction-sensitive acquisitions have a lower 2 International Journal of Biomedical Imaging Figure 1: The diffusion matrix D can be represented by a diffusion ellipsoid, where the semiaxes are spanned by the eigenvectors of D, and the length of each semiaxis is given by the eigen- In this illustration, the diffusion is anisotropic. The principal diffusion direction is along eigenvector V 1 .
signal-to-noise ratio (SNR). Thus the gradient weighted images {S k } K k=1 contain more noise than S 0 . There are several ways to increase the accuracy of the estimated tensor. The most intuitive way is to make an average of a series of repeated measurements. Alternatively, we can increase the number of gradient directions. An obvious disadvantage of both of these approaches is the increased scanner time. Perhaps a better way to improve the quality of the tensor is by postprocessing the data. In this paper, we follow this approach, by introducing a regularization method for tensorvalued data.
Since D models diffusion, regularization methods in DTI must preserve the positive definiteness of D. A positive definite matrix has only positive eigenvalues, which is necessary from the physical modeling perspective. In a minimization method for regularization of the tensor data, one possible way to ensure positive definiteness would be to impose a constraint on the minimization problem. Then the constrained problem would have a solution which is on the manifold of positive definite matrices. Regularization of tensor-valued data constrained to manifolds has been studied during the last couple of years, see [22][23][24]. We however follow a different strategy. Using essentially the same idea as Wang et al. did in a slightly different setting, we treat D implicitly by writing D as the product D = LL T , where L is a lower triangular matrix [18]. Every symmetric positive definite (SPD) matrix has a factorization on this form. We will in this work develop a regularization method for diffusion tensor images by generalizing methods previously developed for scalar-and vectorvalued images [2,25].
Before we go into details of the proposed method, we briefly introduce the total variation (TV) methods for scalarand vector-valued images. During the last 15 years or so, TV models have undergone extensive studies, initiated by the work of Rudin, Osher, and Fatemi (ROF) [2].
Define the total variation (TV) seminorm for scalarvalued data as Throughout this paper, ∇ denotes the spatial gradient, while ∇· denotes the divergence operator. In the ROF model, the TV seminorm with an added L 2 fidelity norm is minimized Note that we can write the functional G more abstractly as where R(u) is a regularization functional and F(u, f ) is a fidelity functional. The regularization term is a geometric functional measuring smoothness of the estimated solution.
The fidelity term is a measure of fitness of the estimated solution compared to the input data. It is customary to measure the fidelity in the sense of least squares. Equation (4) has the corresponding Euler-Lagrange equation We can find a minimum of (4) by searching for a steady state of which is the way the ROF model was first formulated [2]. Alternatively, we can directly attack the zero of the Euler-Lagrange equation for example, by a fixed-point iteration [26]. This is in general less time consuming than solving the equation using the method of steepest descent, but more tedious to carry out numerically. When we generalize the method to matrixvalued images, we solve the minimization problem by the method of steepest descent. Various methods have been proposed to generalize the ROF model to vector-valued image regularization. Among the successful methods, we find the color TV model developed by Blomgren and Chan [25] and the model by Sapiro [27]. Blomgren and Chan [25] generalized the ROF model to color (vector) image regularization using a set of coupled equations with , i = 1, 2, 3, The weight α i in (9) acts as a coupling between the geometric part of the three image channels. In this work, we extend in a natural way the color TV model of Blomgren and Chan to a matrix TV model. However, the method we propose is not restricted to our choice of regularization functional (TV). For a detailed treatment of TV regularization methods we refer the reader to the recent book by Chan and Shen [5].
In Section 2, we define the minimization problem that we propose to solve, and arrive at the Euler-Lagrange equations Oddvar Christiansen et al. 3 corresponding to this minimization problem. We perform numerical experiments in Section 3, before we finish the paper in Section 4 with a conclusion. Details on the Euler-Lagrange equation and the numerical implementation are given in the appendix at the end of the paper.

MINIMIZATION PROBLEM
In this section, we introduce the functional that we minimize in order to regularize tensor-valued data. Let L be a lower triangular matrix. We define D by This has immediate implications on D: symmetry, positive definiteness, and orthogonality of eigenvectors. These properties are required by the diffusion tensor model. Thus (11) is a natural choice. We define i j as the element in the ith row and jth column of L. The elements d i j are defined in the same manner.
Let us look at the algebraic equation expressing D as a function of i j . We derive the expressions for D ∈ R 3×3 ⊂ SPD. We explicitly write out the matrix multiplication (11), In our proposed model, we solve a minimization problem in terms of i j . For each unique kl , we minimize where {kl} ∈ {11, 21, 22, 31, 32, 33} and d i j denotes the elements of the tensor estimated from the noisy data. As in the scalar model, the functional (13) has the abstract form (5). The scalar ROF (TV − L 2 ) functional is convex. But when we introduce the factorization (11) into the model, we cannot expect the functional (13) to be convex or even quasiconvex. However, from numerical experiments where we used different (random) intial conditions we ended up with almost exactly the same solution. This means that even though we are not able to prove that the functional is convex, we have indications that it is at least quasiconvex. We note that minimizing the functional (13) is related to the functional used by Wang et al. [18]. Apart from the fact that they simultaneously estimate and regularize the tensor, there are fundamental differences between our proposed regularization functional and the functional proposed by Wang et al. Even though we represent the diffusion matrix on the form of a Cholesky factorization, we regularize the elements of the full diffusion tensor D, while Wang et al. regularize the elements of the lower triangular matrix L. Intuitively, regularizing the elements of D is more direct than regularizing the elements of L. We highlight the difference between Wang's method and our proposed method by a numerical simulation in a simplified setting in Section 3. In addition, the method proposed in this paper has a coupling between all elements of D in the regularization PDE, while the method proposed by Wang et al. does not have such a coupling between the channels.
We also note that the functional (13) is chosen mainly because of the good properties of the corresponding scalarand vector-valued functionals [2,25], with edge preservation as the most prominent property. Depending on the application at hand, other functionals might be considered as alternatives. The framework developed in this paper is however applicable for other regularization functionals as well.

Euler-Lagrange equations
In this section, we derive the Euler-Lagrange equations corresponding to the minimization functional (13). We first differentiate the fidelity functional We differentiate D with respect to kl , for example The other derivatives follow the same pattern. Writing out (14), we find the derivative of the fidelity functional, where {d i j } 3 i=1, j=1 denote the elements of the matrix D. We differentiate the regularization functional in (13). Define the total variation norm of a matrix D ∈ R 3 × R 3 as This is a straightforward generalization of the total variation norm of a vector [25].
Using the chain rule, we find the derivatives of the regularization functional to be with Note here that this derivative is essentially similar to the derivative in the color TV model of Blomgren and Chan [25], but with the important difference that we represent the diffusion matrix by its Cholesky factors.
In the next section, we perform numerical simulations using the method proposed in this paper. We give more details on the Euler-Lagrange equations in the appendix, which also contains some details on the numerical implementation of the model.

NUMERICAL EXPERIMENTS
In this section we perform numerical experiments on synthetically constructed tensor fields and real tensor fields from a human brain. The numerical implementation of the method is briefly discussed in the appendix.
For the synthetical fields we have constructed clean tensor fields, which are degraded with noise with a prior known distribution. Thus, we are able to measure how well the method performs on synthetical data. For the real human brain DTI, the "true" solution is of course not known in advance. In this case, we measure the performance of the method in terms of a reference solution where a large series of acquisitions are averaged. This is explained in detail in Section 3.3. For the numerical implementation of the method and some of the visualizations, we have used Matlab [28].

Synthetical tensor fields
In the first numerical experiment, displayed in Figure 2, we test the performance of the proposed method on a simple tensor field. This field is mapping a square domain Ω ⊂ R 2 , with four distinct regions, to R 2×2 . We construct the clean tensor-valued data by prescribing the eigenvalues and corresponding eigenvectors. The values of each element of L is in the range [0, 1]. Then we add normally distributed noise η(σ) to each element of the Cholesky factorization of the matrix, that is, L = L + η(σ). Finally, the degraded diffusion tensor is constructed by D = L L T . The noise levels in the simulations in Figures 2 and 4 are given by σ = 0.35 and σ = 0.25, respectively. The time step is Δt = 0.001. Note that the discontinuities in the data are preserved in the solution, that is, the edge preserving property of scalar and vector-valued TV flow is kept in the proposed matrix-valued flow. In the first example, the diffusion is anisotropic in the whole domain. To show how the proposed method differentiates between isotropic and anisotropic regions we show a similar example where one of the four regions is interchanged with an isotropic region. The isotropic region is constructed by considering the orthogonal matrix Q from the QR factorization of a random 2 × 2 matrix. The columns of the matrix Q are considered to be the eigenvectors of the diffusion tensor. We specify two random numbers in the range [0, 1] as the eigenvalues of the tensor. Thus the diffusion is random in the isotropic region. As we can observe from these two numerical examples on synthetical data, edges are preserved in the regularized images. We observe that even when the noise level is high, we are able to reconstruct an image which is close to the true noise-free image.
From these numerical experiments on synthetical data we see that the proposed method gives encouraging results. Similarly as in the scalar-and vector-valued settings, edges are well preserved. We further investigate the edge preservation in the next experiment.

Qualitative experiments
To highlight the qualitative differences between regularizing the elements of the tensor D and the elements of the Cholesky factors L, we have constructed a simple numerical example in 1D. We have removed the fidelity measure from the model, thus the method is in this setting merely a diffusion filter. Thus we have simplified the model in such a way that we can study the qualitative behavior of the two regularization filters in the same setting (see Figure 3). From this example, we clearly see that when we regularize D the edges are better preserved than when we regularize L. Note that Wang et al. regularize the Cholesky factors [18].
We also present a numerical example in 2D where we solve the PDEs first as an uncoupled system, that is, by employing the weighting factors α i j = 1, and then as a coupled system where we use the weighting factors from (10). We denote the clean field by D, the noisy field by D, the field regularized with the uncoupled system by D u and the field regularized with the coupled system by D c . In Figures 5(a) Figure 5, we observe that the uncoupled system does not discriminate the noise from the weak signal. The coupled system on the other hand better discriminates the noise from the weak signal. A similar 1D example is shown by Blomgren and Chan using the color TV model [25].
In the next section, we go one step further and process real human brain DTMRI.

Human brain DTMRI
We also perform numerical experiments on DTMRI acquisitions of a healthy human brain from a volunteer. The human subject data is acquired using a 3.0 T scanner (Magnetom Trio, Siemens Medical Solutions, Erlangen, Germany) with an 8-element head coil array and a gradient subsystem with the maximum gradient strength of 40 mT/m and maximum slew rate of 200 mT/m/ms. The DTI data is based on spin-echo single-shot EPI acquired utilizing generalized autocalibrating partially parallel acquisitions (GRAPPA) 6 International Journal of Biomedical Imaging technique with acceleration factor of 2 and 64 reference lines. The DTI acquisition consists of one baseline EPI and six diffusion weighted images (b-factor of 1000 s/mm 2 ) along the following gradient directions: Each acquisition has the following parameters: TE/TR/averages is 91 ms/10000 ms/2, FOV is 256 mm × 256 mm, slice thickness/gap is 2 mm/0 mm, acquisition matrix is 192 × 192 pixels, and partial Fourier encoding is 75%.
For validation of the proposed regularization method on real data, we construct a reference solution D * by averaging 18 replications. Each replication consists of six-direction weighted and one nonweighted acquisitions. This reference solution is compared to solutions where averages of two, four, and six replications are postprocessed with the proposed regularization method. As a measure of the distance between the reference solution and the processed solution, we use the following metric: Oddvar Christiansen et al.    For each simulation, we report the regularization parameter λ and the metric m(·, ·) in Table 1 and in Figure 6. We display the result before and after applying the proposed method on real DTMRI data in Figures 7 and 8. In the figures, we display a 2D slice of an RGB direction encoded fractionalanisotropy (FA) measure defined by where λ = (λ 1 + λ 2 + λ 3 )/3. The FA measure is directionencoded as described by Pajevic and Pierpaoli [29]. We use the DTMRI software DTIStudio to construct the visualizations [30]. In the figures, we show the color-coded FA.
The noise level is different for each simulation due to the varying number of acquisitions. Consequently, the regularization parameter λ is different for each simulation. However, for clinical applications, the regularization parameter is estimated once for each imaging protocol. When this is done, the same regularization parameter can be used for subsequent applications of the same imaging protocol.

Human brain ROI study
Since our algorithm regularizes the tensor field, we focus on the evaluation of the tensor field, and the derived scalar FA map. However, we note that from the processed tensor field, we may reconstruct the corresponding diffusion weighted images {S i } 6 i=1 by (2). There are obvious visual improvements in the processed diffusion weighted images compared to the noisy diffusion weighted images. Edges are preserved and noise is suppressed. Quantitatively, the mean and standard deviation at certain homogeneous regions of interests (ROIs) show significant improvements. We will now assess the visual and quantitative improvements in terms of the denoised tensors.
For qualitative evaluation, we select three regions of interest (ROIs) from one slice of the acquired images, with a 15-by-15 voxel size. We plot the 2D projection of the eigenvector corresponding to the major eigenvalue in Figure 9. From Figure 9, we can clearly see that our method preserves discontinuities (edges) in the tensor field, while it smooths the tensor field in homogeneous regions. The denoised tensor field from the 4-average acquisition is close to the tensor field obtained from the 18-average acquisition.
For quantitative measures, we use the average deviation angle (ADA) index of Chen and Hsu to measure if the tensor field undergoes gradual changes or sharp turns [21]. The PDE filtering is performed after the tensors are computed, so we use the angle deviation in adjacent voxels as a measure of its performance instead of normalized magnitude of diffusion tensor error (NMTE) index [21]. Denote the eigenvector corresponding to the largest eigenvalue by V * . Define the ADA by where, for example, . We note that we use the absolute value of the inner product (·, ·) to accommodate antisense directional vectors. A small change in direction from one voxel to its neighbor gives a small ADA, while a large change in direction gives a large ADA.
After masking the background, we compute the average ADA within the brain, and call it the global ADA. From Table 2, we see that the global ADA of the data is reduced from 12.31 to 6.27 by the denoising algorithm, whereas the 18-average clean data has an ADA of 6.65. With a higher data fidelity requirement (when λ is larger, e.g., 20), the smoothing is not very aggressive and the ADA is not as close as when λ = 13. When λ is less than 13 (data not shown here), the smoothing is excessive and the ADA values fall well below the   ADA of the 18-average data. From this information, we conclude that for the current acquisition data, λ = 13 is the best choice. The ADA at selected ROIs is larger than the global ADA because in those regions, there are obvious edges that contributed to the relatively large ADA values. Compared with the noisy 4-average data, the denoised data show significant improvements. Using the regularization parameter λ = 13, the ADA is close to the ADA of the 18-average data. The ADAs of all the ROIs are however reduced compared to the noisy data.  Figure 9: ROI study. Top image shows the ROIs that we use. The second row from left to right: the noisy (4-average) data, denoised data with λ = 13, and clean (18-average) data of ROI 1. The third row from left to right: the noisy (4-average) data, denoised data with λ = 13, and clean (18-average) data of ROI 2. The fourth row from left to right: the noisy (4-average) data, denoised data with λ = 13, and clean (18-average) data of ROI 3.

CONCLUSION
In this work, we have generalized the color TV regularization method of Blomgren and Chan [25] to yield a structure preserving regularization method for matrix-valued images. We have shown that the proposed method performs well as a regularization method with the important property of preserving both edges in the data and positive definiteness of the diffusion tensor. Numerical experiments on synthetically produced data and real data from DTI of a human brain of a healthy volunteer indicate good performance of the proposed method.
his very useful comments on the paper. The work of Johan Lie has been funded by the Norwegian Research Council, Project BeMatA. The work of Tony F. Chan and Tin-Man Lee has been funded by NSF, ONR, and NIH.

A. EULER-LAGRANGE EQUATION AND NUMERICAL IMPLEMENTATION
In this appendix, we explicitly write out the Euler-Lagrange equations corresponding to the minimization functional (13). In addition, the numerical scheme used in the simulations in Section 3 is briefly discussed.
Using the short-hand notation where n is the iteration index, and Δt is the time-step parameter. We approximate the gradient ∂G/∂ i j by standard finite difference schemes, see, for example, [4]. We here note that each iteration of the form (A.4) is performed sequentially. Thus, the equations are solved as a coupled system of six PDEs.