Quantitative Evaluation of Temporal Regularizers in Compressed Sensing Dynamic Contrast Enhanced MRI of the Breast

Purpose Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) is used in cancer imaging to probe tumor vascular properties. Compressed sensing (CS) theory makes it possible to recover MR images from randomly undersampled k-space data using nonlinear recovery schemes. The purpose of this paper is to quantitatively evaluate common temporal sparsity-promoting regularizers for CS DCE-MRI of the breast. Methods We considered five ubiquitous temporal regularizers on 4.5x retrospectively undersampled Cartesian in vivo breast DCE-MRI data: Fourier transform (FT), Haar wavelet transform (WT), total variation (TV), second-order total generalized variation (TGVα2), and nuclear norm (NN). We measured the signal-to-error ratio (SER) of the reconstructed images, the error in tumor mean, and concordance correlation coefficients (CCCs) of the derived pharmacokinetic parameters Ktrans (volume transfer constant) and ve (extravascular-extracellular volume fraction) across a population of random sampling schemes. Results NN produced the lowest image error (SER: 29.1), while TV/TGVα2 produced the most accurate Ktrans (CCC: 0.974/0.974) and ve (CCC: 0.916/0.917). WT produced the highest image error (SER: 21.8), while FT produced the least accurate Ktrans (CCC: 0.842) and ve (CCC: 0.799). Conclusion TV/TGVα2 should be used as temporal constraints for CS DCE-MRI of the breast.


Introduction
Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) involves the continuous acquisition of 1weighted MR images during and after the injection of a paramagnetic contrast agent (CA). The CA increases the contrast between different tissues by altering their inherent relaxation rates. Across serial images each image voxel yields an intensity time course that can be used to estimate physiological parameters, such as the volume transfer constant, trans , and extravascular-extracellular volume fraction, V e [1,2].
Both high temporal and spatial resolutions are beneficial in DCE-MRI; high temporal resolution is needed for quantitative DCE-MRI analysis, while high spatial resolution aids clinical reading. However, the requirement for repeated, high signal-to-noise images limits simultaneous enhancement of temporal and spatial resolutions by conventional data acquisition methods.
For the acceleration of dynamic MR images, a common strategy used to balance the trade-off between spatial and temporal resolution is to subsample the data (also known as " -space") at each frame. Many successful algorithms have 2 International Journal of Biomedical Imaging used this idea, such as keyhole [3], -FOCUSS (FOcal Underdetermined System Solver) [4], -BLAST (Broad-use Linear Acquisition Speed-up Technique), and -SENSE (SENSitivity Encoding) [5]. However these methods suffer the limitations such as low signal-to-noise ratio (SNR) and aliasing artifacts for high sampling factors.
Compressed sensing (CS) [6,7] is a newer strategy to accelerate data acquisition in dynamic MRI. Using CS, it is possible to accurately reconstruct an MR image from less Fourier data than with traditional acceleration methods, resulting in a further reduced data collection burden [8,9]. According to CS theory, the reconstruction of dynamic MR images can be modeled as a constrained reconstruction in which the resulting image is chosen to have the sparsest representation in some prior sparse transform while still being consistent with the collected Fourier data. For dynamic images, since most of the image is roughly constant in time, the most significant redundancy (and hence sparsity) often manifests in the temporal direction.
CS has great potential in cancer MRI [10] because many protocols in cancer MRI are dynamic. Many applications of the CS to dynamic MRI have been successfully demonstrated, such as -SPARSE [11], -SLR (Sparse and Low Rank) [12], and iGRASP (Golden-angle RAdial Sparse Parallel MRI) [13]. For example, Han et al. [14] demonstrated the enhancement of spatiotemporal resolution of DCE-MRI in an animal model with a CS-accelerated Cartesian fast low angle shot (FLASH) sequence. Chen et al. [15] compared different forms of temporal total variation terms in the reconstruction of undersampled DCE-MRI data acquired in breast cancer patients. Ji and Lang [16] applied a difference operator to the temporal data frames to enhance the spatial signal sparsity for CS reconstruction. Smith et al. [17,18] showed the expected variance in quantitative parameters for spatial TV regularization across a population of randomly generated sampling schemes. Although CS has been applied to breast DCE-MRI, no one has quantitatively compared different temporal sparsity models for breast DCE-MRI across a large number of sampling patterns. Thus, the expected effects of different temporal regularizers on the error in quantitative DCE-MRI parameters are not known.
Before CS can be used clinically in such a critical area of care as cancer imaging, its effect on the reliability and accuracy must be understood. The aim of this paper partially addresses this by quantitatively evaluating five common temporal sparse regularizers for breast DCE-MRI: (1) ℓ 1 -norm of the Fourier transform (FT) We hypothesize that one of these regularizers will produce more accurate reconstructed images than the others and that one regularizer (not necessarily the same one) will produce the most accurate quantitative parameters.

Data Collection.
We applied all models retrospectively to in vivo breast DCE-MRI data [19] collected under a protocol approved by the institutional review board. The data were acquired using a spoiled gradient-recalled echo (SPGRE) sequence on a Philips (Best, Netherlands) Achieva 3T scanner with TR = 4.33 ms, TE = 2.12 ms, and flip angle = 12 ∘ . The dimension of the data was 192 × 192 × 10 × 105, which consisted of 192 readout points by 192 phase encodes across 10 slices repeated over 105 dynamics. The spatial resolution was 1.33 mm by 1.33 mm by 5 mm, and the field of view was 256 mm by 256 mm by 50 mm. Further details about the protocol can be found in [19].
The data that were acquired were fully sampled with a Cartesian geometry and then retrospectively undersampled according to a range of random sampling patterns. While this was less realistic than prospective undersampling, it was necessary because acquiring hundreds of different sampling patterns prospectively would be impractical.
To narrow the focus of the paper, we looked only at the slice that passed through the center of the tumor. For this particular dataset, the center slice was the sixth slice. Also, for all reconstructions, we cropped the image posterior of the chest wall to improve sparsity. The beating heart caused aliasing artifacts in the first-phase encode direction (superior-inferior). The final cropped dimensions of our test data were 192 × 128 in-plane by 105 dynamics. This cropping procedure can be guided by the undersampled data, as even on the aliased images the chest wall can be demarcated.
We generated 200 distinct Cartesian sampling masks by choosing 200 different random seeds before pattern generation. The dimensions of the masks were 192 × 128 × 105, matching the dimensions of the data. For each dynamic, the low frequency region was fully sampled with a central window width of 20 -space lines. Outside the central window, we randomly chose phase encodes such that each phase encode was sampled roughly the same number of times, but at random time points. The total undersampling factor of the mask on average was 4.5 across all 200 masks. Here we chose 4.5 because of the limitations of the Cartesian undersampling scheme. Higher undersampling factors can be achieved using non-Cartesian schemes, such as radial and spiral, which are not in the scope of this paper. The actual acceleration delivered for a given seed varied because the exact number of phase encodes chosen in a mask varied slightly due to the pseudorandom nature of the pattern generation. Figure 1 shows an example sampling mask.

Image Reconstruction.
We denote the reconstructed spatiotemporal image slice as the matrix ∈ C × , where the spatial dimensions are × with temporal dynamics. The dynamic measurements correspond to the samples in -space corrupted with noise: = + , where = F is the measurement operator, F is a 2D spatial Fourier transform on each temporal frame, is the sampling mask on each temporal frame, and is additive complex white Gaussian noise.
International Journal of Biomedical Imaging The reconstruction problem solved waŝ where is the collected (undersampled) data, is a positive, real parameter that balances between data consistency and sparsity, is Frobenius norm, and is one of the five temporal sparsity-promoting regularizers. Minimizing ( ) promotes the sparsity of the outcome and minimizing ‖ − ‖ 2 enforces data consistency.
In this paper, all the sparse models are solved by the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [20] for its simplicity and efficiency. FISTA is an operator splitting algorithm that aims to minimize the following problem: where is a smooth convex function with Lipschitz constant and is a convex function which may be nonsmooth. The outline of FISTA is shown in Algorithm 1.
Here prox ( )( ) is the proximal map of a function ( ) at point , which is defined as And the projection operator is defined as follows: The key step of FISTA is to find an efficient algorithm to solve the proximal map subproblem. As in most compressed sensing models, we use 1 norm as sparsity term, and the proximal maps can be efficiently solved by soft thresholding.
And the solution of (5) is given by where the soft shrinkage operator T : C → C is defined as where = abs , = arg , and "max" is the maximum operator that chooses the largest of two elements. International Journal of Biomedical Imaging features of the signal in time), so the 1D Fourier transform applied along the temporal dimension can be used to sparsify the signal. The more periodic the signal, the sparser the representation in the Fourier domain. A nonperiodic, noisy signal can still be compressed in the Fourier domain due to the denoising effect of downsampling, although this is a minor effect. For example, in cardiac cine imaging, the image can be efficiently sparsified using a temporal Fourier transform due to the periodic motion of the heart. In -SPARSE [11], a wavelet on the spatial domain and a Fourier transform along the temporal direction was used to make the image sparse. Denoting F as 1D temporal Fourier transform, we get the model  [8] with the Fourier transform, the domain of data collection for MRI. Based on the results of CS theory, this should lead to a better reconstruction for a given undersampling factor. For example, -SPARSE [11] has successfully employed spatial wavelets in cardiac imaging.
Here we define as the 1D temporal Haar wavelet transform, yielding the model

Regularizer 3: Total Variation.
In the first application of compressed sensing in MRI, Lustig et al. [8] used total variation (TV) as the sparsifying transform in the model. The TV operator considers an equally weighted combination of finite differences along space and time, essentially representing the image in terms of its gradient. TV is maximally incoherent with the Fourier domain and as such should yield on average a more accurate reconstruction for a given undersampling factor than any other sparse basis. However, real MR images are not piecewise constant, and constraining the gradient of these images produces "staircase" artifacts. Staircase artifacts manifest when higher-order variations in the signal intensity are reduced to piecewise constant regions. This occurs because the smallest gradient coefficients are being reduced to zero in the reconstruction in order to sparsify the image under the gradient transform.
TV or finite difference based strategies, which were originally designed for image denoising [21], have recently gained wide interest in many MRI applications beyond denoising. In -SLR, a three-dimensional total variation transform was used which is defined as follows: where ∇ , ∇ , and ∇ are finite difference operators along , , and . Several works have successfully used temporal gradient only in CS models, such as DLTG [22] and iGRASP [13]. In this paper, we focus on the temporal gradient and evaluate the performance of the following model: 2.2.4. Regularizer 4: Total Generalized Variation. As mentioned above, finite difference operators are well suited to sparsify MR images, but the first-order finite difference can introduce staircase artifacts. Total generalized variation (TGV) [23] was introduced in part to address this issue. Under the TGV, linear intensity changes in the image are preserved because the TGV operates on second-order and higher derivatives.
The discrete second-order TGV is defined as follows: Here the minimum is taken over all discrete complex vector fields V on ∈ C 2 , ∇V = (∇ V 1 , ∇ V 2 ) denotes the firstorder finite difference, and E(V) = (∇V + ∇V )/2 denotes the symmetrized derivative. Here V 1 ∈ C and V 2 ∈ C are the components of V along the first and second direction, respectively. Such a definition provides a way of balancing the first and second derivative of the function.
As can be seen from the definition, TGV 2 involves higherorder derivatives to measure image features. TGV 2 generalizes TV and is more suitable to model intensity variations in smooth regions of the image. Reconstruction with TGV 2 is capable of preserving sharp edges without causing staircase artifacts.
Like the case in total variation, we denote TGV 2 in temporal dimension as TGV 2 and the model becomeŝ where ∇V = ∇ V with V ∈ ∈ C × .

Regularizer 5: Nuclear Norm.
Low-rank matrix completion has been applied to dynamic MRI by considering each temporal frame as a column of a spatiotemporal matrix, where the spatiotemporal correlations produce a low-rank matrix. The combination of compressed sensing and lowrank matrix completion [24] has produced further increases in imaging speed. In dynamic MRI, previous work on this combination proposed a solution that is both low rank and sparse. In -SLR [12], both nuclear norm and total variation transform were used to demonstrate significant improvement in performance of phantoms and in vivo cardiac perfusion MRI data. Nuclear norm (NN) is denoted by ‖ ‖ * , which is the sum of singular values of . Here we use only temporal constraints, so the formulation is as follows: International Journal of Biomedical Imaging 5  Table 1 shows the parameters used for all sparse models. Here was the weight for ( ), "iter" was the number of iterations for the main function, , , and were the first-order step size, second-order step size, and weight of fidelity term, respectively, in TV and TGV 2 based models. We started with very low regularization ( = 10 −5 ) and then increased the parameters step by step until artifacts were visually eliminated from the resulting image.
In each iteration, we tracked the relative norm of the image difference between two iterations. Once it went lower than a chosen threshold (10 −3 ), we terminated the reconstruction.

Pharmacokinetic Modeling.
One of the most commonly used pharmacokinetic models is the standard Tofts-Kety model [25]. This model provides information about the influx forward volume transfer constant from plasma into the extravascular-extracellular space (EES) and fractional volume of EES per unit of tissue. The standard Tofts-Kety model, with both spatial and temporal dependencies made explicit, is where is the concentration of CA in the tissue and is the concentration of CA delivered by the blood plasma.
In our experiments, this model was fit using nonlinear least squares [18] for every voxel in the reconstructed data that enhanced by a factor of two or more relative to the precontrast baselines, calculated by dividing the mean signal in the first three dynamics to the mean signal in the last three dynamics.

Assessments.
We used two methods to quantitatively evaluate the temporal transforms. For consistency with previous paper [12], the first assessment we used was the image-based signal-to-error ratio (SER), defined as where FS is the image reconstructed from the original, fully sampled data. The second assessment was the concordance correlation coefficients (CCCs) of the parametric maps trans and V , which is widely used in the quantitative analysis of DCE-MRI. In statistics, the CCC measures the agreement between two variables and is given by where FS and CS are the means of trans or V for fully sampled and compressed sensing (CS) reconstructed images, respectively, and FS and CS are the corresponding variances.
To visually evaluate the accuracy in determining time profiles, we computed the difference in signal intensity curves with respect to the fully sampled data in specific regions within the tumor. For consistently plotting these curves, we first manually generated a mask for the tumor and applied the mask to all reconstructions. Additional visual assessment of parameter agreement was conducted using Bland-Altman plots of the average of the undersampled parameters and the fully sampled parameters relative to the fully sampled parameter values.

Implementation.
The CS reconstruction was written in MATLAB, and the DCE analysis software used was DCEMRI.jl [26] and was written in Julia. All experiments were run on a dual Xeon E5-2665 2.40 GHz workstation with 20 GB of RAM with MATLAB 2015b (Mathworks, Natick, MA) and Julia 0.4.3. The full code to generate the results and figures here has been provided as open source [27].

Image Quality.
We first evaluated the performance of the five constraints on image error. The first column of Table 2 represents the signal-to-error ratio (SER) comparisons over 200 runs. We found that NN produced the highest SER (29.1, higher is better) among the five constraints tested while WT produced the lowest (21.8). FT (26.4) came in between WT and TV. TV and TGV 2 produced similar SERs with TGV 2 (27.8) being slightly higher than TV (27.7). Figure 2(a) shows the 105th dynamic of the reconstruction for each of the five constraints using the first randomly generated mask. The red arrows indicate background artifacts that remained in the TGV 2 and TV cases, where NN greatly reduced those artifacts. The background artifacts were also suppressed in the WT and FT reconstructions, but WT and FT performed the worst in reconstructing the tumor. out of the 200 tested patterns. We can see from the red arrows that TV and TGV 2 performed visually the best in the reconstruction of tumor area, which is where the majority of voxels used in the trans and V calculations were located. FT performed visually the worst in the tumor area. Figure 3 shows the boxplot of SER using 200 different sampling patterns for all five regularizers. We can observe from the boxplot that, for each regularizer, the variance in SER was small relative to the mean and all of the results are statistically significantly different except for TV and TGV 2 . Also it can be seen that NN produced the highest mean SER (29.1), and WT produced the lowest mean SER (21.8). Table 2 show the CCC comparisons of trans and V , respectively. The highest CCCs for both trans and V were seen using TV (0.974 for trans and 0.916 for V ) and TGV 2 (0.974 for trans and 0.917 for V ). Although NN produced the best SER (29.1), it did not give the highest CCCs for trans (0.842) and V (0.799). The same phenomenon can be found between WT and FT, where WT (21.8) produced a lower SER than FT (26.4), but WT produced a higher CCC (0.878 for trans and 0.733 for V ). A zoomed image of trans and V maps of the first run can be seen in Figure 4. Both TV and TGV 2 produced accurate trans and V with respect to the fully sampled data, while FT produced the worst trans and V . Although NN produced a more accurate V map than WT, the trans map was blurred and less accurate.

Parameter Accuracy. The second and third columns of
Bland-Altman plots of trans and V using the five regularizers in the first run can be seen in Figure 5. TV and TGV 2 produced nearly unbiased trans and V while WT, FT, We can observe that NN produced the highest SER, while WT yielded the lowest SER, which is an illustration of Table 2. and NN underestimated both parameters. Also although WT produced relatively accurate CCCs, it greatly underestimated the tumor means. Figure 6 shows the difference between the mean intensity curves in the undersampled and the fully sampled cases for all voxels (panel (a)) in the tumor ROI and for a single example voxel (panel (b)). In Figure 6, the zero-filled reconstruction was accurate for the first five dynamics. But after CA injection, the intensity became underestimated, decreasing the SER, trans , and V . In Figure 6(a), WT underestimated the mean intensity curve across all the dynamics. In Figure 6(a), FT failed to fit the mean curve in the first five dynamics, and, in the last five dynamics, and the oscillation behavior is more obvious with FT in Figure 6(b). TGV 2 and TV best fit both the mean intensity curves and single voxel intensity curves. NN overestimated the intensity in the first few dynamics, affecting the accuracy of trans and V .
Since predictable accuracy is important in breast CS DCE-MRI, standard boxplots of CCCs and the tumor mean trans and V are presented in Figure 7. Through all the boxplots, the interquartile ranges are small, which suggests that CS undersampling can have a predictable accuracy for Cartesian DCE-MRI of the breast. In the first three boxes (SER and CCCs), the results are all statistically significantly different except between TV and TGV 2 . Similar to our findings in Figure 6, TV and TGV 2 led to the most accurate CCCs with high mean and low variance. The tumor means of both regularizers are closest to the ground truth. Again FT was the least accurate in reproducing both the tumor mean and the voxel-wise CCCs, and the tumor mean trans and V were far from the ground truth. Although WT produced relatively accurate tumor mean trans and V , the variances were the highest among the five, which suggests less predictable accuracy.

Discussions.
The quantitative comparisons of the five temporal constraints showed that NN was most capable of suppressing background artifacts and thus produced the highest SER. We believe this is due to its better artifact suppression ability and better edge preservation. The reason is that the minimization of the nuclear norm will suppress features that are not the same in all dynamics, such as interference and noise, while keeping static features, such as the breast and the tumor. Thus, at least in compressed sensing DCE-MRI of the breast, if one needs relatively higher image quality (SER in our case), nuclear norm would be a reasonable choice.
On the other hand, TV and TGV 2 provided the highest CCCs for trans and V among the five regularizers tested and also appeared to most closely match the true voxel intensity curves in the tumor area, suggesting that error in fitting the voxel intensity curves may predict the ultimate quantitative parameter accuracy. This is because TV and TGV 2 , as opposed to the nuclear norm, better preserve high-SNR International Journal of Biomedical Imaging  information that varies across dynamics, which in this case is the contrast enhancement in the tumor area. This produces a better reconstruction in the tumor and hence more accurate CCCs. If so, then the fit residuals may prospectively inform the parameter accuracy in cases where ground truths are not available. Thus, at least in compressed sensing DCE-MRI of the breast, if one needs to get more accurate quantitative parameters, TV and TGV 2 would be promising choices.
Since the nuclear norm captures the background information and TV and TGV 2 capture the dynamic information, the combination of nuclear norm and TV/TGV 2 could be the best reconstruction model for breast DCE-MRI. This is also our work in the future. The boxplots in Figure 7 show that there is no statistically significant difference between TV and TGV 2 , so it is hard to distinguish between the utility of the two for this application. This may change for other anatomical sites; however, especially those where more regions of a roughly linear intensity gradient are present, rather than areas of mostly piecewise constant intensity, such as the breast.
The results of the quantitative comparisons presented here should inform clinical and research imaging reconstruction methods. For techniques that value image fidelity above accurate quantitative parameters, the best temporal regularizer may be the nuclear norm. For techniques that value quantitative parameter accuracy above image quality, TV or TGV should be preferred. The same data may be reconstructed multiple ways, of course.
Prior to this work, no measurements of the quantitative accuracy obtained from common temporal regularizers across a range of Cartesian sampling patterns had been made. This work addresses that gap, but with three major caveats. First, only one breast DCE-MRI data set was used: it is possible that the results will vary across subjects, although they are not expected to significantly. Second, though we obtained a fairly normal distribution of errors, the entire space of sampling patterns is astronomically large compared to the 200 tested ones here. It is improbable, but still possible, that a nonrepresentative set of sampling patterns was selected. Third, tuning the FISTA parameters to create the best reconstruction for each constraint is an inexact process. It is possible that slightly different results could be obtained with different FISTA parameters, but we have no reason to think that the general patterns would change.

Conclusion and Future Research
In this paper, we compare the quantitative performance of five temporal regularizers for CS DCE-MRI of the breast. We find that the Fourier transform is the least suitable regularizer because of the nonperiodic behavior of breast DCE-MRI data. The Haar wavelet transform was average in performance but was the least consistent in accuracy across the range of sampling patterns. The nuclear norm best suppressed background artifacts caused by undersampling, thus maximizing SER, but was less accurate in the recovery of pharmacokinetic parameters. Total variation and total generalized variation retrieved the most accurate pharmacokinetic parameters, with TGV 2 slightly edging out TV in both image quality and parameter accuracy.
Since the goal of CS DCE-MRI is to accurately measure tumor properties, we recommend using TV or TGV 2 as the temporal constraint in CS reconstructions of breast DCE-MRI.
Future work includes testing on the full 3D breast DCE-MRI datasets instead of only a single slice. Since performing the computations required for this work on the whole 3D data in MATLAB would be prohibitively slow, we intend to use state-of-the-art GPU acceleration techniques to reduce the computation time. We will also examine prospective non-Cartesian sampling schemes such as radial and spiral to explore the effect of higher acceleration on the quality of reconstructions achieved with the temporal regularizers examined here.

Disclosure
Thomas E. Yankeelov is the W.A. "Tex" Moncrief Professor of Computational Oncology and is a CPRIT Scholar in Cancer Research.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.