Diffusion-Weighted Images Superresolution Using High-Order SVD

The spatial resolution of diffusion-weighted imaging (DWI) is limited by several physical and clinical considerations, such as practical scanning times. Interpolation methods, which are widely used to enhance resolution, often result in blurred edges. Advanced superresolution scanning acquires images with specific protocols and long acquisition times. In this paper, we propose a novel single image superresolution (SR) method which introduces high-order SVD (HOSVD) to regularize the patch-based SR framework on DWI datasets. The proposed method was implemented on an adaptive basis which ensured a more accurate reconstruction of high-resolution DWI datasets. Meanwhile, the intrinsic dimensional decreasing property of HOSVD is also beneficial for reducing the computational burden. Experimental results from both synthetic and real DWI datasets demonstrate that the proposed method enhances the details in reconstructed high-resolution DWI datasets and outperforms conventional techniques such as interpolation methods and nonlocal upsampling.


Introduction
Diffusion-weighted imaging (DWI) is a noninvasive magnetic resonance modality which can be used to infer features of local tissue anatomy, composition, and microstructure from water displacement measurements [1]. Water does not diffuse equally throughout the brain and this property has been applied widely for in vivo analysis of white matter architecture and neuronal diseases [2,3]. Despite the rapid development of this technology and the broad application of anisotropic diffusion properties, DWI exhibits an inherently low signal-to-noise ratio (SNR) compared with other imaging modalities. In addition, because DWI implements EPI scanning in multiple directions, the spatial resolution is relatively poor under limited physical and clinical considerations such as durable scanning time of patients.
It has been shown that the limited resolution of DWI introduces a partial volume effect (PVE) which results in bias during DWI imaging analysis [4]. The improvement of DWI spatial resolution with high SNR provides a better sensitivity for the analysis of brain structure and clinical disease [5,6]. Moreover, high-resolution DWI could improve the estimation accuracy of diffusion tensor imaging, thus proving to be beneficial to fiber tractography and finer bundle analysis [7].
Several methods have been proposed in the literature to enhance the spatial resolution of DWI. During the acquisition stage, long acquisition times remain a primary obstacle preventing the method from being of real interest clinically. For example, Miller et al. [8] implemented five days of scanning to obtain postmortem high-resolution DWI with high SNR. In order to avoid the long scanning times, superresolution (SR) acquisition emerged as an effective technology, initially proposed for MRI, but soon adapted to DWI. Subpixel shifting in the in-plane dimension was proposed to obtain multiple lowresolution images for reconstruction into higher resolution images [9]. Anisotropic scanning was another strategy used to obtain low-resolution images for reconstruction. Scherrer et al. [10] employed a maximum of a posteriori estimation from anisotropic orthogonal acquisition to reconstruct isotropic high-resolution DWI images.
Compared with the SR acquisition implemented in specific scanning protocols, the SR algorithm in the postprocessing stage has been involved with the scene image SR reconstruction. This method category is independent of acquisition protocol and was previously implemented on MRI. The most intuitive methods for increasing resolution are interpolation methods, such as bicubic and B-spline interpolation [11]. These methods estimate the new voxels according to some smoothness assumptions which are not valid in inhomogeneous areas. Hence, interpolation methods usually result in blurred edges and artifacts in lines. Nowadays, advanced superresolution algorithms for scene images have been proposed for MRI and the main idea is reconstructing highresolution information from the image content. Manjón et al. [12] used a nonlocal estimator to reconstruct high-resolution MRI using a single low-resolution dataset. Rousseau [13] involved multimodality MRI to improve the SR quality. Coupé et al. [7] implemented a nonlocal estimator in DWI and also incorporated 0 information to enhance the reconstruction results. Sparse representation, a recent trend in signal and image processing, has also been effectively implemented in MRI. Rueda et al. [14] reconstructed single MRI datasets based on prior knowledge with a pretrained overcomplete dictionary. Trinh et al. [15] extended the sparse representation with nonnegative expressions in order to remove noise and superresolve the two sets together.
Most of the currently used SR methods are based on the structure MRI modality and, when applied to DWI, joint information should be considered for better reconstruction [16]. Joint information comes from the redundancy acquired from adjacent scanning directions. This extra information is beneficial for enhancing the spatial resolution of DWI. Single value decomposition (SVD) plays a central role in reducing high-dimensional data to lower-dimensional data and is a classical method involved in inverse problems such as denoising [17] and restoration [18]. Recently, high-order single value decomposition (HOSVD) was used to generalize the SVD of a matrix into a high-order matrix and offered a simple yet elegant method for handling similar patches [19]. Additionally, the HOSVD basis was adapted from image content and may achieve a more sparse representation than the fixed basis. In this paper, we propose a novel SR method for DWI datasets using HOSVD. Similar to the nonlocal patch-based SR approaches successfully implemented on both MRI and DWI [7,12,13], HOSVD was used to construct the regularization framework in the proposed method. The merit of the HOSVD SR method stems from the adaptive HOSVD basis which results in a more accurate reconstruction. Also, the HOSVD is only implemented over similar patches in a stack, which effectively decreases the computational complexity simultaneously. This is especially useful for DWI, since the involvement of joint information from adjacent directions in DWI datasets dramatically increases the computation burden [16].
The remainder of this paper is organized as follows: we first describe the proposed method in detail and then apply it to both synthetic and in vivo DWI datasets for experimental evaluation. Experimental results and computational efficiency are demonstrated in Section 4 and concluding remarks are given in Section 5.

Methods
Image SR leads to an ill-posed inverse problem which is related to the LR image y and HR image x; the general model can be expressed as follows: where n represents acquisition noise, represents the decimator operator, and represents the degradation function [7,12,13]. Based on this model, the SR image can be estimated by minimizing a least-square cost function as follows: For such inverse problems, a regularization term should be added to stabilize the convergence; thus, the HR image x can be estimated from the LR observation y using the following equation: where (x) is the regularization term, ‖y − x‖ is a fidelity term, and is a balancing parameter. As demonstrated by Coupé et al. [7], nonlocal patches methods can be an efficient way to define the regularization term. Instead of using a nonlocal mean estimator, we propose the implementation of a high-order SVD to be used as the estimator in this study, owing to its simple application and promising performance [19].
The HOSVD estimator clusters similar patches into a stack, in a manner similar to other patch-based methods [7,12], and then performs an HOSVD transformation to obtain the HOSVD basis and coefficients. After the truncation of the coefficients, the patches are then reconstructed by an inverse HOSVD transform.
With this in mind, the regularization term for the superresolution process in (3) can be defined as follows: where HOSVD is the HOSVD base estimator. Given an × patch P centered in , we define such similar patches (including P ) as {P }, where 1 < < , and the − 1 similar patches are obtained as follows.
Let us denote {P } as the stack Z ∈ L × × ; the HOSVD of the stack can then be defined as [20] where S is the set of coefficient matrices for a three-order tensor with × × , × stands for the th mode tensor product defined in [20], and U (1) ∈ L × , U (2) ∈ L × , and U (3) ∈ L × are orthonormal unitary matrices.

3
After applying the HOSVD transform, the patches can be estimated by nullifying the coefficients under the assumption that the coefficients of the clean image have a sparse distribution. As indicated in [19], the coefficients can be truncated using hard thresholding as follows: where denotes the hard threshold defined by = √2 log( 2 ) for the stack with patches of size × . As noted in [20], the coefficients in tensor S are not necessarily positive and the hard thresholding is defined as the absolute value of the coefficient array: where S denotes the th element of tensor S. After truncation, the stack Z is reconstructed using an inverting transform with truncated coefficients to obtain the final HOSVD estimator HOSVD : Since the DWI datasets are three-dimensional, the above method should be extended to include fourth-order HOSVD transforms for stacks with 3D similarity patches. Besides this, the threshold should be modified as = √2 log( 3 ) for a stack of size × × × .
In [12], a mean consistency correction followed the estimator to ensure coherence with the physical acquisition model. This was implemented in the fidelity term: This was done for the entirety of location in the LR dataset while subsampling consistency was imposed on the reconstructed patches. Finally, the iteration process is summarized by (8) and (9) and is applied until convergence: where NN is the nearest neighbor interpolation and is the iteration number.
In order to further improve the SR performance, the proposed HOSVD SR method can be augmented using joint information from adjacent directions in the DWI dataset [16]. For each patch P , the corresponding stack Z was constructed with similar patches which were determined as follows: the distance threshold selected all patches for which ‖P − P ‖ < was chosen as = 3 2 2 , where is the variance of the noise. This threshold is balanced between the estimation accuracy and the computational speed as indicated in [19]. The joint information was introduced by enlarging the search window into the adjacent DWI datasets, where is defined as = 2 + 1 and denotes the directions before and after it. In this paper, the HOSVD superresolution method which uses joint information in multiple directions is referred to as HOSVD-M.

Experiments
In order to quantitatively evaluate the quality of the reconstruction, B-spline interpolation, which has been introduced for DWI resolution enhancement in the literature [21,22], is used for comparison. In addition to this, a nonlocal approach for image SR [12] is also involved as an effective nonlocal patch-based SR method for comparison purposes. In this section, both synthetic and in vivo datasets were implemented for evaluation. The patches size was empirically set to 5 as suggested in [23], which was for denoising purposes primarily and also demonstrated robust results in this work. The balance parameter was set to 0.01 in all experiments. Since a sensitivity analysis for this parameter showed that the values between 0.001 and 0.2 only generated less than 0.1 dB variations of the PSNR, this means that the reconstruction has little dependence of this parameter which was also observed in the literature [14].
The simulated dataset without noise was chosen as ground truth, which consists of the 3D structure field presented at the 2012 HARDI Reconstruction Challenge [24] and occupies a 16 × 16 × 5 volume, mimicking a realistic 3D tract configuration. As shown in Figure 1(a), this dataset is comprised of five different fiber bundles which give rise to the nonplanar configurations of bending, crossing, and kissing tracts. All fiber tracts were characterized with a fractional anisotropy between 0.75 and 0.90. To better explore the proposed method, this synthetic dataset was also corrupted by Rician noise (SNR = 30) as demonstrated in Figure 1(e). Both the original dataset and the noisy set were downsampled by factor 2 using nearest neighbor interpolation along each axis. Afterwards, the LR datasets were superresolved using the B-spline method, the nonlocal method, and the proposed method, respectively. In addition to the visual comparison demonstrated in Figure 1, the angular accuracy was also measured for quantitative evaluation purposes [24]. The angular accuracy in the orientation of the estimated fiber compartments was assessed by medians of the average error (in degrees) between the estimated fiber direction and the true direction present in each voxel:  The in vivo DWI dataset was acquired using a 7T Philips Achieva whole body scanner (Philips Healthcare, Cleveland, OH) equipped with a volume head coil for transmission and 32 channels. A DW dual spin-echo, SENSE accelerated msh-EPI was used to acquire the DWI data ( -value: 700 s/mm 2 ; 15 diffusion directions), FOV = 210 × 30 × 21 mm 3 , matrix size = 300 × 300 with 15 slices, and a spatial resolution of 0.7 × 0.7 × 2 mm 3 . In order to validate the proposed approach both quantitatively and qualitatively, a gold standard image was constructed based on the in vivo HR DWI dataset. This was calculated by averaging 10 acquisitions of high-resolution DW images in the image space (0.7 × 0.7 × 2 mm 3 ). LR images were then used in the experiment and were simulated by downsampling the gold standard by factor 2 using nearest neighbor interpolation along each axis (i.e., [2 2 2]), which resulted in simulated LR images of size 1.4 × 1.4 × 4 mm 3 .
In order to quantitatively evaluate the superresolved DWI dataset, two objective measurement matrices, namely, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [25], were used. The PSNR measures the extent to which noise has been suppressed and SSIM measures the structural and perceptual similarities between the original and reconstructed images: where and are the mean values of images and . The terms and are the standard deviation of and , respectively, is the covariance between them, and the constants 1 and 2 are chosen as suggested [25].
Tensor estimation of the in vivo DWI dataset was evaluated quantitatively between the superresolved dataset and the gold standard. First, the diffusion tensor field and principal eigenvectors were computed using CAMINO [26] and are demonstrated in Figure 1; Figure 2 contains the mean angular error estimated by (11). The fractional anisotropy (FA) map and colormap of the estimated DTI were calculated and are shown for comparison. Finally, the primary direction of the tensor is also demonstrated for visual comparison. Figure 1 illustrates the principle eigenvector for the tensor model in the synthetic phantom and the results reconstructed using B-spline interpolation, nonlocal upsampling, the proposed HOSVD, and the proposed HOSVD-M. It is evident from the results in the figure that the superresolved methods dramatically outperformed the interpolation methods. The proposed method achieved the best results, by visual inspection, for both noisy and no-noise configurations. This is likely due to the adaptive HOSVD bases derived from the stacked patches, which are more suitable for reconstruction. Quantitative comparisons were provided in Figure 2 for more comprehensive evaluation. The expected inability of reconstruction using interpolation method was clearly reflected as the highest value in both original and noisy phantom datasets. Meanwhile, the proposed HOSVD and HOSVD method achieved the best reconstruction quality with the lowest value and remarkable stability. It can be seen in Figure 2(a), in the original dataset, the proposed two HOSVD methods achieved narrower angle error distribution compared with other methods. Besides this, in the noisy dataset (Figure 2 plausible stability with significantly less outliers than the NLM method. The reconstructed results for in vivo DWI data are demonstrated both quantitatively and qualitatively in Figures  3 and 4, respectively. Figure 3 displays the PSNR and SSIM for the reconstructed DWI dataset using the above methods. As indicated previously, the patch-based SR methods obviously outperformed the interpolation methods. The proposed HOSVD methods outperformed the nonlocal upsampling method in most of the diffusion directions and the HOSVD-M method achieved the best results in most of them. Figure 4 shows a visual comparison of the reconstructed DWI images. The interpolated results (Figure 4(b)) were the blurriest.

Results
Images reconstructed using the proposed method were the most similar to the original images. The enlarged region (Figure 4(l)) demonstrates that the proposed HOSVD clearly reconstructed the spatial features of the cracked area as indicated in the red arrow, as compared with the same area reconstructed by other methods, in which the edges are blurry and difficult to distinguish. Figures 5 and 6 demonstrate the tensor estimation results using the superresolved DWI datasets. Figure 5 shows the FA map for the estimated DTI datasets. The proposed HOSVD-M method achieved the best results in the enlarged areas and retained most of the structure and tissue from the original images. It can be seen more clearly that, in the residual map,  the result of HOSVD-M remained less in structure information compared with others. The fiber direction indicated in the FA colormap is shown in Figure 6. It can be observed in Figure 6(e) that the proposed HOSVD-M obtained robust direction reconstruction results. For example, in the corpus callosum bundle, the color of most voxels remained the same. This coincides with Figure 7(f) in which the primary eigenvectors for the voxels in the corpus callosum maintain the same direction.
In addition, we applied our method directly to the in vivo golden standard DWI dataset. Figure 8 shows the results of the reconstruction using different method. As it can be observed, the proposed method achieved plausible reconstruction as well. As pointed out by red arrow, the HOSVD-M preserved the fine detail and reconstructed explicit crack boundary which may be beneficial for further applications.

Discussion and Conclusion
In this work, we investigated a novel patch-based single image superresolution method to increase the spatial resolution of DWI datasets. The proposed method introduced HOSVD into the SR framework as a regularization term to achieve better image reconstruction and more efficient computation. Joint information from adjacent DWI directions was also involved to make further improvements. Both synthetic and in vivo DWI datasets were implemented for evaluation of the DWI reconstruction and DTI estimation. Compared with conventional interpolation and patchbased SR methods, the improvements made by the proposed HOSVD method can be contributed to two features. The first is adapted HOSVD basis acquired from a stack of similar patches. This technique obtained bases adaptively, according to image content, and achieved a more effective reconstruction result. The second feature is the introduction of joint information from adjacent directions in the DWI datasets. As noted in [16], the adjacent directions contain a significant amount of image redundancy and the encapsulation of both the processed and adjacent directions effectively benefits the reconstruction.
Computational complexity is another important issue for patch-based methods as well as DWI processing. All experiments were performed on a PC running MATLAB R2013b in Windows 7, with an Intel(R) core i7-4600U processor and 8 GB of RAM. For a typical DWI dataset with a matrix size of 128 × 128, 60 slices, and 32 directions, the runtime for a single direction was approximately 8 minutes for nonlocal upsampling, 3 minutes for the proposed HOSVD, and 5 minutes for the proposed HOSVD-M. This increase in speed is likely due to an inherent dimensional decreasing property of the SVD as well as HOSVD. Specifically, compared with nonlocal methods which average every patch to estimate the reconstructed patches, the proposed method only manipulates a portion of patches with high similarity. This induced a faster convergence speed. We also expect that the implementation of parallel computing on graphic processing units could further speed up the reconstruction. This will require further research in future studies.
In this paper, we proposed a patch-based single image superresolution method which involved applying a highorder SVD to a DWI dataset. The adaptive HOSVD bases acquired from the image ensured a more accurate image reconstruction and manipulation of similar patch stacks led to a reduction in computational complexity. Quantitative and qualitative comparisons of the traditional interpolation and nonlocal patch-based methods demonstrate the competitive results obtained for both DWI reconstruction and DTI estimation.