^{1}

^{1}

^{2, 3, 4}

^{2, 3, 4}

^{2, 3, 4}

^{2, 3, 4}

^{2, 3, 4}

^{1}

^{2}

^{3}

^{4}

Low dose CT (LDCT) images are often significantly degraded by severely increased mottled noise/artifacts, which can lead to lowered diagnostic accuracy in clinic. The nonlocal means (NLM) filtering can effectively remove mottled noise/artifacts by utilizing large-scale patch similarity information in LDCT images. But the NLM filtering application in LDCT imaging also requires high computation cost because intensive patch similarity calculation within a large searching window is often required to be used to include enough structure-similarity information for noise/artifact suppression. To improve its clinical feasibility, in this study we further optimize the parallelization of NLM filtering by avoiding the repeated computation with the row-wise intensity calculation and the symmetry weight calculation. The shared memory with fast

X-ray Computed Tomography (CT) can reflect human attenuation map in millimeter level, in which rich 3D information of tissues, organs or lesions can be provided for clinical diagnosis. Though its wide application in clinics, the radiation delivered to patients during CT examinations is always a wide-spread concern. It was reported in [

The first one refers to those techniques that improve CT imaging by suppressing the noise in projected raw data before the routine FBP reconstructions. The key of these techniques is to find the accurate statistical distribution of projected data and design effective restoration algorithms [

The third one refers to postprocessing methods, which can be directly applied to improve LDCT images. Distribution and scale features of noise, artifacts, and normal tissues in CT images need to be jointly considered in designing effective postprocessing algorithms [

However, since noise and artifacts often distribute with prominent amplitudes in LDCT images, a large searching window is practically required to include more structure information in noise/artifact suppression, which implies a large computation cost. This will strongly limit its clinical application considering the large workload in current radiology departments. To overcome this, this paper presents an improved GPU-based parallelization approach to accelerate the NLM filtering. The proposed approach optimizes the computation in NLM filtering by avoiding repeated computation with row-wise intensity calculation and weight calculation. The fast

Compared to restoration algorithms based on intensity gradient information, NLM filtering can provide edge-preserving noise/artifact suppression without blurring image structures. In NLM filtering, one image patch is matched with a group of similar patches in a large neighboring area, and in this way more structure similarity information in large neighboring scale can be used to suppress noise and artifacts in LDCT images. The NLM algorithm replaces pixel intensities by the weighted average of intensities within a searching window

In Figures

Processed result of one

Processed result of one

Processed result of one

Processed result of one

3D illustration of a set of thoracic LDCT images. (a), (b), and (c) correspond to the original LDCT volume, the corresponding SDCT volume, and the LDCT volume processed by NLM filtering, respectively.

To highlight the importance of a large searching window, we also list in Figure

Utilizing GPU based techniques to parallelize algorithm has already become a notable trend in the field of parallel computing. The GPU based parallelization is achieved by jointly parallelizing coarse-scale patches and fine-scale threads in the original grid computation task which is parallelizable [

The conventional GPU based parallelization accelerates the NLM filtering algorithm by direct pixel-wise parallelization. Based on above (

For data

The third kernel function (

In the final loop

In the above conventional parallelization approach, the second kernel function in (

Row-wise calculation in patch difference calculation.

Similar to the conventional GPU parallelization, we also divide the algorithm into the following four parts (

The first kernel function computes the sum of the intensity differences for each row pair, which is multiplied by the Gaussian weight calculated based on the perpendicular distance from the row to the center point. The computational complexity of this kernel function is

The second kernel function calculates the similarity of patches based on (

The second improvement is saving one half computation cost by exploiting the symmetry property of weights calculated in (

Then, a final kernel function (

The final loop number with respect to the required operation number as to the searching window now becomes

In this section we compare the computation cost of different methods. To verify the improvement brought by the proposed acceleration method for the NLM filtering, we process the same

Comparison of the computation time of the serial algorithm and the conventional parallelization algorithm for NLM filtering.

CPU: Inter(R) Core(TM) i7-3770 CPU @ 3.40 GHz; Memory: 8 GB; Graphics Card: NVIDIA GeForce GTX 680 with 1536 CUDA cores; Effective memory clock: 6008 MHz; Memory bandwidth: 192 GB/s; Memory size: 2 GB; Memory bus type: 256 bit.

Operating System: Win7 64 bit; Matlab: R2011a; CUDA: 4.0.

Then, we compare the computation time of the conventional parallelization algorithm and the improved parallelization algorithm with respect to the size of searching window. The patch size is fixed to

Comparison of the computation time with respect to searching window size for the conventional parallelization algorithm and the improved parallelization algorithm.

Comparison of the computation time with respect to patch size for the conventional parallelization algorithm and the improved parallelization algorithm.

In this paper we further optimize the parallelization for NLM filtering in CT image processing. The proposed approach optimizes the parallelized computation in NLM filtering by avoiding repeated computation with row-wise intensity calculation and weight calculation. The fast

Currently, the structure similarity idea in NLM has got wide applications in the other field of image processing (e.g., image segmentation and image reconstruction) [

The authors declare that there is no conflict of interests regarding the publication of this paper.

This study is supported by the Science and Technology Research Project of Liaoning Province (2013225089), the National Natural Science Foundation under Grants (81370040, 31100713), and the Qing Lan Project in Jiangsu Province. This research was also partly supported by National Basic Research Program of China under Grant (2010CB732503).