Low-Dose CT Image Denoising Based on Improved DD-Net and Local Filtered Mechanism

Low-dose CT (LDCT) images can reduce the radiation damage to the patients; however, the unavoidable information loss will influence the clinical diagnosis under low-dose conditions, such as noise, streak artifacts, and smooth details. LDCT image denoising is a significant topic in medical image processing to overcome the above deficits. This work proposes an improved DD-Net (DenseNet and deconvolution-based network) joint local filtered mechanism, the DD-Net is enhanced by introducing improved residual dense block to strengthen the feature representation ability, and the local filtered mechanism and gradient loss are also employed to effectively restore the subtle structures. First, the LDCT image is inputted into the network to obtain the denoised image. The original loss between the denoised image and normal-dose CT (NDCT) image is calculated, and the difference image between the NDCT image and the denoised image is obtained. Second, a mask image is generated by taking a threshold operation to the difference image, and the filtered LDCT and NDCT images are obtained by conducting an elementwise multiplication operation with LDCT and NDCT images using the mask image. Third, the filtered image is inputted into the network to obtain the filtered denoised image, and the correction loss is calculated. At last, the sum of original loss and correction loss of the improved DD-Net is used to optimize the network. Considering that it is insufficient to generate the edge information using the combination of mean square error (MSE) and multiscale structural similarity (MS-SSIM), we introduce the gradient loss that can calculate the loss of the high-frequency portion. The experimental results show that the proposed method can achieve better performance than conventional schemes and most neural networks. Our source code is made available at https://github.com/LHE-IT/Low-dose-CT-Image-Denoising/tree/main/Local Filtered Mechanism.


Introduction
Computed tomography (CT) is crucial in medical diagnosis and illness analysis [1][2][3][4][5]. As excessive CT scan probably causes a series of acute potential cancers and cases, the medical instrument usually adopts a CT radiation dose as low as possible to avert the damage to health. However, it is certain that reducing the radiation dose will cause information loss of human tissue, and a large amount of noise in the image may influence the accuracy of the diagnosis. us, how to reduce the noise in low-dose CT (LDCT) images and preserve the image information is one of the critical issues in medical image processing.
As reducing the dose of CT radiation can produce the projection data with a low signal-to-noise ratio, some methods utilize the nonlinear filter [6,7] or the statistical characteristic of noise [8,9] to reduce the noise in the projection data. In addition, some methods attempted to remove the noise and streak artifacts in LDCT images directly. To remove the streak artifacts fused in the tissue structure, approaches including nonlinear diffusion filter [10], sparse representation, and dictionary learning [11][12][13] were proposed. Due to the excellent performance, some natural image denoising algorithms [14,15] were applied to remove the noise in LDCH images.
ese methods belong to postprocessing methods and aim to preserve the detailed information and remove the noise and artifacts simultaneously.
Due to the remarkable expressive capacity of neural networks, many researchers try to reduce the noise in the LDCT images based on convolution neural network (CNN) and generative adversarial network (GAN). Some researchers utilized structures, such as autoencoder, residual block, and dense block, into CNN for LDCT image denoising [16][17][18]. Some researchers used CNN to learn image features and provide prior knowledge for traditional CT reconstruction schemes such as analytic reconstruction and iterative reconstruction [19,20]. However, it is hard to generate realistic and diverse details based on CNN. To remedy this, some schemes based on GAN [21][22][23][24][25][26][27][28] were proposed. In these GAN-based solutions, the denoised images are generated by the generator and evaluated by the discriminator. rough competition and self-optimization, the generator can generate realistic normal-dose CT (NDCT) images. Besides, some novel structures, such as leap structure [29] and sharpness detector [23], were developed to enhance the performance of neural networks. Although these networks can perfectly remove the noise and recover the image structures, there are still some ambiguous and incorrect subtle structures in the denoised results. Confronting the problem, the frequencyseparation-based networks [30,31] were proposed to separate the LDCT image into a low-frequency portion and a highfrequency portion for generating each portion, respectively. However, the incorrect subtle structures are usually the areas that are contaminated severely by noise and streak artifacts instead of the edge information in high-frequency domain, as shown in Figure 1. On the one side, frequency separation usually produces information loss during transformation. On the other side, there is an essential correlation between global structures and detailed information, which is useful for subtle structure restoration.
To address the above problems, we use the difference image between the predicted image and the NDCT image to make the mask image. Compared with frequency-separation-based methods, the difference image can reflect the areas that contain incorrect subtle structures accurately. Using the mask image obtained by taking a threshold operation to the difference image, the local filtered mechanism filters the high-quality areas in the LDCT image and NDCT image and preserves the low-quality areas, which will be optimized by the network especially. To utilize the correlation between global structures and detailed information, our model learns the global structure between LDCT image and NDCT image in the first step. en, the model learns the detailed information between the filtered LDCT image and NDCT image in the second step. By the network optimization in the second step, the ability in subtle structure restoration is enhanced significantly. As the global structures and detailed information are learned in the same network, our model can provide the implicit global features for subtle structure restoration and avoid the transformation loss during frequency separation. However, learning two tasks in one network requires a deeper network structure. erefore, we propose an improved DD-Net [32] that has mighty ability in feature extraction and denoising. In our network, we replace the dense block by an improved residual dense block [33] to deepen the neural network and improve the performance. Although this mechanism solves the subtle structure restoration problem, the combination of MSE and MS-SSIM as a loss function is detrimental for the edge information restoration. us, we introduce the gradient loss [34] to calculate the loss in the high-frequency portion, which improves the performance reduction caused by the local filtered mechanism.
In this work, we evaluate the performance of our proposed network by comparing with other typical schemes. Experimental results demonstrate that the proposed scheme can restore the ambiguous subtle structures brilliantly and gets higher performance in objective metrics than most comparative methods. e contributions of the paper can be listed as follows: (i) We propose an improved network based on DD-Net and a novel local filtered mechanism. rough this mechanism, the network can generate the subtle structures accurately with the global context and accomplish the balance between network generalization and subtle structure restoration. (ii) We introduce the gradient loss to enhance the ability in edge information restoration and improve the performance reduction caused by the abatement of generalization significantly. (iii) Experiments on low-dose chest image and brain image denoising prove that our network outperforms the conventional schemes and most neural networks in both evaluation metric and visual appearance.

Related Work
Since we propose an improved neural network for LDCT image denoising, some significant work will be reviewed optionally in this section. As neural networks have achieved brilliant performance in image processing, LDCT denoising methods based on the neural network have been presented in the past decades. ese methods achieved outstanding results in both objective metrics and visual appearance. Although analytic reconstruction and iterative reconstruction are still the mainstream in commerce scenes, it is certain that neural networks will be applied in commercial CT equipment in the future.

CNN Methods.
Due to the powerful ability of feature extraction and mapping, some researchers attempted to reduce the noise in the LDCT images based on CNN. e deep residual network and cascade network [16][17][18] were the early applications to improve the performance of LDCT denoising. Zhang et al. [32] combined the dense block and deconvolution structure to build a lightweight network that can reuse the features effectively. Some methods [19,20] combined neural network with analytic reconstruction or iterative reconstruction and improved the quality of LDCT images in the projection domain. Based on the residual block and dense block, the residual dense block (RDB) [33] achieved excellent performance in superresolution through contiguous memory (CM) mechanism and local feature fusion (LFF). By introducing the feature attention and enhancement attention modules (EAM), the real image denoising network (RIDNet) [35] can denoise real noise images efficiently. Inspired by the above study, this work introduces an improved residual dense block based on the DD-Net for achieving further enhancement of the feature representation and denoising.

GAN Methods.
While methods based on CNN can greatly improve the denoising performance, they can only generate the image structures based on prior knowledge, which causes the restriction in subtle structure restoration. erefore, the schemes based on GAN were presented. Yang et al. [21] applied the Wasserstein distance and perceptual loss to train the GAN network. Wolterink et al. [22] introduced the voxelwise loss to improve the performance of GAN. Ge et al. [24] developed a conditional GAN to generate the thin thickness slices from thick LDCH images. To compete with commerce algorithms, Shan et al. [25] proposed a modularized adaptive processing neural network (MAP-NN). Choi et al. [26] presented the semisupervised GAN, including denoising network and classification network, to reduce the dependence on NDCT images. Taking the consecutive low-dose projections as the input, the comprehensive domain generator [27] with three-dimension was presented to learn the redundant information among slices and generate the subtle structures. To capture structural details, You et al. [29] introduced the leap connection and network in network. To describe the uncertainty of the denoised image, Huang et al. [36] used the CutMix technique and U-Net-based discriminator to provide radiologists with a confidence map. However, although the GAN-based methods can perfectly preserve the texture information in LDCT images, they performed poorly in subtle structure restoration as well. In contrast, our network preserves the areas in low quality by the mask image and strengthens the ability in subtle structure restoration by optimizing the areas in low quality especially.

Subtle Structure Restoration.
Currently, although CNN and GAN have remarkably improved the performance in image denoising, it is still hard to restore the subtle structures. Yin and Babyn [23] designed a sharpness detector based on cGAN to preserve more edge information. Wang et al. [30] applied the shearlet transformation to generate the high-frequency information and low-frequency information separately. Fritsche et al. [34] utilized the low pass filter for frequency separation and adopted the GAN loss for subtle structure restoration. Yang et al. [31] designed two subnetworks based on U-Net in the generator for LDCT image denoising in the spatial domain and high-frequency domain. Recent work restores subtle structures through frequency separation, which lacks global information and may bring information loss during transformation. In contrast, our network learns the global structures and detailed information in one network and can provide the implicit global context for subtle structure restoration. Meanwhile, our model can specially optimize the areas with low quality through the mask image and avoid information loss during transformation.

The Proposed Scheme
e critical content of LDCT image denoising is to restore the subtle structures while removing the noise. To enhance the ability in feature representation and denoising, we propose an improved DD-Net. Besides, we present a new local filter mechanism and introduce a novel gradient loss to restore subtle structures accurately.

Network Structure.
e proposed neural network employs a similar structure with DD-Net [32], which achieves brilliant performance in medical image denoising. However, as the global structures and detailed information are learned Computational Intelligence and Neuroscience in one network, our model requires a deeper network structure and more powerful feature representation ability. us, the improved residual dense block (IRDB) [33] is introduced. e improved residual dense block is composed of the dense connected [37] block and the enhanced residual block [35]. Considering the scale variation in the maxpooling layers [38], we remove the batch normalization in the dense connected block. e detailed structure of the improved residual dense block is shown in Figure 2.
e detailed network structure is represented in Figure 3. It includes 1 convolution layer, 4 max-pooling layers, 4 improved residual dense blocks, 4 upsampling layers, and 8 deconvolution layers, followed by Relu and batch normalization.
e input and output of the network are 512 × 512 × 1 medical images. We adopt a 7 × 7 convolution layer after the input layer. After the 7 × 7 convolution layer, there are 4 encoder modules and 4 decoder modules.
e encoder modules employ the max-pooling layer and the improved residual dense block [33] to extract multiscale features. e decoder modules employ the upsampling layers and 2 deconvolution layers followed by the Relu and batch normalization to restore the image information. e layers with the same feature shape in encoder modules and decoder modules will be concatenated. e kernel size of the last deconvolution layer is 1 × 1 to generate the denoised image.

Local Filtered Mechanism.
As the subtle structure restoration is a significant content in LDCT image denoising, most schemes attempt to restore subtle structure by frequency separation. However, high-frequency information cannot reflect the subtle structures accurately, and approaches that process high-frequency information and low-frequency information separately may cause information loss during the transformation. To solve the problem, we propose a local filter mechanism to enhance the ability in subtle structure restoration. e mechanism restores the unclear subtle structures by two steps. In the first step, the LDCT image is inputted into the network to get the denoised image I 1 . In the second step, the difference image D 1 between the NDCT image and I 1 is obtained. en, the mask image is generated by taking the threshold operation to D 1 . Using the mask image to conduct elementwise multiplication operation with LDCT and NDCT images, the areas with high quality are filtered, and the areas with poor quality are preserved. At last, the filtered LDCT image is inputted into the network to get the filtered denoised image I 2 . e original loss between the denoised image I 1 and NDCT image N 1 is calculated to enhance the ability in global structure restoration. e correction loss between the filtered denoised images I 2 and filtered NDCT image N 2 is calculated to enhance the ability in unclear subtle structure restoration. e detailed process is shown in Figure 4 and Algorithm 1. Compared with the frequency-separation-based methods, the proposed network can achieve feature sharing between global structure and detailed information and can provide more context information for detail restoration.
us, it can generate more realistic and precise subtle structures, as shown in Figures 5-8. In fact, the above mechanism can be considered as a confrontation between network generalization and subtle structure restoration. e neural network tends to discard some detailed information of a specific image and make the subtle structures oversmoothed, which is beneficial for the enhancement of generalization and the reduction of overall error. To alleviate this phenomenon, this mechanism is designed to filter the areas with high quality and drive the network to specially optimize the areas with low quality that contain detailed information and are harder to optimize. Finally, the network can achieve the balance between the image denoising and subtle structure restoration through the sum of original loss and correction loss of the improved DD-Net. Table 1 and Figure 9 demonstrate the results of different threshold values and training strategies. Although the model which was trained without a local filter mechanism can achieve higher PSNR and SSIM results, it drops some subtle structures to improve the generalization ability and cannot achieve further enhancement of subtle structure quality. e models which were trained with a local filter mechanism for 160 epochs can restore more appealing subtle structures but achieve poor performance in lowfrequency portion denoising. erefore, all networks without a pretrained model obtain lower PSNR and SSIM results than a pretrained model. e reason is that the global structures and detailed information are learned separately when adopting a local filtered mechanism at the beginning of the training phase. Without the prior knowledge of global structures, the network cannot utilize the global structure information learned in the first step to restore the subtle structures in the second step efficiently. Models with threshold values 0.01 and 0.04, which were trained with a pretrained model and local filter mechanism for 160 epochs, obtain higher PSNR and SSIM results than a pretrained model, but the improvement is slower than the model which was trained without a local filter mechanism for 160 epochs. e reason is that the subtle structures are more difficult to optimize. As is shown in Figure 9, it can be seen that all models trained with pretrained model and local filter mechanism pay more attention to subtle structure restoration.
Furthermore, how to set the threshold value is a crucial problem. When we use the low threshold value, the filtered images contain a large amount of low-frequency information which hinders the network from especially optimizing subtle structures. When we use the high threshold value, most subtle structures are filtered, and large black areas in the filtered images will influence the global structure restoration as there is a large difference between the filtered LDCT image and real LDCT image. Here, we trained the model without the local filtered mechanism for 80 epochs at first and visualized the filtered results using different threshold values. e visualization results are shown in Figure 10. As the threshold value of 0.04 can filter most low-frequency areas and save enough subtle structures, we choose 0.04 as our training threshold value in the chest dataset. Adopting the above parameter selection strategy, we choose 0.004 as our training threshold value in the brain dataset. Table 1 shows that both larger threshold value and lower threshold value will reduce the  Figure 3: An overview of our network structure. e c16k7s1 denotes that the channel, kernel size, and stride are 16, 7 × 7, and 1, respectively. k 2, 2 denotes that the kernel size of max-pooling 2d is 2 × 2. f2 denotes that the upscale factor of upsampling is 2. c32k5s1 denotes that the channel, kernel size, and stride are 32, 5 × 5, and 1. c16k1s1 denotes that the channel, kernel size, and stride are 16, 1 × 1, and 1. c1k1s1 denotes that the channel, kernel size, and stride are 1, 1 × 1, and 1.
Require: L 1 ( e low-dose CT image) Require: N 1 ( e normal-dose CT image) Ensure: L total ( e total loss to take the back propagation and optimize the network) NOTE: D 1 is the difference image which can reflect the difference between I 1 and N 1 . M is the mask image to filter the areas with high quality and save the areas with low quality. P xy is the gray value of D 1 at pixel (x, y). M xy is the gray value of the mask image M at pixel (x, y). Improve dD DNet is an improved convolution neural network based on DD-Net [32]. thresh is the threshold value. ⊙ is the elementwise multiplication operation. N 2 and L 2 are the filtered normal-dose image and low-dose image. Loss is the loss function of our network. L original is the loss between I 1 and N 1 . L correct is the loss between I 2 and N 2 .  Computational Intelligence and Neuroscience quality of the denoised results. e model whose threshold value is 0.04 obtains the highest PSNR and SSIM results in the chest dataset regardless of whether we use pretrained model or not. Furthermore, Figure 9 illustrates that an appropriate threshold can obtain a better visual appearance of subtle structures. Experiment verification indicates that the above parameter selection strategy can obtain a relatively proper threshold value.

Loss Functions.
In our experiments, we use the weighted sum of mean square error, multiscale structural similarity [39], and gradient loss as the final loss function. To implement the local filter mechanism, it is required to calculate the original loss between the denoised image I 1 and NDCT image N 1 when the LDCT image L 1 is used as the input, as well as the correction loss between the denoised image I 2 and filtered NDCT image N 2 when the filtered LDCT image L 2 is used as the input. us, the total loss function is the sum of original loss and correction loss of the improved DD-Net, as equations (1) and (2).

Mean Square Error (MSE).
e mean square error (MSE) is used to generate the objective information of images and calculate the pixel difference between denoised images and NDCT images. e MSE is described in equations (3) and (4). (3) where I(i, j) denotes the gray value of denoised images at pixel (x, y) and N(i, j) denotes the gray value of NDCT images at pixel (x, y). Computational Intelligence and Neuroscience 7

Multiscale Structural Similarity (MS-SSIM).
e structural similarity (SSIM) is a common metric for evaluating the perceptual loss in a single scale. As optimization of SSIM, the multiscale structural similarity (MS-SSIM) [39] is conducted over different resolutions and has a better performance. e MS-SSIM is described in equations (5)- (9).
where I(x, y) is used to evaluate luminance, c(x, y) is used to evaluate the contrast, and s(x, y) is used to evaluate the structural similarity. α, β, c are constants. In general, and B denotes the bit depth of the image.

Gradient
Loss. e gradient loss [34] is used to generate the edge information of images. In the first step, we take the convolution operation with denoised image and NDCT image, respectively, using a high-pass filter kernel K instead of the Sobel operator. In the second step, the gradient loss can be obtained by calculating the mean square error between the denoised image G predict and NDCT image G normal . e gradient loss is described in equations (10)- (13).
where K denotes the high-pass filter kernel and ⊗ denotes the convolution operation.

Experiments
e experimental details and result analysis are introduced in this section to validate the effectiveness of our proposed scheme.

Dataset.
In our experiments, the dataset comes from the "2016 NIH-AAPM Mayo Clinic Low Dose CT Grand Challenge." It comprises 100 chest scans with 10% of the routine dose, 99 head scans with 25% of the routine dose, and 100 abdomen scans with 25% of the routine dose. In our study, 10% chest scan and 25% head scan are used as the chest dataset and brain dataset. For chest dataset, we randomly select 25 cases for training and 10 cases for testing. For brain dataset, we randomly select 40 cases for training and 10 cases for testing.
ere is no overlap between training and testing. e medical image dataset can be obtained from the Cancer Imaging Archive (TCIA) website at https://wiki.cancerimagingarchive.net/pages/ viewpage.action?pageId�52758026.

Details.
We trained and tested the proposed model in the chest dataset and brain dataset, respectively. e network was trained for 80 epochs without local filtered mechanism to obtain the pretrained model. Based on the pretrained model, the model was trained for 160 epochs with a local filtered mechanism to obtain the final model. e batch size was 8. e optimizer was Adam with β 1 as 0.9 and β 2 as 0.999. e initial learning rate was set to 1 × 10 − 4 which reduced to 5 × 10 − 5 at the 130-th epoch. e convolution and deconvolution layers were initialized with the Gaussian function whose mean square was 0, and variance was 0.01. We, respectively, set λ 1 , λ 2 , and λ 3 to 1, 0.15, and 0.8 as the hyperparameters of loss function. All models were trained  Figures 11 and 12 demonstrate that our method is easier to achieve higher performance than DD-Net using MS-SSIM. Although the model which was trained without local filtered mechanism can get higher PSNR and SSIM results, it will drop some subtle structures to improve the generalization ability, which is shown in Figure 9. e PSNR and SSIM results on the chest testing dataset, respectively, stabilize at 31.80 and 0.76 when the epoch is 120. e PSNR and SSIM results on the brain dataset, respectively, stabilize at 53.70 and 0.99 when the epoch is 140.
To verify the effectiveness of our scheme in subtle structure restoration, we selected some competitive methods for evaluation. Nonlocal total variation (NLTV) [40] is known as a statistical iterative method based on the compressed sensing (CS) technique. It adopts the global search and nonuniform weight penalization to improve the denoised image quality. Block-matching and 3d filtering (BM3D) [15] is the postprocessing method. It has the advantages of nonlocal methods and transform methods and removes the noise by searching and matching the similar blocks. Residual encoder-decoder convolution neural network (RED-CNN) [16] is a CNN-based method. It utilizes the residual structure and encoder-decoder structure and takes MSE as a loss function. Different from RED-CNN, DD-Net [32] applies the dense block in the encoder-decoder structure and uses the combination of MSE and SSIM as a loss function. Generative adversarial network with Wasserstein distance and perceptual loss (WGAN-VGG) [21] is a GAN-based method. It takes the combination of perceptual similarity calculated by the VGG-19 network and Wasserstein distance as a loss function. Adopting the GAN structure as well, MAP-NN [25] applies the multiple conveying path-based convolution encoder-decoder (CPCE) modules in the generator. It takes the combination of MSE, Wasserstein distance, and edge incoherence calculated by the Sobel operator as a loss function. 2D conveying pathbased convolution encoder-decoder network (CPCE-2d) [28] uses single CPCE module for LDCT denoising and takes the combination of adversarial loss and perceptual loss as the loss function. High-frequency sensitive generative adversarial network (HFSGAN) [31] is a frequency-separationbased method. It obtains high-frequency portion and lowfrequency portion by guided filter and applies two U-Net to process the high-frequency portion and whole image separately. Generative adversarial networks with dual-domain U-Net-based discriminators (DU-GAN) [36] apply two U-Net-based discriminators to evaluate the difference in image domain and gradient domain and adopt the CutMix technique to describe the uncertainty of the denoised image.
In the NLTV method, we set the number of iterations, time step (dt), gradient regularization (ϵ), and fidelity term (λ) to 5, 0.2, 1 × 10 −6 , and 1.2, respectively. In the BM3D method, the parameter σ was 25, and the hard threshold value was 2.7 × σ. β in Kaiser filter was 2. In the basic estimation, we set the match threshold value, maximum number of group matched blocks, block size, block stride, search step, and search window size to 2500, 16 blocks, block size, block stride, search step, and search window size to 400, 32, 8, 3, 3, and 39, respectively. In the RED-CNN method, the LDCT images were cropped into 64 × 64 patches as the input, and the batch size was 160. e network was trained for 100 epochs using Adam with default parameters; namely, β 1 was 0.9 and β 2 was 0.999. e initial learning rate was set to 1 × 10 − 5 and decreased by 0.5 every 30 epochs. In the DD-Net method, it took the 512 × 512 images as the input, and the batch size was 8. e network was trained for 160 epochs using Adam with default parameters. e initial learning rate was set to 1 × 10 − 4 and slowly decreased to 1 × 10 − 5 . In the WGAN-VGG method, the LDCT images were cropped into 64 × 64 patches as the input, and the batch size was 128. e network was trained for 200 k iterations using Adam with β 1 as 0.5 and β 2 as 0.9, respectively. e learning rate was set to 1 × 10 − 5 , and λ 1 in the loss function was set to 0.1. λ p for the gradient penalty was 10. In the MAP-NN method, the LDCT images were cropped into 64 × 64 patches as the inputs, and the batch size was 128. e network was trained for 80 epochs using Adam with default parameters. e initial learning rate was set to 1 × 10 − 4 and decreased by 1/ � t √ after the t-th epoch. λ m and λ e in the loss function were set to 50 and 50, respectively. λ p for the gradient penalty was 10. e number of conveyinglink-oriented network encoder-decoders (CLONE) was 5. In the CPCE-2d method, the LDCT images were cropped into 64 × 64 patches as the inputs, and the batch size was 128. e network was trained for 40 epochs using Adam with default parameters. e initial learning rate was set to 1 × 10 − 4 and decreased by 1/t after the t-th epoch. λ p for perceptual loss was set to 0.1. In the HFSGAN method, it took the 512 × 512 images as the input, and the batch size was 12. e network was trained for 200 epochs using Adam with β 1 as 0.5 and β 2 as 0.999, respectively. e initial learning rate was set to 2 × 10 − 4 . λ 1 and λ 2 in the loss function were set to 100 and 50, respectively. In the DU-GAN method, the LDCT images were cropped into 64 × 64 patches as the inputs, and the batch size was 64. e network was trained for 100000 iterations using Adam with default parameters. e initial learning rate was 1 × 10 − 4 . λ adv , λ img , and λ grd in the loss function were set to 0.1, 1, and 20, respectively. e code of NLTV was downloaded at http://math.sjtu.edu.cn/faculty/ xqzhang/NLIP_v1.zip. e code of BM3D was implemented using Python. Other deep learning-based methods were implemented using PyTorch according to the official codes.

Evaluation
Metrics. Appropriate evaluation metrics are crucial for the evaluation of LDCT image denoising because medical images contain more subtle structures and fewer channels than natural images. We select the peak signal-tonoise ratio (PSNR) and structural similarity (SSIM) as the evaluation metrics. ese evaluation metrics are widely used in image processing tasks, such as image superresolution and image inpainting.

Peak Signal-to-Noise Ratio (PSNR).
PSNR is an objective metric to measure the error of image pixels, and it is usually used in the images which are sensitive to error, as equations (14)- (15). In general, the higher the PSNR is, the lower distortion the image has.
PSNR � 10 · log 10 where x, y, N denote the denoised image, normal-dose image, and the width or height of the image, respectively, and MAX denotes the maximum of gray value.

Structural Similarity (SSIM).
As the PSNR cannot completely reflect the subjective visual difference, the SSIM is used as a supplement to measure the visual appearance of images, as equation (16). In general, the higher the SSIM is, the more abundant and appealing visual appearance the image has.

Qualitative
Evaluations. e LDCT denoised images are shown in Figures 5-8. e regions of interest (ROIs) are marked by green rectangles, and the red pointing arrows denote the difference between our method and comparative methods.
From the enlarged view under the denoised results, it can be seen that the BM3D and NLTV reduce the noise to some extent but fail to remove the artifacts, which is shown in Figure 6. e denoised images generated by RED-CNN and MAP-NN are slightly oversmoothed due to adopting MSE as a loss function. As the perceptual loss and adversarial loss cannot reflect the pixel level difference accurately, the WGAN-VGG and CPCE-2d preserve the texture information in LDCT images but cannot remove the streak artifacts completely. Using the MS-SSIM as a loss function, the DD-Net can remove the noise and streak artifacts effectively but cannot restore some subtle structures accurately, which can be shown in Figure 7. As the frequency-separation-based method, the HFSGAN enhanced the structural similarity between the denoised image and NDCT image in highfrequency portion and can restore the subtle structures clearly. However, it blurred the image information in the low-frequency portion. As is shown in Figure 6, the DU-GAN can generate the image with realistic texture information but cannot restore the subtle structure when the LDCT image is contaminated severely by noise and streak artifacts. Overall, as is shown in Figures 6 and 7, our method can restore subtle structures effectively and keep the color and structure consistent with the NDCT image.

Quantitative Evaluations.
To validate the performance of our neural network quantitatively, we adopted peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as objective metrics. Table 2 shows the PSNR and SSIM results of the chest dataset and brain dataset using different methods.
First, as the NLTV and BM3D remove the noise and artifacts through the information of single LDCT image solely, they perform poorly in image information restoration and get the lower PSNR and SSIM results than deep learning-based methods. Second, as the perceptual loss and Computational Intelligence and Neuroscience adversarial loss reflect the style difference, the WGAN-VGG and CPCE-2d cannot remove the streak artifacts completely. erefore, they obtain lower PSNR and SSIM results than methods using MSE as one of the loss functions, including RED-CNN, DD-Net, MAP-NN, HFSGAN, DU-GAN, and our method in the chest dataset. Meanwhile, as the RED-CNN was trained with MSE loss solely, it produces oversmoothed denoised results in Figure 6. Due to the MS-SSIM loss, the denoised results generated by DD-Net keep the high structural similarity with NDCT images. erefore, the DD-Net obtains the second-best PSNR and SSIM results in the chest dataset and the third-best PSNR and SSIM results in the brain dataset. As the HFSGAN processes the high-frequency portion especially, it can accurately restore the subtle structures. However, it blurs the image information in the low-frequency portion and obtains lower objective metrics results than DD-Net and our model in the chest dataset. In addition, due to the information loss during frequency separation, the HFSGAN performs poorly when the difference between LDCT images and NDCT images is relatively small. erefore, the HFSGAN obtains the lowest PSNR and SSIM results among deep learning-based methods in the brain dataset. Although the DU-GAN can preserve the texture information in LDCT images, it performs poorly in subtle structure restoration, which can be shown in Figure 6, and cannot keep color consistency with NDCT images. Besides, as there is a relatively large difference in texture information between LDCT images and NDCT images, the DU-GAN obtains lower PSNR and SSIM results than most deep learning-based methods in the chest dataset. However, when the difference between LDCT images and NDCT images is small, the DU-GAN can accurately sense the difference through a confidence map and get the best PSNR result in the brain dataset. Our model achieves high performance in subtle structure restoration and obtains the best objective metrics results in the chest dataset. However, our model obtains the second-best PSNR result and the best SSIM result in the brain dataset because the confidence map and taking the 64 × 64 patches as the input in the DU-GAN can sense the difference with low confidence score more accurately than our method. In contrast, our model can achieve higher performance when the LDCT images contain a large amount of noise and streak artifacts.
Because the perceptual loss using VGG-19 [41] is another effective loss function to enhance the visual appearance, it is compared with MS-SSIM. As is shown in Figures 11 and 12, the model using MS-SSIM as a loss function obtains higher PSNR and SSIM results than perceptual loss using VGG-19. e reason is that VGG-19 is trained on natural images and cannot reflect the visual features of medical images perfectly. Moreover, as shown in Figure 13, whether it is DD-Net or our network, the image generated by the model training with VGG-19 contains more noise than the model using MS-SSIM.

Uncertainty Visualization.
e uncertainty visualization of different methods is shown in Figure 14. As both BM3D and NLTV reduce the noise in the LDCT image to some extent, they can obtain higher global scores of D img dec than the LDCT image. However, they cannot remove the streak artifacts and restore the image information efficiently, which causes low per-pixel confidence. e RED-CNN can efficiently remove the noise and streak artifacts but oversmoothens the LDCT image simultaneously. erefore, its global score and per-pixel confidence are lower than other deep learning-based methods. In addition, although the WGAN-VGG and CPCE-2d can avoid the oversmooth problem, they cannot remove streak artifacts completely. erefore, their global scores are higher than RED-CNN but lower than most deep learning-based methods. e MAP-NN will pay more attention to subtle structure restoration by adopting the gradient loss as the loss function but blurs the low-frequency portion according to per-pixel confidence. erefore, it obtains a higher global score and per-pixel confidence than WGAN-VGG. e HFSGAN can generate realistic subtle structures but oversmoothens the low-frequency portion in the LDCT image as well according to the confidence map. e DD-Net can remove the noise and streak artifacts efficiently and keep high structural similarity Table 2: e PSNR and SSIM results of chest dataset and brain dataset using different methods. e chest-10% denotes that the dose of lowdose chest images is 10% of the routine dose. e brain-25% denotes that the dose of low-dose brain images is 25% of the routine dose.

Method
Chest-10% with the NDCT image according to the confidence map. However, compared with our method, it cannot restore some subtle structures accurately and obtains a lower global score than our method. As the DU-GAN is trained with the discriminator and can adjust the image quality pertinently, it can generate the photo-realistic denoised results according to the confidence map and obtains the best global score. e proposed method obtains higher per-pixel confidence than DU-GAN in the subtle structures and obtains the secondbest global score, indicating that our method achieves better performance in subtle structure restoration.

Ablation Study.
Since we introduce some novel structures and local filtered mechanism based on DD-Net, it is significant to take a comparative analysis. e detailed results are presented in Table 3, which shows the objective metrics of our modifications.

Use Improved Residual Dense Block.
As the global structure and detailed information are learned in one network, we introduce the improved residual dense block to improve the representation ability of the neural network. As is shown in Table 3, our model obtains higher PSNR and SSIM results than DD-Net when the network uses the improved residual dense block and does not introduce other improvements. Furthermore, when the DD-Net was trained with local filtered mechanism, using the improved residual dense block can solve the performance reduction remarkably, indicating that the representation ability of DD-Net is insufficient for the local filtered mechanism, and introducing the above module can enhance the performance in subtle structure restoration.

Use Local Filtered Mechanism.
As the high-frequency portion cannot reflect the incorrect subtle structures accurately and frequency-separation-based methods cannot utilize the correlation between global structures and detailed information, we introduce the local filtered mechanism. Filtering the areas with high quality, this mechanism drives the network to specially optimize the subtle structures in low quality and enhances the ability in subtle structure restoration. However, it causes performance reduction but can make the network generate more precise and realistic subtle structures. e reason is that the areas preserved by this mechanism are more difficult to optimize, and this mechanism hinders the network from dropping some subtle structures for better generalization ability.

Use Gradient Loss.
As the combination of MSE and SSIM cannot keep the edge information consistency between the denoised result and NDCT image, the network will oversmoothen some subtle structures, which is shown in Figure 7. To remedy this, we introduce the gradient loss to sense the edge information in the LDCT image. As is shown in Table 3, introducing the gradient loss can solve the performance reduction caused by the local filtered mechanism efficiently as well. In addition, as is shown in Figure 13, the color and edge information of subtle structures are enhanced through the local filtered mechanism and gradient loss, indicating that introducing

Conclusion
In this work, we propose a novel scheme for LDCTdenoising based on improved DD-Net and local filtered mechanism. As the incorrect subtle structures are usually the areas that are contaminated severely by noise and streak artifacts instead of the edge information in high-frequency domain, previous studies cannot restore the subtle structures efficiently. erefore, we present the local filtered mechanism to filter the areas with high quality and make the network optimize the subtle structures especially. Based on the original loss and correction loss of the improved DD-Net, the proposed method can accomplish the balance between network generalization and subtle structure restoration. However, as learning global structures and detailed information in one network requires more powerful feature representation ability and the edge information is significant for subtle structure restoration, we introduce the improved residual dense block and gradient loss to deepen the network structures and keep the edge information consistency between the denoised result and NDCT image, respectively. e ablation study validates the effectiveness of the above components.
e quantitative results show that our scheme can obtain higher scores in objective metrics than conventional schemes and most neural networks. Meanwhile, the visual comparison and uncertainty visualization also show that our scheme can provide a brilliant approach for subtle structure restoration in LDCT image denoising. In addition, the proposed network achieves competitive performance in both chest dataset and brain dataset, even if their radiation doses are quite different, which demonstrates the generalization ability of our network in different scenarios. However, training the network needs to, respectively, calculate the original loss and correction loss of the improved DD-Net for each batch, which introduces more computational cost. Moreover, the effectiveness of our scheme requests further validation in other image processing tasks, such as image superresolution and restoration, which is a significant research direction in the future.
Data Availability e data supporting this work are from previously reported studies and datasets, which have been cited. e processed data are available at https://wiki.cancerimagingarchive.net/ pages/viewpage.action?pageId�52758026.

Conflicts of Interest
e authors declare that there are no conflicts of interest.