An Image Denoising Method Based on BM4D and GAN in 3D Shearlet Domain

. To overcome the disadvantages of the traditional block-matching-based image denoising method, an image denoising method based on block matching with 4D ﬁltering (BM4D) in the 3D shearlet transform domain and a generative adversarial network is proposed. Firstly, the contaminated images are decomposed to get the shearlet coeﬃcients; then, an improved 3D block-matching algorithm is proposed in the hard threshold and wiener ﬁltering stage to get the latent clean images; the ﬁnal clean images can be obtained by training the latent clean images via a generative adversarial network (GAN).Taking the peak signal-to-noise ratio (PSNR), structural similarity (SSIM for short) of image, and edge-preserving index (EPI for short) as the evaluation criteria, experimental results demonstrate that the proposed method can not only eﬀectively remove image noise in high noisy environment, but also eﬀectively improve the visual eﬀect of the images.


Introduction
e typical image denoising methods can be commonly classified into three schemes: the filtering-based methods, the decomposition-based methods, and dictionary learningbased methods [1]. e classical filtering includes the median filter [2] and the wiener filter [3]. e basic principle of decomposition-based methods is to decompose the contaminated images into low-pass and high-pass subbands and then separate the image noise by manipulating the obtained coefficients. And, the wavelet transform is the typical decomposition tool. e basic principle of the dictionary learning-based methods is to sparsely represent the noisy image by overcomplete atoms, and only the large representation coefficients are used to reconstruct the original image while the small coefficients are discarded [4][5][6]. Another famous strategy for denoising is based on the selfsimilarity of the image, such as the block-matching and 3D filtering (BM3D) algorithm [7]. e above methods usually perform well, but they will lose the edge, texture, and other details and result in blur and block effect when the noise is heavy.
Nowadays, since the adaptive space estimation strategy for nonlocal mean [8]can effectively alleviate the high complexity and low efficiency of nonlocal mean filtering, the nonlocal similarity becomes an effective feature for image denoising [9]. Its main disadvantage, however, is the loss of the directions. So, the geometric regularity of images cannot be effectively captured to sparsely represent the features of the original images [10]. Besides, the computation of the nonlocal similarity is implemented only in the spatial domain. As reported, the multiscale geometric transformations, such as contourlet transform, can greatly help to suppress the heavy noise in the frequency domain [11]. But this scheme is limited to the mathematical properties of the selected transformation. On the other hand, the deep learning technologies, such as the convolutional neural network [12] and the generative adversarial network [13], have achieved great success in the areas of image classification, target recognition [14], and image fusion [15]. erefore, a novel image denoising method based on block matching with 4D filtering in the 3D shearlet transform domain and the generative adversarial network is proposed. e 3D shearlet transform provides better mathematical properties than the commonly used wavelet or contourlet to capture the anisotropic features of images in different scales and directions. In addition, the traditional BM3D is extended into four 4D space, which can effectively improve the edge and texture details of the images. e output of the BM4D is used as the input to the designed generative adversarial network to make full use of its good learning ability. e remainder of this paper is organized as follows. e details of the proposed method are presented in Section 2. Experimental results and discussions are shown in Section 3. Finally, the whole paper is concluded in Section 4.

2.1.
e Architecture of Proposed Method. e overall structure of the proposed method is shown in Figure 1.
rough multiscale decomposition and directional partition, the 3D shearlet coefficients are obtained. en, the coefficients are input into the hard threshold and wiener filtering contained in the BM4D model. For the hard threshold, the similar cubes are extracted from volume observation and then they are combined together. If the distance between the cubes is smaller than the setting threshold, the collaborative filters are carried out to apply on the similar cubes. In the aggregation process, the basic volume estimation is generated by the adaptive convex combination. For the process of the wiener filtering, the aggregation is implemented by the inversion of the shearlet transformation. Finally, the clean results are obtained by the designed generative adversarial network.

e 3D Shearlet Transform.
In the 3D space, the shearlet region P i (i � 1, 2, 3) is obtained by combining the function system associated with the pyramid region in the 3D Fourier space R 3 .
For d � 1, 2, 3, l � (l 1 , l 2 ) ∈ Z 2 , the 3D shear system associated with P d in the shearlet region is defined as a set: A d is the specific anisotropic expansion matrix and B d l is the specific shear matrix. More details can be found in the literature [16,17]. In Figure 2, a shearlet in 3D space is shown.

2.3.
e Improved 3D Block-Matching Algorithm. Let z: X ⟶ R be the noise form: where y is the original unknown volume signal, x ⊂ Z 3 is the 3D signal, and η(·) ∼ N(0, σ 2 ) is an independent and uniformly distributed Gaussian noise, whose standard deviation is noted as σ. e goal of the improved 3D block-matching algorithm [18]was to obtain the estimation y of y from the noise observations. e implementation of the improved algorithm was divided into two cascade stages: the hard threshold and wiener filtering [19], each of which includes 3 steps: grouping, collaborative filtering, and aggregation.
For the hard threshold, let C z xR represent a cube whose volume is L × L × L, where L ∈ N was extracted from z. e similarity between two cubes is measured by photometric distance: where ‖ · ‖ represents the summation of the squared differences between the corresponding intensities of the two cubes and the denominator L 3 is the normalization factor. No prefiltering is performed before cube matching, so the similarity of noise observations can be directly tested.
In the grouping step, cubes similar to each other are extracted from z and combined to form a group for each cube C z x R . If the distance between two cubes was not larger than the predefined threshold τ ht match , the two cubes are considered to be similar. Similar to C z x R , we here firstly define a set that contains the indexes for the cubes as follows: en, a four-dimensional group is built by the above formula: where the reference cube (represented by R) matches a set of similar cubes located in the 3D data. Particularly, the coordinates x R and x i correspond to the tail and head of the arrow connecting the cubes in formula (4), respectively. Note that since the distance from any cube to itself is always 0, according to the definition of formula (4), each formula (5) must contain at least its reference cube.
In the collaborative filtering step, a joint four-dimensional transformation T ht 4D was applied to each dimension of equation (5), respectively. en, by a hard threshold operator c ht with the threshold σλ 4D , the obtained four-dimensional group spectrum is Representing the filter group, it is transformed into the following form: For each unknown volume data y, the estimated C y x i of the original C y x i was extracted separately. Formula (7) was an overcompleted representation of the denoising data because there may be overlap between the cubes in the same group and different groups.
In the aggregation step, the redundancy is used to generate a basic volume estimation by adaptive convex combination: where w ht x R is the group-related weight and χ x i : where σ is the standard deviation of noise in z and N ht x R is the number of nonzero coefficients in formula (6). Since the coefficient always remains the same after doing the threshold operation, that is, N ht x R ≥ 1, the denominator of equation (9) is never equal to zero. e numerical N ht x R has two functions: on the one hand, it measures the sparsity of the threshold spectrum in (5) and on the other hand, it approximates the total residual noise variance of the group estimation in (6). As a result, the groups that are in a high degree of correlation will be given more weight, while other groups with larger residual noise are punished with less weight.
For the Wiener filtering, the cube matching is performed within the basic estimation of y ht . In fact, since the noise level in y ht is much smaller than that in the noise observation z, it is expected to obtain the more reliable match to make the packet data more sparse. Formally, for each reference cube C y ht x R extracted from the basic estimated y ht , its cube-like coordinate set is constructed as follows: e collaborative filtering here is implemented as an empirical Wiener filter. Similar to formula (6), it firstly uses the coordinate set (9) to extract a set of G y ht S y ht x R from y ht and then defines the empirical Wiener filter coefficients as where σ is the standard deviation of noise and T wie 4D is a transformation operator that is composed of four one-dimensional linear transformations. Such transformations are usually different from those in T ht 4D . Subsequently, the same   . An element multiplication is implemented between the spectrum of the noise group and the wiener filter coefficient formula (11) as the coefficient shrinkage rate. e group's estimations are en, the inversion of the 3D shearlet transform [14] is applied to shrink the spectrum. e final estimation of y wie is generated by convex combination, which is similar to formula (8), and formula (4) is replaced by formula (10). e aggregation weight of the specific group estimation (11) is defined by the energy of the wiener filter coefficient (12): In this way, each formula (13) provides an estimation of the total residual noise variance of the corresponding formula (12).

2.4.
e Generative Adversarial Network for Training. After obtaining the intermediate results, we can obtain the final denoising image by training the generative adversarial network (GAN) [20]. e training process is shown in Figure 3.

2.4.1.
e Generator Network G. e generated network generates a fake image from the noise image, as shown in Figure 4. e generation network consists of 11 cascaded convolution layers, which are trained to learn the label image and the residual image of the input image. Internal connection is introduced into each block to save information and reduce the training time. In order to maintain good performance and reduce computational complexity, the network adopts a bottleneck structure, in which the number of the first feature mapping, the middle layer, and the last layers are 64 layers. According to suggestion from reference [21,22], for low-level computer vision problems, a 3×3 convolution kernel is used in each convolution layer and the linear unit (ReLU) is used as the activation function. Figure 5, the discriminator network D is trained to distinguish the fake image and the real image. It has four convolution blocks and two fully connected layers. Each convolution block consists of the convolution layer, the batch normalization layer, and the ReLU activation function. e size of the core K is 3×3, and the number of filters N increases from 64 to 256. e step size S of each convolution layer is 2 to reduce the resolution of the image. e probability that the inputting image is noiseless is generated by a fully connected layer of 1024 neurons.

Adversarial Training.
e aim is to use the adversarial strategy to train a model to remove the image noising. Adversarial training is a way to train the generator network Gto generate samples from real data x ∼ p data . e generator is input into a noise variable zwith a distribution p Z and then trained to learn the mapping to the data space. e distribution of the generator model is where θ g is the parameter of the generator network. When training a generator, the essentially exception is to maximize the probability that the samples match the data, which can be noted as p data (G(z; θ g )).
To guarantee the above probability, the discriminator network D, whose input is the data sample x and the output is the D(x, θ d ), should learn to distinguish the generated samples from real samples. It must maximize the probability value assigned to the actual data samples and minimize the probability value assigned to the generated samples, that is, Both the generator and discriminator networks are alternately trained, and they try to cheat each other. Finally, when the generator has successfully learned how to generate the samples from p data , the whole process is converged.
In Figure 6, the experiments on four groups of color images show the necessity and effectiveness of using the GAN.

Experimental Results and Discussions
In this section, two groups of experiments are implemented to show the performance of the proposed method. e platform is the Dell workstation M4800 with the Intel CPU 2.5 GHz and 32G RAM, operating under Matlab and Python. PSNR [23], SSIM [24], and the edge-preserving index (EPI for short) [25] are used as objective evaluation measurements. PSNR can be computed by the following formula: PSNR(y, y) � 10 log 10 where D is the peak of y, X � x ∈ X: y(x) > 10 · D/255 , and |X| is the base of X. e SSIM is defined as where l(x, y), c(x, y), and s(x, y) can be computed as

Mathematical Problems in Engineering
in which x and y are the reference image and the image to be tested, respectively, u x and u y are the mean values of the two images, σ x and σ y are the standard deviation, σ xy is the covariance of x and y, and c 1 , c 2 , and c 3 are the small constants whose values are positive. It is mainly to avoid the instability when the denominator is 0 in the above formula. When α � β � c � 1 and c 3 � c 2 /2, then we can get SSIM(x, y) � 2u x u y + c 1 2σ x σ y + c 2    Mathematical Problems in Engineering     Based on the edge contrast, the edge-preserving index (EPI) is defined as where p s (i, j) is a pixel from the testing image, p o (i, j) is a pixel from the original image, p s and p o are located in the edge region, i is the number of rows, and j is the number of columns. e range of EPI is 0 to 1. When EPI equals to 1, the edge of the image is completely maintained. When the EPI equals to 0, it means the image becomes a plane without any change. e larger the EPI value, the stronger the edge holding ability.

Experiment 1.
Adding to the Gaussian white noise with different standard deviations, the first experiment is implemented under some popularly used natural images shown in Figure 7. Five famous methods are used to be the comparative methods, including BM3D [26], EPLL [12], TNRD [27], WNNM [28], and WSNM [29]and the proposed method (proposed for short). All the results are reported in Tables 1-3, respectively. In addition, the visual performance of different methods when the noise level σ � 25 is shown in Figure 8 to provide the subjective comparison. A small region, marked by the red rectangle, is enlarged to clearly display the visual difference.

Experiment 2.
e second experiment is implemented under Gaussian noise and Rician noise. e testing volume data is a T1 brain network voxel with a size of 181 × 217 × 181. e slice thickness is 1 mm. Without loss of generality, it is assumed that all the images are normalized to a real-valued signal in the intensity range [0, 1]. e experimental results are shown in Figures 9 and 10.
In Figures 9 and 10, the first row is one slice of the original volume data. In each column, it shows the lateral, coronal, sagittal, and cross results when the standard deviation is 0.05, 0.11, and 0.015, respectively. e results of under OB-NLM3D [30], OB-NLM3D-WM [31], ODCT3D [32], PRI-NLM3D [33], BM4D [34] and the proposed method are reported in Tables 4 and 5. All the quantitative results are shown in Tables 4 and 5, respectively. By the comparison of all the above experimental results, it strongly demonstrates the efficiency of the proposed fusion methods, both visually and quantitatively. Specially, among the algorithms considered, the PSNR and SSIM performance of the proposed method shows that the better results can be obtained, when the noise level increases. In addition, better visual effects can be obtained from the smoothness of flat areas, the preservation of details along the edges, and the accurate restoration of phantom intensity. e time complexity of the proposed method is O (N log 2 N). e cost of computing time is indeed an annoying problem at present. e main reason is that the process of our method contains more steps than other methods. e good news is that our method is not the  slowest. We think it will be effectively solved in the future by improving the hardware conditions. In addition, some parallel computing methods (CUDA will be used in our plan) can also be used to reduce the time cost. is is also the work we will deal in the future.

Conclusion
In this paper, an image denoising method that is based on the BM4D in the 3D shearlet transform domain and GAN is proposed. e proposed method can make full use of the sparse representation of the 3D shearlet transform and the good deeply learning ability of the generative adversarial network. All the experimental results prove the effectiveness and accuracy of the proposed method in the form of subjective comparison and objective quantification, strongly demonstrating the superiority of the proposed method over the traditional image denoising methods.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.