Research on Image Denoising and Super-Resolution Reconstruction Technology of Multiscale-Fusion Images

Image denoising and image super-resolution reconstruction are two important techniques for image processing. Deep learning is used to solve the problem of image denoising and super-resolution reconstruction in recent years, and it usually has better results than traditional methods. However, image denoising and super-resolution reconstruction are studied separately by state-of-the-art work. To optimally improve the image resolution, it is necessary to investigate how to integrate these two techniques. In this paper, based on Generative Adversarial Network (GAN), we propose a novel image denoising and super-resolution reconstructionmethod, i.e., multiscale-fusion GAN (MFGAN), to restore the images interfered by noises. Our contributions reflect in the following three aspects: (1) the combination of image denoising and image super-resolution reconstruction simplifies the process of upsampling and downsampling images during themodel learning, avoiding repeated input and output images operations, and improves the efficiency of image processing. (2) Motivated by the Inception structure and introducing a multiscale-fusion strategy, our method is capable of using themultiple convolution kernels with different sizes to expand the receptive field in parallel. (3).e ablation experiments verify the effectiveness of each employed lossmeasurement in our devised loss function. And our experimental studies demonstrate that the proposedmodel can effectively expand the receptive field and thus reconstruct images with high resolution and accuracy and that the proposed MFGAN method performs better than a few state-of-the-art methods.


Introduction
As images can carry a great deal of data and information, image processing technology is applicable to many fields, such as medical treatment, transportation, military, space and aerospace, and communication engineering. It has penetrated into our lives and been inseparable from each of us. However, during image collection, storage, and processing, image noise may be introduced, which interferes with the information provided by the image and reduces the image sharpness. Generally, the image noises include salt and pepper noise, gamma noise, and Gaussian noise [1]. To solve the problem, image denoising and image super-resolution reconstruction (SR) techniques [2] were investigated to restore high-quality and high-resolution images [3]. e image denoising technique reduces the information loss of the original image while removing the image noises. A variety of noise reduction methods have been proposed by researchers. Traditional noise reduction methods [4] include bilateral filtering method [5], Gaussian filtering method [6], nonlocal algorithm [7], and block matching 3D (BM3D) algorithm [8]. In recent years, deep learning is widely adopted to solve image denoising problems, where the main methods are multilayer perceptrons and fully convolutional networks. Specifically, the multilayer perceptrons based method splits the original image into several blocks and subsequently denoises the blocks one by one. Finally, all the processed image blocks are stitched together. In addition, the algorithms based on fully convolutional networks mainly consist of denoising convolutional neural network (DNCNN) [9], convolutional blind denoising network (CBDNet) [10], etc. Moreover, Generative Adversarial Network (GAN) has nowadays attracted much attention from researchers. Various network frameworks, such as deep convolutional GAN (DCGAN), CycleGAN [11], and Pix2pix, have been proposed based on GAN for image processing.
It is worth noticing that image denoising and SR are two important phases of image processing, and they are normally studied separately in state-of-the-art work. However, when we aim to complete the two tasks simultaneously, it is necessary to study the two techniques integrally. To this end, we combine the image denoising with the image super-resolution for image processing and specifically propose the multiscale-fusion GAN (MFGAN) method. e network model is designed based on the convolutional neural network and GAN. e proposed model can denoise the noise-disturbed image with high quality and efficiency and restore the image from the lower resolution to higher one by super-resolution reconstruction. e main contributions of this paper are summarized as follows: (1) Based on GAN, we organically integrate the image denosing and image super-resolution reconstruction tasks, which avoids the considerable, repeated input and output operations for upsampling and downsampling images during the model learning, and achieves a simplified workflow with high performance. (2) Derived from the Inception structure and adding the multiscale fusion into the GAN network, we successfully expand the receptive field in parallel and thereby enhance the quality of reconstructed images.
(3) We put forward a novel loss function and the effectiveness of each item of which is verified by the ablation experiments. Our experimental studies demonstrate that the proposed MFGAN method overall performs better than a few state-of-the-art methods.
e rest of this paper is organized as follows. Section 2 introduces the state-of-the-art work regarding our study. Section 3 presents the proposed model and method. Section 4 demonstrates the performance of the proposed model and shows the experiment results. Finally, Section 5 summarizes this paper.

Basic
eory of Residual Network. For convolutional neural networks, deeper network architecture can lead to better accuracy, but may result in higher computational complexity and gradient disappearance problems. e reason lies in that the deep neural network has small training gradients. e residual network is proposed to solve the problems, which can greatly improve the performance of the deep neural network. e residual network consists of a stack of residual blocks, all of which can be represented in a general form [12,13]. We assume that H(x) is the basic mapping of several stacked layers, and x represents the input of the first layer. e network can be designed with a jump connection, i.e., H(x) � F(x) + x, which means to add the two input elements. Using the jump connection, the gradient-related problems of the deep neural network can be solved without increasing the computational complexity. In addition, an approximate residual function, i.e., F(x) � H(x) − x, can be obtained with the gradual approximation of multiple nonlinear layers. Figure 1 is the diagram of the residual network module.

Basic Research on SR. Super-Resolution Convolutional
Neural Network (SRCNN) is the first one that applies deep learning technique to image SR [1], and it is also the most classic method in the field, which is fundamental to the following super-resolution reconstruction research. e network structure of SRCNN is shown in Figure 2.
SRCNN is mainly composed of three layers, i.e., feature extraction, nonlinear mapping, and image reconstruction [14], whose corresponding convolution kernel sizes are 9 × 9, 1 × 1, and 5 × 5, respectively. e feature extraction in the first layer is to extract the overlapping parts from the lowresolution image Y and express them with high-dimensional vectors. It can be formulated as where W 1 and B 1 represent filter and bias, respectively, and F 1 represents a set of feature images extracted by SRCNN. Subsequently, all the high-dimensional vectors are nonlinearly mapped to the opposite high-dimensional vectors. e obtained vectors are then used to construct the image features. e nonlinear mapping layer can be expressed by where F 2 denotes the mapping vectors in high resolution. Furthermore, the final high-resolution image is generated with a convolution layer, which is presented as where F 3 represents the final high-resolution image.

Image Quality Evaluation Method.
In order to quantitatively evaluate the performance of the network model for image denoising and restoration, two widely used metrics for image quality evaluation are utilized in this paper, i.e., Peak Signal-to-Noise Ratio (PSNR) [15] and Structural Similarity Index [16] (SSIM).

PSNR.
PSNR is often used for objectively evaluating image quality [15]. It represents the ratio of the maximum power of the effective signal to the noise power in the signal. e unit of PSNR is dB, and the mathematical expression of PSNR is PSNR � 10 × log 10 2 where MSE denotes the mean squared error between the original image and the generated image, i.e., where S ij and T ij represent the values of the pixels in the i-th row and the j-th column in the original image and the generated image, respectively. e larger MSE is, the more the differences between the original image and the generated image are and the smaller the corresponding PSNR is. It means that the larger the PSNR is, the higher the quality the generated image has.

SSIM.
SSIM is another indicator to quantify the image quality [16]. It is adopted in this paper since PSNR cannot accurately indicate the clarity of the image in some cases. e value of SSIM ranges from −1 to 1. Specifically, SSIM � 1 means that the structures of the two images are completely the same. On the contrary, the structures of the two images are completely different, if SSIM is equal to −1. SSIM is given by where μ x and μ y are the average values of all the pixels in the images m and n, respectively. σ represents a variance, and σ xy is the covariance of the two images. c 1 and c 2 are both constants.

Comparison of Image Denoising Methods based on Deep
Learning. Table 1 summarizes the image denoising methods based on deep learning in detail from the advantages, limitations, and applicability of the method.

GAN.
With the widespread usage of deep learning in image processing, more and more scholars pay attention to the efficiency and comprehensiveness of network models. In fact, image denoising and image super-resolution reconstruction are two different topics but closely related with each other in the field of image processing. Image denoising will inevitably lead to the definition and resolution reduction of the image. To this end, it is necessary to conduct super-resolution reconstruction. As shown in Figure 3, GAN can combine the image denoising method with the super-resolution reconstruction technique, which is able to simplify the processing procedure and decrease the image processing time. In this section, we redesign the network model and optimize the model parameters based on GAN in order to promote the quality of the generated image. e generator and discriminator are the key modules in GAN. e generator is mainly used to obtain data information and subsequently generate an image. Both the real image and the generated image are sent to the discriminator for training. e discriminator will identify if the generated image is a fake one. Specifically, if the generated image is regarded as a fake image, the abovementioned process will be executed again. e loop will not stop until the discriminator cannot determine the authenticity of the generated image. As a result, we can obtain a well-trained generator.

Research on Multiscale-Fusion Module.
e Inception structure proposed by Google is a new functional unit, of which the main idea is to improve the convolution kernel to increase the receptive field [17]. In this way, more image information can be learned by the network, and thus the Path extraction Low-resolution High-resolution Image (output) Non-linear mapping clarity of the image may be improved. At present, the Inception structure has been developed from Inception V1 to the current Inception V4. eir network structure and processing delay are improved version by version.
To improve network performance, the easiest way is to add the depth of the network. However, with the network depth increasing, the computational complexity will grow exponentially, which may lead to overfitting. It means that it is very difficult to optimize the network parameters. In order to solve the problem, as illustrated in Figure 4, the Inception structure uses a set of parallel convolution kernels, which expands the original convolution into one 1 × 1 convolution, one 3 × 3 convolution, one 5 × 5 convolution, and one max-pooling. In this way, the receptive field of the network can be enlarged, and thus more features can be learned.
Based on Inception structure, the GoogLeNet network is developed by Google, as presented in Figure 5. It adds one 1 × 1 convolution to each branch in the network, which can reduce the number of network parameters.
It is well known that the dimensionality of the convolutional neural network cannot be reduced to a certain level; otherwise it may lead to information loss and bring a negative impact on the network training. erefore, to solve the problem, two methods are proposed for the Inception structure to segment the convolution kernel. e first one is to decompose a large convolution kernel into two or more small convolution kernels equivalently. It can obtain the same receptive field and decrease the network parameters, which means that adding the depth of the network will not result in the degradation of the network performance. Moreover, the second method is to decompose a symmetric convolution kernel into multiple small asymmetric convolution kernels. It will also decrease the number of the parameters and can improve the expressive ability of the model and learn more features.
At present, the deep learning network is becoming more and more complex and the network load is growing as well, which tends to cause overfitting and gradient disappearance. Since the abovementioned residual network also has a good performance on image processing, it is potential to combine  the Inception structure with the residual structure for feature fusion, which can solve the problem to the greatest extent.

Multiscale-Fusion GAN.
In this section, we integrate the SR model and the Inception structure into GAN for image denoising processing and SR. e diagram of the designed network model is shown in Figure 6.

3.3.1.
Generator. e input of the generator is a fuzzy image with noises, while the output is a generated image with high resolution. Figure 7 shows the schematic diagram of the generator's network model based on the convolutional neural network considering the multiscale-fusion structure. Specifically, the convolution kernels with different sizes can read information of different dimensions in the image. e jump connections are added between any two consecutive layers.
e information of the entire image can be read maximally in this way, thus improving the performance of the final image generation. Moreover, PRelu [18] is used as the activation function here, instead of Relu, as a number of experiments present that use PRelu can reduce the number of dead neurons in the network.

Discriminator.
e diagram of the discriminator's network model is shown in Figure 8 which is to distinguish the super-resolution image generated by the generator from the real image in the training set. e network is designed based on the super-resolution GAN (SRGAN). Experiments demonstrate that eight convolutional layers [19] can result in good network performance. As the number of network layers continues to rise, the number of features that the network can obtain will grow as well, and the feature size decreases. e backpropagation can be performed even when the input value is negative, which is very suitable for discriminating networks. In addition, we use Leaky Relu in the network as the activation function, which can solve the problem of neuron death.

Loss Function.
e super-resolution reconstructed image generated by the computer has high accuracy. It is difficult for human to recognize the difference of the image using naked eyes. erefore, we need a suitable loss function to evaluate the prediction accuracy of the model. In this section, based on SRGAN, we design a loss function that combines perceptual loss with adversarial loss to optimize the model, which can be formulated as where l MSE denotes the perceptual loss, and it uses the traditional MSE. s is the downsampling factor. sW and sH represent the image width and height, respectively. θ is the feedforward propagation parameter of the corresponding generator or discriminator. e difference between the two images is obtained by calculating the difference between the pixel values of the two images. e larger the MSE is, the larger the difference between the two images is. l Gen denotes the adversarial loss and is used to test the performance of the discriminant network to ensure that the generated fake image can fool the discriminator.
Moreover, experimental results demonstrate that the proposed network model is optimal when the coefficient ratio of l MSE and l Gen is 1 : 10 −3 .  Mobile Information Systems e Adam gradient descent algorithm is used for training, the initial learning rate is set to 0.001, and the batch size is set to 64. e datasets used in the experiment include CIFAR-10, CIFAR-100, VOC 2012, RENOIR [20], BSD100, and selfmade ImageNet.

Experiment Implementation and Results.
e experiment uses the imnoise function in the MATLAB compiler to perform noise processing on the measured dataset. We compared the performance between the noise coefficient of 0.2 and 0.05. In the case of the noise coefficient of 0.5, the image damage is more serious and it is difficult to denoise well, so this situation is discarded by us. In addition, the amount of noise reduction is evaluated under different levels of Gaussian noise interference. Two kinds of noise are used here, namely, Gaussian noise and salt and pepper noise. Tables 2 and 3, respectively, show the experimental results of PSNR and SSIM under different datasets and different noise figures. Compare the table longitudinally. Due to the image characteristics of different datasets, not every dataset is suitable for denoising experiments. Among them, BSD100 is the most suitable for denoising experiments, and the results obtained are relatively good.
Comparing the results in Tables 2 and 3 horizontally, after performing super-resolution reconstruction, the performance of the denoising method based on GAN network is significantly better than other methods; in particular, SRGAN and ADGAN have excellent performance, and under different network settings they show similar results, but the MFGAN proposed in the article has better experimental results in most cases. Based on the above comparison and the comparison of the image clarity seen by the naked eye mentioned below, the performance of the MFGAN proposed in the article is slightly better than that of SRGAN and ADGAN, which effectively improves the denoising effect.
We evaluated the experimental results from a statistical point of view through the Friedman test [34] and the Holm post hoc test [35]. Friedman test is used to calculate the average ranking of the compared methods and to determine whether the observed differences are statistically significant. We set the significance level of the test to 0.05. If the p-value is less than 0.05, the null hypothesis H 0 is rejected and we can confirm that there are significant differences. e Holm post hoc test is then performed to evaluate the statistical differences between the control (i.e., the method that achieves the best Friedman rank) and other methods. e results of   the Friedman test are shown in Table 4, and the results of the Holm post hoc test are shown in Table 5.
As shown in Table 4, the Friedman test results reveal that the MFGAN method performs better than the other seven classification methods in classification accuracy. And the results of Holm's post hoc in Table 5 also show that, compared with other methods, MFGAN has better performance.
is once again proves that our proposed MFGAN has achieved better results in denoising. Figure 9 compares the generated images between SRGAN and the proposed MFGAN. e images are obtained based on the training of 100 rounds. e first column of the images is generated by SRGAN, the second is the original training image, and the third one is the generated image of MFGAN.
rough the figure, we can discover that the generated images by the MFGAN show better rendering performance than SRGAN. Although the reconstruction quality is different from images, MFGAN can obtain higher accuracy than SRGAN, especially for the images at the first row and the fifth row. In addition, compared with SRGAN, the proposed method also improves the expansion of the receptive field.
Furthermore, we conduct ablation experiments to verify the importance of perceived loss and adversarial loss for the experimental results. As shown in Table 6, when the perceived loss is used alone as the loss function in the experiment, the overfitting occurs and the generated image is      pixelated. In addition, if the adversarial loss is used alone in the experiment, the PSNR of the generated image is generally between 20 and 21, which is significantly lower than the original one. erefore, both the perceived loss and the adversarial loss in the experiment are of significance to ensure the integrity and accuracy of the experimental results.

Conclusion and Future Work
In my paper, we propose a GAN-based method, which is enabled to combine image denoising with image superresolution reconstruction for image processing. e proposed method improves the residual network in SRGAN and increases the receptive field by adding the idea of multiscale fusion. Moreover, the activation function is selected and the problem of neuron death is solved by the method.   Furthermore, we also design the loss function to improve the discriminator and the accuracy of the generated image. In practice, multiple types of noises coexist with each other. When the image is severely interfered by various types of noises, the performance of the proposed model may be affected. erefore, it is required to improve the model for the scenario with different noises in future work. For example, the coefficient ratio of the perception loss and the counter loss should be studied. How to intelligently determine the appropriate loss coefficient ratio according to noises is required to be solved as well. In addition, the introduced Inception structure leads to a long computational delay for the model training. Hence, it is worth investigating the method to improve the training speed in the future.

Data Availability
e datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.