Image Super-Resolution Using Lightweight Multiscale Residual Dense Network

,


Introduction
e purpose of image super-resolution reconstruction is to recover the corresponding high-resolution image from one or more low-resolution images. At present, the technology has been successfully applied to such fields as remote sensing, medical imaging, image compression, video monitoring, and military. As an important image processing technology, image super-resolution has received wide attention from researchers, and many effective super-resolution methods have been proposed [1]. e existing image super-resolution methods mainly include the following: interpolation-based methods [2], reconstruction-based methods [1,3,4], and learning-based methods [5][6][7]. Interpolation-based methods include bilinear interpolation, nearest neighbor interpolation, bicubic interpolation, and edge information based image interpolation [2]. ese methods have lower computational complexity, but the high-frequency detail information of the original image is often lost, resulting in poor visual quality.
Reconstruction-based methods are based on the assumption that the low-resolution image is generated from the highresolution image with downsampling, deformation, translation, and noise. Combined with prior knowledge, the highresolution image can be recovered by optimization. Although reconstruction-based methods can retain more image details, the reconstruction effect is susceptible to the parameters of the image degradation model and the regularization model. Meanwhile, when the resolution is improved more than the scale factor of 4×, the reconstruction effect is usually not ideal. Compared with the other two types of methods, learningbased methods can introduce more high-frequency information and have better robustness to noise, so they have become a research hot spot in recent years. e basic idea of learning-based methods is to establish a relationship between low-and high-resolution images through learning and then use this relationship to guide high-resolution image reconstruction. Freeman et al. [5] first established this relationship based on Markov random field, but this method required large amounts of time to construct training sets, and image reconstruction was also time-consuming. Chang and Yeung [6] proposed a super-resolution reconstruction method based on neighborhood embedding, assuming that the high-and low-resolution image blocks have the same geometry manifold. Due to the excellent performance of sparse representation in computer vision tasks [8][9][10][11][12][13], Yang et al. [7] proposed a super-resolution reconstruction method based on sparse representation, which assumes that high-and lowresolution images have the same sparse coding coefficients.
In recent years, deep learning has attracted wide attention because it displays excellent performance in image super-resolution. A series of effective super-resolution algorithms based on deep learning have been proposed [14][15][16][17][18][19][20].
e deep learning-based methods usually use a single-scale network to extract image features. Under the constraint of minimizing the loss function, the network can extract the texture details from the low-resolution image to restore the high-resolution image. However, some feature information at different scales will be lost when extracting features through the single-scale network, resulting in unsatisfactory image quality of super-resolution.
To solve the above problem, we propose a multiscale feature extraction model for single image super-resolution. e model can extract features at different scales and in different receptive fields, which improves the efficiency of feature extraction considerably while reducing the depth of the network. erefore, our method can not only make the model lighter and improve its training efficiency but also effectively avoid the significant degradation of the quality of the reconstructed image. Specifically, we design a multiscale residual dense network to extract the feature information on different scales, and propose to integrate the features of each layer to realize the sharing of multiscale information. us, the method can receive more information from different receptive fields, which is very beneficial to avoid information loss [21]. In addition, as we know, convolutional neural network (CNN) introduces adaptive parameters by adding a fully connected layer to the network, resulting in an increase in parameter size. Inspired by the idea of lightweight network [22], we propose to add lightweight parameters at each scale. us, the method ensures the quality of reconstruction without obvious degradation. Moreover, it reduces the parameter scale of the model, enhances the nonlinear mapping ability of the network, and improves the efficiency of the algorithm. e major contributions and innovations of our work are summarized as follows: (i) In order to obtain high-resolution image from the corresponding low-resolution image, we design a multiscale residual dense network. e network can extract the features of different scales and improve the efficiency of the algorithm, while keeping the reconstruction quality from degrading significantly. (ii) In multiscale reconstruction, a lightweight parameter learning method is developed and added to each scale to enhance the nonlinear mapping ability of the network. Different from the existing residual dense networks, we do not introduce the common fully connected layer after the feature extraction at various scales but use a lightweight method to generate multiscale parameters. (iii) e proposed method not only uses multiple scales to extract feature information under different receptive fields but also adopts dilated convolution to increase the area of receptive fields. It can extract more receptive field features under different scales with the same number of parameters, so that the recovered high-resolution image retains more feature information.
e remainder of this paper is organized as follows: some related work is briefly reviewed in Section 2. Section 3 describes the proposed network structure. e experimental results and analysis are presented in Section 4. Finally, we conclude our work in Section 5.

Related Work
In recent years, researchers have witnessed the impressive performance of deep learning in single image super-resolution. In particular, Dong et al. [23] first applied deep learning to image super-resolution and proposed super-resolution convolutional neural network (SRCNN), which achieved clear reconstruction results. In order to solve the problem of a large number of parameters in SRCNN, researchers have proposed various solutions. Dong et al. [24] improved SRCNN and proposed fast super-resolution convolutional neural networks (FSRCNNs).
is method first performs convolution and extracts features at the low-resolution stage, and then generates super-resolution images with upsampling at the end of the network. Kim et al. a deeply-recursive convolutional network (DRCN) [14] and a very deep convolutional network (VDSR) [25] for image super-resolution. Based on the idea of residual, VDSR reduces the training parameters by adding a skip connection to learn the residual parameters instead of all parameters. Zhang et al. [16] proposed an image super-resolution reconstruction method based on residual dense network (RDN).
is method well solves the problems of gradient vanishing and low convergence efficiency in training and the problem of progressive loss of image information during the convolution process. Ledig et al. [17] proposed a super-resolution generative adversarial network (SRGAN). is method is based on adversarial learning and uses the adversarial training of generators and discriminators to generate texture details consistent with the distribution of natural images. e above methods all extract image features at a single scale and perform super-resolution. However, they ignore the complementary information of image features at different scales, leading to the loss of some information of the source image.
is will hinder the reconstruction of detailed high-resolution images.
In order to solve the above problems, researchers have proposed the concept and model of multiscale convolutional neural networks in recent years [18,26,27]. e multiscale methods use convolution kernels of different scales to extract features on different scale layers of the image, and then fuse them, thus alleviating the loss of image feature information and improving the quality of super-resolution. Specifically, Hu et al. [27] proposed a multiscale convolutional neural network, which can effectively extract feature information under different receptive fields. It performs feature fusion of different scale layers after each feature extraction module and then extracts the residuals between the fusion information of adjacent modules. Gao and Zhuang [26] developed a multiscale super-resolution method based on the deep neural network and showed the advantages of the multiscale residual dense network in feature extraction compared with the single-scale network. In addition, the enhanced deep super-resolution network [18] also utilizes multiscale residual blocks to eliminate gradient disappearance and gradient explosion. In the meanwhile, it adds a fixed hyperparameter to the multiscale network to enhance the network's ability, so that it can fit feature information at each scale and remove the batch normalization layer, thereby reducing the scale of parameters. e existing multiscale models can solve the problem that the single-scale model cannot extract enough feature information because of too few branches. However, there are still some issues to consider: (1) by increasing the number of branches of the network, the models can enhance the ability of the network to extract complementary feature information. However, most of them do not consider the multiscale basis to further enhance extraction capability. (2) Most of the existing models input the multiscale information from the previous multiscale feature extraction block directly into the later feature extraction block after fusion, which is likely to cause the problem of gradient vanishing. To solve this problem, a multiscale dense residual network is proposed. In this method, the convolution kernel with different receptive fields is set up at different scales to integrate the advantages of multiscale, and the feature information of different receptive fields is also extracted. In this paper, the residual feature is directly input into the subsequent layer instead of multiscale fusion feature, which speeds up the convergence rate of the network. Moreover, dilated convolution is used to expand the receptive field without changing the parameters of the convolution kernel, which reduces the scale of parameters compared with the direct use of different sizes of convolution kernel.

Our Approach
We propose a super-resolution model based on MRDN. e model consists of three modules: shallow feature extraction, multiscale deep feature extraction, and reconstruction. In the shallow feature extraction module, a 3×3 convolution layer is used for shallow feature extraction. Let F 0 denote the output of shallow feature extraction block, which also is the input of the multiscale deep feature extraction module. In the multiscale deep feature extraction module, we have N multiscale residual blocks (MRBs). Each MRB contains M multiscale fusion layers (MFLs). Each MFL contains three different scale feature extraction branches. Each scale contains a dilated convolution with a kernel size of 3, and the dilated ratio is 1, 3, and 5, respectively. In order to prevent grid effect, a convolution kernel of the same size as the dilated ratio is added before the dilated convolution of each scale. Each MRB is linked in the form of dense connection to ensure that the extracted feature information of each layer is not lost. Meanwhile, the information flows quickly to the convolution layer behind, accelerating the convergence speed. In the reconstruction module, upsampling is performed by 3×3 deconvolution to generate a high-resolution feature map after reducing the dimension by a 1×1 convolution. e specific network structure is shown in Figure 1.

Multiscale Deep Feature Extraction Module.
e existing deep network cannot extract the feature information under different receptive fields. To solve this problem, we propose a multiscale residual block (MRB), which can extract the feature information under different receptive fields and produce better texture details of the recovered high-resolution image. e MRB is composed of three parts: feature extraction layer, multiscale fusion layers, and residual learning. e proposed multiscale residual network structure is shown in Figure 2.
Let F n−1 and F n denote the input and output of the n-th MRB, respectively. In the feature extraction layer, features are extracted, and the feature map generated by this layer is directly transmitted to the end to obtain the residual, which accelerates network convergence and prevents gradients from disappearing. e formula is as follows: where FS n is the output of convolution with a scale of k � 3, relu is the ReLU activation function, and sconv is an equalsize convolution; that is, the input feature map is the same size as the output feature map by adding 0, and k � 3 is the convolution kernel size. FS n is used as input to multiscale fusion layers. Each MRB consists of M MFLs with a scale of 3, and the output of the former MFL is the input of the latter MFL. We use dilated convolution for feature extraction, which can expand the receptive field of the feature map without changing the parameters. By this method, the feature information of different receptive fields at different scales is extracted. Moreover, in the multiscale fusion layer, we propose a lightweight parameter method to simulate the channel attention mechanism. Instead of using the fully connected layer to generate adaptive parameters such as channel attention, the proposed method uses the lightweight parameters. e lightweight parameters are actually learnable tensors, which can be generated by built-in function. e tensors are introduced into the MRDN and become the trainable parameters. As shown in Figure 2, the lightweight parameters such as p1, p2, and p3 can be initialized to 1. In the feature fusion stage, the concatenation and 1 × 1 convolution are used to fuse the feature maps extracted from three-scale feature extraction branches.
e formula for the multiscale feature fusion layer is as follows: where F1 m n , F2 m n , and F3 m n are three outputs of the m-th MFL in n-th MRB and F1 m−1 n , F2 m−1 n , and F3 m−1 n are its inputs. F1 0 n � F2 0 n � F3 0 n � FS n when m � 1. k is the convolution kernel size, and d is the dilation rate, which is used to control the size of the dilated convolution. In the process of dilated convolution, part of the feature image information may not be convoluted as the dilated convolution complement 0 increases the receptive fields, resulting in information loss. erefore, the common convolution of the same size as the dilated convolution is performed before the dilated convolution, which can effectively eliminate the mesh effect of the dilated convolution and a large number of parameters. At the end of multiscale fusion layers, the feature information extracted from the M-th MFL is fused by concatenation and 1 × 1 convolution: where F1 M n , F2 M n , and F3 M n are the output of three scales in the last MFL. concat is concatenating operation.
In the multiscale residual block, the residual learning is utilized to further improve the feature map by where FR n denotes the feature after residual learning. At the end of the multiscale residual block, the 3 × 3 convolution is used to further extract the feature, and the final output of the n-th MRB can be formulated as

Loss Function.
In this paper, we use L1 loss defined in equation (6) to make the reconstructed super-resolution image I SR approximate to the real high-resolution image I HR :  size of high-resolution label sets is 128 × 128, and that of low-resolution training sets is 64 × 64. When training the network model with magnification factor 4, the image size of high-resolution label sets is 256 × 256, and that of lowresolution training sets is 64 × 64.

Implementation Details.
In the experiment, the training network contains 8 MRBs, each containing 3 MFLs. Each MFL has 3 feature extraction branches with different receptive fields. L1 loss is used to make the reconstructed super-resolution image approximate to the real high-resolution image. e learning rate in our method is set to be 1 and remains unchanged during the iteration. e parameters of the whole MRDN are trained by back propagation until the model converges. In addition, we set epoch � 2200 when training on DIV2K, and epoch � 1200 when training on the power image set.
In the testing process, a group of high-resolution power images and three groups of high-resolution natural images in DIV2K dataset are selected to verify the effectiveness of the proposed method. e low-resolution images are obtained by downsampling the selected high-resolution images with scale factors of 2× and 4×. In addition, the proposed method is compared with SRCNN [23], FSRCNN [24], SRGAN [17], and RDN [16]. We use peak signal-tonoise ratio (PSNR) and the structural similarity (SSIM) to evaluate the quality of super-resolution reconstruction results objectively. PSNR can measure the mean square error between the reconstructed image and the original one. A larger PSNR means less image distortion and higher quality of the reconstructed image. SSIM can measure the structural similarity between the reconstructed image and the original one. e greater SSIM, the closer the reconstructed image is to the original.

Comparisons with the State-of-the-Arts.
For the power image, the results of super-resolution reconstruction with scale factors of 2× and 4× using different methods are shown in Figures 3 and 4. From the perspective of visual effect, the results of FSRCNN have artifacts, and the reconstructed images are blurred. e 2×super-resolution results of SRCNN, SRGAN, and RDN have better visual effect, but the 4× results are blur and have jagged edges. e 2× and 4× super-resolution reconstruction results of the proposed method have good visual effects. is is because the method can well integrate the feature information under different receptive fields and generate a reconstructed result with rich high-frequency information such as edge details and textures. To further compare the performance of different methods, quantitative assessments are presented in Table 1. As can be seen from these results, the proposed method achieves the highest values of PSNR and SSIM on the power image. Compared with RDN, our method improves PSNR and SSIM by 1.86 dB and 0.01, respectively, for magnification factor 2, and by 4.13 dB and 0.03, respectively, for magnification factor 4. erefore, in terms of subjective and objective evaluation, the multiscale fusion model proposed in this paper is superior to the single-scale single receptive field models, such as SRCNN, FSRCNN, SRGAN, and RDN. us, the effectiveness of our method is verified In the second experiment, the "baby," "mountain," and "girl" images from DIV2K are used to further validate the effectiveness of the proposed method. e results of 2× and 4× super-resolution reconstruction of different methods are shown in Figures 5-10. In terms of visual effect, the reconstructed image of SRCNN displays jagged edges, and SRGAN produces details which are not consistent with the real texture. e results produced by FSRCNN lose more high-frequency details and have poor visual effect. e 2× results of RDN have better visual effects, but its 4× results lose details. Based on these results, we can see that our method can better fit the real high-resolution image and enhance the brightness of the reconstructed high-resolution image while recovering its texture and removing its sharp edge. Table 1 shows the quantitative evaluation of these reconstructed results. From these data, it can be seen that the PSNR value of the 4× superresolution result of the "mountain" image by RDN is slightly higher than that of our method, and our method outperforms  Table 2, and the visual results are illustrated in Figure 11. From these results, it can be seen that our method achieves improvement of PSNR which demonstrates the   effectiveness of adding lightweight parameters to the multiscale networks. e reason is mainly because this design can further enhance the ability of nonlinear mapping at each scale, thus improving the quality of the generated high-resolution images. Although some parameters are added in the training, the scale of the parameters is greatly reduced compared with the traditional method of using the fully connected layer. More importantly, while increasing the nonlinear mapping ability, the complexity of the algorithm is not increased greatly.

Effectiveness Analysis of Lightweight Model.
In order to verify the validity of the lightweight model, we replace the lightweight parameter with the traditional channel attention mechanism (the nonlightweight method) to compare their efficiency. For the fairness of comparison, both methods use power dataset to train the model and the low-resolution images are generated by downsampling their corresponding high-resolution version. Table 3 shows that the proposed multiscale lightweight method improves the training efficiency by 0.5 h under the same 1200 epoch, which proves the effectiveness of the proposed algorithm. In addition, as can be seen from Table 3, the proposed lightweight model improves the objective quality compared with the traditional approach. is shows that the algorithm not only reduces the complexity of the model training and improves the training efficiency of the algorithm but also keeps the quality of the results from degrading.

Discussion of Multiscale Selection.
In the above experiments, our method is compared with the single-scale network, which demonstrates its validity and superiority. However, in the multiscale selection, it is verified whether the three-scale selection is optimal. To this end, we analyze the quality of the reconstructed image at different scales and the training time of the algorithm. In this process, the power image dataset is used as the training set to train the model, and the reconstructed high-resolution results are shown in Table 4. As can be seen from these results, the reconstruction performance of this algorithm is the best, and the training efficiency of the algorithm has not been greatly reduced in the three-scale feature learning. On the contrary, under the four-scale condition, the quality of image reconstruction decreases, and the training efficiency of the algorithm degrades from 4.5 h to 9.5 h. is shows that the feature extraction at three scales is an appropriate choice.  (Figure 11(a)) 0.91 (Figure 11(a)) 25.56 (Figure 11(c)) 0.91 (Figure 11(c)) Our method 28.05 (Figure 11(b)) 0.91 (Figure 11(b)) 26.81 (Figure 11(d)) 0.91 (Figure 11

Conclusion
is paper proposes a new lightweight dense residual network based on multiscale analysis for single image superresolution. e method can avoid the problem of losing feature information when extracting features by the singlescale network. e power image dataset and the natural scene image dataset (DIV2K) are used to train the network separately, and the power test images and the natural test images are employed to verify the effectiveness and performance of the model. e experiments demonstrate the validity of our algorithm. After analysis, the following conclusions are drawn: (1) compared with the single-scale model, the multiscale residual dense network proposed in this paper can extract the feature information of different scale layers and different sizes of receptive fields, which is very conducive to the extraction of image feature information. (2) Lightweight parameters reduce the redundancy of the algorithm effectively while enhancing the nonlinear mapping of the network, and the experiments indicate that using three scales to construct the network model produces the best performance.

Data Availability
e power images used to support the findings of this study are supplied by the Electric Power Research Institute of Yunnan Power Grid Co., Ltd., under license and so cannot be made freely available. e DIV2K dataset is available at https://data.vision.ee.ethz.ch/cvl/DIV2K/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.