Single image super-resolution (SISR) is a traditional image restoration problem. Given an image with low resolution (LR), the task of SISR is to find the homologous high-resolution (HR) image. As an ill-posed problem, there are works for SISR problem from different points of view. Recently, deep learning has shown its amazing performance in different image processing tasks. There are works for image super-resolution based on convolutional neural network (CNN). In this paper, we propose an adaptive residual channel attention network for image super-resolution. We first analyze the limitation of residual connection structure and propose an adaptive design for suitable feature fusion. Besides the adaptive connection, channel attention is proposed to adjust the importance distribution among different channels. A novel adaptive residual channel attention block (ARCB) is proposed in this paper with channel attention and adaptive connection. Then, a simple but effective upscale block design is proposed for different scales. We build our adaptive residual channel attention network (ARCN) with proposed ARCBs and upscale block. Experimental results show that our network could not only achieve better PSNR/SSIM performances on several testing benchmarks but also recover structural textures more effectively.
Super-resolution (SR) is an important issue in the image restoration area. The task of single image super-resolution (SISR) is to find high-resolution (HR) images from the low-resolution (LR) images. Since it is an ill-posed problem, there are potential high-resolution images corresponding to an identical image with low resolution. There are practical applications with SISR methods, such as video quality enhancement, remote sensing image processing, and MRI analysis. To find the most suitable HR images, there are various methods for SISR problem and other image restoration tasks.
Deep learning has shown its amazing performance in various tasks [
ResNet [
Attention is used for human brain simulation. When watching pictures, human’s brain usually focuses on more important area. There are attention methods for image processing tasks. SENet [
In this paper, we propose a novel adaptive residual channel attention block (ARCB) for image super-resolution. Different from vanilla residual blocks, an adaptive weight is learned from paired data for combining information of main path and shortcut. Considering the different importance of channels in residual blocks, channel attention is introduced in ARCB for weight distribution on channels. Besides block designs, recent works design special upscale modules for different scaling factors. In this paper, we introduce a simple but effective general upscale block design for different factors. The adaptive residual channel attention network (ARCN) is designed based on ARCBs and proposed upscale block. Experiments are performed on several testing benchmarks. The results show that our ARCN could not only achieve better performance on PSNR/SSIM comparison but also recover complex structural textures more effectively.
The contributions of this paper could be demonstrated as follows: We propose a novel block named ARCB with channel attention mechanism. In ARCB, we propose an adaptive residual connection with learned weights. The weight factors could find suitable ratios for combination information from different paths. Channel attention mechanism in ARCB distributes different weights on channels for concentrating more on important information. We propose a tiny but effective upscale block design method. With the proposed design, our network could be flexibly analogized for different scaling factors. Experimental results show that our proposed ARCN achieves better PSNR/SSIM results on several testing benchmarks and recovers more complex structural textures than other methods.
Let
Convolutional neural network (CNN) has been proved as an effective tool for image restoration [
A deeper network will cause a large amount of parameters. Recursive design with sharing parameters is one of the choices to build lightweight networks. There are recursive networks for SISR problem. DRCN [
Recently, there are works with good performances focusing on different block designs and network pipelines [
Attention mechanism was firstly proposed by human brain simulation. When watching an image or reading a sentence, the important areas will be paid more attention. There are different attention methods used in image processing. There are four kinds of attention mechanisms: item-wise soft attention, item-wise hard attention, location-wise soft attention, and location-wise hard attention. The difference between item-wise attention and location-wise attention is input form. Special sequential items are required for item-wise attention, while location-wise attention needs a single feature map. From another point of view, attentions could be separated as soft and hard attention. Soft attention focuses more on different areas and channels. After training, soft attentions will be generated by networks. Besides soft attention, hard attention concentrates more on different pixels. Hard attention is a random prediction procedure, which is usually implemented by reinforcement learning.
Spatial transformer network (STN) [
In this section, we will describe the proposed ARCN. In ARCN, an adaptive residual channel attention block named ARCB is proposed to compose the network. Adaptive factors in ARCB for different information importance are learned while training. After adaptive residual connection, channel attention mechanism distributes the weights on channels, which considers the importance from another point of view. The main body of ARCN is composed of several ARCBs and a padding structure. A global skip connection is introduced to ARCN for residual learning. After the main body, an effective and tiny upscale module is designed for changing the scaling factors flexibly. We will introduce proposed ARCN in the following manner: Firstly, the network design will be described in general. After description, the details of ARCB will be discussed with channel attention. Detailed introduction of flexible upscale block will follow the description of ARCB. Finally, some comparisons will be done with other SISR works.
The entire network structure is shown in Figure
The architecture of our proposed adaptive residual channel attention network (ARCN).
There are three modules in the proposed ARCN. Firstly, feature extraction module extracts feature maps from input LR images. After feature extraction, nonlinear mapping module processes the feature maps from LR space into HR space. A skip connection is applied to nonlinear mapping module for global residual learning. Finally, restoration module with a flexible upscale block restores the HR images from proposed feature maps.
There is one convolution layer in feature extraction module. The layer extracts low-level features from LR image and builds the feature maps. Let
After feature extraction, several ARCBs are applied in nonlinear mapping modules for mapping feature maps from LR space to HR space. Let us denote
After K blocks, there is a padding structure composed of two convolution layers with ReLU activation. The padding structure is used to increase the network depth and weight the information from main path for global residual learning. The operation of padding structure and global residual learning could be demonstrated as
Finally, an effective upscale block is applied in restoration module. In restoration module, the final HR image
ARCB is introduced to the network with adaptive residual connection and channel attention. An illustration of proposed ARCB is shown in Figure
Structure of different blocks. (a) Res blocks. (b) Proposed ARCB.
There are two convolution layers with ReLU activation in ARCB. Different from ResBlock, which is used in most of SISR works, a channel attention layer is designed after the convolution layers. The attention layer weights information from different channels. After that, learned adaptive factor
Let
The adaptive factor
Another main difference between vanilla ResBlock and ARCB is the channel attention. Convolution layers treat information from different channels equally. To concentrate more on important channels, channel attention is introduced to ARCB. The structure of channel attention is shown in Figure
Structure of channel attention.
From Figure
Let us denote by
There is a Sigmoid activation between the excitation and multiplication. On one hand, it will be helpful for nonlinearity. On the other hand, Sigmoid activation will convert weights to be negative. Since there is no negative for human vision system, it is designed to fit the biological process.
Upscale block is widely used in various works for SISR problem, which increases the resolution of feature maps and restores the final HR image. There are different upscale block designs for different scaling factors without a unified pattern. In this paper, we proposed a flexible upscale block design pattern. With the proposed design, the structure could be easily modified for different scaling factors. The structure of our proposed flexible upscale blocks is shown in Figure
Structure of proposed flexible upscale block with scaling factor ×4.
As shown in Figure
There are two benefits of using the flexible upscale block design. On one hand, there is only one convolution layer in the block, which saves the parameters and decreases the computation complexity. On the other hand, when the scaling factor is changed, the only modification of the block is the channel number of convolution layers. After changing, the main body of the network could be find-tuned for a new factor with few iterations.
There is a main difference between the proposed block and others. In other designs, there is a convolution layer after the last pixel-shuffle or deconvolution layer. Usually, it is used to restore the HR images with 3 channels from feature maps. However, in our proposed block, the restoration is proposed by the only convolution layer. On one hand, it is corresponding to the feature extraction module, which is also composed of only one convolution layer.
To introduce the design in detail, there are examples of different scaling factors. The special configurations are shown in Table
Upscale block configurations for different scaling factors.
Scale | Layers | Input channels | Output channels | Kernel size |
---|---|---|---|---|
×2 | Conv | 64 | 64 × 2 × 2 | 3 × 3 |
PS | 64 × 2 × 2 | 64 | — | |
×3 | Conv | 64 | 64 × 3 × 3 | 3 × 3 |
PS | 64 × 3 × 3 | 64 | — | |
×4 | Conv | 64 | 64 × 4 × 4 | 3 × 3 |
PS | 64 × 4 × 4 | 64 | — | |
×8 | Conv | 64 | 64 × 8 × 8 | 3 × 3 |
PS | 64 × 8 × 8 | 64 | — |
Difference from DRDN [ Motivation on global and local residual learning: In ARCN, global and local residual learning strategies are jointly applied for feature exploration. The residual connections can effectively solve the gradient vanishing problem, which make the network deeper. The local residual connection in ARCB ensures the gradient flow, while the global residual learning in ARCN guarantees the identical information transmission, which improves the network capacity and representation.
In proposed ARCN, there are
Our ARCN is trained on DIV2K [
Five testing benchmarks are used to evaluate the network performance. They are Set5 [
We compare our ARCN with some SISR works: SRCNN [
Quantitative PSNR/SSIM comparison for scaling factors ×2, ×3, and ×4, on testing benchmarks Set5, Set14, B100, Urban100, and Manga109. Our performance is shown in bold.
Scale | Method | Set5 [ | Set14 [ | B100 [ | Urban100 [ | Magan109 [ |
---|---|---|---|---|---|---|
×2 | Bicubic | 33.66/0.9299 | 30.24/0.8688 | 29.56/0.8431 | 26.88/0.8403 | 30.80/0.9339 |
SRCNN [ | 36.66/0.9542 | 32.45/0.9067 | 31.36/0.8879 | 29.50/0.8946 | 35.60/0.9750 | |
VDSR [ | 37.53/0.9587 | 33.03/0.9124 | 31.90/0.8960 | 30.76/0.9140 | 37.22/0.9750 | |
FSRCNN [ | 37.00/0.9558 | 32.63/0.9088 | 31.53/0.8920 | 29.88/0.9020 | 36.67/0.9710 | |
DRCN [ | 37.63/0.9588 | 33.04/0.9118 | 31.85/0.8942 | 30.75/0.9133 | 37.55/0.9732 | |
LapSRN [ | 37.52/0.9591 | 32.99/0.9124 | 31.80/0.8952 | 30.41/0.9103 | 37.27/0.9740 | |
DRRN [ | 37.74/0.9591 | 33.23/0.9542 | 32.05/0.8973 | 31.23/0.9188 | 37.88/0.9749 | |
MemNet [ | 37.78/0.9597 | 33.28/0.9142 | 32.08/0.8978 | 31.31/0.9195 | 37.72/0.9542 | |
IDN [ | 37.83/0.9600 | 33.30/0.9148 | 32.08/0.8985 | 31.27/0.9196 | 38.01/0.9740 | |
EDSR (B) [ | 37.99/0.9604 | 33.57/0.9175 | 32.16/0.8994 | 31.92/0.9272 | 38.54/0.9749 | |
SRMDNF [ | 37.79/0.9601 | 33.32/0.9159 | 32.05/0.8985 | 31.33/0.9204 | 38.07/0.9769 | |
CARN [ | 37.72/0.9590 | 33.52/0.9166 | 36.66/0.8978 | 31.92/0.9256 | 38.36/0.9761 | |
Ms-LapSRN [ | 37.62/0.9600 | 33.13/0.9130 | 31.93/0.8970 | 30.82/0.0150 | 37.38/0.9765 | |
×3 | Bicubic | 30.39/0.8682 | 27.55/0.7742 | 27.21/0.7385 | 24.46/0.7349 | 26.95/0.8556 |
SRCNN [ | 32.75/0.9090 | 29.30/0.8215 | 28.41/0.7863 | 26.24/0.7989 | 30.48/0.9117 | |
FSRCNN [ | 33.18/0.9140 | 29.37/0.8240 | 28.53/0.7910 | 26.43/0.8080 | 31.10/0.9210 | |
VDSR [ | 33.66/0.9213 | 29.77/0.8314 | 28.82/0.7976 | 27.14/0.8279 | 32.01/0.9340 | |
DRCN [ | 33.82/0.9226 | 29.96/0.8311 | 28.80/0.7963 | 27.53/0.8276 | 32.66/0.9343 | |
LapSRN [ | 33.81/0.9220 | 29.79/0.8325 | 28.82/0.7980 | 27.07/0.8275 | 32.21/0.9350 | |
DRRN [ | 34.03/0.9244 | 29.99/0.8349 | 28.95/0.8004 | 27.53/0.8378 | 32.71/0.9379 | |
MemNet [ | 34.09/0.9248 | 30.00/0.8350 | 28.96/0.8001 | 27.56/0.8376 | 32.51/0.9369 | |
IDN [ | 34.11/0.9253 | 29.99/0.8354 | 28.95/0.8013 | 27.42/0.8359 | 32.71/0.9381 | |
EDSR (B) [ | 34.37/0.9270 | 30.28/0.8417 | 29.09/0.8052 | 28.15/0.8527 | 33.45/0.9439 | |
SRMDNF [ | 34.12/0.9542 | 30.04/0.8382 | 28.97/0.8025 | 27.57/0.8398 | 33.00/0.9403 | |
CARN [ | 34.29/0.9542 | 30.29/0.8407 | 29.06/0.8034 | 28.06/0.8493 | 33.50/0.9440 | |
Ms-LapSRN [ | 33.88/0.9230 | 29.89/0.8340 | 28.87/0.8000 | 27.23/0.8310 | 32.28/0.9360 | |
×4 | Bicubic | 28.42/0.8104 | 26.00/0.7027 | 25.96/0.6675 | 23.14/0.6577 | 24.89/0.7866 |
SRCNN [ | 30.48/0.8628 | 27.50/0.7513 | 26.90/0.7101 | 24.52/0.7221 | 27.58/0.8555 | |
FSRCNN [ | 30.72/0.8660 | 27.61/0.7550 | 26.98/0.7150 | 24.62/0.7280 | 27.90/0.8610 | |
VDSR [ | 31.35/0.8838 | 28.01/0.7674 | 27.29/0.7251 | 25.18/0.7524 | 28.83/0.8870 | |
DRCN [ | 31.53/0.8854 | 28.02/0.7670 | 27.23/0.7233 | 25.14/0.7510 | 28.93/0.8854 | |
LapSRN [ | 31.54/0.8842 | 28.09/0.7700 | 27.32/0.7275 | 25.21/0.7562 | 29.09/0.8900 | |
DRRN [ | 31.68/0.8888 | 28.21/0.7720 | 27.38/0.7284 | 25.44/0.7638 | 29.45/0.8946 | |
MemNet [ | 31.74/0.8893 | 38.26/0.7723 | 27.40/0.7281 | 25.50/0.7630 | 29.42/0.8942 | |
IDN [ | 31.82/0.8903 | 28.25/0.7730 | 27.41/0.7297 | 25.41/0.7632 | 29.41/0.8942 | |
EDSR (B) [ | 32.09/0.8938 | 28.58/0.7813 | 27.57/0.7357 | 26.04/0.7849 | 30.35/0.9067 | |
SRMDNF [ | 31.96/0.8925 | 28.35/0.7787 | 27.49/0.7337 | 25.68/0.7731 | 30.09/0.9024 | |
CARN [ | 32.13/0.8937 | 28.60/0.7806 | 27.58/0.7349 | 26.07/0.7837 | 30.47/0.9084 | |
Ms-LapSRN [ | 31.62/0.8870 | 28.16/0.7720 | 27.36/0.7290 | 25.32/0.7600 | 29.18/0.8920 | |
From Table
Quantitative comparison on parameters and performance for scaling factor ×4. Our results are shown in bold.
Model | Param | Set5 | Set14 | B100 | Urban100 | Manga109 |
---|---|---|---|---|---|---|
CARN [ | 1.592 M | 32.13/0.8937 | 28.60/0.7806 | 27.58/0.7349 | 26.07/0.7837 | 30.47/0.9084 |
EDSR (B) [ | 1.518 M | 32.09/0.8938 | 28.58/0.7813 | 27.57/0.7357 | 26.04/0.7849 | 30.35/0.9067 |
SRMDNF [ | 1.552 M | 31.96/0.8925 | 28.35/0.7787 | 27.49/0.7337 | 25.68/0.7731 | 30.09/0.9024 |
DRCN [ | 1.774 M | 31.53/0.8854 | 28.02/0.7670 | 27.23/0.7233 | 25.14/0.7510 | 28.93/0.8854 |
Visualization comparisons are shown in Figure
Visual quality comparisons of different methods with B1 × 4 degradation. The images are chosen from Urban100 dataset. From the results, ARCN could recover the structural information more effectively.
Illustration of different features from ARCB. (a) Processed feature. (b) Shortcut.
Illustration of learned
Quantitative PSNR/SSIM comparison on adaptive weights for scaling factor x4. Our results are shown in bold.
Weight | Set5 | Set14 | B100 | Urban100 | Manga109 |
---|---|---|---|---|---|
w/o | |||||
W | 32.09/0.8931 | 28.48/0.7794 | 27.50/0.7337 | 25.88/0.7801 | 30.20/0.9044 |
Comparisons on parameters of two upscale blocks with different scaling factors.
Scaling factor | ×2 | ×3 | ×4 | ×5 | ×6 | ×7 | ×8 |
---|---|---|---|---|---|---|---|
Cascading | 119443 | 334083 | 297115 | 924931 | 481795 | 1811203 | 44867 |
Proposed | 6924 | 11579 | 27696 | 43275 | 62316 | 84819 | 110784 |
Study on parameters: From the design, our proposed flexible upscale block could save the parameters. To show the comparison on parameter and performance, we test the model on five benchmarks. The quantitative results are shown in Table Study on adaptive factors: To demonstrate the effect of adaptive factors, we illustrate the learned features from two different parts of ARCB. As shown in Figure Furthermore, to demonstrate the learned weights of different ARCBs, we show Study on upscale block: In this paper, we propose an efficient upscale block for arbitrary upscaling factors. As a substitution, the vanilla cascading design and proposed efficient one hold competitive performance. However, the proposed upscale block holds a much smaller number of parameters. A comparison of two upscale blocks with different scaling factors is shown in Table
In this paper, we proposed a novel adaptive residual channel attention network named ARCN for single image super-resolution (SISR) problem. In the proposed ARCN, adaptive residual channel attention block (ARCB) was designed for better performance. Mixture factors in ARCB were learned while training, which weighted the information from two paths in blocks adaptively. Channel attention mechanism was introduced to ARCB for distributing the importance among different channels. Besides ARCB, a tiny but flexible upscale block design was proposed for different scaling factors. Experimental results showed that our proposed ARCN could not only achieve competitive or better performance with fewer parameters than other lightweight works but also recover the complex structural textures more effectively.
In the future, more reference-free perceptual assessments will be performed to demonstrate the network performance. Furthermore, more experiments will be conducted on real-world datasets.
The image and quantitative comparison data used to support the findings of this study are included within the article.
The authors declare that they have no conflicts of interest.
This paper was supported by the Science Foundation of Shenyang University of Chemical Technology (LQ2020020).