Infrared Image Deblurring Based on Generative Adversarial Networks

. Blind deblurring of a single infrared image is a challenging computer vision problem. Because the blur is not only caused by the motion of diﬀerent objects but also by the relative motion and jitter of cameras, there is a change of scene depth. In this work, a method based on the GAN and channel prior discrimination is proposed for infrared image deblurring. Diﬀerent from the previous work, we combine the traditional blind deblurring method and the blind deblurring method based on the learning method, and uniform and nonuniform blurred images are considered, respectively. By training the proposed model on diﬀerent datasets, it is proved that the proposed method achieves competitive performance in terms of deblurring quality (objective and subjective).


Introduction
e main reason of motion blur is that there is rapid relative motion between the camera and the captured object during the exposure time. e blurring of images will reduce the perceptual quality of human beings. It also has a negative impact on advanced visual tasks such as object detection and semantic understanding. Image deblurring is a common and important problem in the field of image processing and computer vision. However, due to the complexity of motion blur processing, most existing methods may not produce satisfactory results when the blur kernel is complex, and the details of the required clear image are abundant. In addition, because the infrared (IR) imaging system is more complex than the natural imaging system, the degradation degree of infrared images is relatively high, such as Gaussian blur, motion blur, and noise pollution. erefore, infrared image deblurring plays an important role in the IR imaging system. Some researchers are dedicated to hardware-based research for infrared image deblurring. In literature [1], the fluttered shutter is used to solve the problem of infrared image deblurring. Literature [2] uses an ordinary inertial measurement unit (IMU) to estimate the trajectory of the camera movement during the exposure time. Oswald-Tranta et al. [3] used the parameterized Wiener filter method to blur the infrared images obtained from the infrared detector of the microbolometer. Oswald-Tranta also committed to obtaining accurate temperature measurements by deblurring infrared images [4]. Wang et al. [5] used the iterative Wiener filter to estimate the PSF filter of motion blur in infrared images. e deblurring method based on infrared imaging hardware equipment is more expensive. erefore, the algorithm-based deblurring of infrared images is more widely used. Luo et al. [6] developed a new infrared blurred image restoration model based on the principle of nonuniform exposure. In order to eliminate the motion blur of the image and restore the image, Jing et al. [7] proposed an infrared target motion deblurring method based on the Haar wavelet transform. Liua et al. [8] proposed a method of using Lp-quasi-linear norm and the overlapping sparse total variation method to blur infrared images.
Inspired by the great progress of traditional blind deblurring methods and learning-based blind deblurring methods recently, we propose a method based on GAN and channel prior discrimination. Specifically, the innovation of this article is summarized as follows: (i) A channel-based inverse prior discrimination is proposed. And this method is built into a new framework of the GAN. It improves the blind deblurring performance of infrared images. (ii) Different blur types are caused by the motion of the camera or object. In view of this situation, two different methods were used to synthesize two kinds of blurred datasets. (iii) In the experimental stage, we conducted extensive experiments which were carried out on two different datasets. e method proposed in this article is compared with the other four advanced methods qualitatively and quantitatively.

Image Deblurring.
e solutions to deblurring problems are mainly divided into two types: blind deblurring and nonblind deblurring. e early related work is mainly nonblind deblurring, that is, the ambiguity function is assumed to be known. Most of these algorithms rely on the Lucy-Richardson algorithm and Wiener or Tikhonov filter which are sensitive to noise to perform deconvolution operation and obtain IS estimation. However, in reality, ambiguity functions are often uncertain. It is unrealistic to find the ambiguity function for each pixel. erefore, a lot of recent works are focused on blind deblurring. e first modern bold attempt was Fergus et al.'s [9] variational Bayesian method to eliminate uniform camera shake. In the past decade, many methods [10][11][12][13][14][15][16][17][18][19][20] have solved the blur caused by camera shake by considering the uniform blur on the image. is kind of algorithm first estimates camera motion according to the induced blur kernel and then reverses the effect by performing deconvolution operation. Unfortunately, these algorithms are usually unable to eliminate nonuniform motion blur.
In fact, due to the camera rotation, radial camera motion, depth of field change, or rapid movement of objects, images taken in the field may experience more complex heterogeneous blur. erefore, most existing nonuniform blind deblurring methods [21][22][23][24][25][26] are based on specific motion models. For example, Gupta et al. [27] proposed to model camera motion as a motion density function. e blurring kernel of spatial variables can be derived directly from it. By specifying a prior of sparsity and compactness in density, an optimization problem is formulated, and the density function and deblurred image can be solved iteratively. A new projection motion path model is proposed in [28,29]. Another method to eliminate spatial variation ambiguity is to estimate through block-by-block blurring kernel [30][31][32]. Segmented blurring estimation [24,33] also considers the spatial variation blur caused by the object movement.
is method learnt to predict the plural Fourier coefficients of the deconvolution filter of the input patches of blurred images and then used the traditional optimization strategy to estimate the global blurring kernel from the restored patches. And Gong et al. [34] used the fully convolved network movement flow to estimate. All these methods use the CNN to estimate unknown ambiguity functions. Recently, Noroozi et al. [23] and Nah et al. [44] adopted the kernel-free end-to-end method, using the multiscale CNN to directly remove images. Tao et al.'s latest work [42] expands the multiscale CNN from [37] to scale the recursive CNN to realize image deblurring, and the effect is impressive. Ramakrishnan et al. [38] used a combination of pix2pix framework [45] and densely connected convolution network [46] to perform blind kernel-free image deblurring. ese methods can deal with different sources of blur. Since Ramakrishnan et al., the success of GAN in image restoration has also affected the deblurring of single image. Ramakrishnan et al. [38] firstly solved the problem of image deblurring by referring to the idea of image translation [45]. Recently, Kupyn et al. [36] introduced DeblurGAN; it is developed by Wasserstein GAN [47] with gradient penalty and perceived loss.

GAN.
Generative adversarial network, commonly known as GAN, was proposed by Goodfellow [48] and inspired by the zero-sum game in game theory. is game has achieved many exciting results in image restoration [49]. After style conversion [45,50,51], it can even be used in other fields. e system includes a generator G and a discriminator D; they constitute a minmax game for two. e generator tries to capture the potential actual data distribution and outputs new data samples, while the discriminator tries to distinguish whether the input data come from the real data distribution. e minmax game with the value function V(G, D) is represented by the following formula [1]. Both generator and discriminator can be constructed based on the CNN and trained based on the above ideas.
where p data (x) is the real data distribution, p (x) (z) is the model distribution, and the input z is a sample from a simple noise distribution.
GAN is known for its ability to preserve textural details in images, create solutions that are close to the real image, and be perceptually persuasive. Literature [51] was further developed; it is based on the conditional GAN [52] and trains a cyclic consistency goal. is target generates a more realistic image in the task of image migration. Inspired by this idea, Isola [45] put forward the earliest idea of image deblurring based on the GAN. Recently, great progress has been made in the related fields of image super-resolution [53] and image restoration [54] by applying the GAN.

2
International Journal of Optics

Dark Channel Prior Algorithm.
He et al. [55] proposed a defogging algorithm (DCP) based on the dark channel prior. DCP is based on the assumption that most nonsky patches of outdoor fog-free images contain some pixels. ese pixels have very low intensity in at least one color channel. For any image I, its dark channel I dark (x) is given by the following formula: in which Ω(x) represents a local color block centered on x and I c is the c-th color channel of i. e optical channel proposed in a similar article [56] is based on the assumption that the most blurred image block contains some pixels with very bright intensity in at least one color channel. For any image I, its optical channel I bright (x) is as follows: Many methods use dark channels and bright channels to complete image defogging [55,56], and they are also used to estimate the blurring kernel in conventional blind image deblurring [15,57]. In [15], Pan et al. proposed to use the regularization term based on L 0 additionally on the dark channel image to improve the gradient-based L 0 -minimization blind deblurring method [11]. In [57], Yan et al. further combined and used L 0 -based regularization in both dark and bright channel images.

Method
In this work, the purpose of the infrared image deblurring model is to restore a clear image when only the blurred infrared image is given. In this paper, the architecture, proposed in [51], is used to build two sets of GAN models. e generators are G B2S : I B ⟶I S and G S2B : I S ⟶I B . G B2S restores clear images from blurred images, while G S2B generates blurred images from clear images.
e discriminators are D B and D S . D B tries to distinguish whether the input is a blurred image, while D S tries to distinguish whether the input is sharp. e architecture of the proposed method is shown in Figure 1. e input in the method is the blurred image and clear image. e clear image is sent to the generator G S2B to generate the corresponding blurred image. e generated blurred image is sent to the generator G B2S to generate a deblurred image. e generated deblurred image and the real clear image are sent to the discriminator D S together to identify true and fake. e real blurred image is input into the generator G B2S to generate a deblurred image. e generated deblurred image is sent to the generator G S2B to synthesize the blurred image. e synthesized blurred image and the real blurred image are sent to the discriminator D B to determine the authenticity. rough continuous iteration, the generator can generate more realistic deblurred images. e algorithm flow is summarized as Algorithm 1.

Model Architecture.
e method proposed by us includes two pairs of GAN. e model architecture of one pair is shown in Figure 2; it includes two deep convolutional neural network (DCNN) modules. e generator is similar to that proposed by Johnson et al. [50], including two step convolution blocks with a step size of 0.5, nine residual blocks, and two transposed convolution blocks. An instantiation standardization layer (IN) is added after the convolution layer of each convolution module except the ResBlocks. e network structure of the discriminator is the same as that of [45]. It includes five convolution modules; except the last module, each convolution layer is followed by an IN layer and a LeakyReLU layer.
As we all know, both BN and IN layers use a batch of mean and variance to normalize features during training and use the estimated mean and variance of the whole training dataset during testing. One of the potential motivations for applying BN or IN is to accelerate the training of deep neural networks (DNNs). However, recent work [58] on singleimage super-resolution points out that the BN layer will bring artifacts in training and testing stages. Especially, these artifacts are more likely to occur with the deepening of the network and training under the framework of the GAN. When turned to blind deblurring, the above empirical discussion shows that the IN layer will bring similar artifacts, that is, irregular block color shift. erefore, no IN or BN layer is introduced in the residual block, as shown in Figure 3. e network configuration of the generator and discriminator is shown in Tables 1 and 2.

Adversarial Loss.
Adversarial loss includes generator adversarial loss and discriminator adversarial loss, where generator adversarial loss is defined as follows: Among them, the first item is the adversarial loss between the reconstructed blurred image I B and the discriminator D B . e second term is the adversarial loss between the reconstructed sharp image I S and the discriminator D S . e least square loss is better than the mean square loss in the image style conversion task. erefore, the discriminator uses the least square loss as adversarial loss:  International Journal of Optics Among them, the first term is the loss function of the discriminator D B error identification, and the second term is the loss function of the discriminator D S error identification.

Loss of Circular Perception Consistency.
For the general GAN, it is necessary to compare the reconstructed image and the original image in the training stage with a certain metric as content loss. e common choice of content loss is pixel-space loss, and the simplest is L1 or L2 loss. Because this kind of loss often produces excessively smooth pixel-space output, this leads to blurring artifacts on the generated image.
is brings negative factors to the deblurring task, so the circular perception consistency loss suggested in [58] is adopted. e purpose of circular perception consistency loss is to preserve the original image structure by looking at the combination of high-level and low-level features extracted from the second and fifth pooling layers of the VGG-16 system [59]. Under the constraints of generator G B2S : I B ⟶I S and generator G S2B : I S ⟶I B , the following formula of circular perception consistency loss is given: Among them, L cycle_perceptual1 is the cycle perception consistency loss of the generator G B2S ; L cycle_perceptual2 is the cycle perception consistency loss of the generator G S2B . e goal is to make the reconstructed image and the input image as close as possible. ϕ i,j is the feature map obtained by the VGG-16 network from the i-th largest pooling layer after the j-th convolutional layer. W i,j and H i,j are the corresponding dimensional feature maps.

Prior Loss Based on the Dark Channel and Bright
Channel. Using the bright channel and dark channel presented in formulas (2) and (3), the following two different energies are defined: and Energy bright (I S ) > Energy bright (I B ). In order to visualize the calculation results, 200 images were randomly selected, and the sum curves were provided, as shown in Figure 4.
Based on this conclusion, it is considered that clear images and blurred images can be distinguished by dark energy and bright energy defined in (9) and (10). In order to improve the GAN from the perspective of domain knowledge, the prior judgment of the traditional blind image deblurring method is taken as the training loss function: L BCP G S2B I S � Energy bright G S2B I S .

Experiment
All models are implemented by the PyTorch deep learning framework. FLIR_ADAS_1_3 dataset and LTIR dataset are used to train on a desktop with 2.20 GHz × 40 Intel Xeon (r) Silver4114 CPU, GeForce GTX 1080Ti, and 64GiB memory. In this section, the experimental results are introduced and compared with the results of mainstream methods. In addition, qualitative results are provided on real images.

Synthetic Blurring Dataset.
ere are two types of blurred images: the overall image is blurred due to the movement of the imaging device, and the partial image is blurred due to the movement of the imaging object. In order to verify that our deblurring method is effective for both types of blur, we simulate the two types of image blur through two different schemes.
For the overall image blur caused by the motion of the imaging device, we choose to use a linear blur kernel to create a synthetic blur image. Sun et al. [40] created a composite blurred image by convolving a clear natural image with one of 73 possible linear motion kernels. Xu et al. [60] also used the linear motion kernel to create synthetic blurred images. Chakrabarti [61] created a blurring kernel by  [62] that have been used for multiple datasets. However, the maximum blurring kernel size of these eight blurring kernels is 41 × 41, which is relatively small in practice. erefore, we follow the algorithm in [63] to generate four uniform blur kernels from 51 × 51 to 101 × 101 by sampling random 6D camera trajectories. en, a convolution model with 1% Gaussian noise is used to synthesize a blurred image. For the local image blur caused by the motion of the imaging object, we choose to use the average frame of the video sequence to simulate.
is is a typical method of simulating blurred image pairs [23,37]. is method can create realistic blurred images but only limits the image space to scenes with video sequences; this makes the dataset limited. Figure 5 shows a comparison of two different blur types. e blurred image generated by averaging frames shows the blur caused by moving objects and static background. e car in Figure 5(b) is blurred, but the surrounding trees are clear. e blur kernel method simulates the motion blur of the whole image caused by the motion of the camera. In Figure 5(c), the car and the surrounding trees are blurred. In order to verify the universality of our algorithm, we use the blur kernel to synthesize blurred images for the LTIR dataset and use two synthetic methods of average frame and the blur kernel for the FLIR dataset to simulate motion blur. e blurred dataset synthesized by the blur kernel method is used as the FLIR-A dataset; the blurred dataset synthesized by the average frame method is used as the FLIR-B dataset.

FLIR_ADAS_1_3 Dataset
Results. FLIR_ADAS_1_3 datasets provide annotated thermal imaging datasets and corresponding unannotated RGB images for training and verifying neural networks. Data are acquired by using the RGB camera and thermal imaging camera installed on the vehicle. e dataset contains a total of 14,452 infrared images, of which 10,228 are from multiple short videos, and 4224 are from a video with a length of 144 s. All videos come from streets and highways. e sampling rate of most pictures is two frames per second. e frame rate of the video is 30 frames per second. When there are few targets in a few environments, the sampling rate is 1 frame per second. In the experiment, 8862 8-bit infrared images are divided into 7090 image training sets and 1772 image test sets. Figure 6 1  9  17  25  33  41  49  57  65  73  81  89  97  105  113  121  129  137  145  153  161  169  177  185  193 Energy_dark Images Blur images Sharp images  1  9  17  25  33  41  49  57  65  73  81  89  97  105  113  121  129  137  145  153  161  169  177  185  193 Energy_bright Images (b) International Journal of Optics 7 shows the test images on the FLIR-A blurred dataset, and the quantitative results are shown in Table 3.
In order to further compare the deblurring effects of various methods on different types of blurred images, we compare the deblurring results of FLIR-A and FLIR-B blurred datasets. Figure 7 shows the deblurred images of different methods on the two types of blurred datasets, and the evaluation indicators are shown in Table 4. It can be seen from the subjective and objective results that our method has better deblurring performance than several other methods.
is result is particularly obvious on the FLIR-B blurred dataset. For partially blurred images caused by the motion of the imaging object, the deblurring effect of other methods is significantly reduced, the original clear background becomes more blurred, and the blurred area does not achieve the ideal deblurring effect. However, our method can restore the blurred area clearly while keeping the background clear. is has a lot to do with the idea of channel prior discrimination adopted in our method. e channel prior discrimination algorithm is based on local color patches. is makes our method have better deblurring performance in the local blurred image.

LTIR_v1_0 Dataset Results
. LTIR dataset is a thermal infrared dataset used to evaluate the tracking of a single object (STSO) in a short time. Currently, only one version is available. Version 1.0 consists of 20 infrared thermal sequences with an average length of 563 frames. is dataset is a subchallenge of the 2015 Visual Object Recognition (VOT) Challenge. In the experiment, 11,262 8-bit images are divided into a training set of 9010 images and a test set of 2252 images. Figure 8 shows the test image on the LTIR dataset.
e quantitative results are shown in Table 5.    Table 6. We can see that our proposed dark channel and bright channel a priori determination components are steadily improving PSNR and SSIM. In  particular, the dark channel a priori determination module contributes the most. When we replace the perceptual loss function with L1 and L2 loss functions, the average SSIM and PSNR both decrease. It can be seen from Figure 9 that the deblurred image generated after replacing the perceptual loss function with the L1 and L2 loss function is too smooth. In summary, in the deblurring task, the perceptual loss function is more suitable than the L1 and L2 loss functions.

Use Advanced Vision Tasks to Compare Deblurring
Results. Basic vision tasks, including image deblurring, serve for advanced vision tasks. In order to further verify the effectiveness of our method, we match the deblurred images generated by several methods with real clear images. Scale-Invariant Feature Transformation (SIFT) is a representation of Gaussian image gradient statistics in the field of feature points and is a commonly used image local feature     International Journal of Optics 13 extraction algorithm. In the matching result, the number of matching points can be used as a criterion for matching quality, and the corresponding matching points can also determine the similarity of the local features of the two images. Figure 10 shows the result of matching the deblurred image with the real clear image through the SIFT algorithm. It can be seen from the quantity that the deblurred image produced by our proposed method obtains more correct matching pairs than other methods. In this experiment, we use the classic YOLO [65] method for deblurring image target detection ( Figure 11). As can be seen, the proposed method to generate a blurred image has better detection result, and more targets can be detected.

Conclusion
Blind deblurring of a single infrared image is still a challenging computer vision problem. In this work, a method based on the GAN and channel prior discrimination is proposed for the problem of infrared image deblurring. Different from the previous deblurring work, we combine traditional blind deblurring and blind deblurring methods based on learning methods. Considering the different types of blur caused by the motion of the imaging device and the imaging object, extensive experiments were carried out on different public datasets. Experimental results show that the proposed method is more competitive than other popular image deblurring methods in terms of deblurring quality (subjective and objective) and efficiency.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.