Infrared and Visible Image Fusion Based on Iterative Control of Anisotropic Diffusion and Regional Gradient Structure

To improve the fusion performance of infrared and visible images and e ﬀ ectively retain the edge structure information of the image, a fusion algorithm based on iterative control of anisotropic di ﬀ usion and regional gradient structure is proposed. First, the iterative control operator is introduced into the anisotropic di ﬀ usion model to e ﬀ ectively control the number of iterations. Then, the image is decomposed into a structure layer containing detail information and a base layer containing residual energy information. According to the characteristics of di ﬀ erent layers, di ﬀ erent fusion schemes are utilized. The structure layer is fused by combining the regional structure operator and the structure tensor matrix, and the base layer is fused through the Visual Saliency Map. Finally, the fusion image is obtained by reconstructing the structure layer and the energy layer. Experimental results show that the proposed algorithm can not only e ﬀ ectively deal with the fusion of infrared and visible images but also has high e ﬃ ciency in calculation.


Introduction
In recent years, UAVs have played an increasingly important role in many fields due to their high flexibility, low cost, and easy operation, which are often used for battlefield reconnaissance, battle situation assessment, target recognition, and tracking in the military. Now, image sensors in UAVS can acquire multiple types of images such as multispectral images, visible images, and infrared images [1]. However, due to the limitation of environmental conditions such as light, imaging with only one sensor will be affected by certain factors and cannot meet the requirements of practical applications. The combination of multiple imaging sensors can overcome the shortcomings of a single sensor and obtain more reliable and comprehensive information. The imaging sensors commonly used in UAVs are infrared sensors and visible sensors. The infrared sensors use the principle of thermal radiation to obtain images with larger infrared targets, but the targets are not clear and the edges are blurred [2]. The visible sensors use the principle of light reflection to obtain clear images with clear details, but under low-visibility conditions, the images have limitations. Research has found that the effective combination of infrared images and visible images can result in a more comprehensive and accurate scene or target, which provides strong support for subsequent task processing [3].
The more widely used methods in the field of infrared and visible image fusion can be roughly classified into MST-based methods [4], sparse representation-based methods [5], spatial domain-based methods [6], and deep learning-based methods [7]. At present, the most researched and applied methods are MST-based methods, including wavelet transform [8], Laplacian pyramid transform [9], nonsubsampled shear wave transform [10], and nonsubsampled contourlet transform [11]. These methods decompose the source images in multiple scales, then fuse them separately according to certain fusion rules, and finally get the fusion result through inverse transformation, which can extract the salient information in the images and get better performance. For example, nonsubsampled contourlet transform is utilized by Huang et al. [11] to decompose the source images to obtain precise decomposition. However, due to the lack of spatial consistency in the traditional MST methods, structural or brightness distortion may appear in the result.
In addition, image fusion methods with edge preserving filtering [12] are also receiving attention. Edge-preserving filtering can effectively reduce the halo artifacts around the edges in the fusion results while retaining the edge information of the image contour and has a good visual performance. Popular methods are mean filtering [13], bilateral filtering [14], joint bilateral filtering [15], and guided filtering [16]. These methods complete decomposition according to the spatial structure of the images to achieve spatial consistency, so as to achieve the purpose of smoothing the texture and preserving edge detail information. For example, Zhu et al. [16] proposed a novel fast single-image dehazing algorithm by using guided filtering to decomposition the images, and it obtained good performance. The edge-preserving fusion algorithms maintain spatial consistency and effectively improve the phenomenon of fusion image distortion or artifacts, but there are certain limitations: (1) it will introduce detail "halos" at the edges; (2) when the input images and the guide images are inconsistent, the filtering will be insensitive or even fail; and (3) it is difficult to meet the requirements of fusion performance, time efficiency, and noise robustness simultaneously.
Inspired by the previous research, this article focuses on reducing "halos" at the edges to retain the edge structure information and obtaining better decomposition performance in both noise-free and noise-perturbed images. In this paper, a new infrared and visible image fusion method based on iterative control of anisotropic diffusion and regional gradient structure operator is proposed. Anisotropic diffusion is utilized to deconstruct the source image into a structure layer and a base layer. Then, the structure layer is processed by using the gradient-based structure tensor matrix and the regional structure operator. Due to the weak detail and high energy of the base layer, the Visual Saliency Map (VSM) is utilized to fuse the base layer. By reconstructing the two prefusion components, the final fusion image can be obtained.
The main contributions of the proposed method can be summarized as follows: (1) A novel method of infrared and visible image fusion is proposed. The anisotropic diffusion model with a control iteration operator is proposed to adaptively control the number of iterations, so the image is decomposed adaptively into a structure layer with rich edges and detail information and a base layer with pure energy information. Especially, the computational efficiency is greatly improved (2) The regional structure operator is proposed into the structure tensor matrix, which can effectively extract information such as image details, contrast, and structure. It can also greatly improve the detection ability of weak structures and obtain structure images with good prefusion performance (3) Since anisotropic diffusion can effectively deal with noise, the proposed method also has a good perfor-mance on noisy image fusion. In addition, the algorithm is widely used and it is also suitable for other types of image fusion The paper is organized as follows. Section 2 briefly reviews the anisotropic diffusion and structure tensor theory and introduces new operators. Section 3 describes the proposed infrared and visible image fusion algorithm in detail. Section 4 introduces related experiments and compares with several current advanced algorithms. Finally, the conclusion is discussed in Section 5.

Related Theories
2.1. Anisotropic Diffusion Based on Iterative Control. Anisotropic diffusion [17] can be utilized to smooth the image and maintain the image details and edge information. Compared with other filtering methods, it is more suitable for image decomposition processing. The anisotropic diffusion equation is expressed as where vðx, y, tÞ is the flux function or diffusion rate of diffusion, ∇ is the Laplacian operator, Δ is the gradient operator, and t is the time or scale or iteration. Equation (1) can be regarded as a discrete square matrix, and the four nearest neighbor discretizations of Laplacian can be used: where I t+1 i,j is the coarser resolution image at t + 1 scale, which is influenced by I t i,j . μ is a constant with 0 ≤ μ ≤ 1/4. D N , D S , D W , and D E are the nearest difference values in the four directions of North, South, West, and East, respectively, which can be defined by and v N , v S , v W , and v E are the conduction coefficients or flux functions in the four directions of North, South, West, and Journal of Sensors where gðj•jÞ is a monotonically decreasing function with g ð0Þ = 1 and gð•Þ is the "edge stop" function or the differential coefficient, which has a very important influence on the noise suppression and edge retention ability of anisotropic diffusion. The image format in this paper is obtained by image processing technology.
The scale space weighed by these two functions is different. The first function is for the abrupt areas with large gradients, namely, the edge and detail areas. The second function is for flat areas with small gradients. Both functions consist of a free parameter k.
The anisotropic diffusion is a differential iterative process, in which the number of iterations is a key issue. If it is overiterated, it will lead to oversmoothing; but if the number of iterations is not enough, the detail components cannot be separated effectively. Moreover, the number of iterations for noisy images and the number of iterations for noisefree images are also uncertain. Therefore, an iterative control operator θ is introduced to control k, thereby adaptively controlling the number of iterations and reasonably separating structural information such as gradients and details. And it can also be improved in computational efficiency.
where K 0 is the empirical value for controlling the diffusion strength, which usually is set by 30. It can be seen from Equation (6) that the value of k is related to the edge strength of the region boundary, and the value of k is updated through positive and negative excitation by θ to obtain the optimal number of iterations. Get the most effective and accurate separation results.
The anisotropic diffusion of the image I is simply represented by anisoðIÞ. After the image is diffused through anisotropy, since the iterative control operator can precisely control the number of iterations, almost all the vibration and repetitive context can be effectively preserved in the structure layer, while the energy information and weak edges are preserved in the base layer. Figure 1 shows the base layer and structure layer images obtained after anisotropic diffusion decomposition. It can be clearly seen that the images are basically consistent with the theoretical analysis.

Gradient-Based Structure Tensor
Matrix. Gradient is the rate of change, which is reflected by the difference between a central pixel and surrounding pixels. It can be used to accurately reflect the texture details, contour features, and structural components in the image. The structure tensor is an effective method to analyse the gradient problem, and it has been applied to a variety of image processing tasks.
The gradient operator [18] is described as follows. For a local window Θðx, yÞ of any ε ⟶ 0 + in the direction β, the square of the change of the image Iðx, yÞ at the point ðx, yÞ is In any direction β at the point ðx, yÞ, the change rate C ðβÞ of the local features of the image Iðx, yÞ is To make better analysis of gradient features and effectively realize image processing, the structure tensor matrix S is introduced. And CðβÞ can be expressed as where 3

Journal of Sensors
The two extreme values of the structure tensor S can be expressed as The structural characteristics of the local area of the image are related to the extreme value of the matrix. Generally, if the two extreme values are relatively small, it indicates that the region does not have gradient characteristics; that is, the region is located in the isotropic part. Otherwise, it means that the local area of the area has obvious changes and contains certain structural information, because in the image area saliency measurement, a wide range of structure types are involved. Finally, the structural saliency operator SSO is defined according to [19] as

Fusion Framework
Based on the above theories, a new image fusion framework is constructed, as shown in Figure 2. Different from the tra-ditional decomposition scheme, in order to make better use of the useful information in the original image, first, the iterative control anisotropic diffusion is utilized to decompose the source image into base and structure components. At this time, most of the gradients and edges can be effectively preserved in the structure layer, and the base layer contains the remaining energy information. Then, according to the characteristics of each layer, different fusion rules are introduced to acquire the prefusion of each layer. Among them, for the fusion of the structure layer, the prefusion is effectively realized through the regional gradient structure; for the base layer, the prefusion is performed through the VSM. Finally, the fusion result is obtained by reconstructing the two prefusion layers.
3.1. Anisotropic Decomposition. Let the source imagesfI n ðx, yÞg N n=1 be all coregistered. The base layer is obtained through the anisotropic diffusion model in the previous section with smooth edges: where I B n ðx, yÞ is the nth base layer and anisoðI n ðx, yÞÞ represents the anisotropic diffusion process on the nth source  After anisotropic decomposition, a structure layer with rich outline and texture details and a base layer with intensity information can be obtained.

Fusion of Structure Layers.
Since the structure saliency operator (SSO) in the previous section can effectively detect the gradient structure information of the images, SSO can be used to prefuse the structure layers. However, due to the lack of intensity variables, SSO cannot accurately detect the weak feature information in the images. In order to improve the structure detection ability, the regional structure operator (RSO) is introduced to improve the performance of SSO. RSO is the regional structural component with ðx, yÞ as the center position; then, the regional gradient structure (RGS) can be expressed as where SS I ðx, yÞ is the salient image produced by SSO, and RS I ðx, yÞ represents the regional structure feature at position ðx, yÞ, which can be expressed as where N controls the size of the region and influences the efficiency and effect of fusion. Through comparing the RGS of the input image, the structure saliency map M 1 ðx, yÞ of the image I S 1 ðx, yÞ is calculated:

Journal of Sensors
where Θ is a central local area in ðx, yÞ whose size is T × T. Therefore, the prefusion structure layer image F S I ðx, yÞ can be expressed by 3.3. Fusion of Base Layers. Since the base layers contain less details, the weighted average technology based on VSM [20] is used to fuse the base layer F B I . First, VSM is constructed; let I P represent the intensity value of a pixel p in the image I. The saliency value VðpÞ of pixel p is defined as where j represents the pixel intensity, M j represents the number of pixels whose intensity is equal to j, and L represents the number of gray levels (in this case, 256 After obtaining these two prefusion components, the final fusion image F I is

Experimental Analysis and Results
In order to verify the effectiveness and reliability of the algorithm in this paper, multiple pairs of images are utilized for experimental verification, and the results are analysed through subjective vision and objective quantitative evaluation. After setting the algorithm parameters, the experimental results are displayed and discussed.  7 Journal of Sensors fusion through infrared feature extraction and visual information preservation (IFEVIP) proposed by Zhang et al. [24], and multisensor image fusion based on fourth-order partial differential equations (FPDE) proposed by Bavirisetti et al. [25]. In addition, the fusion performance is quantitatively evaluated by six indicators, including entropy (EN) [26], edge information retention (Q AB/F ) [27], Chen-Blum's index (Q CB ) [28], mutual information (MI) [29], structural similarity (SSIM) [30], and peak signal-to-noise ratio (PSNR) [31]. Although the structure is better preserved, the details are relatively weakened and lost. The IFEVIP method maintains a good contrast, but the visual effect is too enhanced, especially in the partially enlarged areas, resulting in obvious error in the result. The FPDE method has the phenomenon of blurred internal features. The CNN method has obtained a relatively good fusion result, but its image is somewhat unnatural, and the colour of the result in Figure 5(c4) contains errors. Therefore, the proposed method can effectively separate the component information of different images, preserve the useful information of the source images into the fusion images, and obtain the best visual performance in the aspect of edge and detail preservation.

Objective Evaluation.
Except for subjective evaluation, the fusion results are quantitatively evaluated, and the results are shown in Table 1, in which the best results are labelled in bold. According to the data in the table, it can be seen that the objective evaluation of the proposed method is significantly higher than other methods. In all quantitative evaluations, only a few places are not optimal, but they do not affect the advantages of the method in this paper. In addition, Figure 6 shows the bar chart comparison of EN, Q AB/F , Q CB , MI, SSIM, and PSNR values of various fusion methods for the car example.
In summary, for infrared and visible fusion, the method in this paper has a good performance both subjectively and objectively.  Figure 7, the fusion results have both high spatial resolution and high spectral resolution, and the fused images have a strong ability to express structure and details. The objective evaluation results are shown in Figure 8. It can be seen from the visual and objective results that this algorithm can effectively retain high-spatial and hyperspectral information and can improve the accuracy of subsequent processing of remote sensing images.

4.4.
Computational Efficiency. The methods tested in this paper are all carried out in the same experimental environment. The average implementation time of six pairs of images is compared as shown in Table 2. It can be seen that the calculation efficiency of the proposed algorithm has a considerable advantage over the comparison algorithms.

Conclusions
In this paper, an infrared and visible image fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. The algorithm makes full use of the advantages of anisotropic diffusion and improves the decomposition efficiency and effect through iterative control operators. The regional gradient structure operator is intro-duced to fully extract the detailed information in the structure layer to obtain a better fusion performance. Many experimental results show that this algorithm is significantly better than existing methods in terms of subjective and objective evaluation. In addition, higher calculation efficiency and stronger antinoise performance can be obtained, and the algorithm can be effectively applied to other types of image fusion situations.

Data Availability
The data used to support the findings of this paper are available from http://imagefusion.org/.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.