Infrared and Visible Image Fusion Combining Interesting Region Detection and Nonsubsampled Contourlet Transform

The most fundamental purpose of infrared (IR) and visible (VI) image fusion is to integrate the useful information and produce a new image which has higher reliability and understandability for human or computer vision. In order to better preserve the interesting region and its corresponding detail information, a novel multiscale fusion scheme based on interesting region detection is proposed in this paper. Firstly, the MeanShift is used to detect the interesting region with the salient objects and the background region of IR and VI. Then the interesting regions are processed by the guided filter. Next, the nonsubsampled contourlet transform (NSCT) is used for background region decomposition of IR and VI to get a low-frequency and a series of high-frequency layers. An improved weighted average method based on per-pixel weighted average is used to fuse the lowfrequency layer. The pulse-coupled neural network (PCNN) is used to fuse each high-frequency layer. Finally, the fused image is obtained by fusing the fused interesting region and the fused background region. Experimental results demonstrate that the proposed algorithm can integrate more background details as well as highlight the interesting region with the salient objects, which is superior to the conventional methods in objective quality evaluations and visual inspection.


Introduction
Image fusion is an important branch of information science, which has been widely used in many fields, such as bioinformatics, medical image processing, and military target visualization.Especially in military field, infrared (IR) and visible (VI) image fusion is important to military science technology, such as automatic military target detection and localization.As a hot image fusion field, it has attracted the attention of many researchers [1][2][3][4][5][6][7].The key problem of IR and VI image fusion is to integrate and extract the feature information of the source images to produce a new image which is more reliable and understandable, and the fused image not only has the detailed texture information of VI image but also can highlight the target area in an IR image.
There are many different algorithms for the IR and VI image fusion that have been proposed and developed over the past few decades.The early fusion methods such as intensity-hue-saturation (IHS) and principal component analysis (PCA) were to process pixel values on spatial domain, which were traditional classical methods, but the fusion effect was limited compared with other excellent fusion methods [8][9][10].Many fusion methods based on multiscale transform (MST) have become popular in recent years, such as Laplacian pyramid (LP), wavelet transform (WT), discrete wavelet transform (DWT), and nonsubsampled contourlet transform (NSCT) [11][12][13][14][15][16].Due to the excellent characteristics of the multiscale decomposition method, the MST-based method could get a good fusion effect compared with early fusion methods, such as NSCT-PCNN [17].However, these methods usually failed to highlight the target information in the fused image.IR image target detection-based method is another popular IR and VI image fusion method; these methods detected the target region of the IR image firstly, then fused the background regions using other methods to get the fused background image, and finally fused the target region and background regions directly to get a new image.The advantages of these methods can fully retain the infrared target information in the fused image, but commonly, these infrared target regions of the fused image will lack the corresponding detail information in the VI image.Our previous work proposed a fusion algorithm which was based on target extraction; it was useful to highlight the target in the infrared image due to the target region which was directly fused into the final image [8].Taking into account the shortcomings of these algorithms, in order to overcome these problems, a novel IR and VI image fusion method is proposed in this paper.Compared with our previous work, we improved the accuracy of interesting region detection where it contains highlighting target and heat sources.In addition, in order to enrich the visible information in the interesting region, we also adopt fusion strategy to fuse them.
The first step of the proposed method is to detect the interesting region which contains significant target in the IR image by the MeanShift method.The MeanShift has many applications, such as clustering, discontinuity preserving smoothing, object contour detection, image segmentation, and nonrigid object tracking [15,18].We will use it to detect the interesting region with a significant infrared target from the background regions in the IR image.In order to fully retain and highlight the interesting region and significant target information in the fused image, the interesting region will be taken as a separate component and directly fused into the finally image.But the interesting region extracted from the IR image will lose the details of the corresponding region of the VI image.To solve this problem, we use the guided filter to fuse the interesting regions of IR and its corresponding VI image [19][20][21].The interesting region of the IR image serves as the guidance image and the interesting region of VI image as the input image.The guided image filter was proposed in 2013 by He et al. [19]; the guided filter has many good characteristics, such as edge-preserving and image smoothing.So we use it to preserve the edge of the VI image; the produced interesting region contains the significant target information as well as the detail information.
Next, the background regions will be decomposed by nonsubsampled contourlet transform to get a lowfrequency and a series of high-frequency layers.NSCT as an effective decomposition tool was proposed by Da Cunha et al. [16].NSCT has many good properties of timefrequency localization, multidirection, and multiscale; therefore, it has been widely used in image fusion compared with other multiscale-based methods [22][23][24].For the lowfrequency layer, we proposed an improved weighted average method based on per-pixel weighted average.Due to the characteristics of the low-frequency layer (hazy image), the per-pixel weighted average based method is effective and will be detailedly described in the fusion rule section.For the high-frequency layers, the pulse-coupled neural network (PCNN) will be used to process each high-frequency layer.PCNN was proposed by Eckhorn et al. [25].Since it was introduced, it has been widely used in the field of image processing, such as image segmentation, image enhancement, image edge detection, and image fusion [26,27].In the proposed method, the spatial frequency (SF) metric of the high-frequency layers will be used as external incentive information of the PCNN model, which makes it better to deal with overexposed or weak exposure images and make the fusion result more suitable for human visual inspection.
The remaining sections of this paper are organized as follows: the related work and proposed methods are introduced in Section 2, including the interesting region detection and fusion, the background region fusion, and concrete fusion steps.Experimental result comparisons and analysis are given in Section 3. The conclusions are shown in Section 4.

Related Work and Proposed Methods
2.1.Related Work 2.1.1.MeanShift Algorithm.The most important function of the MeanShift is as a tool for computing probability density function in a set of data samples [28].It has been widely used in discontinuity preserving smoothing, object contour detection, and image segmentation.
Given a finite number of data points x 1 , … , x n in the d-dimensional space R d , a multivariate kernel density function is defined as where With the kernel K(x) being a bounded function with the following properties, where c is a constant and those of equations mean normalized, symmetric, and exponential weight decay, respectively.The normal kernel K(x) is computed by Estimate the kernel density gradient by and using the normal kernel form, (5) can be rewritten as 2 Journal of Sensors where g x = −k ′ x .We use the MeanShift to process the IR image, cluster the infrared target pixels, and obtain the interesting region of IR.For image clustering and segmentation, we treat the image as data points in the spatial and gray level domains; two radially symmetric kernels will be used which are defined as follows: where x s is the spatial coordinate, x r is the range of a feature vector in color space, and h s and h r are the employed kernel bandwidths.An example of the interesting region detection of IR with different bandwidths is given in Figure 1.We can see from Figure 1 that the MeanShift method can effectively extract the IR image region which we interested in and the infrared target information accurately.Compared with IR, the interesting region of VI contains more detailed information.In order to highlight the interesting region of IR and enrich the details of the corresponding region of VI in the fused image, when the interesting regions of IR and VI are determined, the guided filter is used to fuse ones.
2.1.2.Guided Filter.The guided filter is an edge-preserving filter and can compute the filter output by considering the content of the guidance image.There are many good characteristics of the guided filter, especially in edge detail preservation [18,19,29].The filtered output image is very similar to the input image, and it also contains both texture and detail information of the guidance image, as shown in Figure 2.
Supposing that the guidance image is I, the detail description of the guided filter is given as follows: where O is the linear transformation of I, ω k is a local window, in which the pixel k is the center, and the coefficients a k and b k are constant, to make the input image and the output image as similar as possible; we minimize the variance between the output image O and the input image P as follows: where a k is the mean and b k is the variance of the local window ω k in the image I, ω is the total number of pixels in the local window ω k , and P K is the mean of the input image P in the local window ω k .Figure 2 shows a set of examples of the guided filter.
It can be seen from Figure 2 that the guidance image contains a large number of detail and texture information.And the input image just contains significant regional information but lacks of detail, texture, and edge information.As can be seen from Figure 2(c), the output image of the guided filter is consistent with the input image, but it only contains detailed texture information in the corresponding region of the guidance image.This is also suitable to process the saliency target in the IR image and its corresponding region in the VI image.Through the guided filter, we can fuse the detail information into the interesting region in the IR image; in this way, the produced new interesting region contains both salient object and detail information.
It can be seen from Figure 3 that the source image can be decomposed by NSCT to get a low-frequency layer and a series of high-frequency layers.All obtained layers are the same size with the source image.Figure 4 shows an example of NSCT decomposition.In Figure 4, we decompose the source image into four layers, each of which is decomposed into four images in four directions; we select two images from each layer as shown in Figure 4. Figure 4(b) is the lowfrequency layer; it can be seen that it contains only the low frequency information of the source image without high frequency details.From level 1 to level 4 are the high-frequency layers, which show the detail information from different levels to different levels.

Pulse-Coupled Neural Network (PCNN).
The PCNN is a single-layered artificial neural network [25].A basic neuron 4 Journal of Sensors of PCNN contains the receptive field, the modulation field, and the pulse generator, which are shown in Figure 5.
The receptive field of PCNN can be described in detail as follows: where S ij is the input stimulus at pixel (i, j) of the source image, F ij is the feeding input of it, matrices W and B are the constant synaptic weight, α F and α L are the time constants, and V F and V L are normalizing constants.
In modulation field, the internal state is controlled by linking strength β, which is given by where U ij is the internal state of the neuron, which is created by modulating the feeding and linking channels.
The pulse generator field can be described as where Y ij is the output of input S ij and θ ij is the dynamic threshold of the neuron, which is used to compare with U ij .It can be seen from ( 14), if U ij is larger than θ ij , the output Y ij of the neuron at (i, j) is 1, which we call the neuron is fired.The time matrix T of the neuron fired can be described as follows: 2.2.Proposed Method.The proposed fusion algorithm framework is depicted in Figure 6.The first step in the proposed method is to detect the interesting region which  contains the significant target areas and then fuse the interesting regions of the IR and VI image.In our method, the MeanShift and guided filter are used to perform the first step in our algorithm.The background region is obtained by removing the interesting region from the source image.For the background region, the multiscale transform-based method is used to process it.Firstly, nonsubsampled contourlet transform (NSCT) is used to decompose the background region of two source images and then to get a lowfrequency layer and a series of high-frequency layers for each image.Next, we use an improved weighted average method based on per-pixel weighted average and pulse-coupled neural network (PCNN) to process the low-frequency and high-frequency layers, respectively.

Low-Frequency Layer Fusion Rules.
In nature images, low-frequency information is the main component of an image; on the contrary, high-frequency information contains the details of the image [30].It can be seen in Figure 6 that, compared with the image B1 and B2, the low-frequency layers L1 and L2 are the main components without the details.Most low-frequency layer fusion methods are weighted averaging based methods, which do not consider the membership relationship between pixels and only weigh the independent pixel values.These methods cannot fully fuse the details of the low-frequency layers.In order to have a better fusion effect, we proposed an improved weighted average method which is based on per-pixel weighted average, which can be described as follows: where C L F i, j denotes the final result of the low-frequency layer, C L A i, j is the low-frequency layer of the background region in the source IR image A, and C L B i, j is the lowfrequency layer of the background region in source VI image B.
where μ and σ are the mean and variance of the background regions in source VI image B and τ is the adjustment factor of Gaussian function.The Gaussian function curve and an example are shown in Figure 7.In the proposed method, we set τ = 1.It can be seen in Figure 7(d) that, after the source image is processed by per-pixel weighted average, only the low-frequency information of the source image is reserved; to some extent, it is a low pass filter, and similar as Figure 4(b), the low-frequency layer is obtained by NSCT.Therefore, it is effective to process the 6 Journal of Sensors low-frequency layer by the weighted average method based on per-pixel weighted average.

High-Frequency Layer Fusion Rules.
From Figure 4, it can be seen that most details of information, texture, and edge are included in the high-frequency layers.For the high-frequency layers, PCNN is used in the proposed method.In the modulation field, the linking coefficient β is a key parameter which value can directly affect the weighting of the linking channel.We use the spatial frequency (SF) of the high-frequency layer as the linking coefficient β in our proposed method.In Section 2.1.4,we have analyzed the PCNN model.The spatial frequency (SF) can reflect the overall definition level of an image; the SF of the source image is used to determine the linking strength β, which can be described as follows: where RF is the spatial row frequency and CF is the spatial column frequency, which can be computed by The fused high-frequency layer C F,ij can be determined as follows: where T A,ij (n) and T B,ij (n) denote time matrices of each neuron obtained by (15) and C A,ij and C B,ij are the highfrequency layers of the background regions in source IR image A and VI image B.

Fusion
Steps.The framework of the proposed method in this paper is shown in detail in Figure 6, and the concrete fusion steps are summarized as follows: input: source IR image A and VI image B.
Step 1: detect the interesting region which contains the salient infrared objects of IR and corresponding VI image by the MeanShift, to get the interesting region and the background regions.
Step 2: for the interesting regions of the source image, fuse them by the guided filter method which is described in Section 2.1.2,to produce the fused interesting region.
Step 3: perform NSCT in the background regions and then obtain a low-frequency layer and a series of high-frequency layers for each source image.
Step 4: for the low-frequency layer, an improved weighted average method based on per-pixel weighted average algorithm is used to produce the fused low-frequency layer, which is shown in ( 16) and (17).
Step 5: for the high-frequency layers, SF-PCNN-based method is used to produce the fused highfrequency layers, which are described in Section 2.2.2 in detail.
Step 6: the fused background region is produced by NSCT reconstruction.
Step 7: fuse the interesting region and the fused background region to produce the final fusion image.

Experimental Results and Analysis
In order to illustrate the effectiveness of the proposed fusion algorithm, several groups of IR and VI images fusion experiments will be described in detail in this section.These images are available at http://figshare.com/articles/TNO_Image_Fusion_Dataset/1008029.All simulations are conducted in MATLAB 2014a, on an Intel(R) Core (TM) i5-6400 @2.7GHz PC with 16GB RAM.Firstly, the experimental parameter setting is introduced; then the discussion of fusion results compared with other methods will be given.

Experimental Introduction.
To show the improvement of the proposed method, the fusion results of "Jeep" by the proposed method and the method of [8] are shown in Figure 8.
In [8], the target region was directly fused into the final image to highlight the target in an infrared image.And in this paper, we improve the accuracy of interesting region detection where it contains highlighting target and heat sources.
In addition, we integrate the interesting regions of VI and IR for enriching the visible information in the interesting region.As shown in Figure 8 and Figure 9, the fusion result by the proposed method contains rich details while highlighting the target of IR image.
The proposed method will be compared with eight current fusion methods: principal component analysis-(PCA-) based method [10], discrete wavelet transform-(DWT-) based method [11], PCNN-based method [15], NSCTbased method [23], Laplacian pyramid transform-(LP-) PCNN-based method [14], NSCT-PCNN-based method [17], IFM based method [31], and MWGF-based method [32].In all PCNN-based method experiments, through a large number of verification and comparison in experiments, the parameters of PCNN are set as α θ = 0 2, α L = 0 05, V L = 0 02, V θ = 40, N = 200, and M = W = 0 707, 1, 0 707 1, 0, 1 0 707, 1, 0 707 , where N is the number of iterations of PCNN.All NSCT-based methods, "pkva," and "9-7" are set as the pyramid and the direction filter.For all multiscale decomposition methods, the decomposition level is set to 3, and "averaging" is used to fuse the low-frequency layer, and the high-frequency layers are fused by "absolute maximum choosing." In order to evaluate the fusion results with different methods objectively, three most commonly used objective indicators will be used as the evaluation index: mutual 7 Journal of Sensors information (MI), pixel of visual information (VIF), and edge gradient operator (Q AB/F ).MI is used to measure the amount of the source images' information retained in the fused image.VIF is an evaluation index for human visual system, which is based on natural scenes and image distortion [33].Q AB/F is used to measure the edge information based on edge strength and orientation preserving from the source images.Commonly, the greater value of these evaluation metrics indicates that the fused image has a better quality [34].

Fusion Results and Discussions.
The experimental images consisted of six pairs of IR and VI images, which are shown in Figure 10.The first line in Figure 10 is VI images, and the second line is IR images.A large number of details and texture are included in the VI images, while the IR image contains only significant information.
The fusion results obtained by different fusion algorithms of "Sand path" are given in Figure 11. Figure 11( From Figure 11, we can see that the fused image by the proposed method contains more detail information of the VI image, as well as the highlighted infrared target information compared with other methods.In addition, the fused image by our method has advantages in visual effects and it is also superior to other algorithms in objective evaluation, which are shown in Table 1.
In order to illustrate the applicability of the proposed method, other groups of experiments are performed, which are given as follows.
It can be seen from Figure 12 and Figure 13 that the proposed method has more advantages in detail information integration.In order to better reflect their differences of the fused images obtained by different fusion methods, Figure 13 shows the detail with an enlarged scale of Figure 12.In Figure 13(i), the red frame region is more suitable for human visual system, with more visible detail information.Compared with the same position in two source images, the fused region has higher readability and reliability.Figure 14 and Figure 15 are the third and fourth groups of experiment of "UN Camp" and "Trees," respectively.The objective evaluation matrices are given in Table 1.In order to reflect more directly out of their difference, line chart comparison of MI, VIF, and Q AB/F values of the experiments is given in Figure 18.  1. Line chart comparison of MI, VIF, and Q AB/F values of the experiments is given in Figure 19.It can be seen that the fusion results of the proposed method have a better visual effect.Compared with the same position in two source images, the fused region has higher readability and reliability.
All fusion results by the proposed method and the method of [8] are shown in Figure 18.The first line in Figure 18 is the results of [8], and the second line is the results of the proposed method.The objective evaluation matrices

Figure 1 :
Figure 1: Interesting region detection with different bandwidths.
a) is the fused image by PCA, Figure 11(b) is the fused image by DWT, Figure 11(c) is the fused image by PCNN, Figure 11(d) is the fused image by NSCT, Figure 11(e) is the fused image by LP-NSCT, Figure 11(f) is the fused image by PCNN-NSCT, Figure 11(g) is the fused image by IFM, Figure 11(h) is the fused image by MWGF, and Figure 11(i) is the fused image by the proposed method.
(a) Visible image jeep (b) Infrared image jeep (c) Fused Jeep by [8] (d) Fused jeep by the proposed method

Figure 16 and
Figure 16 and Figure 17 are the experimental results of "Jeep" and "Kaptein."The objective evaluation matrices are given inTable 1. Line chart comparison of MI, VIF, and Q AB/F values of the experiments is given in Figure19.It can be seen that the fusion results of the proposed method have a better visual effect.Compared with the same position in two source images, the fused region has higher readability and reliability.All fusion results by the proposed method and the method of[8] are shown in Figure18.The first line in Figure18is the results of[8], and the second line is the results of the proposed method.The objective evaluation matrices

Table 1 :
Objective results for various fusion results.