Image Fusion Based on Nonsubsampled Contourlet Transform and Saliency-Motivated Pulse Coupled Neural Networks

In the nonsubsampled contourlet transform (NSCT) domain, a novel image fusion algorithm based on the visual attention model and pulse coupled neural networks (PCNNs) is proposed. For the fusion of high-pass subbands in NSCT domain, a saliencymotivated PCNNmodel is proposed. The main idea is that high-pass subband coefficients are combined with their visual saliency maps as input to motivate PCNN. Coefficients with large firing times are employed as the fused high-pass subband coefficients. Low-pass subband coefficients are merged to develop a weighted fusion rule based on firing times of PCNN. The fused image contains abundant detailed contents from source images and preserves effectively the saliency structure while enhancing the image contrast. The algorithm can preserve the completeness and the sharpness of object regions. The fused image is more natural and can satisfy the requirement of human visual system (HVS). Experiments demonstrate that the proposed algorithm yields better performance.


Introduction
Due to a tremendous growth in the application of image sensors, image fusion technique has huge potential for growth and has been used successfully in many fields, such as remote sensing, medical imaging, defense surveillance, and computer vision [1][2][3]. The aim of image fusion is to combine several source images (obtained from different sensors and view points) into a fused image, which contains all important contents from source images and expresses more abundant information in a scene. In practical applications, the direct obtained image is not able to satisfy the requirements because of many factors, for example, the limitations of sensors, varying illumination, occlusions and angles, and so forth. Image fusion technique can solve effectively the problems by taking advantage of multiple-source information producing the fused result which satisfies perception system.
According to the level, image fusion approaches can be generally classified into three types: pixel level, feature level, or decision level [4]. According to whether the fusion methods need the assistant of multiscale transform (MST) tools or not, they can also be categorized into two main classes [5]: MST-based and non-MST-based approaches. A variety of MST tools have been developed for image fusion. The earliest and the most popular MST tools are pyramid [6,7] and wavelet [8,9] transform. They are directly constructed by combination of two 1D transforms, so they are not the true 2D transforms. To improve the accuracy of decomposition and reconstruction, the more advanced MST tools have been proposed, such as ridgelets [10], contourlets [11,12], curvelets [13,14], and NSCT [15]. The approaches are the true 2D geometric MST tools, which can achieve the decomposition and reconstruction of image signals perfectly and satisfy the requirement of image fusion. In addition, for non-MST-based methods, Piella [16] performs the image fusion by a variational model, and the fused result contains the geometry structure of all the inputs and enhances the contrast for visualization. Ludusan and Lavialle [17] propose a variational approach based on error estimation theory and partial differential equations for concurrent image fusion and denoising of multifocus images.
The combination strategy of the decomposed coefficients is another key step in the MST-based fusion approaches. Fusion strategies can mainly be divided into three categories: pixel-based, window based, and area based [18]. The simplest pixel-based fusion rule directly selects the fused coefficients using single pixel, but the method is easily influenced by noise. Window based and area based fusion rules take advantage of the local characteristics of neighborhood pixels and, thus, are superior to pixel-based rules [19].
The existing image fusion approaches do not take fully into account the characteristics of HVS, which the HVS tends to focus on the most relevant saliency regions in a scene. According to the visual perception mechanism, the fused image should improve the quality of object areas in a scene. The goal of the proposed algorithm is to preserve the completeness, saliency, and sharpness of object areas and satisfy the requirements of HVS. Consequently, based on NSCT and saliency-motivated PCNN, the paper proposes a novel image fusion algorithm. The visual saliency model and PCNN are two very important tools in image processing. The former is inspired by the behavior and the neuronal architecture of the early primate visual system; the latter is a visual cortex-inspired neural network and characterized by the global coupling and pulse synchronization of neurons. The saliency map produced by the visual saliency model as input to motivate PCNN is used as the fusion rule which can preserve the saliency objects from source images leading to more abundant content contained in a fused image.
The rest of the paper is organized as follows. Section 2 reviews basic NSCT theory in brief. Section 3 presents the proposed image fusion algorithm in detail. Section 4 demonstrates and discusses the experimental results. Section 5 concludes.

Nonsubsampled Contourlet Transform
In this section, we briefly review the theory and properties of NSCT, which will be used in the rest of this paper (see [15] for details).
NSCT is a kind of overcomplete transform and is a shiftinvariant version of contourlet transform. NSCT has some excellent properties in the process of image decomposition, including shift invariance, multiscale, and multidirection. NSCT is used as the MST tool to provide a better representation of the contours and overcome pseudo-Gibbs phenomena. The main components of the NSCT are a nonsubsampled pyramid filter bank (NSPFB) structure for multiscale decomposition and a nonsubsampled directional filter bank (NSDFB) structure for directional decomposition. The NSCT is displayed in Figure 1.
The multiscale property of the NSCT is achieved by using two-channel nonsubsampled 2-D filter banks (NSFBs), called as NSPFB. The filters for next level are obtained by upsampling the filters of the previous level, by which the multiscale property is obtained without the need for additional filter design. We assume that the NSPFB decomposition is with = levels. At the first level, input images are decomposed by the low-pass filter 0 ( ) and the corresponding high-pass filter 1 ( ), respectively. The ideal passband support of the low-pass filter at the jth level is the region [−( /2 ), ( /2 )] 2 .
The ideal support of the equivalent high-pass filter is the complement of the low-pass filter, that is, the region shift-invariant directional filter bank (DFB), is obtained by eliminating the downsamplers and upsamplers in the DFB. To achieve multidirection decomposition, the NSDFB is iteratively used. All filter banks in the NSDFB tree structure are obtained from a single NSFB with fan filters. Each filter bank in the NSDFB tree has the same computational complexity as that of the building-block NSFB. Figure 1 shows the NSCT which is constructed by combining the NSPFB and the NSDFB. The two-channel NSFBs in the NSPFB and the NSDFB satisfy the Bezout identity and are invertible, so the NSCT is invertible. The key of NSCT is the filter design problem of the NSPFB and NSDFB. The aim is to design the filters supporting the Bezout identity and obtaining other useful properties. In addition, for a fast implementation, the mapping approach is used to transform the filter into a ladder or lifting structure. More details can be seen in [15].

The Proposed Algorithm
In the section, the proposed image fusion algorithm based on NSCT and saliency-motivated PCNN is presented in detail. The main idea is that the visual saliency map is first built on high-pass subband coefficients of the NSCT using the visual attention model (phase spectrum of Fourier transform (PFT) model presented in Section 3.1) and then is combined with source high-pass subband coefficients as input to motivate PCNN. Coefficients with large firing times are employed as the fused high-pass subband coefficients. Low-pass subband coefficients are merged to develop a weighted fusion rule based on firing times of PCNN. PCNN is built in each subband to simulate the biological activity of HVS. The fused image has more natural visual appearance and can satisfy the requirements of HVS. The framework of the proposed algorithm is shown in Figure 2. For the clearness of the presentation, we assume that two registered source images are combined.
The algorithm first decomposes source images into the low-pass subband and high-pass directional subband coefficients by the NSCT. The coarsest subband contains the main energy from source images and denotes the abundant structural information. Therefore, an adaptive weighted average fusion rule based on the firing times of PCNN is developed to merge the low-pass subband. High-pass directional subbands contain the abundant detail contents of source images, so we create a maximum selection fusion principle based on saliency-motivated PCNN for selecting the fused coefficients. The final fused image is reconstructed by applying the inverse NSCT on the merged coefficients.

Images Decomposition and Saliency-Motivated PCNN.
The decomposition of source images employs NSCT presented in Section 2. Input images and are decomposed into different scale and direction subbands using NSCT. obtained, where C 0 (x,y) denotes the low-pass subband coefficients of the input images at the coarsest scale and , ( , ) denotes the high-pass directional subband coefficients at the jth scale and in the lth direction.
The following the proposed saliency-motivated PCNN model is discussed. Eckhorn develops a novel biological neural network, called PCNN which is based on the experimental observations of synchronous pulse bursts in cat and monkey visual cortices [20]. PCNN is a feedback network and each PCNN neuron consists of three parts: receptive field, modulation field, and pulse generator [21]. In image processing, PCNN is a single-layer and a two-dimensional connection neural network [22,23] shown in Figure 3.
In this paper, let , ( , ) denotes the coefficient located at (x, y) in the jth scale at the lth direction. , ( , ) in each subband is inputted to PCNN to motivate the neurons and generate pulse of neurons with (1). Firing times , are then computed as in (2) ( In (1), the coefficient , ( , ) is assigned to the feeding input , . The linking input , is equal to the sum of neurons firing times in linking range, where indicates the decay constants and is the amplitude gain.
, is the weighted coefficient (p and point out the size of linking range in PCNN). The internal state signal , is obtained by modulating , and , , where is the linking strength. , is the threshold, where and are the decay constants and the amplitude gain, respectively. n denotes the iteration times. If , = 1, the neuron will generate a pulse, called one firing. If , = 0, the neuron will not generate a pulse. In applications, , [ ] defined in (2) are often used to indicate the total firing times in iteration. The firing times are employed to represent image information. The saliency maps , ( , ) and , ( , ) are computed on the high-pass directional subbands , ( , ) and , ( , ), which denotes the th scale and lth direction. The saliency maps are used as the importance indicator of the coefficients for preserving important information of source images.
Phase spectrum of Fourier transform (PFT) proposed in [24] is employed as a saliency detection model for grayscale image. PFT showed that the saliency map can be easily computed by the phase spectrum of an image's Fourier transform when its amplitude spectrum is at nonzero constant value. Only the phase spectrum is used to reconstruct an image which reflects the saliency information of the source image. The implementation of PFT model consists of three steps. An image is first transformed into frequency domain using Fourier transform, and the amplitude and phase spectrums are then obtained. Finally, the saliency map is obtained by inverse Fourier transform on only the phase spectrum. Given an input image ( , ), three steps have the corresponding equations as follows: where and −1 denote Fourier transform and inverse Fourier transform. P(F) is the phase spectrum of and is a 2D Gaussian filter. The saliency value in location (x, y) is computed using (5).
PFT model is a simple and efficient saliency detection method. An example of the PFT saliency detection is shown in Figure 4.

Subband Coefficients Fusion.
The high-pass subbands of NSCT decomposition contain abundant detailed information and indicate the saliency components of images, for example, lines, edges, contours, and so forth. In order to preserve the saliency components in the process of image fusion, we propose the fusion rule based on saliency-motivated Step 0: Given source images and .
Step 1: Perform decomposition on source images and using NSCT to obtain the high-pass directional subband coefficients and the low-pass subband coefficients.
Step 4: Construct the fused image by applying the inverse NSCT to the fused subband coefficients.
Algorithm 1: Image fusion method with saliency-motivated PCNN. PCNN for the high-pass subbands. According to the visual attention mechanism, different regions in an image have varying importance for HVS, so the saliency detection is performed on source images to yield saliency maps which indicate the significance level of every pixel in source images. Based on the characteristics, the PFT model is performed on the high-pass subbands to produce the saliency maps, which indicate the importance level of coefficients. And then, the obtained saliency maps are combined with the corresponding high-pass subband coefficients as the input to motivate PCNN. Coefficients with large firing times are selected as the fused coefficients. In addition, the lowpass subband of NSCT decomposition in the coarsest scale contains the main energy of source images and denotes abundant structural information. The fusion rule of the lowpass subband employs a weighted fusion rule based on firing times of PCNN. The activity maps of high-pass subbands as the criteria of selecting coefficients are presented by the firing map of saliency-motivated PCNN. The activity level indicates the magnitude of coefficients. The coefficients of greater energy carry more important information, so the coefficients of greater activity level are selected as the fused coefficients. Now, according to (7) and (2), the fused coefficients in location (x, y) of high-pass subbands denoted by , ( , ) are defined as follows: The fused coefficients of low-pass subbands denoted by F 0 (x, y) employ a weighted fusion rule based on firing times of PCNN on coefficients 0 ( , ) and 0 ( , ), which are defined as follows: where is the weight of coefficients and 0 is computed by (1) and (2). Because the low-pass subband at the coarsest scale does not contain the direction, here the symbol in (1) and (2) is changed to 0. Specifically, , ( , ) in (1) is replaced by / 0 ( , ), and , in (2) is replaced by 0 .
Finally, apply the inverse NSCT to the fused coefficients { 0 ( , ), , ( , )} and then obtain the fused image . At last, the algorithm description of the proposed image fusion approach is shown in Algorithm 1 for better understanding.

Experiments and Analysis
In this section, the proposed image fusion algorithm based on NSCT and saliency-motivated PCNN (named as NSCT-SPCNN) is tested on several sets of images. The goal of the tests is to validate if the proposed algorithm can be used in the real applications and varying surroundings. For comparison, besides the fusion scheme proposed in this paper, another three fusion algorithms, the Laplacian pyramid transform based (LPT), discrete wavelet transform based (DWT), and NSCT-simple based, are used to fuse the same images. All of these use averaging and absolute maximum selection schemes for merging low-and high-pass subband coefficients, respectively. The decomposition level of all of the transforms is three. Extensive experiments with multifocus image fusion and different sensor image fusion have been performed. Here, three groups of different images were tested to evaluate the performance of the proposed algorithm: a set of multifocus images, a set of multimodal medical images, and a set of artificial out-of-focus images. It is assumed that source images have been registered. The fused results were evaluated using subjective visual inspection and objective assessment tools.

Visual
Analysis. The first experiment uses two multifocus source images and four fused images produced by LPT, DWT, NSCT-simple, and NSCT-SPCNN methods, shown in Figure 5. Figure 5(a) focuses on the right region. Figure 5(b) focuses on the left region. The fused images contain all of focus point regions of source images and expand effectively the depth of a scene. In Figure 5(f), the saliency value associated with coefficients as the input to motivate PCNN is employed to compute the activity level of coefficients. In this way, the algorithm makes sure that the activity level of the saliency pixel is higher, so that the fused image preserves the saliency regions of source images. The images in Figures 5(c)-5(e) are not clear enough and have lower contrast; artifacts were also introduced. The differences among the fused images are very slight, so it is difficult to evaluate the image quality by direct visual inspection. To observe the image quality in more detail, one area in the fused images was magnified.  Figure 5(a) can be seen. This further demonstrates that the NSCT-SPCNNbased method is with higher fusion performance. Figure 8 shows a group of multimodal medical images and images fused using four different fusion algorithms. A set of spatial out-of-focus images are shown in Figure 9, Mathematical Problems in Engineering

Objective Analysis.
In previous discussion, the fusion results of different algorithms have been analyzed by visual aspect. However, the performance of fusion algorithms needs to be further evaluated using objective metric tools. A successful fusion technique has to satisfy many conditions, such as preserving important features of source images, enhancing contrast, and avoiding artifacts. Mutual information (MI) [25] and an objective image fusion performance measure ( / ) [26] are employed to evaluate the fusion performance of different fusion methods quantitatively. MI indicates how much of the input information the fused image contains.
/ reflects the preservation of input edge information in the fused image. For the two metrics, the higher the values are, the better are the fusion results. Figure 10 shows the quality measurement results for fused images in Figure 5 and Figures 8 and 9. Observing Figure 10, we can see that the LPT and DWT methods are the worst. This is consistent with the subjective visual analysis. Compared with other fusion algorithms, the NSCT-SPCNN yields the optimal performance. Experimental results demonstrate that the proposed NSCT-SPCNN algorithm can preserve the saliency regions of source images and improve the quality of the fused image.
Finally, the computational performance of the proposed NSCT-SPCNN algorithm is tested on three sets of images ( Figures 5, 8, and 9). The hardware setup is an Intel Core i5-3479 PC with 4 GB RAMs. Our Matlab implementation takes about 16 seconds for Figures 5 and 8 and 73 seconds for Figure 9. Meanwhile, the NSCT-simple-based fusion method takes about 17 seconds for Figures 5 and 8 and 72 seconds for Figure 9. The LPT-and DWT-based fusion methods take less than 1 second. From the comparison, we can observe that the computational bottleneck lies in the NSCT transform. Therefore, a more efficient MST tool needs to be applied in the future.

Conclusion
The paper proposes a novel image fusion algorithm based on NSCT and saliency-motivated PCNN. In fusion for high-pass subbands, a saliency-motivated PCNN model is proposed.
The key idea is that depending on the human visual attention      model, the visual saliency map is first built on high-pass subband coefficients of NSCT, and then the algorithm combines the visual saliency map with the coefficients of NSCT as input to motivate PCNN. Coefficients with large firing times are employed as the fused high-pass subband coefficients. Lowpass subband coefficients are merged to develop a weighted fusion rule based on firing times of PCNN. The algorithm can preserve the completeness and the sharpness of object regions. The fused image is more natural and can satisfy the requirement of HVS. Experiments illustrate that the proposed fusion algorithm improves greatly the quality of the fused images.