Infrared Target Detection and Location for Visual Surveillance Using Fusion Scheme of Visible and Infrared Images

Themain goal of image fusion is to combine substantial information fromdifferent images of the same scene into a single image that is suitable for human andmachine perception or for further image-processing tasks. In this study, a simple and efficient image fusion approach based on the application of the histogram of infrared images is proposed. A fusion scheme to select adaptively weighted coefficients for preserving salient infrared targets from the infrared image and for obtainingmost spatial detailed information from the visible image is presented. Moving and static infrared targets in the fused image are labeled with different colors.This technique enhances perception of the image for the human visual system. In view of the modalities of infrared images, low resolution, and low signal-to-noise ratio, an anisotropic diffusion equation model is adopted to remove noise and to effectively preserve edge information before the fusion stage. By using the proposed method, relevant spatial information is preserved and infrared targets are clearly identified in the resulting fused images.


Introduction
With the rapid improvement of sensor technology, many surveillance systems have been developed in recent years.Infrared sensors, which are useful tools because of their individual modalities, have been employed in fields that include military surveillance, medical imaging, and machine vision.Infrared sensors can detect relative differences in the amount of thermal energy reflected from objects in the scene.They are thus more effective than visible cameras under poor lighting conditions.Current studies have focused on object detection and tracking from infrared images, and infrared sensor-based methods have shown good performance [1][2][3][4].However, sensors often differ in modalities, considering physical and technological limitations.The image data acquired using different sensors exhibit diverse modalities, such as degradation and thermal and visual characteristics.By combining two or more different sensor data sets, a surveillance system can perform better than a system that uses only a single sensor.This technology is called image fusion.Image fusion is defined as the process of combining substantial information from several sensors using mathematical techniques to create a single composite image that is highly comprehensive and thus extremely useful for a human operator or in the execution of other computer vision tasks [5].
In this study, we focus on the fusion process of visible and infrared imagery.Visible imagery has high resolution and can provide spatial details of the scene.Infrared imagery aids in the detection and recognition of heat-based targets under poor lighting conditions or in cases in which the target and the background are of the same color [6,7].Relevant information in both visible and infrared images should be preserved in the resulting fused images.A surveillance system can benefit from an efficient fusion to improve targeting and identification performance.In recent years, image fusion has become an important topic.Many fusion methods for visible and infrared images have been proposed [4,6,8,9].However, these techniques often lead to spatial distortions in the visible image because of the mixing of irrelevant infrared information in the nontarget area of the infrared image.In [8], the experiments on visible and infrared images show the contamination of the background regions by the infrared information.Considering the modalities of visible and infrared images, the following three basic requirements must be met to achieve a good fusion method: (1) infrared targets must be perfectly preserved in the fused image; (2) spatial detail information from the visible image must not contaminate the nontarget regions of the infrared image; and (3) the fused image should be enhanced to be easily understood by the human visual system.
In the current work, a novel fusion algorithm is proposed to meet these requirements.The method is aimed at preserving high spatial detail information and showing infrared targets clearly in the resulting fused image.An adaptive threshold can be determined by the histogram of the infrared image.If the gray value (pixel by pixel) of the infrared image is greater than or equal to the threshold value, then the pixel value of the fused image is calculated by a weight determination function.Otherwise, the new value of the fused image would be directly obtained from the visible image.The proposed fusion algorithm is highly suitable for the image fusion of visible and infrared images because it displays salient infrared targets and avoids spatial distortion.Furthermore, the idea of labeling moving targets with red contours and marking static targets (or the highlighted nontarget areas of the infrared image) with green contours is very effective for the human visual system.
The rest of the paper is organized as follows.Section 2 gives a brief introduction of nonlinear diffusion filtering.Section 3 describes in detail the image fusion scheme.Section 4 provides the experimental results and comparisons.Section 5 discusses infrared target detection.Section 6 presents the conclusions.

Image Denoising
Given the physical limitations of sensors, many infrared imaging systems produce images with low signal-to-noise ratio, contrast, and resolution; these features reduce the detectability of targets and impede further investigation of infrared images [1,10,11].Thus, image denoising has become an important issue in addressing the shortcomings of infrared images.Several methods, such as Gaussian filtering [12] and wavelet-based denoising, have been developed to address these issues [10,13].A disadvantage of some common methods is that they do not only smooth noise, but they also blur edges that contain significant image features.In the current study, an anisotropic diffusion [14][15][16] algorithm is adopted in infrared image denoising to filter noise and effectively preserve edge information.
After more than two decades of research, partial differential equation-(PDE-) based nonlinear diffusion processes have been widely applied in signal and image processing.PDE-based and improved PDE approaches have been successfully applied in image denoising, segmentation, enhancement, and restoration [15][16][17][18].The first PDE-based nonlinear anisotropic diffusion technique was reported by Perona and Malik [14].Different from the isotropic heat conduction equation, the anisotropic diffusion equation filters an image depending on its local properties.The anisotropic diffusion equation removes image noise and simultaneously preserves its edges.
The diffusion equation in two dimensions can be expressed as where div is the divergence operator, ∇ is a gradient operator, and (|∇|) denotes the diffusion conductance function chosen as a decreasing function to ensure high diffusion in a homogeneous region and weak diffusion near the edges.A typical choice for the conductance function is where  is a gradient threshold that can be fixed by an empirical constant.

Image Fusion Scheme
First, a common weighted combination rule is presented where  V (, ) is the intensity value of the pixel coordinate (, ) in the visible image;   (, ) and   (, ) are the intensity values of the corresponding pixel (, ) in the infrared image and the fused image, respectively; and th is used to distinguish between the dark background and the bright objects determined by image histogram-based method.The fused image is produced pixel by pixel according to the gray value of the infrared image.If the gray value is greater than or equal to th, then the new gray value of the fused image is obtained at the corresponding pixel location using (3).Otherwise, the gray value of the fused image is directly obtained from the visible image.The next step of the fusion algorithm is the production of the weighted coefficients.
Second, a weight determination function is defined as where  represents the intensity value of the infrared image,  max is equal to 255,  controls the slope of the function,  determines the translation of the curve, and sp is the starting position of the curve.Figure 1 depicts the function defined in (4), considering the conditions wherein  = 10,  = 0.5, and sp = 120.An increase in  results in a steep curve.Therefore,  is usually limited to the range in [8,12], depending on the intensity levels of each image.The quantity  is a constant equal to 0.5; this constant specifies that the values of  1 and  2 must be 0.5 when  is the middle point in the range [sp,  max ].In this study, the parameters  = 10 and  = 0.5 are chosen, but the sp value is an adaptive selection based on the intensity histogram of the infrared image.Third, the sp value is defined as where  max is equal to 255 and th is the same threshold described in (3) that distinguishes between the targets and the background according to the histogram of the infrared image.The value of th is in the range [40, 80], as determined by considering the intensity level distribution of the infrared image in the experiments.

Experimental Results
4.1.Image Data Sets.Two sets of visible and infrared images were chosen.They are publicly available through the ImageFusion.orgwebsite [19].These image sets were chosen because (1) the grayscale levels of the visible image are abundant and the light objects of the scene are clearly distinguishable in the infrared image (Figures 2(a On the contrary, the human figure is highlighted in the infrared image, but other objects are hard to recognize correctly.The second data set (AIC) was chosen from the AIC thermal and visible nighttime data set sequence [19,20].Frame 3853 (Figure 3(a)) was extracted from the visible MPEG format file, and the corresponding frame (Figure 3(b)) was obtained from the infrared (thermal: 7 m to 14 m) sequence.To ensure consistency in the procedure, the color image was converted to a grayscale image.The white margin of the infrared image was manually filled with black color to achieve good image fusion.The scene contains buildings, bright windows, roads, and pedestrians.Further details and descriptions of the data acquisition procedure can be found in [20].

Infrared Image Denoising.
The method for reducing noise in this case is nonlinear diffusion filtering using techniques similar to those discussed in Section 2. An additive operator splitting (AOS) algorithm [21], a type of efficient nonlinear diffusion filtering, was applied to the infrared images.The AOS algorithm performs anisotropic diffusion to remove image noise while preserving its edges.

Fusion Results and Comparisons.
To illustrate the performance of the proposed image fusion approach, image fusion was performed using conventional methods, namely, weighted averaging (WAV-based), discrete wavelet transform (DWT-based), and nonsubsampled contourlet transform (NSCT-based).The DWT-based and NSCT-based methods are performed by simply merging the low-pass and highpass subband coefficients using the averaging scheme and the choose-max selection scheme, respectively.The DWT-based fusion algorithm is performed using five-level decomposition.The NSCT-based method is performed by using db3 wavelets in scale decomposition; 9-7 wavelets are used in a nonsubsampled directional filter bank, in which the number of directions is 4.
Figures 2 and 3 illustrate the fusion results using the aforementioned methods.The proposed scheme captures most of the salient areas of the infrared images and preserves the spatial detail information from the visible images.The infrared targets are obvious, and the nontarget regions are seldom contaminated in the fused images.For clear comparisons, the difference images between the fused images and the source visible images are given in Figure 4.The fused images obtained using the proposed method have the best visual quality.
Objective evaluation criteria, namely, fusion root-meansquare error (FRMSE) and correlation coefficient (CC), were used to evaluate the fused images.Considering that the size of the target in the infrared image is small compared with that in the scene, we suppose that  V is the ideal reference image. V (, ) and   (, ) denote the pixel value of the visible image and the fused image at points (, ), respectively.The size of the images is calculated as  × .
(1) FRMSE can effectively reflect the similarity between two images.Small FRMSE values provide satisfactory fusion results: (2) CC can be evaluated to compare  V (, ) with   (, ): × ( where  V and   are the means of the visible image and the fused image, respectively.The maximum coefficient corresponds to the optimum fusion.
Table 1 shows the results of the quantitative evaluation using the two evaluation methods.The proposed method achieves superior results.The values of the fusion results of the first image set are not entirely consistent with those of the second image set because the visible image from the UN Camp sequence contains abundant texture information.By contrast, the AIC visible sequence contains structural details of buildings, such as edges.These objective criteria prove that the images fused using the proposed method are strongly correlated with the corresponding visible images; that is, the proposed scheme ensures that useful spatial detailed information of the visible images is preserved in the fused images.

Infrared Target Detection
Automatic object detection remains difficult to be undertaken.An object detector needs to cope with the diversity of visual imagery that exists in the world at large.Different detection methods can be used for different environmental conditions, so numerous approaches for automatic object detection have been investigated [22][23][24][25][26].A common method is to extract targets from the image sequence through background subtraction when the video is captured by a stationary camera.The simplest way is to calculate an average image of the scene or to smoothen the pixels of the background with a Kalman filter [23] under the assumption that the background consists of stationary objects.A preferable way to tolerate background variation in a video is to employ a Gaussian function that describes the distribution of each pixel belonging to a stable background object.Among these background subtraction techniques, the mixture of Gaussians (MoG) has been widely utilized to model scene backgrounds at the pixel level [24][25][26].
In MoG, the distribution of recently observed values of each pixel in the scene is characterized.A new pixel value is represented by one of the major components of the mixture model and is used to update the model.One of the significant advantages of this method is that when something is allowed to become part of the background the existing model of the background is not destroyed.The original background color remains in the mixture until it becomes the most probable distribution and a new color is observed [26].Good results of foreground object detection by applying MoG to outdoor scenes have been reported.
In surveillance applications, moving and static targets need to be correctly detected (for military surveillance systems, static targets must not be arbitrarily ignored), and the locations of the targets need to be identified.In Section 4, the resulting fused image is obtained according to the proposed fusion rule of visible and infrared images.However, the fused image in the implementation was not computed to determine whether a moving object is present in the scene.
To separate moving targets from the static targets, infrared images easily address the issues of background modeling based on the MoG method.The chromatic contours of infrared targets in the fused images are highly suitable for the human visual system.Visual effect is enhanced in the natural scene when moving, and static targets are labeled with different colors.The data sets are still UN Camp and AIC images, as described in Section 4. In Figure 5, the moving and static targets are marked in red and green, respectively.These colors help humans clearly distinguish targets from the background in the fused images.All AIC infrared frames were warped to align them with the corresponding visible spectrum images.Green-dotted borders can be observed on top of the frames of the AIC sequence (Figure 5).

Conclusions
An efficient fusion method for the image fusion of infrared and visible images is proposed.To obtain good results using the proposed fusion scheme, a weight determination function by a suitable coefficient that enhances infrared targets and preserves spatial detail information from the infrared and visible images, respectively, is described.The method is appropriate for the image fusion of visible and infrared images.The infrared targets in the natural scene can be clearly distinguished in the resulting fused images.The infrared targets are highlighted by marking them in red (or green) in the fused image.This technique is useful for visual surveillance.Moving targets are also detected and marked in red in this study.Future work will be focused on determining ways to track certain moving targets effectively.

2 Figure 1 :
Figure 1: Example of a weight determination function.
) and 2(b)); and (2) the infrared image is corrupted by noise and the intensities of targets are low (Figure3(b)).Figures3(a) and 3(b), respectively, show a visible frame and the corresponding infrared frame from the sequences extracted from MPEG files.The image in Figure3(b) was warped using planar homography to align it with the visible spectrum image[20].All input images were assumed to be registered, and each pair of images contains exactly the same scene.The first image set (UN Camp, frame 1815) comprises a terrain scene characterized by a path, trees, fences, and a house roof (Figure2(a)).A person standing behind the trees and closing the fence is shown in the infrared image (thermal 3-5 m, Figure 2(b)).Although the two images exhibit the same scene, the visible image clearly provides spatial details of the scene, but the human figure is invisible.

Figure 2 :Figure 3 :
Figure 2: Source images and fusion results of different fusion algorithms from the UN Camp sequence set.(a) Visible image; (b) denoised infrared image; and (c)-(f) images obtained by the WAV-based, DWT-based, NSCT-based, and proposed methods, respectively.

Figure 4 :
Figure 4: Difference images between the fused images and the corresponding visible images.(a)-(d) For the UN Camp sequence, fusion was performed using the WAV-based, DWT-based, NSCT-based, and proposed methods, respectively.(e)-(h) For the AIC sequence, fusion was performed using the WAV-based, DWT-based, NSCT-based, and proposed methods, respectively.

Table 1 :
Comparison of fusion results.