Novel Spatiotemporal Filter for Dim Point Targets Detection in Infrared Image Sequences

Dim point target detection is of great importance in both civil and military fields. In this paper a novel spatiotemporal filter is proposed to incorporate both the spatial and temporal features ofmoving dimpoint targets. Since targets are expected to be detected as far as possible, in this situation, they have no texture features in spatial dimensions, appearing like isolated points. Based on the attributes, potential targets are extracted by searching the local maximum point in a sliding window. And the potential targets are then correlated based on target moving patterns. After combining local maximum points and target moving patterns, structure background in infrared scene is removed. Next, the temporal profiles of infrared sense are reviewed and examined. By a new maxmedian filter performing on temporal profiles, the intensity of target pulse signal is extracted. Finally, each temporal profile is divided into several pieces to estimate the variance of the temporal profiles, which leads to a new detection metric. The proposed approach is tested via several infrared image sequences. The results show that our proposed method can significantly reduce the complex background in aerial infrared image sequence and have a good detection performance.


Introduction
Detecting dim point targets is a key unit in a variety of applications, such as infrared searching and tracking systems, precision guidance, air traffic control, and telescopic monitoring.Since targets are expected to be detected as far as possible, in this situation, they have no texture features in spatial dimensions, appearing like isolated points, which makes the detection of the target difficult and complex [1][2][3].
In the last two decades, a number of target detection approaches have been proposed to deal with the issue.Generally, target detection approaches can be categorized into two classes: detect before track (DBT) approaches and the track before detect (TBD) approaches.DBT methods focus on detecting targets in single frames and then track the targets using temporal associations.For a target, it is brighter than its neighboring background in its local areas.Thus, a direct detection method is to match point-like signals to detect targets.The top-hat transform [4,5] and the LoG filter [6] are introduced for point-like signals detection.Another method for target detection is to suppress background, such as the TDLMS filter [7], and the max-median filter [8], while TBD methods focus on tracking all the pixels of a scene in a short period time and then detecting targets based on the temporal differences of the targets and background [9].TBD approaches are proposed to encounter situations that targets are too dim to be detected in single frames.Silverman, Caefer, and Tzannes et al. analyzed the temporal profiles of target and background [10,11].Their works indicated that damped sinusoid filters [12,13], continuous wavelet transform [14], and hypothesis test performing on temporal profiles [15,16] are effective to detect dim point targets from evolving clutter.Subsequently, Lim et al. develops an adaptive mean and variance filter for detecting dim point-like targets [17].In [18], Liu et al. found the connecting line of the stagnation points (CLSP) of temporal profile has good performance in detecting dim moving targets.Recently, the CLSP based method is examined and improved in [19,20].
In fact, the differences of targets and background exist in both spatial and temporal domains.Considering spatial or temporal information independently is not sufficient.In [21], Akula et al. proposed a ground moving target detection method in thermal infrared imagery.It is designed for extend target and not suitable for aerial point targets.Therefore, a novel spatiotemporal dim point target detection method is presented by considering both the spatial and temporal information.The proposed approach is executed in two stages, structure background removing stage and target detection stage.Detailed execution of the approach is shown in Figure 1.
In the structure background removing stage, the local maximum is first extracted as the potential targets.And the smoving patterns of local maximum are correlated to remove structure background.In the target detection stage, the pulse signals in temporal profiles are extracted by using a new max-median filter.The variances of temporal profiles are then estimated by segmenting each temporal profile to small pieces.Thus, targets can be detected by using a new designed detection metric.Finally, a threshold that is determined by probability of false alarm is used to segment moving targets.The contributions of this paper are threefold: (1) The structure background in infrared scenes is removed by using a spatiotemporal filter; (2) the intensity of target pulse signal is extracted by using a local contrast model; (3) the variance level of temporal profile is estimated by dividing a temporal profile into several pieces.The proposed method is tested via several infrared image sequences.The results show that our proposed method can significantly reduce the complex background in aerial infrared image sequence and have a good detection performance.

Structure Background Removing
2.1.Spatial Feature Extraction.Aircraft generally have hot engines and plume, which makes targets brighter than background in their local areas in infrared images.In practical applications, targets are expected to be detected as long as possible.In this situation, a target only occupies one or several pixels in infrared images and shows like a bright point, sometimes resembling noise.In a small area where a target presents, the value of the target pixel appears as a local maximum point.Figure 2 shows some targets in their local areas.
As shown in Figure 2, the target pixels in their local areas are the brightest pixels (in a 3 × 3 window).This model is based on the assumption that a target is very small (in the order of one pixel).This attribute can be used for target detection.The position of a local maximum point in its neighborhood is where (, ) denotes spatial position,   = −1, 0, 1, and   = −1, 0, 1.If the position of the local maximum point is (0, 0), it is labeled as a potential target where  denotes the index of frames.

Target Moving Patterns Correlation. Considering local maxima can only remove few fractions of background.
To improve background removing, the moving patterns of targets are introduced.In fact, aircraft flying across sky follow Newton's law and do not have sharp trajectories.Figure 3 shows an example of a moving target in several frames.As shown in Figure 3, the target presenting in a beginning frame may move from its original pixel to its neighboring pixel or stand in the same position in the next frame and does not move in a sharp trajectory.
Suppose a target present in a pixel of a frame.The target can only present in its neighboring pixel or the same pixel in the previous frame.Similarly, the target can only present in its neighboring pixels or the same pixel in the next frame.The possible patterns of a target moving in consecutive three frames are shown in Figure 4.
As shown in Figure 4, the consecutive three frames are previous frame (left frame), current frame (middle frame), and next frame (right frame).The dark box denotes the possible position of a target.Figure 4(a) denotes a target appears in the upper left corner in the previous frame.In this situation, the target will appear in a pixel of the four dark pixels in the next frame.Considering all possible positions in the previous frame, the corresponding positions in next frame are listed in Figures 4(a)-4(i).If we index the nine pixels in Figure 4 with 1 to 9, as shown in Figure 5 the corresponding indexes that the target presented in the previous frame and the next frame are shown in Table 1.
The labeled potential targets are then tested by using target moving patterns (listed in Table 1).If the movement of potential targets does not follow moving patterns of targets, they are removed.
Assume the moving patterns in Table 1 are denoted by where  mov,1 = {1, 5},    The pixels that a target may present are where   is the number of frames.Thus, the removed structure background pixels are By incorporating local maxima with moving patterns of targets, many background pixels can be removed.Generally, the structure background of an infrared scene, such as clouds and buildings, has fixed patterns.The local maxima of structure background have moving patterns that are distinct from that of targets.The removed pixels by using local maxima labeling and targets moving patterns are mainly from structure background.The result is shown in Figure 6.
As shown in Figure 6, most of the structure background pixels are removed.In fact, the structure background in an infrared scene is difficult to eliminate.

Target Detection
3.1.Temporal Profile of Infrared Image Sequences.By using a focal plane array (FPA) detector to constantly monitor a scene, each pixel will produce a temporal profile over a short period of time.The temporal profile indicates variation of the pixel values in this period of time.When a target moves across a pixel, a pulse-like signal is created on its temporal profile.The width of the pulse will be inversely proportional to the target velocity.Its height above (or depth below) background depends on its differential radiance with respect to the background.This model is based on the assumption Figure 4: Possible patterns of a target moving in consecutive three frames.that the target is very small (in the order of one pixel) and moving across the scene.For clear sky background, the temporal profiles affected by targets can easily be discriminated; however in practice there are also drifting and evolving cloud clutters in background.Temporal profiles produced by this drifting and evolving clutter may have similar temporal behaviors to that of the targets, which will lead to false alarms in detection implementation.Figure 7 shows temporal profiles of a target, clear sky, inner cloud, and cloud edge pixel.
As shown in Figure 7, pixels affected by clear sky or inner cloud background have temporal profiles that behave like a constant mean value plus white noise.Pixels affected by cloud edges or other difficult clutter features will have less regular temporal behaviors.A pixel affected by a small moving target will have a pulse-like shape on its temporal profile, which is distinct from that of the cloud clutter and clear sky.

Target Pulse Signal Extraction.
The temporal profile that is affected by a target will have a pulse signal.The height of the pulse signal is proportional to the radiance of the target, and its width related to the relative velocity of the target and the detector.The temporal profiles that are affected by evolving clouds will have irregular large fluctuations, which will cause the increase of false alarm of pulse signal extraction.Therefore, we present a new efficient and novel target detection algorithm by using max-median approach to extract target pulse signal.
Suppose a target pulse signal appears on a temporal profile with   points.A slide window with width   = 2  +1 moves on the temporal profile, followed by two background estimation windows with width   located on the two sides of the slide window.The maximum value in the slide window is where () denotes the pixel value of a temporal profile in frame ;  = −  , −  + 1, . . .,   .The background level on the left side of the slide window can be estimated by where  = −  −  , −  −  +1, . . ., −  −1.The background level on the right side of the slide window is where  =   + 1,   + 2, . . .,   +   .The background level in position  can be estimated by Hence, the height of a potential pulse signal is The the height of the target pulse signal can be acquired by The calculation of target pulse signal extraction is shown in Figure 8.
Since max-median filter is a nonlinear filter and is robust in heavy noisy conditions, the max-median filter proposed to extract target pulse signal can avoid the interferences of noise or blind pixels.

Temporal Profile Variance Estimation.
In [22], the authors employ a local contrast model to detect targets, using the average of background as reference.This is inappropriate for temporal profiles.Because evolving clouds can generate sharp peaks on temporal profiles, the local contrast model will extract the sharp peaks as target pulse signals, leading to a high probability of false alarm.To reduce the sharp peaks of evolving clutters, the fluctuation level of a temporal profile is considered to normalize the amplitude of pulse signals.In this paper, the variance of temporal profile is introduced to represent the fluctuations of temporal profiles.For target temporal profiles, the estimated variance should not be affected by the target pulse signal.Therefore, we propose a new variance estimation method by dividing each temporal profile into several pieces.And we use the minimum variance of the pieces as reference: where   = ⌊  /  ⌋,  = 1, 2, . . .,   , and   is the number of divided pieces.The reference temporal profile variance is The estimation of temporal profile variance is shown in Figure 9.By dividing a temporal profile into several pieces, a target pulse signal will only appear in one or two pieces of the temporal profile.Using the minimum variance of the pieces can avoid the affection of the target pulse signal.And the calculation procedure is simple and highly efficient.
Finally, the target detection in infrared sequences can be achieved by the following metric:

Experimental Results and Discussion
We test our proposed approach by using three infrared image sequences, captured by Rome Laboratory, named as "npa," "j2a," and "na23a."The "npa" and "j2a" scenes have two targets on the sky with heavy clutters.The "na23a" scene has one very dim point target below clouds.All the targets are emphasized by red boxes.To compare the performance of our proposed approach, the CWT [14], CLSP [18], TCF [23], and fusion filters (FF) [24] approaches are selected as references.FF approach focuses on the tracking of small targets.Our approach focuses on detecting moving target from infrared sequences.Here, only the target detection stage of the FF approach is compared with our approach.The detection results are shown in Figure 10.
The images from the second to the fifth row are detection results of the CWT, CLSP, TCF, FF, and the proposed approach, respectively.As shown in Figure 10, the CWT approach can highly enhance the response of target, but the heavy clutters and noisy background also have high response.
The CLSP approach has good performance in background suppression, yet some target pixels are also suppressed.The interferences of TCF approach are much higher in the three evaluated scenes.FF approach has slight enhancement in the targets pixels.However, the evolving clutters have high response, which can lead to high false alarms.The results of the new proposed approach have high contrast of target and background in the evaluated scenes.All the targets are enhanced and the response of background becomes much weaker.The bottom line of Figure 10 is the threshold result of the proposed approach, indicating that all targets are truly segmented out.
To further evaluate the performance of our proposed approach.We calculate the receiver operating characteristic (ROC) curves of the evaluated approaches.The results are shown in Figure 11.
As shown in Figure 11, the solid lines with down triangle symbols are ROC curves of our proposed approach.The ROC curves of our approach are on the top of the evaluated approaches.The results indicate that our new proposed approach has much better performance than the evaluated approaches.

Conclusion
In this paper, we propose a local signal-to-noise filter based moving dim target detection approach to eliminate the interferences of slow moving clouds and some abnormal blind pixels.By using median-mean model, the new approach  can remove the large fluctuations on temporal profiles and eliminate the impact of blind pixels.The approach estimates temporal profile variances by using a segmentation method, avoiding the interference of target pulse signal.The proposed approach is tested and compared with several conventional temporal profile based target detection approaches.The experimental results validate the high efficient and robust of the proposed approach.

Figure 1 :
Figure 1: Execution of the proposed approach.

Figure 2 :
Figure 2: Targets in their local areas.

Figure 3 :
Figure 3: A moving target in several frames.

Table 1 :
Corresponding indexes in the previous frame and the next frame.