Infrared Dim and Small Targets Detection Method Based on Local Energy Center of Sequential Image

In order to detect infrared (IR) dim and small targets in a strong clutter background, a method based on local energy center of sequential image is proposed. This paper began by using improved anisotropy for background prediction (IABP), followed by target enhancement by improved high-order cumulates (HOC). Finally, on the basis of image preprocessing, the paper constructs a sequential image energy center detection algorithm that integrates the neighborhood, continuity, area, and energy and other motion characteristics of the target. Experiments showed that the improved anisotropic background predication could be loyal to the true background of the original image to the maximum extent, presenting a superior overall performance to other background prediction methods; the improved HOC significantly increased the signal-noise ratio of images; when the signal-noise ratio (SNR) is lower than 2.5 dB, the proposed method could effectively eliminate noise and detect targets.


Introduction
At present, detection of the IR dim and small moving target under the strong clutter environment is a core technology for infrared search and tracking system and has been hot and difficult research topic in this field.According to the precedence relationship between detection and tracking, the traditional detection algorithms can be divided into two categories: detect before track (DBT) and track before detect (TBD).The DBT algorithm is to extract the candidate targets from the preprocessed images and then use the sequence trajectory analysis method to confirm the target.As the key to sequence trajectory analysis is based on the motion continuity of the target and the randomness of noises to eliminate the influence of false targets, adverse to the detection of dim and small moving targets submerged in various noises and clutters, the DBT algorithm is only applied to scenes with high signal-tonoise ratio (SNR > 5 dB).In order to detect the targets in the case of low SNR, researchers also propose TBD algorithm [1][2][3][4][5].The TBD algorithm firstly searches all possible trajectories of the target and applies the appropriate method to complete the interframe energy cumulates.By comparing the posterior probability of each trajectory, the threshold method is used to judge the real trajectory of the target [6].
Either the DBT algorithm or the TBD algorithm is IR dim and small target detection algorithm based on multiframe image processing, except that both algorithms differ in the sequential processing order of interframe information.Various methods proposed in literature [1][2][3][4][5] need to know a priori knowledge of the target, including the movement state and trajectory.In the actual infrared scene, scope of application of those methods is undoubtedly.In addition, those methods only focus on studying the target judgment method and ignore study of the target criteria, namely, continuity, neighborhood, speed, area, energy, and other target characteristics.Thus, there is an underutilization of target movement information, and so forth, and their detection performance fails to meet the requirements in case of low SNR.
To this end, this paper attempts to start from the motion characteristics of IR dim and small targets and analyze how to rationally use motion information of the target and explore the criteria which are applicable for IR dim and small targets to achieve the target detection.It requires using the various motion characteristics of the target as much as possible but minimizes the excessive restrictions on the target's motion characteristics and reliance on a priori knowledge.The analysis result shows that the target is relatively stable concentrated in a certain neighborhood in the sequential image, forming a certain area and concentration characteristics.However, the noise presents a random discrete distribution in the image sequence.Therefore, the energy center where the composite features, such as target neighborhood of sequential image and area and concentration degree, are integrated is proposed to realize the multiframe motion correlation detection of the target.

Related Works
The paper launches its related works from the discussion on the target detection algorithm and background estimation of the dim and small target.

Target Detection Algorithm.
The paper conducts research on the typical detection algorithms of dim and small target in the last dozen years, which emphatically analyzes their application scenes and disadvantages, discovers problems from the disadvantages, and then proposes detection algorithm applied to scenes with low SNR.The typical DBT algorithm includes a pipeline filter method [7], which is simple and easy to implement.This method has better detection effect when the signal-noise ratio is high (>5 dB) but fails when the SNR is low and the target position makes no change.In addition, in the pipeline, once the frame detection error occurs, target detection of the next frame in the pipeline has a certain degree of risk, because the incorrect pipeline location increases the possibility of the target falling outside the pipeline, resulting in the failure of the target detection.This case is likely to happen in the practical application.The classical TBD algorithms include the following: the threedimensional matching filter method proposed by Reed et al. [1].This method is only applicable to the case where the velocity magnitude and direction are known.The unknown velocity may lead to speed mismatching, which leads to the decrease in the output SNR.In addition, this method is only applicable to scenes with small speed variations, due to limitations by computational complexity.The classical TBD algorithms also include the projection transformation method proposed by Falconer [2], which can effectively reduce the amount of data and the memory space in the threedimensional search and detection process, but it may cause loss of SNR.In addition, when the noise is large and there is a large displacement between target frames, the detection performance of the method is rapidly reduced.It does not adapt to the target detection with low SNR and large interframe displacement.The dynamic programming proposed by Johnston and Krishnamurthy [3] is that this method can detect the point target trajectory of a linear motion in the case of a low SNR, but requires a priori knowledge of the velocity window parameter.If the target velocity is unknown, the parameter range of the velocity window must be liberalized, which will lead to the increase in computation and reduce the detection performance of the algorithm.A multilevel hypothesis testing method proposed by Blostein and Huang [4] can detect multiple targets of linear motion at the same time.However, in the case of low SNR, in order to reduce the false alarm rate, there are many candidate trajectory starting points, resulting in a sharp increase in tree branches behind and a rapid increase in computational complexity.Furthermore, it needs to limit the target to the extent where the target must do the local uniform linear motion, which it is difficult for IR dim and small targets to meet in most cases.Thus, this method has limited scope of application.There is also a high-order correlation method proposed by Liou and Azimi-Sadjadi [5] which can detect straight or curved trajectories from a noisy three-dimensional image, without the need to know the a priori knowledge of the number of targets, the initial conditions, and so forth.This method can be applicable to multitarget detection under different clutter density.However, if the order is too high, the amount of computation and storage increase; if the order is too low, the false alarm rate increases.In recent years, as the machine learning algorithm has been rapidly developed in the field of target detection, some researchers have proposed small and dim moving target detection methods that are based on visual significance [8,9] and sparse representation [10][11][12][13].The method which is based on the notion that the significance performs well only when there is a great difference between the target and the background, while practicality of the sparse representation-based method is undoubtedly limited, when the target signal is seriously polluted by the noise and the sparse feature between the target and the background is demined seriously.It is difficult for these methods to meet the requirements of detection of small and dim targets, in case of low SNR (SNR < 3 dB).
All the detection algorithms proposed above exert poor detection effects on scenes with low SNR (SNR < 3 dB).In order to explore a new algorithm of multiframe motion associated detection applied to scenes with low SNR, based on the summary and analysis about advantages and disadvantages of each detection method (refer to Table 1 for details), the paper proposes a local energy central detection algorithm of sequence image that takes full advantage of the motion information, neighborhood gray scale, area, energy, and other characteristics embodied by the target in the time-space domain.

Background Estimation.
Due to the low SNR of the target, plus the serious interference from noises, in order to improve the follow-up detection ability, the background estimation method is often required to estimate background pixel from the image first and then subtract the estimated part from the original image, so as to obtain an image containing the target components and part of the noises, followed by subsequent detection processing.Representative methods are Top-Hat, TDLMS [14], and so on; Top-Hat is a kind of practical nonlinear background estimation method, which tends to be affected by the structural elements, with poor adaptability as a consequence.In order to enhance the adaptability of the algorithm, some scholars put forward adaptable filtering technology, such as two-dimensional minimum mean square error filter (TDLMS), which requires no understanding of Visual saliency [8,9] Able to quickly locate the region of interest Only adapted to scenes with big differences between the target and the background, and more obvious characteristics for the target Sparse representation prior knowledge for the image and has a simple structure but requires statistic characteristics of the background to be constant or slowly changing, dramatically limiting the application scope.There are also background estimation methods based on statistics, such as single Gaussian background estimation method proposed by Benezeth et al. [17], which can deal with simple scenes with tiny and slow changes, except precisely describing the background when the background changes substantially or suddenly or background pixels present multimodal distribution.In order to solve the background of multimodal distribution, Bouwmans et al. [19] propose mixed of Gaussian model; the algorithm is adapted to the dynamic background estimation of long time series, whose disadvantages are a certain amount of training data to be required and rapid changes against the illumination, resulting in poor effects on the shadow processing.For the uncertainty of model parameters brought by the mixed of Gaussian during the background estimation process due to noise interference or deficiency in training data, Sigari et al. [18] propose a fuzzy running average method, which is suitable for the background estimation in camera shake and dynamic scenes but difficult to get rid of the shadow.Compared with the background estimation methods based on statistics, the nonparametric background model method owns the following advantages: it requires no potential model to be specified or explicit estimation parameters.Therefore, they can adapt to any unknown data distribution.For instance, Liu et al. [15,22] adopt the model described by influencing factors to describe the changes of the background and then deduce the most reliable background status with the potentially distributed local extremism, so as to find out the point gathering most intensive data in the density distribution of data.This model is robust, able to adapt to the background under the scene that is chaotic and incompletely static, but contains small disturbance.With the complexity of application scenes, such as the illumination variation, camera shake, and dynamic background, Sobrala and Vacavant have reviewed background estimation methods for these scenes over the past dozen years [23] and developed an open source background estimation library called BGSLibrary [24], laying the foundation for subsequent scholars to conduct corresponding researches.However, all these algorithms are only applied to the circumstances like fixed scene shooting, slight shake, or slow moving of cameras, showing poor adaptability for the scenes of camera fast moving along with the small dim target, as well as the environment with low SNR.
In recent years, some scholars have been trying to achieve background estimation by separating the "gray singularity" formed by the "gray disturbance" on the image caused by the target.For example, Song et al. [25] describe the "gray singularity" of the target area with gradient operators and achieve background estimation according to this.However, it is difficult for gradient operators to distinguish the strong texture of the target and complex background, leading to poor suppressing effects of the algorithm on the background edge texture.Besides, Wang and Liu [26] adopt anisotropic diffusion filter to separate the gradient feature differences between the target and the background, to improve the SNR of the image.The algorithm belongs to the unidirectional diffusion, which cannot enhance the target signal but only reserve them negatively, so its improvement in image SNR is not sufficient.Considering that the target signals in the imaging system are the process of diffusion outwards from the center pixel and the gradient relationship of the anisotropy in different directions in each pixel is similar to that of the point spread function, the idea of anisotropy is introduced into background estimation in the paper and then improved to enhance the target signal.

Prediction of Anisotropic Background
3.1.Anisotropic Differential Principle.Background prediction is crucial for dim small target detection.On one hand, anisotropy is featured with the ability to smoothen and stabilize the background region and meanwhile reserve the marginal details and mutating zones in the background; the diffusion equation is where  is the grayscale image, ∇ is the gradient, (∇) is the edge stop function, and div is the divergence operator.The edge stop function (∇) calculates the smoothing coefficient based on relations of gradients in different directions.Literature [27] presents the anisotropic edge stopping function as follows: Wherein,  is a constant; for a flat region with small gradient, the (∇) value is large and will use high level of smoothening; for a mutation region with large gradient, the (∇) value is small and will use low level of or no smoothening, which will reserve these regions.

Improved Anisotropy for Background Prediction.
Analysis on images of dim small targets shows that the differences of each direction's features between the target region and other regions could serve to realize differentiated disposal of different characteristic regions.The gradient operator of the local region where dim small targets are located is shown in the following: ) . ( If the mean value of min 1 and min 2 , the two smallest parameters of the original anisotropic edge stopping function's four directions, is used to carry out pixel-by-pixel filtering of the image, and the values of parameters of the stable background and nonstable background are found to be relatively big, with relatively small parameter values of the singularity region (target signals), the target signals can only be reserved but cannot be enhanced.The min 1 and min 2 equation is as follows: To highlight the singularity region's signals of enhanced targets, the edge stopping function is improved as follows.Its function image is shown in Figure 1: In Figure 1, the horizontal axis is the direction gradient value and the vertical axis is the function value; the improved edge stopping function is a monotonically increasing function.As for the stationary and nonstationary regions of the infrared image, a small gradient value will cause a small edge stopping function value.As for singular region, the edge stop function value is larger when the gradient is larger.Substitute ( 5) into (3); then use (4) to evaluate the mean value of min 1 and min 2 , the two smallest parameters of the original anisotropic edge stopping function's four directions; then apply pixel-by-pixel filtering of the image with this mean value, and the dim small targets would be able to be highlighted smoothly.The filter equation is as follows: ) .

High-Order Cumulates Target Enhancement
4.1.High-Order Cumulates Principle.The high-order cumulates can effectively accumulate the space-time domain energy, suppress Gaussian noise, enhance the transient signal, and achieve the goal of enhancing dim and small target energy [20].As the noise of the infrared sequential image can be regarded as Gaussian noise, the following binary hypothesis is applied to the image with the removed background: where  0 (, , ) is the image with the removed background,  0 is the pixel of the area where the target is not located, and  1 is the pixel of the area which the target passes through. frame high-order cumulates can be described as follows: where   refers to cumulates of the target and   refers to cumulates of noise.Because the noise (, , ) obeys the Gaussian distribution,   = 0( → ∞), and   =   .After using the  high-order cumulates as the detection statistic, 4.2.Improved HOC.The original high-order cumulates only take the time domain's characteristics into consideration, which will inevitably affect the enhancement effect.To better accumulate its energy on the space domain, the space domain characteristics (motion information of the target) need to be considered.The target's movement in adjacent frames could be described as 12 forms in Figure 1.The first five forms are horizontal movement.The middle five forms are vertical movement and the last two forms are diagonal movement, as shown in Figure 2. Whichever direction the target movement is, it always exists in the continuous neighborhood region of the neighboring frames.Therefore, the moving target's energy could be accumulated inside the moving neighborhood region of the Here,   is the template of the target neighborhood region,  is the accumulative window radius, and  0 is the blockingout image sequence with the purpose of extracting the maximum values of the adjacent frames' target neighborhood regions as the accumulative value.The improved  frames of HQS could be defined as follows:

Local Energy Center of Sequential Image
Local energy center (LEC) of sequential image refers to the center of the target moving energy region formed by IR sequential image through continuous multiframe energy cumulates and determined by the target moving area , the target concentration degree , and the moving area multiple   .The specific formula is as follows: where   (, , ) is a local binary image of the candidate target at the  frame;   (, ) is a region where the moving trajectory of the target is projected onto the binary image and is obtained by a multiframe image or an operation; (⋅) refers to a function which is used to calculate the number of candidate targets in local region;  is the frequency at which the candidate target appear in a sequential region;  is the target moving area; _  is the mean area of the candidate target and is obtained by comparing the accumulated  frame area in a local region with the degree of concentration.

Proposed Algorithm.
Acquire the local energy center of sequential image, to seek for the cumulative frame length in (2+1)×(2+1) neighborhood of each candidate target point and use them as three elements of the ( − 1)/2 frame before and after the current frame.Then, according to the formula (13), sort them by size.Select the mass center of the candidate with the smallest serial number as the local energy center of sequential image.The expression is as follows: LEC = arg min (order () + order () + order (  )) , (13) where order(⋅) refers to the sorting function, min(⋅) represents the function which is used to achieve a minimum, and arg refers to the parameter satisfying the condition.The detection method is as shown in Algorithm 1.

Background Prediction Performance and Enhancement Effect Evaluation
For evaluation of background prediction result, in this study, we use three indexes, Mean Squared Error (MSE) [28], Structural Similarity (SSIM) [29], and local Signal-to-Noise Ratio Gain (GSNR) [30], to evaluate the effect of image background prediction.The enhancement effect of highorder cumulates is evaluated by using the target's average grayscale value (AGV) and the image's local signal-to-noise ratio (LSNR).(1) MSE is used to calculate the average error between each pixel value of the predicted background image and the real background image.The equation is as follows: where  is the predict background image;  is the real background image (because of dim and small infrared target is very dim, so use infrared image as real background image);  and  are image width and height, respectively.
(2) SSIM is used to evaluate the degree of similarity of geometric structure information of the predicted and the real background; the parameters are very effective for the evaluation of the performance of the image background prediction.The equation is as follows: where , , , and  are as defined above;   represents the real background pixels mean;   represents the real background of standard deviation;   is the background covariance;  1 and  2 are a small constant to ensure that denominator is not 0.
(3) GSNR is the mean value of signal-to-noise ratio in the sequence frames.The equation is as follows: where   is the maximum value of the target area;   is the mean value of the local region of the target;  is the standard deviation of the local region of the target.
(4) The formulas for AGV and LSNR are as follows: where  , is the grayscale value of the candidate target at row and column of (, ). is the total number of pixels occupied by the candidate target.  is the local mean value of the candidate target.  is the local background mean value, and   is the local background standard deviation.The size of the local background area is generally 3 times of the target area.3 × 3, the background area is the 9 × 9 range centering on the target.In this paper, the improved anisotropic background prediction method is used to predict the background.The edge stop function  2 is selected, with  = 120, step = 4.
The smaller the MSE value is, the smaller the error is, indicating that the background prediction effect is better.The closer the SSIM value is to 1, the closer the predicted background is to the real background.The larger the GSNR value is, the better the target enhancement effect of the difference image obtained from the background prediction is.Through comparison of the three performance indicators of MSE, SSIM, and GSNR, it can be seen that the improved anisotropic background prediction method is better than other background prediction algorithms in terms of the prediction effect.
Meanwhile, an image frame whose SNR is 0.86 is selected and the above methods in this paper are used to predict the background.The results are shown in Figure 3, in which (a) is the original infrared image where the target has been marked with a red rectangle; (b) shows the background prediction and difference and three-dimensional graphs of Top-Hat; (c) shows those of TDLMS; (d) shows those of the nonparametric method; (e) shows those of the anisotropic method; (f) shows those of the improved anisotropic background prediction method; (g) shows those of the single Gaussian method; (h) shows those of the fuzzy running average method; (i) shows those of the mixed of Gaussian method.
As can be seen from Figure 3, the backgrounds of traditional background prediction methods (Top-Hat and TDLMS) are blurred and there is a significant block effect.The nonparametric method needs to increase the number of training background frames in order to obtain a clearer background image.However, this will severely undermine the adaptability of the background model.The difference graphs obtained by single Gaussian, fuzzy running average, and mixed of Gaussian are prone to target drift or losses.They need different number of training frames in different scenes to obtain the background model, and they are only applicable to scenes where the background changes very slowly or the background is stationary.The anisotropic method can only negatively retain the target signal but cannot enhance the target signal.The improved anisotropic background prediction method can effectively eliminate most of the background in the image.It not only preserves the edge contours of stationary background and nonstationary background but also eliminates the problems of block effect and target drift.After calculating the difference with the original image, it can extract the candidate target and reduce the false alarm rate.

Enhancement Results and Analysis
7.2.1.Parameter Selection Analysis.The main parameters of high-order cumulates include radius  of the cumulative window and length  of the cumulative frame.To achieve effective accumulation of the energy of a moving target, the cumulative window radius , the cumulative frame length , and the target moving velocity V must satisfy the following equation: In order to reduce the accumulation of noise energy in the image, it is desirable to select the window radius  that is the minimum value satisfying (18).The relationship between the SNR gain and the cumulative length  after the accumulation is shown in Figure 4.It can be seen that when the cumulative length  is set to 4 frames, the SNR enhancement effect is better.In the experiments, the cumulative frame length is  = 4 and considering that the movement of a small target in long distance infrared imaging is slower (usually V ≤ 2 pix/s), the cumulative window radius is  = 4.

Results and Analysis.
In order to verify the enhancement effect of high-order cumulates, a simulation experiment is done on an image frame whose SNR is 1.05.The main parameters of the algorithm are as follows: cumulative window radius  = 4 and cumulative frame length  = 4.In Figure 5, (a) shows the difference graph and the corresponding 3D graph obtained by using the improved anisotropic method; (b) shows the image enhanced from (a) by using the original high-order cumulates (HOC) method and its corresponding 3D graph; (c) shows the image enhanced from (a) by using the improved high-order cumulates (IHOC) method and its corresponding 3D graph.The target position has been marked with a red rectangle.Table 6 describes the AGV and LSNR of the image after the original high-order cumulates and the improved high-order cumulates methods are used, respectively.It can be seen that the original and the improved high-order cumulates methods both enhance the dim and small target.On the whole, the improved high-order cumulates method provides a better enhancement effect as its image SNR improves more obviously and its overall performance is better.

Parameter Selection Analysis.
The detection effect of the local energy center of sequential image is related to the length  of the cumulative frame and the cumulative neighborhood size . Figure 6 shows the relationship curve between the detection rate (Pd) and the cumulative frame length .It can be seen that the target detection rate is the highest when  In order to evaluate the detection performance of proposed detection method, the infrared images of 10 different sequences are compared by using the pipeline filtering method [7], Wu et al. 's method [20], and proposed method, respectively.The average SNR, number of sequence frames, the detection rate (Pd), and the false alarm rate (Pf) are  filtering method, W is that of Wu et al. 's method, and E is that of proposed method.Figure 11 shows a comparison between the average SNR and the false alarm rates.The symbols in the figure are similar to those of Figure 10.
It can be seen from Table 7 that, under the same SNR, the detection rate of proposed method in this paper is the largest, and its false alarm rate is the lowest, followed by the proposed method in the literature [20].As can be seen from Figures 10 and 11, the detection rates of the three methods increase as the SNR increases, while the false alarm rate decreases as the SNR increases.For a dim and small target with local SNR less than 2.5 dB in the infrared sequence image, the proposed method can detect it well, and the detection rate is obviously improved and the false alarm rate is reduced compared with the other two methods under the same SNR.

Conclusions
In order to improve the detection and recognition ability of small targets in images, this paper first uses the improved anisotropy to predict the background and then adopts the improved high-order cumulates to enhance the target, and finally, on the basis of image background suppression and target enhancement, this paper proposes a new motion feature of local energy center of sequential image as a multiframe motion correlation detection algorithm.The algorithm does not need to predict a priori knowledge in advance, such as the motion velocity and direction.In addition, this method requires no excessive restrictions.Thus, compared to traditional methods, this method has a wider range of applications and is more in line with the needs of the actual infrared scene.The simulation experiment shows the following: (1) Overall performance of the improved anisotropy is better than other background prediction methods.For different SNR images, MSEs of the improved anisotropy are all less than 10.45, and MSE is lower when SNR is higher.SSIM of the improved anisotropy are all greater than 0.93 for different SNR images.For low SNR images, such as     and small target in infrared sequential image, of which the local SNR is less than 2.5 db.Under the same SNR condition, the detection rate of the method is obviously improved and the false alarm rate is reduced, compared with the pipeline filtering method.The method can detect a target with the lowest SNR of 0.86.

[10- 13 ]
Effectively enhancing the sparse feature difference between the target and the background and improving the detection accuracy through training Only adapted to stable or slowly changing background, and scenes with high SNR The proposed method Able to effectively detect scenes with low SNR (SNR < 3 dB) Requiring large amount of computation

Figure 7 (
a) is the first frame image of scenes A; (b) is a trajectory image sequence obtained by superimposing the binary images obtained by improving the anisotropic and the high-order accumulation methods; (c) is the trajectory image obtained by removing the noise from the local energy center of sequential image and superimposing the target detection results.Figure 8(a) is the first frame image of scenes B; (b) and (c) are images obtained by the same methods.Figure 9(a) is the first frame image of scenes C; (b) and (c) are images obtained by the same methods.It can be seen from Figures 7, 8, and 9 that using the local energy center of the sequence image can effectively eliminate the noise from the image and accurately detect the target.

Figure 6 :
Figure 6: The relationship curve between the detection rate and the number of cumulative frames.

Figure 7 :
Figure 7: Detection results of image scene A with the proposed method.

Figure 8 :
Figure 8: Detection results of image scene B with the proposed method.

TargetFigure 9 :Figure 10 :Figure 11 :
Figure 9: Detection results of image scene C with the proposed method.

Table 1 :
Advantages and disadvantages about different detection algorithms.

Table 2 :
Signal-noise ratio of 6 frames of images.

Table 3 :
MSE comparison between various background prediction methods.

Table 4 :
SSIM comparison between various background prediction methods.

Table 5 :
GSNR comparison between various background prediction methods.

Table 6 :
Comparison between the enhancement effects of the original high-order cumulates and the improved high-order cumulates.
A, B, and C with frame lengths of 85, 114, and 245, respectively, are selected for experiments.The target in scene A moves around a certain point randomly.The target in scene B is strongly mobile, which moves upward first and downward next, then suddenly accelerates to go upward obliquely, and finally turns around and goes downward.The target in scene C just moves obliquely and downward in a uniformly accelerated rectilinear motion.

Table 7 :
Simulation result of each sequence.