Multifeatures Based Compressive Sensing Tracking

To benefit from the development of compressive sensing, we cast tracking as a sparse approximation problem in a particle filter framework based on multifeatures. In this framework, the target template is composed of multiple features extracted from visible and infrared frames; in addition, occlusion, interruption, and noises are addressed through a set of trivial templates. With this model, the sparsity is achieved via a compressive sensing approach without nonnegative constraints; then the residual between sparsity representation and the compressed sensing observation is used to measure the likelihood which weights particles. After that, the target template is adaptively updated according to the Bhattacharyya coefficients. Some experimental results demonstrate that the proposed tracker appears to have better robustness compared with four different algorithms.


Introduction
Visual tracking is an essential task in computer vision.It is applicable in many fields such as vehicle tracking, medical imaging, robotics, and surveillance.Many efforts have been paid in developing a robust visual tracking algorithm to overcome the challenges when occlusion, illumination changes, viewpoints variation, or noise interruption occurs [1][2][3].
With the help of sparse representation technique, [4] proposed the L1 tracker to perform a robust visual tracking, casting tracking as a sparse approximation problem under a particle filter framework.This approach shows better performance in dealing with occlusions, pose changes, and illumination changes, when compared with mean shift tracker, covariance tracker, and appearance adaptive particle filter.But it is computationally demanding because it needs to solve an  1 norm related minimization problem for many times.Hereafter many efforts are paid to accelerate tracking progress and improve robustness based on it: [5] developed a real-time compressive sensing tracking (RTCST) algorithm to reduce the computational complexity; meanwhile [6] promoted an accelerated proximal gradient approach instead of the interior point method for acceleration; the result shows that both algorithms perform with higher accuracy and robustness than standard L1 tracker in many complex scenarios but are still insufficient in accommodating extreme illumination variations.A main reason causing this insufficiency may be that both algorithms directly use the tailored target image to generate the target templates, which is not sensitive to environmental changes; besides, it will be unstable when tracking target is similar to background.To improve robustness and accuracy, a multifeature based method is employed in our approach, which leads us to treat the tracking target as a sparse representation in the linear span of multifeature space.
Actually, the multifeature methods are widely used in image fusion and face recognition; many attempts are taken to extend them to deal with tracking problems [7,8].A mixture of infrared and visible features is a typical method for modeling the tracking object.Reference [9] uses the mixed visible and infrared features and combines a mean shift method with the level set evolution algorithm to track visual objects; [10] employs the intensity cue and edge cue of infrared target as the feature template and applies a particle filter framework to track with pedestrian.All these algorithms could successfully cope with the complex environmental cases such as illumination change, shadow, and occlusion but should be improved to handle challenges under more severe conditions.To benefit from these previous works, we use some of the features mentioned above, together with a modified compressed sensing tracking method to receive a more robust tracking result.

Mathematical Problems in Engineering
The reminder of the paper is arranged as follows.We briefly review some related works in Section 2. Section 3 details the proposed tracker, including a sparse target template representation model, a compressive sensing tracking strategy, and an adaptive template update scheme.Experimental results and conclusions are given in Sections 4 and 5.

Related Work
2.1.Particle Filtering.The particle filter is a sequential Monte Carlo sampling method for Bayesian filtering; it provides a convenient framework for estimating the posterior distribution of state variables characterizing a dynamic system which is nonlinear and non-Gaussian.Assume that   is the state variable at time  which can be defined as affine motion or any other parameters reflecting the properties of the system.A predicting distribution of   given all available observations  1:−1 up to time  − 1 can be computed as When observation   is available at time , the state vector is updated using the Bayes rule the signal x is -sparse if ‖s‖ 0 = , in which ‖s‖ 0 counts the nonzero elements in vector s.The compressive sensing theory demonstrates that x can be recovered with an overwhelming probability by using few measurements: where Φ is incoherent with Ψ and Θ satisfies the RIP (restricted isometry property) [11].
Many studies indicate that the Gaussian measurement matrix Φ is stable to protect the salient information of any -sparse or compressible signal in dimensionality reduction.With a small constant , an  ×  iid Gaussian matrix Φ can be shown to have the RIP with high probability if  ≥  log(/); besides, for Φ being universal, Θ = ΦΨ will be iid Gaussian, thus having the RIP regardless of the choice of Ψ [12].
Since directly obtain s (which equals to x in the Ψdomain) via is an NP-complete problem, it is commonly relaxed to solve the optimization When considering noises, the modified optimization where  is a prespecified threshold.

L1 Tracker and Real-Time Compressive Sensing Tracking.
The  1 tracker considers the tracking target as a sparse representation in the linear span of target template set, along with noises.Given target template set where a = (a 1 , a 2 , . . ., a  )  ∈   is a target coefficient vector and e + ∈   , e − ∈   are called positive and negative trivial coefficient vectors; with this definition, at time , the coefficient vector c   of each candidate target y   is gained by solving the  1 regularized least squares problem where  = 1, . . .,   is the number of particles; then the residuals of particles r   = ‖y   −   a   ‖ 2 2 are imported into particle filter to estimate the tracking result [4].
Based on the model above, to reduce computational load, [6] modifies (10) to the following minimization model: where , e = [e + , e − ]  , and   is a parameter controlling energy in trivial templates; after that, an accelerated proximal gradient approach is proposed for solving problem (11).At the same time, RTCST uses compressive sensing to deal with the extremely high dimension of feature space in  1 tracker.Given the measurement matrix Φ, we get the measurement vector z = Φy = Φc; then a sparse coefficient vector c can be recovered from min ‖c‖ 1 s.
where  is the distance from region center; then the histogram at location   can be described as here is a normalization factor, {  },  = 1, . . .,  is the current coordinates of pixels in candidate target, and ℎ = √ℎ 2  + ℎ 2  is the bandwidth of kernel function. is a Kronecker delta function, where  = 1, . . .,   is the histogram bin and (  ) associates pixel   with the histogram bin according to the corresponding RGB or gray intensity.

LBP. LBP (local binary pattern
) is a type of feature characterizing the texture spectrum of an image; it is proved to be a powerful feature for texture classification.The original LBP feature first divides the examined window into cells and then, for each pixel  0 in a cell, compares it to each of its 8 neighbors {  },  = 1, . . ., 7; the LBP value can be calculated as After that, a histogram is computed over the cell to generate the feature vector for the window.) is modeled as a Gaussian distribution.

Compressive Sensing
In  1 tracker and RTCST, the dictionary  is composed of target template , positive trivial template , and negative trivial template −; meanwhile the coefficient vector is divided into target coefficient vector a, positive trivial coefficient vector e + , and negative trivial coefficient e − to enforce nonnegativity constraints.The main reason of exploiting nonnegativity constraints into account is that they help filter out clutter which is similar to target templates but in reversed intensity patterns; furthermore, it is normal that the target image in current frame appears to be more like that in previous frame.Since multifeatures have the potential of filtering clutter and occlusion, the nonnegativity constraints can be relaxed; besides, an extra occlusion handling method becomes less necessary; for example, when tracking with pedestrian, the occlusion may occur in visible image but not in related infrared image; in other words, infrared features help filter out the occlusions.While the target template set  = [t 1 , t 2 , . . ., t   ] is available after initialization or update, we model the observation as where  is the trivial template representing noises along with the trivial coefficient vector x  and x  is the target coefficient vector.
Under a certain threshold more weak correlated features will increase robustness and stability but will take more computational cost.To deal with this problem, we adopt the compressed sensing theory.Choosing random Gaussian matrix Φ ∈  × ,  , ∼ N(0, 1) as the measurement matrix, we take the measurements z  = Φz = Φx; then by using orthogonal matching pursuit (OMP), we recover the coefficient vector x from min ‖x‖ 1 , s.t.‖Φx − Φz‖ 2 ≤ . ( The main flow of multifeature based compressive sensing tracking (MFCST) algorithm is presented in Algorithm 1. variations of target due to illumination or pose changes; otherwise, frequently updating the template will accumulate errors and drift the tracker from target.We tackle this problem by adaptively updating the target template.Notice that x  can be seen as a sparse representation of z in the linear span of , which means that the bigger   is, the more important t  will be.The Bhattacharyya coefficient is applied to measure the similarity between target observation ẑ and the template t  which has maximum coefficient   , which can be defined as

Template
a typical value of (⋅) is between (0, 1); in addition, bigger (ẑ, t  ) means that ẑ is more similar to t  .We adopt two thresholds  2 >  1 to guide the updating progress, where (ẑ, t  ) <  2 indicates that ẑ is not so similar to t  , which reflects variations in tracking; moreover, when (ẑ, t  ) decreases under a certain threshold that is (ẑ, t  ) <  1 , we consider it as a strong interference occurring in the tracking progress.The template update scheme is summarized in Algorithm 2.

Experiments
We test our algorithm on many real-world visible and infrared video sequences obtained from the data set 03 OSU Color-Thermal Database of the OTCBVS Benchmark Dataset [15]; three scenarios of two videos are involved in our experiments.To evaluate the performance of the proposed tracking framework, we compare the tracking results of our proposed multifeature based compressive sensing tracking (MFCST) with the adaptive multicue particle filter (AMC-PF) [10], the tracking method based on infrared and visible dual-channel video (IVDT) [9], the  1 tracker [4], and the real-time compressive sensing tracking (RTCST) [5].All targets are manually marked in the first tracking frame without careful selection.
The test sequences used in our first and second experiments S1 and S2 are based on location 2 in the data set.In S1, two pedestrians intersect with each other, which causes a partial occlusion.From the visible frames, we can see that the whole scene is under well illumination; two pedestrians are distinct from both each other and the background but in corresponding infrared frames; these two pedestrians are similar to each other.Some tracking results are given in Figure 1, where the frame indexes are labeled in the upper left corner of the images.It can be observed that although all trackers can well follow the target pedestrian, AMC-PF, RTCST, and MFCST do more precise work than those of  1 tracker and IVDT.
The second experiment S2 is based on the scene in which target pedestrian passes by a static car.The target pedestrian has similar appearance as the front part and tire of the car in both visible and corresponding infrared frames.Results in Figure 2 indicate that AMC-PF drifts to the car when target occludes the tire.This is because AMC-PF uses the intensity cue and edge cue of the infrared image; however, the appearance of pedestrian and the front part of the car in infrared frame is easily confused.
The third experiment S3 considers a more complex scene; we use a video sequence from location 1 in the OSU Color-Thermal Database and then test 5 trackers mentioned above from frame 220 to 429.Since it is hard to distinct target pedestrian from background, we manually labeled the tracking target in the first infrared frame (frame 220) at the beginning of our experiment S3, without careful selection.The whole video sequence can be described in 5 clips.In the first clip, which covers from starting frame 220 to frame 250, the target walking in the shadow is confused with background in visible frame but is distinct in the infrared frame.As target walks straightforward, in clip 2, from frame 250 to 322, the tracked pedestrian particle occludes the first street lamp, which we sign as lamp 1; then, as target walks out of shadow, the illumination conditions are drastically changed.
A heavy occlusion appears in clip 3, from frame 322 to 374; when target passes by the second street lamp, street lamp 2 completely occludes the pedestrian both in visible and infrared image.After that, target is particle occluded by pedestrian 2, wearing a red coat, which is obviously different from target pedestrian in visible frame; after several frames, the target walks by pedestrian 2 and encounters the third street lamp; these sequences are described in clip 4, from frame 374 to 390.At last, in clip 5, from frame 390 to 427, the target walks into a square and finishes the whole tracking process.The performance of 5 methods is shown in Figure 3, from which we can see that the  1 tracker and RTCST failed to track with target in clip 1; the main reason may be that these two algorithms only use pixel information of visible images; unfortunately, it is severely a hard work to distinguish target from background in these visible frames.AMC-PF drifts to street lamp 1 from clip 2, where the illumination is severely changed and lamp 1 occludes the target both in visible and infrared frames.The dual-channel method IVDT succeeded in clips 1 and 2 but began to lose its target from clip 3. Meanwhile our proposed MFCST accurately tracks the target pedestrian throughout the whole sequence; this indicates that our approach appears to have better robustness in dealing with heavy occlusions and drastic illumination changes.In addition, when comparing with multifeature tracking methods AMC-PF and IVDT, our approach takes less measurements; furthermore, comparing with  1 and RTCST, our algorithm receives better accuracy.

Conclusion
We model the tracking target as a sparse representation in the linear span of multifeature space.To generate the dictionary, we use RGB histogram and local binary patterns (LBP) of the visible frames with intensity histogram of the infrared frames to create the target template  and then combine  with the trivial template .A compressive sensing method is employed under particle filter framework to reduce the high dimension of the feature space and solve the coefficients for sparse representation.For further robustness, we introduce an adaptive template update scheme for this system.The proposed tracker receives robust and stable performance in dealing with occlusions and illumination variations.