We present a new method for object tracking; we use an efficient local search scheme based on the Kalman filter and the probability product kernel (KFPPK) to find the image region with a histogram most similar to the histogram of the tracked target. Experimental results verify the effectiveness of this proposed system.
1. Introduction
Real time object tracking is one of the most active research areas in computer vision. The goal of object tracking is to estimate the locations and motion parameters of the target in a video sequence given the initialized position in the first frame. Research in tracking plays a key role in understanding motion and structure of objects. It finds numerous applications including surveillance [1], human-computer interaction [2], traffic pattern analysis [3], recognition [4], and medical image processing [5], to name a few. Although object tracking has been studied for several decades and numerous tracking algorithms have been proposed for different tasks, it remains to be a very challenging problem. One of the most challenging factors in object tracking is to account for appearance variation of the target object caused by change of illumination, deformation, and pose. In addition, occlusion, motion blur, and camera view angle also pose significant difficulties for algorithms to track target objects. Furthermore, there exists no single tracking method that can be successfully applied to all tasks and situations.
In this paper we present a new method for real time tracking of complex nonrigid objects. This new method successfully coped with camera motion, partial occlusions, and target scale variations. The scale/shape of the object is approximated by an ellipse and its appearance by histogram-based features derived from local image properties. We use an efficient search scheme (Kalman filter [6, 7], using the probability product kernels [8–11] and integral image [12] as a similarity measure) to find the image region with a histogram most similar to the target of the object tracker. In this work, we address the problem of scale/shape adaptation and orientation changes of the target. The proposed approach (KFPPK) is compared with MKF [13]. In [13] the authors build a color histogram-based visual representation regularized by a spatially smooth isotropic kernel. Using the Bhattacharyya kernel [8] as the similarity measure, a mean shift procedure is performed for object localization by finding the basin of attraction of the local maxima. However, the tracker MKF only considers color information and therefore ignores other useful information such as target scale variations, resulting in the sensitivity to background clutters and occlusions. Extensive experiments are performed to testify the proposed method and validate its robustness and effectiveness to track the scale and orientation changes of the target in real time.
The rest of the paper is organized as follows: Section 2 presents the probability product kernels. Section 3 introduces basic Kalman filter for object tracking. Section 4 presents scale estimation. Section 5 presents the proposed approach and Section 6 the experiment result. Section 7 concludes the paper.
2. Probability Product Kernel
In this paper we define a class of kernels Kρ:P×P→R on the space of normalized discrete distributions over some index set Ω. Specifically, we define the general probability product kernel [8, 9] between distributions p and q as(1)Kρp,q=∑kpkρqkρ, where ρ is a parameter and p(k) and q(k) are probability distributions defined on the space Ω, since, for any p1,p2,p3,…,pN∈Ω, the Gram matrix K consisting of elements Kij=Kρ(pi,pj) is positive semidefinite:(2)∑i∑jαiαjKρpi,pi=∑k∑iαipikρ2≥0, for α1,α2,α3,…,αN∈R. Different ρ values correspond to different types of probability product kernels. For ρ=1, we have(3)K1p,q=∑kpkqk=IEpqk=IEqpk.
We call this the probability product kernels defined by K(p,q)=∑kp(k)q(k). We denote the histogram of target of object tracking T as hT and the number of pixels inside T as |T|, which is also equal to the sum over bins |T|=∑khT(k). Let q be the normalized version of hT given by q=hT/|T|, so we can consider q as a discrete distribution, with ∑kq(k)=1. Let p be the normalized histogram obtained in the frames of the video sequence. For the k-bin of hT, its value is obtained by counting the pixels that are mapped to the index k:(4)hTk=∑x∈Tδbx-k, where δt is the Kronecker delta with δt=1 if t=0 and δt=0 otherwise. The mapping function b(x) maps a pixel x to its corresponding bin index. The computation of the probability product kernels can be expressed as(5)Kp,q=∑kpkqk=∑kpk1T∑x∈Tδbx-k=1T∑x∈T∑kpkδbx-k=1T∑x∈Tpbx.
Therefore, the computation of the probability product kernels can be done by taking the sum of values p(b(x)) within target T. The output of the following algorithm is a support map using integral image to compute the similarity measure between target and candidate region from each frame of the video sequence.
3. Kalman Filter
The Kalman filter has many uses, including applications in control [14], navigation [15], and computer vision. One important field of computer vision is the object tracking. Different movement conditions and occlusions can hinder the vision tracking of an object. In this paper, we present the use of the Kalman filter in the object tracking. We considered using the capacity of the Kalman filter to allow small occlusions and complex movements of objects [16]. Furthermore, the Kalman filter is a framework for predicting a process state and using measurements to correct or “update” these predictions.
3.1. State Prediction
For each time step k, a Kalman filter first makes a prediction s^k of the state at this time step:(6)s^k=A×sk-1, where sk-1 is a vector representing process state at time k-1 and A is a process transition matrix. The Kalman filter concludes the state prediction steps by projecting estimate error covariance Pk- forward one time step:(7)Pk-=A×Pk-1×At+W, where Pk-1 is a matrix representing error covariance in the state prediction at time k-1 and W is the process noise covariance (or the uncertainty in our model of the process).
3.2. State Correction
After predicting the state s^k (and its error covariance) at time k using the state prediction steps, the Kalman filter next uses measurements to “correct” its prediction during the measurement update steps.
First, the Kalman filter computes a Kalman gain Kk, which is later used to correct the state estimate s^k:(8)Kk=Pk-×Pk-+Rk-1, where R is measurement noise covariance. Determining Rk for a set of measurements is often difficult. In our contribution we calculated R dynamically from the measurement algorithms state.
Using Kalman gain Kk and measurements zk from time step k, we can update the state estimate:(9)s^k=s^k+Kk×zk-s^k. Conventionally, the measurements zk are often derived from sensors. In our approach, measurements zk are instead the output of the tracking algorithm given the same input: one frame of a streaming video and the most likely x and y coordinates of the target object in this frame (taking the first two dimensions of s^k).
The final step of the Kalman filter iteration is to update the error covariance Pk- into Pk:(10)Pk=I-Kk×Pk-. The updated error covariance will be significantly decreased if the measurements are accurate (some entries in Rk are low) or only slightly decreased if the measurements are noisy (all of Rk is high).
4. Target Model Update
A target is represented by an ellipsoidal region in the image in order to eliminate the influence of different target dimensions [13]. We will denote a dataset of independent samples by χ={x1,…,xN}. Let us assume that the probability density function Gaussian p(x)=ℵ(xi,b(xi),V) is a good generative model for our data [17]. Gaussian density function is proposed to automatically determine the number of density functions and their associated parameters, including the pixel locations xi, covariance V, and location of the target b(xi). The advantages of using the covariance matrix representation are as follows:
It is robust to illumination changes, occlusion, and shape deformations.
It can capture the intrinsic self-correlation properties of object appearance.
It is low dimensional.
Those advantages lead to the computational efficiency:(11)V0=∑ixi-bxixi-bxit.
We represent an object O=[o1,…,ok]t by modeling the spatial-color joint probability distribution of the corresponding region with a Gaussian density function. We define b(xi): R2→1,…,K to be the function that assigns a color value of the pixel at location xi. We employ the normalized RGB color space as the color feature in our case; we have used a normalized RGB histogram to provide some robustness to illumination change. We can write the features of spatial position and the color as(12)Ok=∑i=1Nv0ℵxi,bxi,V0δbxi-k, where δ is the Kronecker delta function. We use the Gaussian kernel N to rely more on the pixels in the middle of the object and to assign smaller weights to the less reliable pixels at the borders of the objects. We use only the Nv0 pixels from a finite neighborhood of the kernel and the pixels further.
The color histogram that describes the appearance of the region is rk(b(x),V) and the value of the k-bins is calculated by(13)rkbx,V=∑i=1Nvℵxi,bxi,Vδbxi-k.
The weights of the value of k-bins in object are calculated by(14)ωi=∑k=1KOkrkbx,Vδbxi-k.
The new distribution of color histogram p is calculated by(15)pi=ωiℵxi,bxi,V∑i=1Nℵxi,bxi,V.
The covariance matrix update can be used to approximate the new shape of the object and it is calculated by(16)V=β∑i=1Npixi,bxixi,bxit,and we should use β=1.5. The correct value for β depends on the noise that is present in the image sequence.
5. Proposed Approach
To ensure good organization of the progress of the work, we used the benefits of modular design in our implemented approach using MATLAB. The goal of object tracking is to generate the trajectory of an object over time by discovering its exact position in every frame of the video sequence. The steps of the object tracking system are shown in Figure 1.
Flowchart for the proposed approach (KFPPK).
The proposed approach (KFPPK) for object tracking is composed of five blocks named block processing, block prediction, block tracking, block correction, and block result. The functions of these blocks are as follows.
Block Processing. The algorithm starts the video sequence and converts it into images processing for extracting color information of images and the target of the object tracking.
Block Prediction. This step attempts to evaluate how the state of the target will change by feeding it through a state prediction of the Kalman filter. The state prediction serves two purposes: the time update equations are responsible for projecting forward (in time) the current state and the error covariance estimates for obtaining the a priori estimate for the next time step.
Block Tracking. In this block we use the probability product kernels combined with integral image as a similarity measure to find the image region with a histogram most similar to the target of the object tracker. Furthermore, we estimate the scale/shape and orientation of the object tracker using target model update.
Block Correction. The block correction update equations are responsible for the feedback. That is used for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate. The time update equations can also be considered as predictor equations, while the measurement update equations can be thought of as corrector equations based on block tracking.
Block Result. Tracking trajectory of the object is done on the basis of the region properties of the object such as scale/shape and centroid.
The algorithm of the proposed approach (KFPPK) can be explained as follows.
Step 1.
Start video sequence and select the target of the object tracker in the first frame.
Step 2.
In this stage we use state prediction of Kalman filter to estimate how the state of a target will change by feeding it through a current state and error covariance estimates to obtain a priori estimate for the next time step. This step uses (1) and (2).
Step 3.
Calculate the similarity measure (p(b(x))) between the target model and candidate regions using (5) and estimate the shape and orientation of the object tracker using (13).
Step 4.
Correct and update equations into a priori estimate to obtain an improved a posteriori estimate, using (9) and (10).
Step 5.
Draw trajectory. And go to Step 2 in the next frame.
6. Experimental Results
This section mainly shows the tracking results using the proposed approach (KFPPK). Experiments on real video sequences have been carried out to evaluate the performance of the proposed algorithm. In these experiments, we compared our system with existing algorithms MKF [13]. The proposed algorithm achieves good estimation accuracy of the scale and orientation of the object in the video sequence. We used different sequences; each has its own characteristics but the use of a single object in movement is a commonality between these different sequences. We then set up experiments to list the estimated width, height, trajectory, and orientation of the object.
In this work, we selected normalized RGB color space as the feature space and it was quantised into 16 × 16 × 16 bins to compare between algorithms. One synthetic video sequence and one real video sequence are used in the experiments.
We first use a synthetic ellipse sequence (where the resolution is 352×240, the frame rate is 25 fps, and the number of frames is 77) which is used to verify the efficiency of the KFPPK. As shown in Figure 2, the external ellipses represent the target candidate regions, which are used to estimate the real targets, that is, the inner ellipses. The experimental results show that the KFPPK is reliable for estimating the mean position and trajectory (as shown in Figure 3) of the ellipse with scale and orientation changes. Meanwhile, the results by the MKF [13] algorithm are not as performant as the proposed approach KFPPK because of significant scale and orientation changes of the object tracker.
Tracking results of the synthetic ellipse sequence by different tracking algorithms. Frames 1, 9, 28, and 77 are displayed.
Proposed approach (KFPPK)
MKF approach
Trajectory results in the synthetic ellipse video sequence by the proposed approach KFPPK and the MKF tracking algorithms. We can see the KFPPK is closer to the real trajectory than the MKF tracking algorithm.
The second test is an occlusion sequence (where the resolution is 320×240, the frame rate is 25 fps, and there are 193 frames) which is used to verify the efficiency of the KFPPK on a more complex sequence. The object exhibits large scale changes with partial occlusion. We can see the results by the proposed approach (KFPPK) and the MKF [13] in Figure 4. We observe that the KFPPK works much better in estimating the scale and orientation of the target, especially when occlusion occurs.
Tracking results of the occlusion sequence by different tracking algorithms. Frames 1, 31, 35, 44, 148, and 193 are displayed.
Proposed approach (KFPPK)
MKF approach
Table 1 lists the average time on the videos sequences. We notice that our proposed approach (KFPPK) has a better average time of execution than MKF [13] algorithm.
The average time by different methods on the videos sequences.
Methods/sequences
KFPPK
MKF [13]
Ellipse (77 frames)
0.07 s
0.48 s
Occlusion (193 frames)
0.08 s
0.56 s
6.1. Tracking Performance Evaluation
The main objective in this section is to evaluate the performance of our method using Receiver Operator Characteristic (ROC) curves. The ROC curves show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples. ROC curves are defined via two major rates expressed as follows:(17)True Positive Rate=TPTP+FN,False Positive Rate=FPFP+TN,where TP (true positives) are examples correctly labeled as positives, FP (false positives) refer to negative examples incorrectly labeled as positive, TN (true negatives) correspond to negatives correctly labeled as negative, and FN (false negatives) refer to positive examples incorrectly labeled as negative.
We studied the ROC curves of the proposed algorithm and compared them with the MKF [13] algorithm. Figure 5 compares the performance of the algorithms using ROC curves for each video sequence. The KFPPK algorithm has a higher performance in regard to the MKF [13] algorithm.
ROC curves by different algorithms on the different videos sequences.
Occlusion sequence
Synthetic ellipse sequence
The experimental results demonstrate that the KFPPK is robust in the tracking of the trajectory of objects in different situations (scale variation, pose, rotation, and occlusion). It can be seen that the KFPPK achieves good estimation accuracy in real time of the scale and orientation of the target.
7. Conclusion
In this paper, the proposed approach (KFPPK) has been presented for tracking a single moving object in the video sequence using color information. In this approach we combine the Kalman filter and probability product kernel as a similarity measure using integral image to compute the histograms of all possible target regions of object tracking in the video sequence. The new KFPPK has been compared with the state-of-the-art algorithm on a very large dataset of tracking sequences and it exceeds in performance in the processing speed. The extensive experiments are performed to testify the KFPPK and validate its robustness to the scale and orientation changes of the target in real time. This implemented system can be applied to any computer vision application for moving object detection and tracking.
Conflict of Interests
The authors declare that they have no competing interests.
HaritaogluI.HarwoodD.DavisL.W4s: a real-time system for detecting and tracking peopleProceedings of IEEE Conference on Computer Vision and Pattern RecognitionJune 1998962968de La GorceM.ParagiosN.FleetD. J.Model-based hand tracking with texture, shading and self-occlusionsProceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08)June 2008Anchorage, Alaska, USAIEEE1810.1109/cvpr.2008.45877522-s2.0-51949111555SunZ.BebisG.MillerR.On-road vehicle detection: a review200628569471110.1109/tpami.2006.1042-s2.0-33645139298KimM.KumarS.PavlovicV.RowleyH.Face tracking and recognition with visual constraints in real-world videosProceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08)June 2008Anchorage, Alaska, USA1810.1109/CVPR.2008.4587572ZhouX. S.ComaniciuD.GuptaA.An information fusion framework for robust shape tracking200527111512910.1109/TPAMI.2005.32-s2.0-13344249749JebaraT.KondorR.Bhattacharyya and expected likelihood kernels2003Berlin, GermanySpringer5771JebaraT.KondorR.HowardA.Probability product kernels20045819844MR2248001KalmanR. E.A new approach to linear filtering and prediction problems1960821354510.1115/1.3662552SalhiA.JammoussiA. Y.Object tracking system using Camshift, Meanshift and Kalman filter201264674-679AbdelaliH. A.EssannouniF.EssannouniL.AboutajdineD.Algorithm for moving object detection and tracking in video sequence using color featureProceedings of the 2nd World Conference on Complex Systems (WCCS '14)November 2014Agadir, Morocco69069310.1109/ICoCS.2014.7060999AbdelaliH. A.EssannouniF.EssannouniL.AboutajdineD.A new moving object tracking method using particle filter and probability product kernelProceedings of the Intelligent Systems and Computer Vision (ISCV '15)March 2015Fez, MoroccoIEEE1610.1109/ISACV.2015.7105546ViolaP.JonesM. J.Robust real-time face detection200457213715410.1023/b:visi.0000013087.49260.fb2-s2.0-2142812371ComaniciuD.RameshV.MeerP.Kernel-based object tracking200325556457710.1109/TPAMI.2003.11959912-s2.0-0038633569GhanaiM.ChafaaK.2009InTechNingX.FangJ.An autonomous celestial navigation method for LEO satellite based on unscented Kalman filter and information fusion2007112-322222810.1016/j.ast.2006.12.003ZBL1195.700402-s2.0-33947591439CuevasE.ZaldivarD.RojasR.Vision tracking predictionProceedings of the International Congress on Computer Sciences, Biomedical, Engineering and Electronics (CONCIBE '05)October 2005Guadalajara, MéxicoWangH.SuterD.SchindlerK.ShenC.Adaptive object tracking based on an effective appearance filter20072991661166710.1109/tpami.2007.1112