^{1}

Many modern visual tracking algorithms incorporate spatial pooling, max pooling, or average pooling, which is to achieve invariance to feature transformations and better robustness to occlusion, illumination change, and position variation. In this paper, max-average pooling method and Weight-selection strategy are proposed with a hybrid framework, which is combined with sparse representation and particle filter, to exploit the spatial information of an object and make good compromises to ensure the correctness of the results in this framework. Challenges can be well considered by the proposed algorithm. Experimental results demonstrate the effectiveness and robustness of the proposed algorithm compared with the state-of-the-art methods on challenging sequences.

Visual tracking has a wide range of applications in computer vision, such as space visual surveillance, driver assistance system, and visual navigation. The challenges in designing a robust visual tracking algorithm are caused by the presence of occlusion, background clutter, and illumination change.

Recently, many visual tracking algorithms have been developed to tackle these challenges using sparsity of the image. They usually combine other sophisticated methods to track the target object. Examples are sparse representation combined with learning [

However, in spite of the successes of the previous work, there still exist many limitations. As discussed by Boureau et al. [

Another popular tracking method is the sequential Monte Carlo methods, also known as particle filters, which recursively estimate target posterior with discrete sample-weight pairs in a dynamic Bayesian framework. The basic idea was introduced by Sarkar [

In our studies, we found that there are some similarities between sparse representation and particle filter: (1) the assumption is applied that the current tracking is based on correct result from the last frame, which also means that the result from every tracking is correct; (2) before the analysis of the current frame, they both sample particles; (3) they compute the weights for the particles to choose the best one. We employ the similarities to propose a novel framework for robust object tracking, which merges the same steps of sparse representation and particle filter. The proposed framework samples particles around the result tracked from the last frame, avoiding the particle degeneration. The two algorithms are illustrated in Figure

The illustrations for the algorithmic procedures and the proposed framework: (a) algorithm procedures of sparse representation and particle filter and (b) the proposed framework.

In this paper, we introduce a robust tracking method using a hybrid framework combined with sparse coding and particle filter, for which we propose max-average pooling to improve the traditional pooling method, and the Weight-Selection strategy to avoid drift during object tracking. The rest of the paper is organized as follows. Section

A typical assumption underlying sparse representation is that the tracking result in the last frame is enough accurate that the tracker could utilize it for the current prediction. Based on this assumption, we draw particles around the last result, the center point of the bounding box from the last frame, and make the dispersion of particles conforms to normal distribution. In this case, the particles will disperse around the result of the last frame.

The local patches within the target region can be represented as the linear combination of a few basic elements of the dictionary with the sparsity assumption. For the data in the current image

In many tracking methods, the earlier tracking results are stored longer than the newly acquired results since they are assumed to be more accurate [

The average pooling scheme for histogram generation used by He et al. [

Input: An image from the video sequences, the center point of the bounding box, the noise coefficient

(1) Sample

is the normal distribution, which is implemented by the function randn.

(2) Initialize the dictionary using tracking results of the first ten frames.

(3) Compute the formula to obtain the sparse codes:

Note that

(4) Pool the features by computing:

(5) Compute the error of reconstruction:

(6) Finally, obtain the particle which is corresponding to the minimum error.

The particle filter is a Bayesian sequential importance sampling technique for estimating the posterior distribution of state variables characterizing a dynamic system [

Let

To accomplish the tracking, we compute and update the importance weight of each particle. The posterior

Two bounding boxes for weights.

The above idea can be illustrated in Figure

Sparse representation and particle filter both have advantages and disadvantages of their own. The sparse representation method is more stable for tracking and able to consider the spatial information, while the particle filter has stronger adaptability for nonlinear and non-Gaussian distribution of continuous system than the other state-of-the-art methods. Nonetheless, for the sparse representation method, wrong reconstruction happens occasionally and it is the fatal problem which could cause drift when shape distortion occurs, while particle degeneration is also hard to solve for the particle filter.

In this paper, we combine the two methods for our tracking. Under the framework of particle filter, we employ the sampling method of sparse representation in the phases of initialization for each frame. We solve the optimal mathematical problem by using a popular tool called SPASM library [

Observing the two tracking procedures, we found that the results of sparse representation are more accurate than those of particle filter. However, during the tracking using the above method, there have been always some results jumping far away from an object; for example, Figure

Tracking results in ThreePostShop2cor with the problem (green: result of SR, yellow: result of PF, blue: the sampled particles, and red: the final tracking result).

The 95th frame

The 96th frame

Analyzing the problem above, the proposed max-average pooling method for the sparse reconstruction sometimes is not sufficiently accurate. This problem also exists in max pooling and average pooling method. The primary reason for tracking inaccuracy, even for the drift, is the sparse reconstruction error. Figure

Data analysis about reconstruction errors. (a) It illustrates the relationships between sparse reconstruction errors and the image sequences as occlusions come; (b) it describes the function of Weight-Selection strategy which decreases the impacts of reconstruction errors during the tracking. Value 1 or 2 represents the option for W-S strategy.

Reconstruction errors for sparse representation

Weight-Selection strategy for tracking

To lower the impact caused by the reconstruction errors, Zhang et al. [

We compute the histograms in the bounding boxes which take tracking points as centers. Comparing the histograms, respectively, with the result in the last frame, we choose the one that has smaller distance. Let

Through W-S strategy, we improve the performance of our tracker. The working situation about W-S strategy is shown in Figure

We demonstrate the proposed method on ThreePostShop2cor sequence. The results of the 95th and 96th frames are shown in Figure

Tracking results in ThreePostShop2cor with Weight-Selection strategy (green: result of SR, yellow: result of PF, blue: the sampled particles, and red: the final tracking result).

The 95th frame

The 96th frame

In the following, we provide a summary of the proposed tracking algorithm.

Locate the target in the first frame, either manually or by using an automated detector, and use a single particle and a bounding box to indicate this location.

Initialize the dictionary with the results by tracking the target object in the first ten frames using KNN method.

Advance to the next frame. Draw particles according to the dynamical model.

For each particle, extract the corresponding window from the current frame, calculate its reconstruction error using max-average pooling, and choose the particle with the minimum error to be the temporary result of sparse representation method. At the same time, calculate weights of every particle and choose the best one as the result of particle filter according to the particle filter principle.

Utilize Weight-Selection strategy and select the best one to be the final result.

Go to Step (3).

Our tracker works as a collector at the very beginning when it initializes the dictionary, which is also called templates based on tracker. Between the accuracy and the speed of the algorithm, there is actually a tradeoff. In the next section, we will discuss the implementation issues and analyze the experimental results.

The program of the proposed algorithm is implemented in Matlab r2012b and runs at about 1.5 frames per second on an Intel Core 3.2 GHz with 4 GB memory. We apply the affine transformation with six parameters to model the target motion between two consecutive frames. The sparse coding problem is solved with the SPAMS package [

We evaluate the performance of the proposed algorithm on three different kinds of challenging video sequences from the previous work [

Evaluation criteria are employed to quantitatively assess the performance of the trackers. Figure

Quantitative evaluation of the trackers in terms of position errors (in pixels). The center errors of the proposed algorithm (red) in comparison with ILVT (green) and TLD (blue) algorithm at three sequences.

Overall, the proposed algorithm performs well against the state-of-the-art methods. It is able to overcome the influences of the occlusions, illumination change, and pose variation. The performance of our method can be attributed to the efficient pooling method and W-S strategy.

The first sequence, ThreePastShop2cor, has been used in several recent tracking papers [

ThreePastShop2cor: the challenges are occlusions and pose variation (red: the proposed method; blue: TLD method; and green: ILVT method).

The second sequence, Car4, shown in Figure

Car4: a car moving underneath an overpass and trees. The challenges are the illumination change and pose variation (red: the proposed method; blue: TLD method; green: ILVT method).

We notice that, during the whole tracking, ILVT method and the proposed method track the object all the time. The proposed method is not as steady as ILVT method. The reason is the influence of “jump.” As long as it jumps to the place far away from the object, the W-S strategy makes good compromise which can ensure the correctness of the method. The error of the proposed method is similar to ILVT method. If we do not consider the W-S strategy, the tracker will drift. From the overall perspective, although the proposed method is not as steady as ILVT method, it is able to track the object and its rectangle includes much less background information than ILVT method.

The last video was taken on the bus in the evening which is shown in Figure

Bjbus103: the sequence is captured on the bus in the evening. The target object is the bus in front of the camera. The challenges are the illumination change and pose variation (red: the proposed method; blue: TLD method; green: ILVT method).

In this paper, we propose an efficient tracking algorithm based on sparse representation and particle filter using the proposed max-average pooling and Weight-Selection strategy. The proposed method exploits both spatial and local information of the target by max-average pooling and avoids the drift resulting from sparse reconstruction errors using Weight-Selection strategy. This helps optimization of the spatial and local information. In addition, the combination of the sparse representation and particle filter avoids the particle degeneration, highlights their respective advantages, and improves the performance of the algorithm. Experimental results demonstrate the effectiveness and robustness of the proposed algorithm compared with the state-of-the-art methods on challenging sequences.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work is supported by the National Basic Research Program of China (973Program) 2012CB821200 (Grant no. 2012CB821206) and the National Natural Science Foundation of China (Grant no. 61320106006).