Low-Rank Representation-Based Object Tracking Using Multitask Feature Learning with Joint Sparsity

and Applied Analysis 3 sparse feature considers possible appearance variations of object and background between candidates, and only a small number of candidates are required to reliably represent the observation of each candidate. In order to reduce sparse errors, we minimize the l 1,1 norm of error matrix E. This error assumption has been originally introduced in [9] in the presence of object occlusion. In order to solve this objective function in (4), we introduce three equality constraints and slack variables such asW 1 ,W 2 , andW 3 as min W 1 ,W 2 ,W 3 ,E λ 1 󵄩󵄩󵄩󵄩W1 󵄩󵄩󵄩󵄩∗ + λ2 󵄩󵄩󵄩󵄩W2 󵄩󵄩󵄩󵄩1,2 +λ 3 󵄩󵄩󵄩󵄩W3 󵄩󵄩󵄩󵄩1,1 + λ4 ‖E‖1,1 s.t. Y = UW + E, W = W 1 , W = W 2 , W = W 3 . (5) The above mentioned problem can be minimized using the conventional inexact augmented Lagrange multiplier (IALM)method [24] that has well-defined convergence properties for nonsmooth optimization which has been used in matrix rank minimization problems.This method is an iterative process that employs the Lagrangian function by adding quadratic penalty terms that assign high cost to the infeasible data and admit closed form updates for each unknown variable. Since the alternating direction method (ADM) [25] is introduced for the low-rankmatrix completion and is based on the alternating direction augmented Lagrangian method, we employADM for solvingminimization of ALM functions. 2.2. Optimization. To solve the objective function given in (5), we formulate the ALM-based ADMmodel [24, 26] as L (W,E,M 1−4 ) = λ 1 󵄩󵄩󵄩󵄩W1 󵄩󵄩󵄩󵄩∗ + λ2 󵄩󵄩󵄩󵄩W2 󵄩󵄩󵄩󵄩1,2 + λ3 󵄩󵄩󵄩󵄩W3 󵄩󵄩󵄩󵄩1,1 + λ4 ‖E‖1,1 + ⟨M 1 ,W 1 −W⟩ + ⟨M 2 ,W 2 −W⟩ + ⟨M 3 ,W 3 −W⟩


Introduction
Object tracking is one of the well-known problems in computer vision with many applications including intelligent surveillance, human-computer interface, and motion analysis.In spite of significant success, designing a robust object tracking algorithm remains still challenging issue due to factors from real-world scenarios such as severe occlusion, scale and illumination variations, background clutter, rotations, and fast motions.
An appearance model-based tracking method, which evaluates the likelihood of an observed image patch belonging to object class, considers some critical factors such as object representation and representation scheme.The object representation can be categorized by adopted features [1,2] and description models [3,4].The representation scheme can be either generative or discriminative.The generative methods regard the appearance modeling as finding the image observation with minimal reconstruction error [5,6].On the other hand, the discriminative methods focus on determining a decision region that distinguishes the object from the background [7,8].
Various object tracking methods based on object appearance models can handle only moderate changes and usually fail to track when the object appearance significantly changes.As a result, an appearance model learning process is required for robust object tracking under challenging issues such as object deformation.
Recently, sparse representation-based ℓ 1 norm minimization methods have been successfully employed for object tracking [9][10][11][12][13], where an object is represented as one of multiple candidates in the form of sparse linear combination of a dictionary that can be updated to maintain the optimal object appearance model.Although the sparse representation-based method can robustly track an object with partial occlusion, the computational cost for the ℓ 1 norm minimization in each frame is still expensive.
Bao et al. applied the accelerated proximal gradient (APG) [14] approach to efficiently solve the ℓ 1 minimization problems for object tracking.Structured multitask tracking (MTT) was proposed by mining object discriminative structure between different particles rather than individually learning each particle [15].Zhang et al. address their tracking performance as a fast solution of the MTT problem using Bao's learning method of joint particle representation.By decomposing the sparse coefficients into two matrices for maintaining inliers and outliers, the robust MTT-(RMTT-) based object tracking was proposed in [16].
The low-rank sparse tracking (LRST) was proposed by representing all samples using only a few templates [17].The LRST is improved with incremental learning method of low-rank features [18] and adaptive pruning with exploiting temporal consistency [19].
In spite of these improvements in MTT and LRST in the particle filter framework, the computational cost increases with the number of particles.Furthermore, MTT regards the sparse representations of sampled particles as independent state data without considering relationships between particles.For this reason, MTT may not work if the particles are drawn from specific probability distribution.The LRST-based methods are difficult to apply directly for object tracking in online video processing due to its structural computation complexity such as nuclear norm minimization.
To solve the above mentioned problems, we propose a novel object tracking algorithm based on multitask feature learning using joint sparsity and low-rank representation.We assume that the object representation can be incrementally optimized in the robust principal component analysis (RPCA) framework.The RPCA can be performed by decomposing the observations as the sum of a low-rank matrix and a sparse matrix; thus it can successfully recover the intrinsic subspace structure from corrupted observations.We extract features with low-rank representation within a few frames.After obtaining the subspace basis of object features, the features represented by all possible low-rank and the sparse property are learned using a variant of multitask feature learning framework.Finally, a novel incremental alternating direction method-(ADM-) based low-rank optimization strategy is efficiently applied for update of sparse error and features.The low-rank optimization problem for learning multitask features can be achieved by a few sequences of efficient closed form updating operation for the optimal state variables of object tracking.

Low-Rank Representation of
Object with Joint Sparsity where ‖ ⋅ ‖ 1,1 represents the ℓ 1,1 norm and  is a regularization parameter.
The assumption under RPCA framework [20], for disentangling the low-rank and sparse component, is that Z has singular vectors and is not sparse.The singular value decomposition of Z can be expressed as where  is the rank of Z, In feature learning literature, the multitask feature learning [22,23] employs the ℓ 2,1 norm minimization to involve features selection across tasks under a strict assumption that all tasks share a common underlying representation.
The goal of the proposed object tracking framework is to search particles that have the most similar feature to previous tracking result.Since particles are densely sampled around the current object state, in order to estimate the optimal low-rank features in iterative steps, we focus on all possible nonsmooth sparse errors, joint sparsity of features based on low-rank, and ℓ 1,1 and ℓ 1,2 constraints that produce solution for the within-and between-feature-tasks sparsity.With this consideration, we formulate the following object tracking problem by adopting three equality constraints as min where , and  4 are regularization parameters.
In the proposed object tracking formulation in (4), we should minimize the rank of the matrix for features of all object candidates.Since there is no closed form solution for the rank minimization problem, we replace the rank minimizing of matrix with its convex envelope by the nuclear norm as ‖W‖ * .Moreover, mining similarities between all particle structures can improve the tracking results.However, in the object tracking process, some candidates are completely different from others when particles are sampled from an abnormally large region.In order to address this problem, we place the matrix W in shared feature ‖W‖ 1,2 and nonshared sparse feature ‖W‖ 1,1 .The shared inlier feature exploits the similarities of particles while the nonshared sparse feature considers possible appearance variations of object and background between candidates, and only a small number of candidates are required to reliably represent the observation of each candidate.In order to reduce sparse errors, we minimize the ℓ 1,1 norm of error matrix E. This error assumption has been originally introduced in [9] in the presence of object occlusion.
In order to solve this objective function in (4), we introduce three equality constraints and slack variables such as W 1 , W 2 , and W 3 as min The above mentioned problem can be minimized using the conventional inexact augmented Lagrange multiplier (IALM) method [24] that has well-defined convergence properties for nonsmooth optimization which has been used in matrix rank minimization problems.This method is an iterative process that employs the Lagrangian function by adding quadratic penalty terms that assign high cost to the infeasible data and admit closed form updates for each unknown variable.Since the alternating direction method (ADM) [25] is introduced for the low-rank matrix completion and is based on the alternating direction augmented Lagrangian method, we employ ADM for solving minimization of ALM functions.

Optimization.
To solve the objective function given in (5), we formulate the ALM-based ADM model [24,26] as where M 1 , M 2 , M 3 , and M 4 are Lagrangian multipliers, and  > 0 is a penalty parameter.The proposed strategy for given variable update is performed by computing sparse coefficients based on the lowrank representation with updated error and jointly sparse features.Thus, we first update the sparse error E of the observation matrix Y by referring to the proof [24] as where For updating low-rank variable W 1 , we employ the proof in [27] such that low-rank matrix approximation problem with -norm data fidelity can be solved by a soft-thresholding operation on the singular values of observation matrix as where We update the W 2 and W 3 referring to the proof [24] as The ADM can perform minimization with its alternative property; thus we formulate the objective function for updating sparse coefficient W based on (6) as W is updated with the already updated sparse error E +1 , lowrank representation W +1 1 , ℓ 1,2 norm W +1 2 , and ℓ 1,1 norm W +1 3 using the following model: We obtain the closed form update framework as where ∇(W  ) is the partial derivative of (W) with respect to W  and J / () is a component-wise soft-thresholding operator defined by [28] such that if || ≤ /, J / () = 0; otherwise, J / () =  − sign()/.We set  = 0.25 and  = 1.

Convergence.
The convergence of the proposed object tracking algorithm can be guaranteed using the Karush-Kuhn-Tucker (KKT) point approach by referring to linearized ADM with adaptive penalty (LADMAP) approach [26].The LADMAP [26] aims to solve the following type of convex problems: min x,y  (x) +  (y) , where x, y, and c could be either vectors or matrices,  and  are convex functions, and A and B are linear mappings.In many problems,  and  are vector or matrix norms, and when A and B are identity mappings, the augmented Lagrangian functions of ( 14) for minimizing x and y have closed form solutions.However, when A and B are not identity mappings, they may not be easy to solve.Therefore, Lin et al. [26] proposed linearizing the quadratic penalty term in the augmented Lagrangian function and adding a proximal term for updating x and y, resulting in the following updating scheme: where  is the Lagrange multiplier,  is the penalty parameter, and   > 0 and   > 0 are some parameters for each norm.Lin et al. [26] proposed a strategy to adaptively update the penalty parameter  and proved that when  is nondecreasing and is upper bound and   > ‖A‖ 2 and   > ‖B‖ 2 , then (x  , y  ) converges to an optimal solution to (14), where ‖A‖ and ‖B‖ are the operator norm of A and B, respectively.For updating the penalty parameter  in (6) for the proposed problem, we apply the concept of LADMAP, where x, y, c, and  are W, E, Y, and M 1−4 , respectively, in (6), with some algebra: where  max is an upper bound of {  }.The value  is decided as where  0 ≥ 1 is a constant, and we used  0 = 1.9 in this work.The iterations equations ( 7)-( 13) and ( 16)-( 17) stop when the following two conditions are satisfied: where  1 > 0 and  2 > 0 are small thresholds.The above two stopping criteria are based on the KKT conditions of problem (6).Algorithm 1 summarizes the overall algorithm for the proposed low-rank representationbased multitask feature learning with joint sparsity.

Object Tracking.
Given a set of observed images at the th frame Y  = {Y 1 , Y 2 , . . ., Y  }, the goal of object tracking is to recursively estimate the state x  , maximizing the following state distribution: where (x  | x −1 ) denotes the motion model between two temporally consecutive states and (Y  | x  ) indicates the likelihood function.
In the motion model, we regard the state variable x  as six affine parameters [4] such as horizontal and vertical translations, rotation, scale, aspect ratio, and skew such as x  = [  ,   ,   ,   ,   ,   ].The motion model is assumed to a have random Gaussian distribution such as (x  | x −1 ) = N(x  ; x −1 , Φ), where Φ is a diagonal covariance matrix.
The solution of the proposed algorithm in Algorithm 1 can be obtained by minimizing the following functional as min (1) Input: A data set of  data points Y = [y 1 , y 2 , . . ., y  ], and parameters  1 ,  2 ,  3 ,  4 .
The likelihood function can be formulated by the reconstruction error given in [29] This likelihood function, however, cannot deal with severe appearance deformations.For this reason, the likelihood function is modified by an additionally labeled penalty constraint about the similarity in neighboring error matrices for the deformed region of the object such as |(1−A (+1) )−(1− A () )|, where A () and A (+1) , respectively, represent matrices with nonzero elements of E () and E (+1) .Thus, the proposed likelihood function is given as where ⊙ is the element-wise multiplication of matrices.

Experimental Results
The proposed object tracking algorithm is implemented and tested using ten challenging sequences.Major challenging issues in the test sequences include scale variation, shape deformation, fast motion, out-of-plane rotation, background clutter, object rotation, occlusion, illumination variation, out of view, motion blur, and low resolution.The proposed method is compared with a number of state-of-the-art tracking algorithms such as SMTT [15], APGL1 [14], SPCA [21], ASL [2], ILRF [18], and RMTT [16].The proposed object tracking algorithm is implemented in MATLAB and processes 1.5 frames per second on a Pentium 2.7 GHz dual core PC without any hardware accelerator such as GPU.For each test sequence, the initial location of the object is manually selected in the first frame.Each image sample from the target and background is normalized to a 32 × 32 patch.
We test different combinations of  1 ,  2 , and  3 about three different images sequences having challenging issues such as occlusion, deformation, and scale variation with 1500 frames and compute the central pixel errors from all frames.For each  1 in the discrete set with different  2 and  3 , we produce 10 × 10 average center pixel errors.When tracking performance is acceptable for giving the fixed  1 , we average all the 10 × 10 center pixel errors.With this attempt, we set the regularization parameters as  1 = 5,  2 = 0.5,  3 = 0.1, and  4 = 1.0.

Quantitative Evaluation.
For quantitative performance comparison, center pixel error evaluation method is used and overlap ratio criterion is computed.The center pixel error represents the distance between the predicted and the ground truth center pixels.Figure 1 shows the result of center pixel errors of seven different object tracking algorithms for ten test sequences.
The overlap ratio criterion represents the ratio between the number of frames for a specific object to be completely tracked and the total number of frames in the image sequence.In order to decide whether the object is successfully tracked, we employ the overlapped score defined in [30].Figure 2 shows overlap ratios between the ground truth region and the tracking region.existing methods except ASL [2] drift away object.On the other hand, the proposed tracking method can successfully track the object since it can represent the object appearance more completely using a large scale training data with efficient multitask feature learning for object representation.The moving object in the Football 1 sequence undergoes in-and out-of-plane rotation and background clutter.Experimental results demonstrate that the proposed method achieves the best tracking performance in this sequence.Other methods cannot avoid drift at some instances in the neighborhood of the 60th frame.
There is a blurry scale variation in Freeman 3 and Car Scale sequences.For this reason, it is difficult to predict the location of the moving object.Furthermore, it includes drastic appearance change caused by motion blur.The proposed method performs well since it can adapt to the scale and appearance change of the object and overcome the influence of motion blur by using joint sparsity-based object appearance representation.
In addition to scale variation and in-and out-of-plane object rotations, Couple and Crossing sequences include a critical factor such as fast motion.The fast motion can be involved by both camera and object.For tracking an object in Crossing sequence, the proposed method and SPCA [21] methods perform very well whereas other methods lost their object in neighborhood of the 30th and 80th frames.
The Liquor sequence contains severe motion blur, scale and illumination variation, and fast motion.Although full  occlusion occurs by a bottle with the similar color, the proposed tracking method successfully tracks the object without drifting.
In the Subway sequence, object's information is lost by discriminant factor such as background clutter and the total occlusion.The proposed method can successfully track the moving object since it preserves the sparsity of object appearance with optimal sparse coding using low-rank feature representation with considering the joint sparsity.Other trackers except ILRFT [18] and RMTT [16] drift in the neighborhood of the 20th and 40th frames.
The Skiing sequence contains severely deformed object.The proposed method successfully performs tracking due to its novel object likelihood function defined in (21).It employs more refined likelihood function formulated by the weighted reconstruction error for rigid object regions and by the labeling penalty constraint about the similarity in residual matrices for deformed object regions as well.Figure 3 shows the overall tracking results in each test sequence.The overlapped ratio between the ground truth and predicted tracking region using proposed method turns to 0 at around frame 40, while the center pixel error between the ground truth and predicted tracking region using proposed method still increases.From this fact, we conclude that the proposed method loses tracking in around frame 40, while other state-of-the-art tracking methods totally lose their tracking in around frame 10.

Conclusion
In this paper, we present an effective, robust object tracking method using a multitask feature learning-based low-rank representation with joint sparsity.In order to overcome the limitation of existing sparse representation-based object tracking methods, we employ the novel optimization process of low-rank representation of objects by using a recently proposed model minimization method.The efficient subspace learning-based sparse coding and simultaneous update method of both optimal sparse codes and error matrix can be appropriately updated in the process of tracking in case of severe appearance variation.Experimental results demonstrate that the proposed method can successfully track objects in various video sequences with critical issues such as occlusion, deformation, plane rotations, background clutter, motion blur, scale and illumination variations, and fast motion.

Algorithm 1 :
The algorithm for low-rank representation-based multitask feature learning.

3. 3 .Figure 1 :
Figure 1: Center pixel error.This figure shows center pixel error for ten test sequences.The proposed method is compared with six state-ofthe-art methods.

Figure 2 :
Figure 2: Overlap ratio evaluation.This figure shows overlap ratios for ten test sequences.The proposed method is compared with six stateof-the-art methods.
1 , . . .,   are positive singular values, and U = [ 1 , . . .,   ] and V = [V 1 , . . ., V  ] are the matrices of left-and right-singular projection vectors.The object tracking problem can be solved under the RPCA framework.Suppose we have  sampled particles and the correspondingly observed candidate of an object Y = [y 1 , . . ., y  ] at time .Each observation can be represented as a linear combination of  basis vectors {u 1 , . . ., u  }, which spans the data space such as Y = U  W  , where U  = [u 1 , . . ., u  ], and each column of W  = [w 1 , . . ., w  ]