Scale-Adaptive Context-Aware Correlation Filter with Output Constraints for Visual Target Tracking

Context-aware correlation filter tracker is one of the most advanced target trackers, and it has significant improvement in tracking accuracy and success rate compared with traditional trackers. However, because the complexity of background in the process of tracking can lead to inaccurate output response of target tracking, an accurate tracking model is difficult to be established. Moreover, the drift problem is easy to occur during the tracking process due to the imprecise tracking model, especially when the target has large area occlusion, fast motion, and deformation. Aiming at the drift problem in the target tracking process, a novel algorithm is proposed in this paper. -e developed method derives the specific representation of constraint output by assuming that the output response is Gaussian distribution, and a variable update parameter is obtained based on the output constraint relationship at first, then the tracking filter is selectively updated with changeable update parameters and fixed update parameters, and finally, the target scale is updated with maximizing posterior probability distribution. -e effectiveness of developed algorithm is verified by comparing with other trackers on OTB-50 and OTB-100 evaluation benchmark datasets, and the experimental results have shown that the suggested tracker has higher overall object tracking performance than other trackers.


Introduction
With the rapid development of computer technology, computer vision plays an increasingly important role in our lives [1,2]. e scope of computer vision research is quite extensive, including face recognition [3], vehicle or pedestrian detection [4,5], target tracking [6,7], and image generation [8][9][10][11]. Visual target tracking has become one of the most influential research fields in computer vision because it is widely used in video surveillance [12], intelligent transportation, and military guidance [13,14]. With the successful application of visual tracking technology in human life, visual object tracking is used to more and more complex environment, such as illumination variation, occlusions, fast motion, and deformation and background clutter, and these complex factors bring great challenges to the stable tracking of targets [15][16][17][18]. e target tracking algorithm can be divided into the production tracking algorithm and discriminant tracking algorithm according to whether the background information of the target is utilized or not [19][20][21]. e production tracking algorithm uses only target information and discards background information. Although the production algorithm can shorten the computation time, it gives up useful background information which can lead to lower accuracy of tracking the target [22]. e discriminating tracking algorithm takes the tracking problem as a binary classification problem, and the target area is marked with a positive sample and the background region is signed with a negative sample in the current frame [23]. e target area and the background area are distinguished by a classifier, which is obtained by training a large number of samples using a classification method of ridge regression [24].
Because the discriminant tracking algorithm has better tracking performance than the production tracking algorithm, it has been extensively used for target tracking in recent years. e context-aware correlation filtering (CACF) tracker is a discriminant tracking algorithm, and it has better tracking performance than many other algorithms for dealing with some complex targets [25]. e algorithm uses target information and background information around target area as sample input and takes target position as sample output in the current frame. e target position is determined by using the correlation filter and the target position information of the previous frame. is algorithm can improve tracking accuracy and ensure real-time target tracking. However, the target is prone to drift and causes tracking to fail when the CACF tracker tracks a fast-moving target. Moreover, because the CACF tracker uses a fixed learning rate to update the model, it is easy to cause the model to be updated inaccurately when the tracked target has occlusion or deformation. is inaccurate model update method may cause the target to drift in the tracking process.
Although there are many trackers that have achieved excellent results in dealing with some of these problems, there are still some problems to be solved [26]. Aiming at the drift problem that is easy to occur in the tracking process, this paper proposes the scale adaptive context-aware filter with constraint for output response (OURS) to reduce target drift. e presented method in this paper mainly includes three innovations: (1) a variable parameter is found based on Gaussian distribution and correlation filtering at first, and then the filter is updated using the obtained variable parameter by assuming that the output response is Gaussian distribution. (2) e variable updating parameter with output response under Gaussian constraint and the fixed updating parameter are used to selectively update the filter.
(3) e maximum posterior probability distribution is used to update the adaptive scale of the target.

Correlation Filter Tracker.
e correlation filtering algorithm can greatly reduce computational complexity by converting the time domain calculation into the frequency domain calculation, and it attracts attention of many researchers and applies it to visual target tracking. e principle of the correlation filtering algorithm is to produce a correlation response peak when an interesting object is encountered. e correlation filter has great application in the target tracking field because the tracking speed of traditional algorithms is slow. Bolme et al. [27] applied the correlation filter to target tracking for the first time and proposed the minimum output sum of the squared error tracker (MOSSE), and the tracker is obtained by calculating the least squared error between the actual output and the expected output. en, many researchers were inspired by the MOSSE tracker and proposed a variety of improved target tracking algorithms. Henriques et al. [28] suggested the loop structure kernel correlation tracker (CSK) to improve the target tracking speed by using intensive sampling to get samples. Yuan et al. [29] proposed a metric learning model for visual tracking in the relevant filtering framework, which used the metric learning function to solve the target size problem. Ou et al. [30] developed a new method for selecting representative samples based on the coefficient-constrained model, which regarded the template as a linear combination of representative samples.
In order to further improve the target tracking accuracy, Danelljan et al. [31] presented a real-time visual tracker with adaptive color attributes (CN) by embedding multichannel color attributes into kernel space to obtain an adaptive correlation filtering method. In order to deal with the impact of illumination variation on target tracking, Henriques et al. [32] developed the kernel correlation filter with high-speed tracking (KCF) by applying the histogram of oriented gradient (HOG) feature to the correlation filtering algorithm. e KCF tracker performs multichannel expansion of the linear correlation filter by introducing a linear kernel function, so the computational efficiency and the tracking performance were greatly improved. e traditional correlation filtering algorithm uses a fixed-size window to construct tracker that can estimate the displacement change of tracking target, but it cannot effectively handle the scale change of the target. In response to changes of the target scale, Li et al. [33] proposed a scaleadaptive kernel correlation filtering tracker (SAMF) by fusing color features and HOG features as the new feature of the tracker and determined the optimal target position by comparing the maximum output amplitude with different scales. Taking the advantages of color features and HOG features, Bertinetto et al. [34] suggested a real-time tracking color supplement learning tracker that uses features consisted of color features and HOG features. e successful application of correlation filter in target tracking can greatly improves the computational efficiency of the tracker, but there are still some unresolved problems, such as the drift problem is caused by fast motion, occlusion, and deformation.

Context-Aware Trackers.
Context information can provide very significant ancillary information for tracking the target. e context-aware tracker not only selects the target area as sample but also selects background information around the target area as context area [35]. e whole sample space of the context-aware algorithm includes target area sample and context area sample. e specific situation is shown in Figure 1. e context information may provide important supplementary information to target detection and tracking, and it can better identify the target's state. A context-aware visual tracking tracker (CAT) that found auxiliary target by online learning and provided context information for the target was proposed [36]. e CAT tracker can reduce uncertainty in the target tracking process and improve tracking performance. In order to effectively suppress the interference influence, a context-aware sparse tracker (CEST) that used context particle information to create dynamically updating dictionary template was suggested [37].
Although the CAT has achieved some successful results in target tracking, it is still unable to deal with the drift problem caused by fast motion, occlusion, and deformation. In order to solve the drift problem of tracking process, this paper proposes a scale adaptive context-aware correlation filter with constraints for output response. e experimental results have shown that the proposed algorithm can reduce the drift of target tracking and improve the target tracking performance.

Scale-Adaptive CACF with Output Constraints
is section describes the principle of the context-aware correlation filtering (CACF) tracker, analyzes filter updating when the output response is Gaussian distribution, introduces the scale adaptive updating method, and provides the specific implementation steps of the suggested method.

Context-Aware Correlation Filter.
e classifier of the traditional correlation filter tracking algorithm is trained by means of ridge regression. Its objective function is where A 0 is a sample set which is obtained by cyclic sampling the target area, λ 1 is the regular term coefficient, y is the expected output, and ω is the correlation filter. e symbol A j (j ≥ 0) is a cyclic matrix, and it is constructed with a special matrix F. e A j and its transposed matrix A T j can be obtained as follows where a j (j ≥ 0) is the target sample, a j is its Fourier transform, and a * j is its complex conjugate. e correlation filter ω can be solved by equation (1) and equation (2) as According to equation (2) (j � 0) and equation (3), the Fourier form of correlation filter ω is In order to estimate the target position, the response and the confidence map of sampling data need to be found. Let the input is z, and the output is f(z): en, the Fourier transform of f(z) is where y is the Fourier form of output response, z is the Fourier form of input z, and ⊙ is the element-wise product.
In order to further simplify the calculation, we set ω � A T 0 α, and the simplified result of equation (3) is According to equation (2) (j � 0) and equation (7), the Fourier form of correlation filter α is Because ω � A T 0 α, equations (5) and (6) can be calculated as e context-aware filter not only takes the target area as a sample, but also collects background information around the target area as an auxiliary sample. A new filter is obtained by ridge regression, and the objective function of contextaware filter is where A 0 is the sample set obtained by cyclic sampling of the target area, A i is the set of samples gained by cyclic sampling of the context area, y is the expected output, ω is the new filter parameter, and λ 1 and λ 2 are the regular term coefficients of the tracking system. e cyclic matrix can be constructed with equation (2), and set B and y as en, the objective function f(ω, B) from equations (10) and (11) is e solution ω of equation (12) can be computed as According to equations (11) and (13), the Fourier form of correlation filter ω is Letting ω � B T α, we know from equations (13) and (14) that By taking equation (15) into equation (5), the following equation can be obtained: Equation (18) can be computed according to equation (16) and equation (6):

Filter Updating with Gaussian Distribution.
is section discusses the filter updating when the output is unconstrained and when the output response is constrained as Gaussian distribution.

Filter Updating with Unconstrained Output.
Let y be the expected output and f(z) be the actual output, and then the tracking problem can be expressed as is a nonlinear transformation and is recorded as the input, y i is the output, and B is a small constant. Equation (20) is a Lagrangian equation which is transformed by equation (19): e expressions of variables α and ω can be obtained by solving the Lagrangian equation as follows: Both the model x t (where t is the frame index) and the filter α t are generally updated with a fixed coefficient η as

Filter Updating with Constrained
Output. e focus of this section is to introduce a filter with constraints for output response. Assuming that the target output response is satisfied with Gaussian distribution, the objective function can be expressed as where y i is the Gaussian representation of the i th sample, ω T is the correlation filter for the i th frame, φ i is a nonlinear transformation for the i th frame, μ t and σ 2,t are the mean and the variance of the Gaussian model p, and y t is a variable defined by the Gaussian function. Based on the maximum likelihood theory, the optimal solution of Gaussian distribution is It can be seen from equation (25) that only the change of y t value can affect the optimal solution of Gaussian distribution, and equation (25) can be simplified as Set y t � ω T,t x t (where x t represents the t th sample in this sample set); then, In order to simplify the calculation amount of optimization process, the following operations are performed on equation (27). According to the theory of Gaussian distribution, the mean μ t and μ t+1 are updated as where x t is the appearance of the learning target in the t th frame, iteratively acquired by where x t is the target in the t th frame. By tacking equation (29) into equation (28), we can obtain Adding (− ω T,t x t ) to both sides of equation (31) and then multiplying (− 1) at both sides of the formula, we can obtain By taking equation (30) into equation (31), we can have where x t can be approximately equal to x t− 1 when there is no significant difference between two adjacent frames, and equation (34) can be approximated as e simplified eqaution (35) is It can be seen from equation (36) that ω T,t x t − μ t also takes the minimum value when ω t − ω t− 1 takes the minimum value. erefore, as long as the optimal solution of ω t − ω t− 1 is computed, the optimal solution of ω T,t x t − μ t is obtained as well.
Because the calculation of ω t − ω t− 12 is still somewhat complicate, a new variable β t i is introduced for further simplifying the calculation, and according to equation (16), we can obtain by setting e solution of ω t − ω t− 12 is transformed into the solution of β t − β t− 12 , and a new Lagrangian function can be inferred as Letting Mathematical Problems in Engineering 5 e partial derivative of λ in equation (39) is en, the Fourier transform of equation (40) is e simplification of equation (41) can be Letting τ � (F(k) + λ/F(k) + λ + 4λs) and taking τ into equation (42), equation (43) can be obtained as where μ t and σ 2,t are used to select the samples as shown in equation (44) and τ is a matrix with the same size as F(α). Different from the correlation filter using the threshold to detect the fault condition, this paper considers that the Gaussian prior property can prevent from the drift well. e proposed method uses Gaussian prior to select samples when their response output is Gaussian distribution, and it is satisfied with the following equation: where G is the empirical value, y t is the maximum output response, μ t is the mean, and σ t is the standard deviation. According to the basic principles of mean and variance, we know that μ t and σ 2,t are where y i is the maximum output response of the i th frame.

Scale Updating with Maximum Posterior Probability
Distribution. At present, many scale updating methods are based on the likelihood principle. In the tracking detection stage, the maximum possibility of each scale is used to find the optimal scale in the scale pool. We can set the size of scale template as S T � (S x , S y ) and its zoom pool as S � t 1 t 2 . . . t k at first, and then we can use k windows with the t i s t | t i ∈ S size to find the target and use bilinear interpolation to adjust the sample size to the fixed template size S T by assuming that the target window is s t in the original image. e response is calculated as follows: where z t i is the sample patch with the t i s t size, K xz is the correlation operation between the sample z and template x, and α is the filter. According to equation (47), the maximum response is After a set of data is obtained using the response function f(z t i ), then the maximum function is used to find its maximum scalar. Because the target motion is hidden in the response graph, it is necessary to use t i to adjust the final displacement to get the real motion deviation.
Although the maximum likelihood method has made some achievements in dealing with the problem of the target scale, it fails to solve the problem of the complex target scale. In order to solve the scale problem of the complex target, this paper adopts the maximum posterior probability distribution method instead of the maximum likelihood method, and the details are shown as where s i is the i th scale, y is the output in the current frame, P(y | s i ) is the likelihood which can be calculated from the maximum response peak value at the i th scale, y s is the optimal scale response of the current frame, and P(s i ) is a possibility to follow a Gaussian distribution, which can be obtained by centering around the previous scale and the standard deviation which is set in the experiment.

Steps of the Proposed Algorithm.
e detailed steps of the proposed algorithm are summarized in Algorithm 1 based on the results of studies in Sections 3.1, 3.2, and 3.3.

Experiments
e proposed OURS tracker in this paper is compared with some excellent trackers in the standard test set OTB-50 and OTB-100. In the OTB dataset, each sequence has 11 attributes including illumination variation (IV), out-of-plane rotation (OPR), scale variation (SV), occlusion (OCC), deformation (DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-view (OV), background clutter (BC), and low resolution (LR) that represent challenging aspects of target tracking. e comparison algorithms include CSK [28], CACF [25], KCF [32], and Staple [34]. e precision curve is the percentage of estimated position which is within the given threshold of the ground truth, and the success curve is the coincidence rate between the predicted target position and the actual target position. e precision curve and the success curve are taken as evaluation criteria in our work. When the coincidence rate of this frame is greater than the given threshold, it is recorded as successful tracking, otherwise it fails. In this study, the threshold of the precision curve is set to 20 px, and the threshold of the success curve is set to 0.5.

Parameter Setup.
is experiment tests the effect of parameters on the OURS tracker. For example, an experiment is completed based on a subset of OTB with different s values as shown in Figure 2, and the precision is the highest when s is 1050. In order to choose the best constraint G in equation (43), this study tested the different values in Figure 3, the precision is not changed much when G is varied from 1.0 to 1.5, and the precision is the highest when G is 1.7, so we take G is 1.7 in this work. In the relevant filtering part, Inputs: b 0 : target location and size of the first frame x 0 : the target model of the first frame α 0 : the correlation filter of the first frame λ 1 , λ 2 : the regular term coefficients of the tracking system Outputs: b n− 1 : target location and size of the previous frame x n : the target model of the current frame α n : the correlation filter of the current frame P si : maximum probability of the i th scale y scale : optimal scale response of the current frame y n : the maximum correlation filter response value of the current frame P n : target location of the current frame Preprocessing position and model (1) Initialize the bounding box of the target b 0 � [x 0 , y 0 , w, h] (2) When the frame number n is less than 11, calculate the best mean and the variance of Gaussian constraint with equations (45) and (46) (3) Get a search window based on b n− 1 (n ≥ 1), calculate the maximum correlation response y according to equation (16), and then mark the maximum correlation filter response value as y n , and the target position P n is obtained from y n (4) Continue step (5) if n < 11 , and go to step 8 if n ≥ 11 (5) Update the target model x n from equation (22)  (6) Update the correlation filter α n with equation (23) (7) Return to step (3) Model updating (8) Update the target model x n using equation (22) Scale updating (9) Calculate the response P si according to equations (47) and (48) (10) e maximum posterior probability is calculated by equation (47), and the optimal scale response y scale is obtained from equation (50) Filter updating if |y − μ/σ| < G (11) Update the correlation filter α n according to equation (43) else (12) Update the correlation filter α n using equation (23) end (13) Repeat step (3) Until End of the video sequence End ALGORITHM 1: Scale adaptive context-aware correlation filter with output constraints for target tracking.

Experiments on the Full Dataset OTB-50.
is work is done by evaluating OPE and evaluates the tracker by running the tracker on 51 benchmark video sequences, and the actual ground position is initialized in the first frame and the average accuracy is derived. e precision and the success rate of various trackers are tested in the standard test set OTB-2013, and their test results are shown in Figure 4. e proposed OURS tracker achieves a score of 0.845 for the precision low threshold value of 20, and it achieves a score of 0.774 for the success rate above the overlap threshold value of 0.5; the OURS tracker achieves the best performance in these trackers. e comparison of the OURS tracker and CACF tracker is summarized in Table 1. e OURS tracker gets higher precision than the CACF tracker when dealing with sequences containing illumination variation (IV), scale variation (SV), out-ofplane rotation (OPR), occlusion (OCC), deformation (DEF), in-plane rotation (IPR), motion blur (MB), and fast motion (FM) attributes. e running speed of the OURS tracker and other trackers is shown in Figure 5. It can be seen from Figure 5 that though the running speed of the CSK tracker is the highest and the running speed of the OURS tracker is the lowest, the running speed of the OURS tracker is only 13 frames slower than the CACF tracker. Comprehensive analysis from the high performance and running speed of the OURS tracker shows that the OURS tracker still has high application value.

Experiments with Various Challenging Sequence Attributes on OTB-50.
e attributes of standard test set annotations describe the challenges that the tracker faces within each video sequence. is test set allows the researcher to judge and characterize the behavior of the tracker without having to analyze each video sequence. In Figure 6, this experiment counts the precision of each attribute, and it is easy to see that the OURS tracker works the best properties. e success rate of each attribute is shown in Table 2, and the maximum value of success rates is represented by a bold font. It can be seen from Table 2 that the OURS tracker has the highest success rate when dealing with illumination variation (IV), out-of-plane rotation (OPR), scale variation (SV), occlusion (OCC), deformation (DEF), motion blur (MB), and fast motion (FM). Although the success rate of Staple tracker is highest when dealing with in-plane rotation (IPR) and the success rate of CACF tracker has the highest success rate when dealing with out-of-view (OV), background clutter (BC), and low resolution (LR), the OURS tracker has also achieved the near-optimal performance in these four attributes. In a word, the OURS tracker achieves better performance in most cases.

Experiments on the Full Dataset OTB-100.
To better verify the superiority of OURS tracker, the accuracy and success rate of the OURS tracker and other trackers are tested on the standard test set OTB-100, and the test results are shown in Figure 7. According to Figure 7, we know that the accuracy scores of OURS tracker and CACF tracker are 0.827 and 0.793 when the threshold value is lower than 20, so the tracking accuracy of OURS tracker is 0.034 higher than CACF tracker. e success rate scores of OURS tracker and CACF tracker are 0.718 and 0.699 when the overlap threshold value is more than 0.5, so the tracking success rate of OURS tracker is 0.019 higher than that of CACF tracker. In order to better highlight the value of OURS tracker, the comprehensive performance of five trackers is shown in Table 3. It can be seen from Table 3 that the tracking accuracy and tracking success rate of OURS tracker is the highest among the 5 trackers. Although the running speed of the OURS tracker is lower, it is only 14.13 frames slower than the running speed of CACF tracker. In the case of low running speed requirement, the OURS tracker has better tracker performance than other trackers.

Experiments with Various Challenging Sequence Attributes on OTB-100.
To verify the effectiveness of the OURS tracker, the OURS tracker is again compared with other trackers on the standard test set OTB-100. It can be seen from Figure 8 that the OURS tracker has achieved the best tracking performance when tracking objects with occlusion, geometric deformation, and fast movement. By solving the tracking performance of these main target attributes, the problem of tracking target drift is greatly alleviated. e OURS tracker has the best tracking performance on most attributes. In addition, we know from Figure 9 that the success rate of OURS tracker has obvious advantages compared with other trackers on most attributes. In summary, the OURS tracker achieves the best tracking performance compared with other trackers in almost all attributes.

Experiments under Different Video Sequences.
In order to more accurately describe the performance of the OURS tracker, this paper selects nine video sequences and tests their precision. e precision of nine video sequences in this experiment is shown in Table 4, and the best results are reported in bold. According to the test results in Table 4, the OURS tracker has the highest precision when tracking Bolt, Coke, Couple, Jumping, Freeman1, Freeman4, Jogging, and Girl video sequences, and it also achieved near-highest precision when tracking Football video sequences. e OURS tracker performs better than other four trackers.
is experiment provides a qualitative comparison of OURS tracker with the existed tracker in order to verify the stability of OURS tracker. ese video sequences constitute different challenging situations including illumination variation (IV), out-of-plane rotation (OPR), scale variation (SV), in-plane rotation (IPR), background clutter (BC), occlusion (OCC), deformation (DEF), fast motion (FM), and motion blur (MB). e actual tracking results of OURS tracker, CACF tracker, and KCF tracker in four video sequences are shown in Figure 10. Figure 10(a) shows that the KCF tracker cannot stably track video sequences that contain scale variation attribute, but the OURS tracker and   (29). (c) OPEdeformation (19). (d) OPE-scale variation (28). (e) OPE-background clutter (21). (f ) OPE-out-of-plane rotation (39). (g) OPE-in-plane rotation (31). (h) OPE-illumination variation (25). (i) OPE-motion blur (12). (j) OPE-out-of-view (6). (k) OPE-low resolution (4).    Figure 10(b) shows that the CACF tracker cannot successfully track the Bolt sequence when the video sequence contains in-plane rotation attributes, but the OURS tracker and the KCF tracker can handle these issues very well. Figure 10(c) shows that only the OURS tracker can stably track Freeman video sequences and neither the KCF tracker nor CACF tracker can successfully track the sequence. Figure 10(d) shows that only the OURS tracker can successfully track the Jumping video sequence when the video contains fast motion and motion blur properties. In summary, the OURS tracker has excellent stability.

Conclusion
In this work, a scale-adaptive context-aware correlation filtering algorithm with constrained output response for object tracking is proposed: (1) this study assumes that the output response is Gaussian distribution, and a variable updating parameter is found according to Gaussian output constraints.
(2) the filter is updated with variable updating parameters when the output response is Gaussian constraint, and the filter is updated with fixed updating parameters when the output response is not any Gaussian constraint. (3) the optimal scale of target is obtained by using the maximum posterior probability distribution. e proposed OURS tracker in this paper performs in the research cases because it has the following advantages. To begin with, the OURS tracker can provide a new filter that can get a more accurate model. In addition, the suggested tracker adopts a selectively updating strategy to effectively increase the tracking accuracy. Finally, the maximum posterior probability method is used to obtain a more accurate target scale. e experimental results have shown that the proposed tracker achieves better tracking performance than other trackers when dealing with drift problems caused by fast motion, deformation, and occlusion. erefore, the developed tracker significantly improves the ability of the CACF tracker to handle drift problems and achieves better performance than many other trackers.
Further research will focus on two aspects: (1) we will focus on solving the problem for low resolution and out-ofview to achieve higher tracking performance. (2) We are only studying the single target tracking problem, so the next step is to study the tracking of multiple targets with output response constraint which is satisfied with Gaussian distribution.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.