Analysis of the Occlusion Interference Problem in Target Tracking

As an indispensable part in the field of computer vision, target tracking has been widely used in intelligent transportation, missile guidance, unmanned aerial vehicle (UAV) tracking, and many other fields. It has become one of the hot directions in computer vision in recent years, while occlusion problem has always been a great difficulty and challenge in the process of target tracking. In this article, the problem of occlusion interference in target tracking is described, and the solution of occlusion problem is proposed based on different occlusion conditions. Due to the disadvantages of feature point center weighting, multiparticle template matching, and Kalman filter trajectory prediction algorithms in different cases, some algorithms with higher robustness and stability are developed to solve the occlusion problem. In the analysis of the anti-occlusion model, it is found that some tracking errors caused by occlusion can be solved by improving the quality of negative training samples and enriching the diversity of positive sample sets. According to the different training characteristics of online and offline tracking algorithms, the anti-oc-clusion model suitable for an active learning algorithm under different tracking conditions is found, and the tracking algorithm and characteristics of the active learning algorithm are listed, which is helpful to select the suitable tracking model in different scenarios. Finally, the future development of occlusion problem in target tracking is prospected.


Introduction
Computer vision can replace human brain perception and understanding of the external world through the simulation of human visual function, and image and video processing. In 1982, Marr [1] proposed a relatively complete visual theory system framework.
is theory provides a relatively clear theoretical system for computer vision, which not only promotes the development of computer vision, but also lays a solid theoretical foundation for the wide application of computer vision technology in various elds [2]. Target tracking is one of the basic hot issues in the eld of vision research and has been widely applied in the intelligent transportation system [3], advanced assisted driving [4], missile manufacturing [5], medical diagnosis [6], video surveillance, and other aspects [7] in recent years. e accuracy, robustness, and e ciency of target tracking directly a ect the development of subsequent research work such as target recognition and behavior analysis. erefore, how to improve the accuracy of target tracking has become a hot topic for scholars at home and abroad. e main challenges of target tracking come from the interference of tracking scene and the uncertainty of target motion state. Target tracking is confronted with problems such as illumination change, internal and external rotation of plane, target deformation, scale change, background clutter, and occlusion, among which occlusion is one of the biggest di culties and challenges of target tracking. Most of the tracking methods are easy to introduce background error results when the tracking target is blocked, which leads to the tracking contamination by the model and ultimately leads to tracking failure. Table 1 presents the data of the tracking algorithm competition in VOT (visual object-tracking challenge) platform [8] in recent years. As can be seen from Table 1, in the VOT competition from 2017 to 2019, the accuracy of both scale change and motion change increased from more than 40 percent to more than 50 percent, and only the tracking failure caused by occlusion maintained an accuracy of about 40 percent.
is shows that occlusion is a very challenging problem among various challenges that affect the tracking accuracy.
In the process of target tracking, the existence of various uncertain factors makes the tracking target blocked by unexpected objects. e tracking performance degradation caused by occlusion is not instantaneous, but a gradual process from entering occlusion state-continuous occlusion state-out of occlusion state. So far, the methods to solve the occlusion problem can be roughly divided into three cases, namely, target tracking algorithm based on predicted target trajectory [9], target tracking method based on space-time context [10], and target tracking method based on large-scale random detection [11]. With the promotion of deep learning in the field of target tracking, the training network is used to learn the general changing characteristics of the target, which provides a new idea to solve the problem of target occlusion [12].
In this article, the occlusion interference problem in target tracking is analyzed. In addition to the traditional methods, the solutions with higher robustness and accuracy, such as target tracking algorithm for predicting target trajectory, target tracking algorithm based on space-time context information, target tracking algorithm based on large-scale random detection, and target tracking algorithm based on deep learning, are analyzed and studied. By improving the quality of negative training samples and enriching the diversity of positive sample sets, an adaptive tracking anti-occlusion model based on active learning is found and its representative tracking algorithms and characteristics are listed.

Analysis of Occlusion Interference
Shielding interference is an important factor, which leads to the decrease in target tracking accuracy in the long term, and it is also an inevitable problem in the long-term tracking process. In different scenarios, different shielding interference problems have different influences on target positioning, so adopting different coping strategies for different shielding interference situations can reflect the stability of the tracking algorithm, and effectively classifying different jamming situations is also the prerequisite to solve the problem of shielding interference.

Tracing Scenarios.
Simple scene refers to the single background of the tracking scene without too many texture features and similar targets, as shown in Figure 1.
In the simple background, the environment is relatively simple, and there is no similar target interference and no big deformation of the tracked target during target tracking, so the impact on target positioning accuracy is small, and the response value of the tracked target is large, which makes it easy to achieve target tracking and positioning. In this way, the tracking algorithm in simple scenes is suitable for verification and debugging of a certain component of the tracking algorithm because of its few external interference factors.
Complex scene refers to the background texture information of the tracking scene, which is rich, and there may be similar targets, background features similar to target features, targets may be blocked, targets may be deformed, or lighting changes, as shown in Figure 2.
In complex scenes, there are many background interference factors, so it is difficult to make prediction in advance to prevent interference, and the tracker needs to deal with the influence of interference in real time. is also tests the robustness and stability of the tracker algorithm. erefore, robustness and stability tracking in complex background is an essential part of the tracking algorithm.

Shielding Duration.
When the tracked target is in the process of occlusion within 20 frames, it is short-time occlusion, as shown in Figure 3.
In the case of short-time occlusion, the time from entering occlusion to leaving occlusion is short, so the error information introduced into the target feature template is less. Due to the short occlusion time and limited movement range of the tracking target, the tracking target is probably still within the location sampling range of the tracker. When the tracking target leaves the occlusion, the tracker will produce a drift for a short period of time near the position of the tracking target and finally recapture the target. However, such short-term drift may lead to a decrease in tracking accuracy.
In the tracking process, when the tracked target is in the occlusion process of more than 20 frames, it is regarded as long-term occlusion, as shown in Figure 4.   In the process of target tracking, tracking the target model in each frame is updated in real time; therefore, under the shade for a long time, a lot of background information error is introduced to the target feature template, which can cause the tracker serious pollution, the destruction of the tracking target model, and the low degree of confidence, and will eventually lead to the failure of tracking location drift.

Occlusion Range.
According to the size of occlusion range, target tracking can be divided into partial occlusion and severe occlusion.
In target tracking, when the part of tracking target obscured by background is less than 40%, it is regarded as partial occlusion, as shown in Figure 5. Partial occlusion will cover up part of target features, leading to lower confidence of tracker. As the tracking feature template is being updated all the time, it will lead to the continuous introduction of wrong background information and gradually deepen the pollution degree of the tracking target model, and the tracking accuracy will decrease significantly in the following frames. To alleviate this problem, we must find a good feature extraction method to keep tracking. And different feature extraction methods will have some differences when applied to the same tracking algorithm.
When the tracking target is blocked by the background more than 40%, it is regarded as seriously blocked; when the target is completely blocked by the background, it is regarded as fully blocked, as shown in Figure 6.
In the case of severe occlusion or total occlusion, the tracker cannot correctly locate the tracking target because there is too much information for the target to be occluded and too little information for positioning. In addition, the sampling range of the tracker is limited, but the distance between its positioning coordinates and the position of the tracking target is too far. As a result, the model of the tracker will be damaged more and more seriously, resulting in tracking failure. erefore, stopping the update of the target model when occlusion occurs, activating the detector to scan in a certain range until the target reappears, and continuing to keep tracking with the tracker is one of the effective methods to solve the problem of severe occlusion or full occlusion.

Traditional Solutions.
e traditional methods to solve the occlusion problem mainly include the following four methods: (1) Target tracking algorithm based on feature point center weighting is algorithm, represented by the mean-shift algorithm [13], firstly finds the distribution of the main feature points of the tracking target and then maximizes the weight of the feature points of the target center. e weight of the other feature points is carried out in inverse proportion to the distance from the target center. Finally, the tracking target is located through iterative search.
(2) Template matching tracking algorithm based on multiparticle e multiparticle template matching tracking algorithm, in essence, divides the tracking target into multiple tracking targets according to different regions for template matching and calculates the position of the tracking target through the position of   multiple particles [14]. Particle filter has no special requirement for tracking model, can maintain the multimode distribution of state, and is not susceptible to clutter, so it has been greatly developed in the field of tracking. But the conventional particle filter tracking algorithm has some problems such as large computation and low sampling efficiency.
(3) Track tracking algorithm using the Kalman filter When the tracking target is in the occlusion state, the Kalman filter tracking algorithm will predict the trajectory of the tracking target in the occlusion process according to the recorded coordinates of the previous target [15]. e Kalman filter has certain limitations on the tracking model, which can only deal with linear, Gaussian, and single-mode situations. However, in the application of image tracking, the actual tracking situation is often changeable and complex, which makes the application of the Kalman filter limited to a certain extent. (4) Target tracking algorithm based on multi-algorithm fusion For example, Kalman filter, mean-shift algorithm, and SVM classifier are fused in literature [16]. When the target is moving normally or partially occluded, the Kalman filter algorithm is used to predict the target position in the current frame according to the target position information of the previous frame, which is used as the iteration initial point of mean shift. In this way, the distance between the iteration initial point and the actual position of the target can be shortened, so as to reduce the number of iterations and improve the calculation of the algorithm. When the target is seriously blocked, an SVM detector is used to assist positioning, so that the redetected target position is used as the starting point of the mean-shift algorithm to continue iterative tracking.
With the continuous deepening of domestic and foreign scholars' research, some algorithms with higher robustness, stability, and accuracy are developed based on the traditional algorithm to solve the occlusion problem.

Target Tracking Algorithm for Predicting Target
Trajectory.
e algorithm is mainly divided into three modules: image processing module, target automatic labeling and feature extraction module, and multitarget tracking module. e function of the image processing module is to detect the moving target and eliminate the noise of the image to get the ideal binary image. e target automatic marking and feature extraction module is to label the binary image and calculate the target location of the center of mass, area, circumference, and other feature quantities; the multitarget tracking module is to feature track multiple targets in sequence images, and the segmentation measurement of short-range targets and target track crossing are processed in this module [17]. e basic flow of the algorithm is shown in Figure 7. e image preprocessing module detects moving objects and eliminates noise in the sequence images to obtain an ideal binary image. After that, the target is automatically marked to obtain the feature quantity of the image, and then, the multitarget tracking module tracks the features of the multiple targets in the sequence images. If the trajectory cross occurs between the target features, the Kalman trajectory is used to predict the target. If there is no cross between the target features, the Kalman filter is directly used to process the features for tracking.

Target Tracking Algorithm Based on Space-Time Context
Information. In order to preserve the space-time context information, based on the Siam Mask algorithm, a shortterm memory storage pool is introduced to store historical frame features, and an appearance distinctiveness feature enhancement module is also introduced. is not only enhances the salient features of the tracking target, but also suppresses the interference of the surrounding similar targets to the target, and finally achieves the purpose of improving the accuracy of the tracking algorithm. Figure 8 shows the implementation process of the target tracking algorithm based on space-time context information [18]. is algorithm is mainly based on the Siam Mask algorithm [19] to construct the network system framework, which is divided into two branches, namely, twin network: the upper branch of the twin network takes 127 × 127 images as the input template and is mainly responsible for extracting the feature information of the video target; and the lower branch is modified to take 255 × 255 image as input template and is mainly responsible for extracting the feature information of the current frame of the video. e upper branch of the twin network only uses the shared weight ResNet-50 framework Φ to extract image information features, while the lower branch of the twin network also uses the shortterm memory storage pool to retain the historical frame features of the video. en, the lower branch captures the context information through the appearance distinctiveness feature enhancement module to enhance the distinctiveness of the current frame feature and reduce the interference of similar objects in the environment. Next, cross-correlation is made between the feature image of the video target and the feature image of the current frame to get the response feature of the candidate region. Finally, the result of target tracking and segmentation is obtained by convolution activation.

Target Tracking Algorithm for Large-Scale Random
Detection. On the basis of the tracking framework based on P-N learning, literature [20] proposes a target tracking algorithm adaptive to generate detection range. e Kalman filter is introduced to estimate the target position, scale, and their change speed, and the detection range is generated according to the estimated information before detection to improve detection efficiency. e target tracking algorithm framework of large range random detection is mainly composed of four parts: tracker, detector, integrator, and learning module. e tracker is a mid-stream tracker that can detect tracking errors. When the target is visible, the tracker is responsible for tracking the target frame by frame and stops tracking when the error is greater than a certain value. e detector is a cascade classifier that is responsible for detecting the target in each frame and restarting the tracker if necessary. e integrator evaluates the performance of the tracker and the detector according to the target model and fuses the results of both. e learning module generates training samples according to the results of the integrator and updates the detector to improve the performance of the detector in the next detection. It should be noted that the learning module only updates the detector, while the tracker remains fixed. e framework structure of the algorithm is shown in Figure 9.
e specific implementation process is as follows: the estimated position and size of the target are filtered by the Kalman filter, the bounding box is initialized, and the following frame tracker and detector are processed together. e tracker estimates the state of the current target frame according to the target position and size of the previous frame. e detector scans the current frame completely in the sliding window mode and outputs one or several windows that may contain the target. e integrator processes the results of the tracker and the detector, and outputs the final tracking result of the current frame. Finally, the learning module updates the detector according to the result of the integrator to obtain the new target state, so as to realize tracking.

Tracking Algorithm Based on Deep
Learning. Due to its strong ability of tracking target recognition, the target tracking algorithm based on deep learning can recognize the tracking target well in the case of slight occlusion and partial occlusion. However, in the case of severe occlusion, tracking target information is seriously lost, and deep learning will adopt random sampling to detect target locations in a large range [21]. In the process of deep target tracking, there are two main methods to solve the problem of partial occlusion. One method is to add the target with occlusion in the training sample and learn the changes after the target is partially occluded through the loss function during offline training. en, in the online tracking process, the network trained offline is used to judge the test sample so as to achieve accurate tracking. Another method is to divide the tracking target into several parts and extract the depth features of the target. When the target is blocked, the depth features of the test sample and the target template are matched. If the similarity degree is high, the sample is judged as the target. e deep learning flowchart is shown in Figure 10.
At present, the method to solve the problem of complete occlusion mainly relies on the full image search of the target detection. By matching the depth feature of the whole image with the feature of the target template, the test sample that may be the target can be found. However, due to the large size of the whole image and the large amount of data in the matching process, it is difficult to achieve accurate judgment of the target.

Anti-occlusion Model
In the process of target tracking, in view of the high difficulty and challenge of occlusion problem, scholars have classified and studied an effective anti-occlusion model from different aspects, which is of great significance to the construction of a high robustness tracking model.

Tracking Model Based on High-Quality Negative Training
Samples.
e current mainstream tracking models are driven by data, so improving the quality of training samples can significantly improve the performance of the tracker. By creating high-quality training samples, the tracking model can learn the discriminant information to distinguish the occluded target from the interfering object, so as to improve the anti-occlusion performance of the tracking algorithm. When the tracking target is occluded, the tracking model tends to drift to the occluded object. When the tracking target is severely occluded or fully occluded, the tracking model drifts to the semantic interference object, resulting in the tracking failure. In this case, by solving the problem of unbalanced distribution of training samples in the tracking task, the accuracy of the tracking model can be effectively improved.
erefore, sufficient negative training samples prone to false detection should be introduced or mined, so as to avoid a large number of false negative samples dominating the learning of the tracker, so that the tracking model can correctly distinguish the features of positive and negative semantic samples, and finally achieve the purpose of improving the anti-occlusion interference ability of the model. Table 2 shows three typical algorithm models of high-quality negative training samples and their respective strategies. ese three algorithm models are the influence model of improving loss function to reduce invalid negative samples, the algorithm model of mining a few negative samples that are easy to misjudgment, and the high-quality negative sample model of real scenes, respectively.

Tracking Model Based on Effective Positive Sample Set.
When the target is occluded, the state of the target itself will change due to the interference of the outside world, and the occluded object will pollute the positive sample. e improvement of positive sample quality is beneficial to the feature discrimination of the tracker model and the ability to resist occlusion and background interference. e diversity  of samples can also enhance the generalization performance of the model and improve the ability of tracking deformation targets in occlusion scenarios. Table 3 analyzes the positive sample processing strategies and characteristics of different algorithms from two aspects of improving sample quality and increasing sample diversity, and presents the scenarios to which they are adapted.

Adaptive Tracking Algorithm Model Based on Active
Learning. In the process of target tracking, the training characteristics of the tracking algorithm online and offline will be different. Although online training can better adapt to the appearance changes of the tracked target, there are few real labeled samples and the tracking target status is changeable, which makes it easy to over-fit sample errors  [22] e original cross entropy loss function is optimized into high order sensitive loss function e improvement degree of cost-sensitive function is not high DSLT [23] e traditional ridge regression loss function is optimized into shrinkage loss function e performance of the residual connection removal method does not decrease obviously DiMP [24] e hinge-like loss was obtained by optimizing the focal loss function Unable to handle large frame-to-frame shifts efficiently Mining a few negative samples easy to misjudge RPNT [25] Candidate generation network generated background interference samples and distant similar appearance semantic interference samples e features covered by semantic information are weak in discriminating between classes MDNet [26] In the selection of small batch training samples, the samples with the highest positive score in the negative samples were selected to identify hard negative samples, and the degree of hard negative samples was increased in the process of iterative training tests ere is no optimization strategy for extracting potential targets, and it is difficult to apply panoramic scenes due to insensitive differences within classes, time-consuming of feature extraction and fusion modules, poor real-time performance, and limited estimation of target scale changes SANet [27] RT-MDNet [28] Introduce high-quality negative samples of real scenes BACF [29] e negative sample containing real background information is extracted from the most dense part of the image by clipping operation Lack of temporal stability, fixed mask selection features have limitations DaSiamRPN [30] Negative sample pairs, performing data enhancement such as translation, light transform, and blur are introduced from large-scale data sets Candidate box extraction and region search take a long time, and the tracker speed decreases Samples were collected by traversing the target manifold structure to generate objects that did not appear in the training data Partial cover, deformation, elapsed time Reference [37] Generate adversarial network, and generate deformation, fuzzy samples, adaptive learning discrimination features Deformation, motion blur, partial cover and increase the tracking time in the complex scene covered. Offline training can better deal with occluded targets and avoid model contamination caused by learning wrong frames. However, in the real tracking process, there will be various complex problems, and it is often difficult to track the target accurately only by using the fixed appearance model of the offline training set. erefore, in order to improve the performance of the tracker, active learning of the tracking model is required. e target model with high robustness is constructed through offline learning, and the antifitting error damaged information is learned online to enhance the anti-interference of the model. Table 4 lists representative algorithms and characteristics of active learning strategies in tracking models.

Conclusions
In this article, the interference of occlusion problem in target tracking is analyzed. e most fundamental influence of occlusion on the tracking algorithm is that it destroys the Embedding loss function is introduced to increase class spacing in shared feature space, learn to identify instances of semantic objects across domains e target positioning effect is limited in high precision area DaSiamPRN [30] e reordering of semantic interference context information incremental learning target and interference perceptual candidate is introduced e network depth is limited, and the features extracted are limited TADT [41] Frame-by-frame regression loss function and rank loss function, select feature channels for instance tracking targets ere are algorithm limitations caused by fixed matching between templates MLT [42] According to the gradient information, the feature space can be customized for each object to suppress the error response of background interference source Meta-learning, migration, and merging are simple and low precision target location of the tracker. Some trackers with errors in sample analysis caused by occlusion can improve the performance of trackers by improving the quality of negative training samples and enriching the diversity of positive sample sets. According to the different training characteristics of online and offline tracking algorithms, anti-occlusion models suitable for active learning algorithms in different tracking situations are found, and tracking models with high robustness and high accuracy are selected in different situations, which helps to better complete the target tracking process. With the deepening of domestic and foreign scholars' research, the development of target tracking is becoming more and more perfect. Target occlusion is not only an important cause of tracking failure, but also a key problem of remote target tracking. e tracking task only tracks one target from beginning to end; once the target is blocked, the tracking accuracy will be greatly affected and even lead to tracking failure. erefore, when facing the occlusion problem, the target tracking model is more strict. Research in the field of target tracking has been carried out for many years, and great progress has been made from the earliest generative algorithm to the algorithm based on correlation filtering and then to the current algorithm based on deep learning. Target tracking has also been widely used in many industries, and more and more people begin to pay attention to the development of target tracking. erefore, to better solve the occlusion problem is still an important research direction in the future development. With adversarial generative learning matures, meta-learning, and other methods, the future tracker is expected to provide more prior multidimensional information for the tracker from capturing the target to tracking the scene and movement information of the target, to migrating to the long-term complex tracking task, so as to track the target more accurately. An algorithm with strong robustness, high accuracy, and high speed is still expected in the future, and it is believed that with the joint efforts of researchers, it will not take long to complete this task.
Data Availability e authors confirm that the data supporting the findings of this study are available within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.