Target Tracking and 3D Trajectory Reconstruction Based on Multicamera Calibration

. In traﬃc scenarios, vehicle trajectories can provide almost all the dynamic information of moving vehicles. Analyzing the vehicle trajectory in the monitoring scene can grasp the dynamic road traﬃc information. Cross-camera association of vehicle trajectories in multiple cameras can break the isolation of target information between single cameras and obtain the overall road operation conditions in a large-scale video surveillance area, which helps road traﬃc managers to conduct traﬃc analysis, prediction, and control. Based on the framework of DBT automatic target detection, this paper proposes a cross-camera vehicle trajectory correlation matching method based on the Euclidean distance metric correlation of trajectory points. For the multitarget vehicle trajectory acquired in a single camera, we ﬁrst perform 3D trajectory reconstruction based on the combined camera calibration in the overlapping area and then complete the similarity association between the cross-camera trajectories and the cross-camera trajectory update, and complete the trajectory transfer of the vehicle between adjacent cameras. Experiments show that the method in this paper can well solve the problem that the current tracking technology is diﬃcult to match the vehicle trajectory under diﬀerent cameras in complex traﬃc scenes and essentially achieves long-term and long-distance continuous tracking and trajectory acquisition of multiple targets across cameras.


Introduction
Target tracking is one of the research hot spots in computer vision, and it has been widely used in military, unmanned driving, video monitoring, and other fields. e current target tracking algorithm [1] can be divided into three categories from the observation model: the method based on the generated model, the method based on the discriminant model, and the method based on deep learning. e method based on the generative model is also called the classical target tracking algorithm. is method extracts the features of the target in the current frame, constructs the target model, and searches the best matching region with the appearance model in the next frame as the prediction position of the target. Typical representative algorithms are as follows: particle filter algorithm, mean shift algorithm, and Kalman filter algorithm. e method based on the discriminant model regards the target tracking problem as a classification or regression problem. In this method, the target is separated from the background by combining the background information with the feature extraction. TLD (tracking-learning-detection) algorithm [2] is the representative of a long-time tracking algorithm in this kind of method. In view of the target deformation, scale change, and occlusion in the process of long-time target tracking, TLD combines tracking with a traditional detection algorithm and updates the model and parameters online to make the tracking more robust and reliable. e target tracking method based on correlation filtering also belongs to the discriminant model method. Based on the minimum output sum of squared error (MOSSE) algorithm [3], correlation filtering is applied to target tracking for the first time.
rough fast Fourier transform, the calculation is transferred from time domain to frequency domain, and the tracking speed is up to 615fps. e speed advantage of the target tracking algorithm based on correlation filtering shows its potential in target tracking. KCF [4] algorithm calculates the discriminant function by regression and introduces the cyclic shift method for approximate dense sampling. e kernel method is introduced to map the input to high-dimensional space, and hog feature is added to improve the tracking effect while maintaining fast calculation. SRDCF [5] introduces spatial regularization and weights the filter coefficients so that the filter coefficients are mainly concentrated in the central area, and the influence of boundary effects is alleviated.
In the method based on deep learning, C-COT [6] combines the shallow surface information and deep semantic information in-depth features, synthesizes the feature map information under multiple resolutions, interpolates the response map in the frequency domain, and then calculates the target position through iteration. SiamRPN [7] algorithm proposes a Siamese network structure based on RPN, which is composed of Siamese network and RPN network. Siamese network shares weights and maps the input to a new space to extract features. e RPN network generates candidate regions, which are used to distinguish the target background and fine-tune the candidate content to achieve end-to-end input and output. SiamMask [8] algorithm changes the previous rectangular box aligned with the coordinate axis to represent the target position, adds mask branch in Siamese network architecture, and generates a rotating rectangle through the target mask, which further improves the tracking accuracy.
Single object tracking (SOT) is the research content of the above target tracking methods. Different from singleobject tracking, target tracking in practical application is more multiobject tracking [9] (MOT). e target is locked in the given video sequence, and each target is distinguished in the subsequent frame, and its motion trajectory is given. According to the initialization method of the target box, the multitarget tracking method is divided into two categories: DBT (detection-based tracking) and DFT (detection-free tracking). DFT needs to manually initialize the location box of the target, and it cannot deal with the new target problem in the video; DBT can detect new targets automatically and end the trajectory of the target leaving the visual field. In the multitarget tracking method, the key problem [10] is to detect the data association between nodes and existing trajectories and the correlation between trajectories. Xiang et al. [11] transformed the multitarget tracking problem into Markov decision process (MDP). e target trajectory is set to four different states, and the trajectory state and state transition process are described by MDP modeling and decision-making. Sort algorithm [12] uses Kalman filter algorithm to track the detected target, calculates the distance between IOU (intersection over union) measurement target frames, and performs optimal association matching through Hungarian algorithm. Deep sort algorithm [13] is improved on the basis of sort algorithm. Fast r-cnn is used to detect the target, and the Kalman filter is still used to track and predict the target. In distance measurement, Mahalanobis distance and the minimum cosine distance between the nearest depth feature set successfully tracked by the target and the feature vector of the detection result are integrated, and priority is assigned to the target through cascade matching. e problem of track association of target occlusion is solved. In the multitarget tracking method based on deep learning, Feng et al. [14] proposed a unified multitarget tracking framework. Siamrpn network is used for short-term target tracking, and the appearance characteristics of the long-term target are integrated. Reid network is used to improve the tracking stability when the target is occluded and deal with abnormal motion. Based on association matching, switch aware classification (SAC) is proposed to achieve a good multitarget tracking effect. However, due to the complexity of the model, the tracking speed is slow, which cannot meet the practical application.
It is still an important research task to track multitarget continuously and track accurately in complex traffic scenes. It is of great value to improve the utilization efficiency of traffic video monitoring data, timely and accurately to grasp road traffic information and regional road operation status. e cross-camera multitarget tracking can solve the problem that monocular camera cannot track accurately for a long time and a long distance, which lays an important foundation for the acquisition of wide-angle traffic information.

Principle of Multitarget Tracking
Traffic scene is a typical multitarget tracking application scene. is paper uses DBT detection target box to realize multitarget vehicle tracking in traffic scene. e process flow of multitarget tracking based on DBT is shown in Figure 1.
e target detector will first detect the target in each frame of the video to obtain and identify multiple target positions. Multitarget tracking process is to associate the current detection result with the existing target track to extend the track.
Next, we need to solve the problem of effective association between trajectory and target. In cross-camera multitarget tracking, the first step is to obtain the multitarget vehicle trajectory in a single camera. Referring to the latest research results of the team [15], the similarity between the target frames is calculated based on IOU, and the Hungarian algorithm is used to complete the association between the new detection node and the existing vehicle trajectory. e definition and delimitation method of the stage and state of the trajectory are proposed to better classify the trajectory. en, through cross-camera vehicle tracking, the problem of 3D trajectory reconstruction based on combined camera calibration in the overlapping area is solved, as well as the similarity association and cross-camera trajectory update between cross-camera trajectories, and the trajectory transfer between adjacent cameras is completed.

Data Association Based on Cross-Camera Calibration
For the multicamera monitoring scene with the overlapping area, as shown in Figure 2. In a long area, there are many cameras. From the end with the smaller camera number in the monitoring area, renumber the cameras from 0 in turn. Each camera is responsible for monitoring a section of the Road area. In Figure 2, different color blocks are used to mark the monitoring area of each camera. ere is a view overlap between adjacent cameras, and the overlap area is indicated by yellow. On the premise of cross-camera calibration, the similarity association can be completed by calculating the similarity matrix of vehicle trajectories between adjacent cameras. e basic idea is through the joint calibration of multiple cameras, and the cameras are unified in a world coordinate system, and the similarity matrix is calculated according to the Euclidean distance of the track points in the adjacent cameras in the world coordinate system.

Cross-Camera Joint Calibration.
According to the imaging principle of the monocular camera and the description of the coordinate system in reference [16], the conversion relationship from pixel coordinate system to world coordinate system under the same camera can be obtained as follows: where is the coordinates of the point in the is the camera's external parameter matrix. e above formula is derived without considering distortion. If the distortion of the camera is considered, it can be divided into radial distortion and tangential distortion. For the image physical coordinate system, the corresponding radial distortion correction is shown in equation (2), and the corresponding tangential distortion correction is shown in equation (3). e corresponding formula can be introduced for parameter correction.
x corrected � x 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 , y corrected � y 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 , x corrected � x + 2p 1 y + p 2 r 2 + 2x 2 , e conversion process of the world coordinate system between multiple cameras is as follows: first, a coordinate origin is selected, and the corresponding subworld coordinate system of each camera into the global unified world coordinate system is constructed. e schematic diagram of calibration conversion between adjacent cameras in a large area is shown in Figure 3. Taking two cameras as an example, the monitoring road is two lanes. Suppose that 3 (a) is the monitoring scene of camera i and 3 (b) is the monitoring scene of camera i + 1, through cross-camera calibration, the points under the field of view of each camera are converted to the same world coordinate system, as shown in 3 (c).

Calculation of Association Matrix.
e association matrix calculation of cross-camera vehicle trajectories is to calculate the Euclidean distance between adjacent camera trajectories to be matched after the trajectories are transformed from image coordinates to world coordinates. Suppose that the vehicle trajectories under each camera are divided into two sets: T � RT ∪ NT { }, RT represents the real track set in the scene, NT represents the new track set that has just changed from the undetermined track to the real track, and the similarity matrix of vehicle trajectories between adjacent cameras is D � (d ij ), d ij ≥ 0, where d ij is calculated as follows: In formula (4), m is the number of trajectory nodes involved in the calculation. In this paper, m ≤ 15 is determined by the number of nodes of the trajectory to be matched between adjacent cameras under the same frame number; p k is the world coordinate of the track point in the real track RT of the current camera; p k ' is the world coordinates of the track points in the new track NT of the adjacent cameras, and the frame numbers of p k and p k ' are the same, indicating the vehicle position at the same time. Taking m � 5 as an example, the calculation process of d ij between vehicle trajectories across cameras is shown in

Multitarget Vehicle Tracking Algorithm across Cameras
Cross-camera vehicle tracking relies on the unified calibration between multiple cameras and single-camera multitarget vehicle tracking. Its main work is to associate the Journal of Advanced Transportation tracking results of each camera. First, the global trajectory set GT is established to save the global trajectory information of the vehicle target from entering the monitoring area to leaving the monitoring area. When the target leaves the monitoring area, the corresponding vehicle target information is recorded in the file. After the cameras in the monitoring area are synchronized and the video frames are associated with each other, the new trajectory nodes will be updated into the global trajectory set GT.
Assuming that there are n cameras in a large monitoring area, the flowchart of the cross-camera vehicle tracking algorithm is shown in Figure 5, and the steps of the cross-camera vehicle tracking algorithm are as follows: Step 1: the vehicle trajectory of N cameras is obtained at the same time. e multitarget vehicle trajectory in a single camera is obtained by the method in Section 2.
Step 2: association between adjacent cameras is tracked. e vehicle track in each camera is divided into two  trajectory, including target ID, trajectory color, and updated some trajectory attributes. Among them, the camera number and the starting frame number of the track under the camera are used to draw the target track under the camera.
Step 3: the global track is updated. Every frame needs to update the global trajectory: (i) For unmatched real trajectories, the newly added trajectory nodes need to be updated into the global corresponding trajectories (ii) e unmatched new trajectory is used as a new target, and its trajectory is newly added to the global trajectory (iii) Between the successfully matched real trajectory and the new trajectory, in addition to the above-mentioned trajectory attribute changes, it is also necessary to fuse the trajectory nodes in the overlapping area of the two trajectories Figure 6 shows the successful matching of vehicles between adjacent cameras. When the target vehicle moves from the current camera to the next camera, the vehicle will be in the overlapping area of the two cameras. e successfully matched vehicle target ID needs to be unified, and the vehicle trajectory color will follow the initial color attribute. In Figure 7, when a black car is driven from camera 0 field of view to camera 1, the black car can be detected in both camera fields of view in the overlapping area. e two cars connected by the yellow line are the position of the black car under the two cameras. e target vehicle is matched in the overlapping area, and the vehicle information is transferred to camera 1.

Experiment and Analysis
Since this method is still in the simulation testing stage, there is no special scenario suitable for the experiment in the open data set. Cross-camera vehicle tracking takes the simulation test scene built-in campus as an example, in which two cameras collect images synchronously. After the detection results of the yolov3 detector are obtained, we load reference [15] and the algorithm in this paper to carry out the waiting tracking experiment and obtain the following experimental results. e following is a scene test of overtaking. e silver car first enters the surveillance area of camera 0, and the black car overtakes, as shown in Figure 7. In the collection of 58 frames of photos, two cars can be detected at the same time under camera 0. Since the silver car enters the field of view first, it will be detected first, with ID � 1. Enter after the black car, ID � 2. After overtaking the black car, it first enters the camera 1 field of view. However, when the vehicle is driving across the camera, ID values are assigned in the order in which it first enters the entire monitoring area. After the cross-camera trajectory is matched, the trajectory information is migrated, so the ID of the silver car in camera 1 is still 1, and the trajectory color is blue, which is the same as the trajectory information of the vehicle under camera 0. e black car is the same as above for trajectory information migration. Figure 8 shows the tracking result of camera1 at frame 78.
After the two scenes are calibrated across cameras, the vehicle trajectory can be drawn in the panoramic view of the cross-camera reconstruction of the surveillance scene. Taking the 70th frame photo of the multitarget vehicle tracking panoramic reconstruction image as an example, you can intuitively see the entire overtaking process of the vehicle under the two cameras, as shown in Figure 9. e   result of this panoramic reconstruction allows a real overview of the operating state of the vehicle from a macro perspective and is not affected by the loss of the occluded trajectory. e reliability of the data is a major technological breakthrough.
In order to further verify the effectiveness of the proposed method, the trajectory coincidence degree TC is used for description, and its definition formula is as follows: Among them, m, n represents the number of discrete points on trajectory A and trajectory B, and P iA , P jB are points on trajectory A and trajectory B. By calculating the absolute distance between each point in trajectory A and each point in trajectory B, and then accumulating the number of distances less than the threshold T divided by the product of the number of discrete points on the two trajectory curves, the evaluation value of the coincidence degree of the two trajectories is obtained. e degree of trajectory coincidence obtained in the experiment is shown in Table 1.
It can be seen from Table 1 that the method in this paper unifies the cameras in a world coordinate system for target tracking and association matching. Results: the coincidence degree between trajectories was the lowest in camera0 and camera1, and the effect of trajectory-based target behavior analysis was the same as that of observation from high    altitude. So that the problem of occlusion overlap does not appear in the 2D image, and it can intuitively reflect the whole running state of the target in the large scene. Not only that, the proposed method also meets the real-time requirements.

Conclusion
rough the joint calibration between multiple cameras, the cameras are unified under a world coordinate system. e Euclidean distance between the trajectory nodes under the overlapping area at the same time is used to measure the similarity between the trajectories, and the trajectory association matrix is calculated to realize the matching between the real trajectory in the current camera and the new trajectory under the adjacent camera. Target tracking and association matching under single camera and cross-camera complete the trajectory transfer of the vehicle between adjacent cameras and realize the 3D bird's-eye view reconstruction of the vehicle trajectory. e result proves that the operating state of the vehicle can be viewed from a real macro perspective, and the data are reliable, which is a major breakthrough. It makes the long-term and long-distance continuous tracking of multiple targets across cameras reliable and accurate.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Junfang Song and Tao Fan mainly engaged in image processing and artificial intelligence research. Huansheng Song mainly engaged in image processing and recognition and intelligent transportation systems research. Haili Zhao mainly engaged in image processing and information security research.