Traffic Foreground Detection at Complex Urban Intersections Using a Novel Background Dictionary Learning Model

. In complex urban intersection scenarios, due to heavy traﬃc and signal control, there are many slow-moving or temporarily stopped vehicles behind the stop lines. At these intersections, it is diﬃcult to extract traﬃc parameters, such as delay and queue length, based on vehicle detection and tracking due to the dense and severe occlusion of vehicles. In this study, a novel background subtraction algorithm based on sparse representation is proposed to detect the traﬃc foreground at complex intersections to obtain traﬃc parameters. By establishing a novel background dictionary update model, the proposed method solves the problem that the background is easily contaminated by slow-moving or temporarily stopped vehicles and therefore cannot obtain the complete traﬃc foreground. Using the real-world urban traﬃc videos and the PV video sequences of i -LIDS, we ﬁrst compare the proposed method with other detection methods based on sparse representation. Then, the proposed method is compared with other commonly used traﬃc foreground detection models in diﬀerent urban intersection traﬃc scenarios. The experimental results show that the proposed method performs well in keeping the background model being unpolluted from slow-moving or temporarily stopped vehicles and has a good performance in both qualitative and quantitative evaluations.


Introduction
Real-time traffic data collected at intersections are essential information for intelligent traffic control. Compared with the continuous traffic flow at road sections, the traffic flow at intersections is interrupted due to the influence of traffic signals or traffic signs. e traffic data such as traffic volume, vehicle speed, and time occupancy cannot reflect the inherent traffic state characteristics at intersections. Some more suitable parameters can better reflect the traffic operation state of the intersection, such as control delay, queue length, saturation degree, and number of stops, but the parameters are difficult to be directly captured by the traditional sensors.
At present, many urban intersections are equipped with the traffic video surveillance systems to obtain traffic parameters that can directly reflect the traffic state information.
Existing methods based on traffic videos for obtaining traffic parameters are mainly divided into two categories including the virtual coil or virtual line method and the image zone method. e virtual coil or virtual line method is to set a virtual coil or virtual line at a specified position on the road. When vehicles pass through the virtual coil area or virtual line, traffic parameters can be obtained by analyzing image feature changes. e image zone method is to obtain traffic parameters by vehicle detecting and tracking [1,2]. Recently, deep learning has been successfully applied in object detection and tracking, such as the faster region-based convolutional neural network (R-CNN) [3], You Only Look Once (YOLO) [4], and single shot multibox detector (SSD) [5]. However, with the increasing traffic volume at the urban intersection, vehicle detective and tracking methods are increasingly affected by some factors, such as occlusion, slow-moving, or temporarily stopped vehicles. In order to reduce the complexity of algorithms caused by overcoming environmental disturbances, methods based on visual features for obtaining traffic state parameters are receiving increasing attention.
Traffic foreground detection is the first step in the traffic parameter extraction based on videos. However, due to the influence of traffic signals, the traffic flow at the intersection periodically queues and dissipates behind the stop line, resulting in many slow-moving or temporarily stopped vehicles. e robust foreground detection at urban intersections faces enormous challenges. Existing foreground detection methods mainly include the background subtraction method, optical flow method, and interframe subtraction method. Both the optical flow method and the interframe subtraction method are suitable for detecting the moving foreground. However, there are both moving vehicles and queuing vehicles at urban intersections. If the queuing vehicles are not detected, this may result in incomplete foreground detection. Background subtraction is an effective technique for detecting queuing vehicles and moving vehicles at urban intersections. For the BS methods, the background models are first established, and then, the foregrounds are obtained by comparing the subtraction between the current frame and the background frame [6]. e existed BS methods can be roughly divided into the parametric background model and nonparametric background model. e simplest parametric background modeling method is based on a statistical analysis of the histogram values for each pixel of the past K frames, and the mean and median or the maximum frequency is used to estimate the background [7]. Another parametric background method is to establish a single-peak probability density function for background pixels, such as running Gaussian average [8]. Since a single Gaussian density function cannot handle dynamic background scenarios, the Gaussian mixture model (GMM) composed of N Gaussian component is established for each pixel to estimate the background [9]. Many improved GMM have been proposed for detecting foreground objects in traffic scenarios. To simplify the calculation to increase the operation speed, an image block-based GMM background model is constructed [10]. In addition, the expectation-maximization (EM) algorithm is fused with the Gaussian mixture model for improving the segmentation quality of moving vehicles [11]. e Gaussian mixed model and the previous methods are very effective for continuous variable scenarios, but dynamic scenarios with fast nonstationary changes cannot be accurately described by a set of Gaussian functions. e nonparametric background modeling method is more suitable for the case where the density function is more complicated or cannot be parametrically modeled. e kernel density estimation is a nonparametric method by estimating the background probability of each pixel from the most recent multiframe sequence [12]. Another nonparametric method is the codebook model, which is a background modeling method based on pixel color. e method uses a quantization technique to create a codebook based on the color distance and brightness of each pixel in the images sequence. An improved codebook algorithm is proposed to achieve better results than the original codebook model and the Gaussian mixed model; however, this method could not handle slow-moving or temporarily stopped vehicles at intersections [13]. Visual background extractor (ViBe) is an algorithm for the motion detection by background subtraction. It is a very fast algorithm, based on samples and several innovative processes, including time subsampling, random substitution, and spatial diffusion [14]. e sigmadelta filtering algorithm uses the sigma-delta filter for background estimation and foreground detection to achieve computational efficiency and low memory consumption [15].
Recently, the sparse representation [16] has been successfully applied in background detection.
e methods under this framework follow that the background in the video sequence is modeled by a low-rank matrix and the moving object corresponds to a sparse outlier. In [17], the k-means classifier is used to train the dictionary, and the matching pursuit algorithm is used to obtain the sparse coefficients; then, the sparse linear combination is used to estimate the background. e method is simple and can obtain good preliminary detection results. To effectively deal with the dynamic changes of background illumination and environment, background modeling is performed in combination with K-SVD (singular value decomposition) training dictionary and average sparse coefficients [18]. Fixed dictionaries are used for background modeling but do not reflect background changes in dynamic scenarios [19]. In [20], the dynamic adaptive update dictionary is used for background modeling, and the background subtraction is performed by the sparse reconstruction error of the current image and the background image. Yang and Qu [21] proposed a real-time vehicle detection and counting in complex traffic scenarios using the background subtraction model with low-rank decomposition. ese methods provide good performance for background modeling. However, the background is easily contaminated by slow-moving or temporarily stopped vehicles at intersections, resulting in the missing foreground detection. To solve this problem, Toral et al. and Manzanera and Richefeu developed a sigma-delta model with confidence measurement (sigma-delta with CM) to detect vehicles in urban traffic scenarios [22,23]. Instead of the sigma-delta model, Zhang et al. used GMM with confidence measurement for each pixel to efficiently resolve deficiencies in the background subtraction model [24,25]. e methods have achieved good results at simple intersections with small traffic volumes.
In this study, aiming at the traffic foreground detection at complex urban intersections, we do the following work: (1) We have established a novel background dictionary update model, which automatically removes foreground information from the background when updating the background dictionary, preventing the background from being contaminated by slowmoving or temporarily stopped vehicles at complex intersections (2) e independence of sparse representation leads to large differences in the sparse representation of similar feature blocks of the same foreground object. We introduce a manifold regularity term to build an adaptive sparse coding model for obtaining a continuous and consistent representation of the foreground object. (3) e traffic foreground detection is obtained based on the feature reconstruction error is study is organized as follows. Related works are summarized in Section 2. Section 3 describes the details of the proposed method for background estimation and traffic foreground detection at the intersection. Experimental results are discussed in Section 4, and conclusion is drawn in Section 4.

Materials and Methods
e overview of our proposed method is shown in Figure 1. First, we extract some image frames from video sequences of the urban intersection to form a training sample set. Each image is divided into blocks, and some blocks of each image are randomly extracted to form a subset for training the background dictionary. en, to avoid the background pollution by the slow-moving or temporarily stopped vehicles at the intersection, we establish a background dictionary update model to limit the foreground update to the background dictionary. Due to the independence of the sparse representation, the sparse representation coefficients of different image blocks of the same object may be very different, resulting in discontinuities in the foreground object representation. To obtain a more accurate background update dictionary, we introduce a manifold regularity term to build an adaptive sparse coding model for obtaining a continuous and consistent representation of the foreground object. Finally, for any frame of the video set, the sparse reconstruction error of the current frame and the background is used for foreground detection. e proposed BS method is based on two assumptions. e first one is that the backgrounds of an arbitrary scenario are linearly correlated with each other and can be sparsely and linearly represented by the atoms of the dictionary. e foreground is observed by a sparse and contiguous piece in consecutive frames, leading to the changing of the background, and greatly transforms the projection over the dictionary [16]. e second one is that locations of the foreground in successive frames are likely to be grouping together instead of randomly scattering. is indicates that the foreground objects satisfy the structured sparsity constraint [26].

Initial Background Dictionary
Learning. According to assumption one, we formulate the BS problem by linearly decomposing the input image X into a low-rank matrix X B and a binary sparse matrix X F : where X is the video image frame, X F is the foreground candidate, and X B is the background model. e background model is generally linearly represented by the background dictionary D b and the sparse coefficient α [27]: (2) e initial background dictionary D b needs to be trained first. Most of the existing methods are to manually extract clean background frames from video sequences or use the first N frames in the video sequence as the training set. However, in complex intersection scenes, especially at intersections with large traffic volumes, it is difficult to obtain clean background frames without foreground. So, we use actual video frames directly to train the dictionary. To avoid the background dictionary being contaminated by the foreground in video images, we randomly extract training images from video sequences when the intersection traffic volume is small. en, each image is segmented into blocks, and blocks are randomly extracted from each image to form a set of sample blocks for background training. Caused by the temporarily stopped or slow-moving vehicles problem at intersections, we select training images at a specific time interval instead of selecting every frame in the image sequence. e specific interval is determined according to the actual situation. When the moving speed of the foreground target at the intersection is slower, the interval between selected video frames is longer.
Given a video set, each of the collected images is divided into n nonoverlapping blocks. e blocks in the video frame are selected at a certain interval to form a training set X, and the dictionary D b satisfies the following formula: where M is the number of sample blocks in the training set, x m is the m th block vector in the training set, α m is the m th sparse coefficient, and λ 1 is the regularization parameter. e trained dictionary is shown in Figure 2.

Background Dictionary Update Model Based on Foreground Uncorrelation.
A fixed initial background dictionary does not adapt to the background changes in the video sequences. To effectively extract the foreground in video sequences, the background dictionary needs to be adaptively updated over time. However, in complex scenarios, such as temporarily stopped or slow-moving vehicles at intersections, the foreground will gradually merge into the background and become an element in the background dictionary. erefore, the reconstruction error may be difficult to accurately describe the difference between the current image and the background, resulting in missing foreground detection.
To solve this problem, we propose a background dictionary update model with minimal correlation with the foreground. In the process of background dictionary updating, the background dictionary can learn the atoms by minimizing the correlation between the background dictionary and the foreground dictionary. In this way, when the background is expressed in a sparse linear combination, the foreground is not merged into the background, resulting in a better foreground detection result.

2.2.1.
Model e correlation between the background dictionary D b and the foreground dictionary D f is required to be as small as possible. at is, D T f D 2 bF is as small as possible.
en, we formulate our optimization model as follows: where α � [α 1 , α 2 , . . . , α N ] ∈ R q×N is the sparse coefficient of the image block on the dictionary D b , λ 1 and λ 2 are the regularization parameter, and the regularization term D T f D 2 bF is the cross correlation of D b and D f .

Model
Solving. e model solution of (4) is the joint optimization problem between the background dictionary D b and the representation coefficient α. If they both are variables, the optimization problem is nonconvex; if one is fixed, the optimization problem is transformed into a convex optimization problem. Learning from the alternate rotation optimization of the K-SVD algorithm, the solution steps are as follows: Step 1: fixing dictionary D b , (4) is is is a standard lasso problem that can be solved using existing methods.
Step 2: fixing α, atoms in D b are updated one by one. β j represents the row vector of α, j � 1, 2, . . . , q. us (4) can be reformulated as follows: Letting Z � X − q j ≠ l b j β j , and throwing away items that are not related to b l , we can rewrite (6) as Applying the Lagrange multiplier method to (7), then (7) can be presented by the following equivalent problem:  where c is the Lagrange multiplier.
en, (8) can be rewritten by relaxing the Frobenius norm: Ignoring items that are not related to b l , and using the correlation properties of symmetric matrices, we simplify (9) into the following formula: Deriving the above formula with respect to b l , the outcome is Let the above formula be equal 0; then, the optimal solution is We can normalize b l as Updating each atom as described above, subsequent atoms are updated using the previously updated atom until the stop criterion is reached. For the determination of c, it can be proved that g(c) � b l (c) T b l (c) is a monotonic function about c, and g(c) � 1 has a unique solution. e dichotomy is used to solve c, and the optimal solution D b is obtained by substituting c into formula (12).

Algorithm Convergence Analysis.
e objective function formula (4) is monotonous under the update rule (12).
Let b ∧ j , j � 1, 2, . . ., l-1 be the update value of the previous l-1 step; the update of the step l can be reformulated as where Similarly, in the updating process of step l + 1, erefore, the objective function is monotonous. In the dictionary update phase, when the atom is updated one by one, the objective function is degraded.

Sparse Coding Based on Foreground Consistency
Representation. For the standard sparse representation, when the dictionary is given, the sparse coefficient α i is only affected by the input data x i , regardless of other input data x k(k≠i) . at is, the sparse coefficients are independent of each other, and the correlation between the input data sequences is cut off. According to assumption two, image blocks belonging to the same foreground object have similar features in foreground object detection, and the corresponding sparse representations of the image blocks should be similar. However, the independence of the sparse representation causes a large difference in the sparse representation of similar image blocks. is difference leads to discontinuities and inconsistencies in the extracted foreground, further affecting the accuracy of the background dictionary update. Manifold learning focuses on constructing data relationships and displaying the intrinsic local structure between image blocks. e relationship is described by the local geometry between the data, and the Laplacian matrix is obtained and added as a constraint to the standard sparse representation matrix. In this way, the neighborhood geometry relationship in the original space remains unchanged in the sparse space, and the obtained sparse representation better reflects the original geometry of the data.

Model Construction.
According to above, the following structural sparse representation model is established: where parameters β and λ 1 are the regularization parameters and W ij is a spatial adjacency between blocks, constant, and W ij � W ji . When α i − α 2 j ≤ ε, α i and α j are adjacent, when α i is a point in the k neighborhood of α j or α j is a point in the k neighborhood of α i .

Model Solving.
When D b is fixed in (16), the equation becomes a convex function. Because it is time-consuming to directly solve this convex function, we optimize α i one by one until the entire α converges instead of optimizing all the columns in α at the same time. At each step of optimizing α i , we fix the other columns of α (j ≠ i) and rewrite (16) as Journal of Advanced Transportation 5 where Equation (18) can be rewritten by relaxing the Frobenius norm: and removing the item unrelated to α i , we can rewrite J(α i ) as where en, formula (17) can be reformulated as where θ ∈ − 1, 0, 1 { } is the sign of α i . Deriving the above formula with respect to α i , the outcome is Let the above formula be equal to 0; the optimal solution of (17) can be obtained as where Φ � B T B + βL ii Ι and Λ k is the submatrix after the i th column is removed from the matrix α. L ik is the subvector after the i th element is removed from the vector L. Ι is the unit matrix. For speeding up the convergence of sparse coding, α is initialized by standard sparse coding.

Foreground Detection Based on Feature
Reconstruction. e foreground detection in the video sequence is treated as a reconstruction error estimation process on the background dictionary. e foreground blocks and the background blocks are, respectively, projected on the same background dictionary, and the reconstruction error must be greatly different. Based on the above background dictionary model and the sparse coefficient calculation method, we may distinguish whether the image block belongs to the foreground block or belongs to the background block according to the reconstruction error of each image block. e larger the reconstruction error indicates the larger the difference between the image block and the background, and the higher the probability that the image block is foreground. For characterizing the reconstruction error well, the foreground detection model is constructed as follows: e first item in (25) is the feature reconstruction error. If the background block is successfully represented by the background dictionary D b and the sparse coefficient α, the reconstruction error is small. Since the foreground block has a sparse representation coefficient of almost zero on the dictionary D b , the reconstruction error is relatively large. e second item in (25) is the regular item. Since some foregrounds are well reconstructed using many background dictionary atoms, the reconstruction error is not sufficient to measure foreground changes. erefore, we introduce the regularization term to further reflect the change of the foreground. e last item shows that a larger coefficient difference in sparsity coefficients indicates the higher the probability of being a foreground. en, y is compared with the threshold T. If y is below the threshold T, y is a background block. Otherwise, y is a foreground block. rough finding a suitable threshold T, we can obtain the foreground of the target. e proposed framework of the detection algorithm is given in Table 1. Given the current frame X t (tth frame) and the current dictionary D t− 1 , the sparse representation α t− 1 is updated by (24), and the foreground detection result y t is obtained according to (25). en, the foreground dictionary is updated according to the foreground detection result y t , and the background dictionary is updated according to (12). en, the detection of the next frame X t+1 is performed.
For adapting to changes in environmental disturbances and lighting in the background, the background model needs to be dynamically updated over time. However, continuous sparse reconstruction leads to relatively large computational overhead, and if the background update is too frequent, it is easy to introduce error accumulation to affect the accuracy of foreground extraction. Because the background usually does not substantially change over a certain time interval, the background dictionary is usually updated every T frames instead of every frame performing a background update. e specific frame interval depends on the specific situation.

Results and Discussion
For verifying the performance of traffic foreground detection at complex intersections, firstly, we compare and evaluate our algorithm with other traffic foreground detection methods based on sparse representations (Section 3.1). en, we compare and evaluate our algorithm with other typical traffic foreground detection methods in different intersection scenarios (Section 3.2). e test videos use the PV video sequences of i-LIDS [28] and the video sequences captured at actual complex intersections.
For quantitative evaluation, the metrics of precision, recall, and F-measure are used to show the overall accuracy of our method, defined as follows: where TP (true positive) is the number of pixels correctly classified as the foreground. FP (false positive) is the total number of background pixels that have been misclassified as foreground. FN (false negative) is the number of foreground pixels that have been misclassified as background. Precision gives the percentage of correctly detected foreground pixels among all detected foreground pixels. Recall weighs the percentage of foreground pixels that are correctly detected in the total number of foreground pixels. F-measure is the weighted harmonic mean of precision and recall, which measures the overall detection quality of the algorithm. For all three metrics, the higher the value is, the better the performance that it has is. e algorithms are implemented in MATLAB and run on a desktop with a 2.4 GHz Core2 i5 processor and 4 GB memory.

Comparative Analysis of Foreground Detection Algorithms
Based on Sparse Representation. We test our method on video clips captured at the actual complex intersection. Due to the influence of traffic signals, the traffic flow at the intersection periodically queues and dissipates behind the stop line, resulting in many slow-moving or temporarily stopped vehicles. Moreover, due to the heavy traffic at this intersection, many vehicles are waiting in line for long periods of time. e video resolution is 192 × 256 pixels.
Our method is compared with two representative foreground detection methods based on sparse representations. In our algorithm, the image block is the basic processing unit. e size of the image block has a certain impact on the processing speed and detection results. e image block size is smaller, the detection accuracy is higher, and the processing speed is slower. e image block size of our algorithm is selected as 8 × 8, and it takes 1.5 s∼2.0 s to process one frame at this size. Figures 3-5 show the comparison results of background extraction and foreground detection through the standard sparse representation method, the structured sparse representation method, and our proposed method. To save space, we only list three frames with different queuing lengths. In Figures 3-5, the first column is the original video image, the second column is the current background extracted by each method, and the third column is the foreground detection image. e results in Figures 3 and 4 show that the extracted backgrounds are contaminated to varying degrees due to slow-moving or temporarily stopped vehicles at the intersection. e detected foreground is missing because part of the foreground is degraded to the background. Moreover, in Figure 3, due to the independence of the sparse representation, the sparse representations of image blocks belonging to the same object are different, resulting in inconsistencies in background estimation and foreground detection. In Figure 4, the structured sparse representation improves the detection effect. Figure 5 shows that our method provides a significant improvement over other methods. With foreground unrelated limitations in the background updates, we can prevent the slow-moving or temporarily stopped vehicles from blending into backgrounds and recover the accurate backgrounds without smearing and ghosting artifacts. Moreover, due to the manifold regularity term, foreground consistency detection is better. Although the traffic volume at the intersection in the test video clips is heavy and the traffic scenarios are more complicated, our method can better meet the foreground detection requirements of complex intersections. Table 2 and Figure 6 show the quantitative evaluation results by precision, recall, and F-measure on the test video clips. For the accuracy of the results, we only perform statistics on a road area where there are many slow-moving or temporarily stopped vehicles. Our method achieves the highest F-measure at complex urban intersections. For the standard sparse representation method (Method1) and the structured sparse representation method (Method2), the values of precision and recall show that the selected approaches perform the traffic foreground detection tasks with medium and low performance due to the slow-moving or temporarily stopped vehicles in the complex scenario. Compared with the standard sparse representation method (Method1), the structured sparse representation method (Method2) achieves the higher precision due to foreground consistency detection, but lower recall due to more false positives. e proposed method eliminates most false positives through the limitation of being irrelevant to the foreground in the background update, obtaining the best results.

Comparative Analysis of Detection Algorithm Performance in Different Intersection Scenarios.
We further compare our algorithm with the GMM model and sigmadelta confidence model (proposed in [22]) typically used for traffic foreground detection at intersections. ese algorithms are performed in simple intersection scenarios and complex intersection scenarios. A simple intersection scenario refers to a situation where the traffic environment is simple, the traffic volume is small, and the vehicle queuing time is short. A complex intersection scenario refers to a situation where the traffic environment is complex, the traffic volume is heavy, and the vehicle queuing time is relatively long. In the study, the simple intersection scenarios use the PV Hard video sequences of i-LIDS [28]. In the PV Hard scenarios, there are multiple slow-moving or temporarily stopped vehicles. e video resolution is 720 × 576 pixels. e complex intersection scenarios use the captured intersection video sequences. e video resolution is 192 × 256 pixels. Figure 7 shows the results extracted by compared methods in simple intersection scenarios on the PV video sequences of i-LIDS. Figures 8 and 9 show the results extracted by compared methods in complex intersection scenarios on the captured video clips. Table 3 and Figure 10 show the quantitative evaluation results by precision, recall, and F-measure on the test videos.  Table 1: Traffic foreground detection algorithm for intersection scenarios. Input: current frame X t ; background dictionary D t− 1 ; Output: foreground detection result y t ; background update dictionary D t ; sparse representation α t ; Initialization: initial background dictionaryD b ; initial foreground dictionaryD f ; parameters λ 1 , λ 2 and β; 1. Sparse coding: with fixed D t− 1 , the sparse coefficient α t− 1 is updated by equation (24); 2. Foreground detection: y t is updated by equation (25); 3. Foreground dictionary update: the foreground dictionary D f is updated according to the foreground detection result y t ; 4. Background dictionary update: the dictionary D b is updated according to formula (12) to get a new background dictionary D t ; 5. Return to the step 1 for the next frame detection.

Comparative Analysis of Detection Algorithms in Simple Intersection Scenarios.
e intersection scenario in the PV video sequences of i-LIDS is relatively simple, and a slight camera shake in the video causes unstable background interference. To save space, Figure 7 shows visual comparison results of foreground detection for only the most challenging one in video clips. e vehicles in the video start to move slowly and queue up temporarily. In Figure 7, the first column is the original video image, the second column is the current background extracted by each method,    Figure 6: Evaluation results for test sequences.
Original image Background image Foreground image Figure 7: Detection results in a simple intersection scenario. and the third column is the foreground detection image, and row (a) is the detection results of the GMM model, row (b) is the detection results of the sigma-delta confidence model, and row (c) is the detection results of our method. Using the GMM model, the background is contaminated by slowmoving or temporarily stopped vehicles, and it is failed to detect the intact foreground. To some extent, the sigma-delta confidence model prevents the slow-moving or temporarily stopped vehicles from blending into the background. However, the background recovered using this model is not very accurate, and the extracted background has smear and ghost artifacts. With the foreground unrelated limitations in the background updates, our method achieves promising results. We can prevent slow-moving or temporarily stopped vehicles from blending into backgrounds. e processing speed is significantly improved after converting the image to 192 × 256 pixels. e sigma-delta confidence model takes about 4.35 frames/s, and the speed advantage is obvious. e GMM model takes about 1.47 frames/s, and our algorithm is 0.65 frames/s (1.54 s/frame).

Comparative Analysis of Different Algorithms in
Complex Intersection Scenarios. We compare our method with the GMM model and sigma-delta confidence model (SDCM) in the actual complex intersection scenario. Due to the heavy traffic at this intersection, affected by traffic signals, there are many vehicles queuing in line behind the stop line, and the waiting time is long. Figure 8 shows the comparison results of the three detection algorithms. Figure 8 shows the case in which vehicles have less queue length and shorter queuing time at the intersection. Figure 9 shows the case that vehicles have a long queue length and a long waiting time.
In Figures 8 and 9, the first column is the original video image, at which time vehicles have started to queue; the second column is the background extracted by each algorithm; the third column is the foreground detection. Row (a) is the detection result of the GMM model; row (b) is the detection result of the sigma-delta confidence model; row (c) is the detection result of our algorithm. Figure 8 shows the case where the actual intersection scenario is relatively simple, and the vehicle queue time is not long. Using the GMM model, the background is contaminated by slowmoving and queuing vehicles, resulting in the foreground detection to fail. e background extracted by the sigmadelta confidence model is partially contaminated, and the foreground is not intact. Our method shows great power toward accurate background-foreground separation. Figure 9 shows the difference in detection results of the three algorithms when the traffic volume becomes heavier and the waiting time becomes longer at the intersection. e background extracted by the GMM model is contaminated seriously, resulting in the foreground being seriously missing. Due to more vehicles and longer waiting times at the intersection, part of the foreground blends into the background with the sigma-delta confidence model, resulting in poor results of the foreground detection. Our method successfully detects the foreground at this complex scenario and recovers accurate background. Table 3 and Figure 10 show the quantitative evaluation results by precision, recall, and F-measure on the test video clips. For the accuracy of the results, we only perform statistics on a road area where there are many slow-moving or temporarily stopped vehicles. e precision results show that the GMM model performs the foreground detection task with medium and very low accuracy for simple and complex intersection scenarios, respectively.
is is due to the slow-moving or temporarily stopped vehicles at the intersection scenario. e sigma-delta confidence model and our method detect foreground regions with high accuracy in the simple sequences, but the sigma-delta confidence model introduces few false positives into the final mask. For the complex sequences, the accuracy of the sigma-delta confidence model is dramatically decreased due to the longer vehicle waiting times; the background extracted by this model presents more severe smearing artifacts. e proposed method achieves promising results for all the video sequences. With the help of foreground unrelated limitations in the background updates, we can prevent the slow-moving or temporarily stopped vehicles from leaking into backgrounds and recover the accurate backgrounds without smearing and ghosting artifacts.

Conclusions
In this study, based on the background subtraction of the sparse representation, a background dictionary update model with unrelated foreground is established. is model can limit the foreground into the background dictionary when the background dictionary is updated, thus avoiding the background pollution and the lack of traffic foregrounds caused by the slow-moving or temporarily stopped vehicles at urban intersections. At the same time, the manifold regular term is introduced to establish a sparse coding model of foreground consistency representation. e model can overcome the problem of foreground discontinuity caused by the large difference in sparse representation coefficients of the same foreground target due to the independence of sparse representation. We conducted experimental verification of the proposed method in real-world urban traffic videos. First, comparing our method with two other background subtraction methods based on sparse representation, the results show that our method not only keeps the background model being unpolluted from slow-moving or temporarily stopped vehicles, but also maintains a continuous and consistent representation of the foreground target. en, our method is compared with the other two detection methods of traffic foreground at a simple intersection and complex intersection, including the GMM model and sigmadelta confidence model. e results show that our method can maintain good detection results at complex intersections.
With the increasing traffic volume, traffic parameter detection, and traffic state recognition based on vehicle detection and tracking methods at urban intersections are increasingly limited by occlusion, slow-moving, or temporarily stopped vehicles. Our future work is to select appropriate visual features and their combinations to characterize traffic state parameters and identify traffic conditions based on the extracted robust traffic foreground.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.