Real-time and accurate detection of parking and dropping events on the road is important for the avoidance of traffic accidents. The existing algorithms for detection require accurate modeling of the background, and most of them use the characteristics of two-dimensional images such as area to distinguish the type of the target. However, these algorithms significantly depend on the background and are lack of accuracy on the type of distinction. Therefore, this paper proposes an algorithm for detecting parking and dropping objects that uses real three-dimensional information to distinguish the type of target. Firstly, an abnormal region is initially defined based on status change, when there is an object that did not exist before in the traffic scene. Secondly, the preliminary determination of the abnormal area is bidirectionally tracked to determine the area of parking and dropping objects, and the eight-neighbor seed filling algorithm is used to segment the parking and the dropping object area. Finally, a three-view recognition method based on inverse projection is proposed to distinguish the parking and dropping objects. The method is based on the matching of the three-dimensional structure of the vehicle body. In addition, the three-dimensional wireframe of the vehicle extracted by the back-projection can be used to match the structural model of the vehicle, and the vehicle model can be further identified. The 3D wireframe of the established vehicle is efficient and can meet the needs of real-time applications. And, based on experimental data collected in tunnels, highways, urban expressways, and rural roads, the proposed algorithm is verified. The results show that the algorithm can effectively detect the parking and dropping objects within different environment, with low miss and false detection rate.
With the increasing demands of traffic transportation in modern life, such as express delivery and logistics, the number of motor vehicles in the city continues to rise. The increase in the number of motor vehicles has caused numerous problems, such as parking and dropping incidents that reduce road traffic efficiency [
Parking and throwing objects are static targets in traffic scenes. The detection algorithms for such targets in intelligent-traffic-incident-detection systems developed at home and abroad are mainly divided into two steps: target area detection and target type differentiation.
There are two methods for target area detection: tracking method and nontracking method.
The tracking method detects the stationary target by analyzing the characteristics of the foreground target trajectory. For example, Bevilacqua et al. [
The nontracking method mainly relies on background modeling and analysis of foreground pixel time series features to detect stationary target regions. For example, Fatih Porikli et al. [
Static targets in traffic scenes are mainly divided into parking and throwing objects, so the distinction between static targets is the distinction between parking and throwing objects. The current algorithm mainly uses the two-dimensional features of the target to identify it. For example, Wang Dianhai and Hu Hongyu [
Today’s machine learning [
Through the survey of research on video-based parking and dropping objects, the key issues found in most algorithms for detecting parked and discarded objects are mainly in two aspects. The first issue is how to detect the target, which is the core part of the algorithm. Tracking and nontracking methods are generally used to detect the target area. Both methods need to extract and update the background. However, under complicated traffic scenes such as low visibility, large traffic flow, and intense lighting changes, it is difficult to extract an ideal background image. The second one is how to distinguish the target type. When distinguishing the target type, the two-dimensional feature of the target is often used, but the imaging process of the camera is a process of dimensional reduction. In this process, the target will undergo significant scale changes and geometric deformation. Therefore, these methods of using image features to identify targets have significant limitations. In view of these two shortcomings, it is of great theoretical and practical value to study algorithms for detecting parking and dropping objects that do not rely on background and use real three-dimensional information for target recognition.
In this paper, the above problems are studied and a new method is proposed. The real-time video collected by the camera is used as the data source. The image analysis and processing program is used to realize the automatic detection and feedback of parking and litter events. It is mainly divided into the following three steps. Firstly, based on the state evolution, the initial determination of the abnormal region in the image is carried out. Secondly, the two-way tracking and eight-neighbor seed filling algorithm is used to segment the parking and the drop area in the image. Finally, the three-dimensional information is used to distinguish the target.
The detection of abnormal regions is the core of the whole algorithm, so choosing an appropriate detective algorithm is the first problem to consider. Current algorithms for detection are too dependent on the background and computationally intensive. We will use status change to detect abnormal regions, which can effectively avoid the above two shortcomings.
The abnormal area refers to the image area where the steady state changes. The basic idea of the algorithm is as follows: Firstly, the image is preprocessed to highlight useful information, which lays a foundation for improving the accuracy of subsequent detection. Secondly, the detection of abnormal regions is carried out based on the status change, and solutions are given for some of the shortcomings. Finally, an improved algorithm for detection is proposed, and the results before and after the improvement are compared and analyzed.
When the video image is captured, the camera will be affected by factors such as illumination changes and noise pollution, which will affect the captured image. In order not to affect the result of the algorithm, the image will be preprocessed by image enhancement, edge extraction, and median filtering.
In the case of night or smog, the contrast and color of the collected traffic video images will be degraded, and a lot of useful information will be covered, which is very unfavorable for the subsequent algorithm. Among the algorithms for image enhancement, the gray-level section transforms based on gray level transformation which has been widely used due to its advantages such as simplicity and diverse transformation functions [
Linear transformation of gray scale.
In Figure
Original image and enhanced image.
Original image
Enhanced image
Comparing the two images, it can be seen that the contrast of the enhanced image is significantly improved.
The edge is the most basic feature of the image, which is invariant to changes in light, so in order to reduce the effect of light on the detection result, edge extraction is applied after image enhancement. There are many classic operators for edge detection [
In the process of acquisition, transmission, and storage, any process may introduce noise, and the presence of noise seriously affects the result of edge enhancement. Therefore, it is necessary to denoise the detected edge image. The median filter [
Median filter effect.
Image preprocessing.
Original image
Enhanced image
Edge image
Usually the gray value of each pixel in the image does not change for a long time. Only when the foreground target passes from the pixel area, the gray value of the point will be changed, and when the foreground target passes, the change of the pixel gray value is greater than the change caused by the environmental influence. Thus, when it is detected that there is a pixel point in the image where the gray value changes greatly, which means that the status change happen, it is determined that the foreground object exists. The detected change in the gray value may be caused by the moving target passing through the detected area or the target entering the detected area and stopping. Therefore, in order to correctly detect the parking and the dropping object, it is necessary to remove the gray value caused by the moving target passing through the detected area.
When the pixel gray value suddenly changes and returns to the initial gray value in a short time, it indicates that the moving target passes through the area, but does not stop. And when the pixel gray value suddenly changes and remains stable for a while, indicating that the pixel point is occupied by the foreground target, it is highly probable that a parking or dropping event occurs. Since a single pixel contains too little information and does not take into account the influence of surrounding pixels, the block is used as the basic processing unit.
Figure
Texture changes of blocks in the detection area.
Moving object passing through the detection area
Moving object enters the detection area and stops
As can be seen from Figure
Block coordinate.
Based on the basic concepts, this section will introduce the core of the algorithm. According to the above method of dividing the image, a counter C(m,n) with an initial value of 0 and an abnormality flag D1(m,n) and D2(m,n) with an initial value of false are set for each block in the image, the (m,n) is the block coordinates. The abnormality flag D1(m,n) is used to mark whether the block meet the state change, and D2(m,n) is used to mark whether the block meet the forward trajectory but has no backward trajectory after bidirectional tracking of the block. First, the first frame image in the video is assigned to the template frame, and then starting from the second frame image, each image block is detected according to the following steps.
Firstly, calculate
Secondly, when the value of the counter C(m,n) reaches the threshold Tha, the gray value of the block in the current frame is saved. And if the threshold Tha is reached for the first time, the gray value is saved to
Thirdly, when the value of the counter C(m,n) reaches the threshold Thb (Thb>Tha), save the gray value of the block in the current frame to
In order to verify the effectiveness of the above algorithm, the experiment was carried out in four scenarios: Chongqing Expressway, Xi’an South Second Ring Road, Xi’an South Second Ring Bus Lane, and Shanghai Fuxing Tunnel. The experimental results are as follows.
In Figures
Test results of parking on Chongqing Expressway.
Frame 389
Frame 509
Frame 609
Xi’an South Second Ring Road parking test results.
Frame 12
Frame 634
Frame 786
Xi’an South Second Ring Busway parking detection results.
Frame 2435
Frame 2601
Frame 2755
Detection results of the droppings on Chongqing Expressway.
Frame 891
Frame 965
Frame 1067
Results of detecting the dropping in the bus lane of South Second Ring Road in Xi’an.
Frame 2435
Frame 5201
Frame 5356
As can be seen from Figures
Shanghai Fuxing Road Tunnel Exit.
Frame 106
Frame 830
Frame 830
Shanghai Fuxing Road Tunnel Entrance.
Frame 79
Frame 1566
Frame 1566
Xi’an South Second Ring Road.
Frame 1556
Frame 3484
In order to eliminate false alarm, the algorithm are analyzed and found to have the following defects.
Texture analysis of the false alarm block.
Original image
Enlarged edge enhancement map of block (6, 30)
Histogram corresponding to (b)
Texture analysis of anomalous blocks.
Original image
Enlarged edge enhancement map of block (12, 51)
Histogram corresponding to (b)
Therefore, it can be judged whether it is an abnormal block according to the total gray value in the image block, and the calculation formula is as formula (
By conducting experiments on road sections with different traffic densities, it can be found that the time required to detect anomalies on road sections with less traffic density (number of image frames) is shorter than high traffic density. As shown in Figures
Figure
Effect of passing vehicles on detection.
Original image
SADT and counter curve of block (67, 16) in t period
From Figure
Effect of illumination on detection
Original image
SADT and counter curve of block (12, 29) in t period
It can be seen from Figure
Comparison of the effects before and after the improvement of the algorithm for the exit of Shanghai Fuxing Road Tunnel.
Before improvement
After improvement
In this paper, the variance of historical ST values is used to measure the change. The specific calculation formula is as follows:
Based on the algorithm proposed in Section
The first frame in the video is assigned to the template frame, then from the second frame, each image block is detected as follows.
In order to verify the improved algorithm, the results are compared. As shown in Figures
Comparison of the effects before and after the improvement of the algorithm for the entrance of Shanghai Fuxing Road Tunnel.
Before improvement
After improvement
Comparison of the effects before and after the improvement of the algorithm for Xi’an South Second Ring Road.
Before improvement
After improvement
Comparison of the effect of detecting parking.
Frame 786 before improvement
Improved frame 755
The anomalous area detected in the second section inevitably includes the effects caused by illumination changes, noise, etc. Therefore, this section first performs bidirectional tracking on the selected abnormal area, which is determined to be caused by a parking or a parachute event. Then, the eight-neighbor seed filling algorithm is used to analyze the final abnormal region to segment the parking and the dropping area.
Parking and dropping objects are from the state of motion to the state of rest and have the characteristics of a forward trajectory without a backward trajectory. However, the false alarm caused by the shadow does not have such a feature, so the two-way tracking of the detected abnormal region can further reduce false alarm and improve accuracy.
The pixel points in the image where the brightness changes drastically and the pixel points of the maximum value of the curvature on the edge curve of the image are called corner points, which contain many pieces of important information [
Figure
Detected corner points.
Corner of parking on Chongqing Expressway
Corner of the dropping object of Chongqing Expressway
Corner of parking in Xi’an South Second Ring Road
After acquiring the corner point of the abnormal block, in order to track the abnormal block, it is necessary to find the position of the corner point in the next time series to match the corner points. The matching process is the process of finding the location with the greatest similarity to a given template in the search area. Commonly used matching methods [
The block matching method needs to use the matching criterion to measure the matching rate between two small blocks, so it is necessary to select the matching criteria. And the matching criteria directly affect the tracking accuracy and the calculation amount. Common matching criteria [
Among them, the MSE criterion and the NCCF criterion require more multiplication operations, the time complexity is higher, and it is rarely used. The calculation amount of SAD is smaller than that of MAD, and the implementation is simple, convenient, and widely used. Therefore, this paper uses SAD as the selection criterion for the optimal matching point.
First, in the preliminary abnormal block, the corner point P0(x, y) is obtained by the Moravec algorithm.
Second, read the m-frame image closest to the preliminary abnormal block, and carry out backward tracking with P0 as the initial point. Find the best matching point
Third, forward tracking with P0 as the initial point. Find the best matching point
As can be seen from Figure
The result of bidirectional tracking of the corner points in Figure
Results of bidirectional tracking of Figure
Results of bidirectional tracking of Figure
Results of bidirectional tracking of Figure
The comparison of the detected results before and after the two-way tracking is shown in Figure
Comparison of abnormal blocks detected before and after bidirectional tracking.
Before two-way tracking
After two-way tracking
The parking and dropping areas after two-way tracking contain many abnormal image blocks, which constitute the actual target area. Therefore, in order to further determine the position and size of the target area and segment the target area, the connected domain analysis of the image is performed in units of image blocks.
An image area that is adjacent in position and has the same pixel value is generally referred to as an image connected domain. The process of finding and marking all connected domains in an image is called connected domain analysis. There are many methods for connected domain analysis. Here are two of the most commonly used algorithms: two-pass scanning and seed filling. A method of finding all connected domains in an image and marking them by repeatedly scanning the images twice is called a two-pass scanning method [
The above are two basic connected domain analysis methods. In view of the fact that the two-pass scanning method requires two scans of the image, and a large amount of space is needed to store the equal relationship between the markers, the seed filling method is used to analyze the connected region of the abnormal region.
After the abnormal block detection, the abnormality D1(m, n) and D2(m, n) of each image block exist the following three cases:
The first one is D1(m, n)= False and D2(m, n)= False, which means that the image block has neither state change nor track feature.
The second is D1(m, n)=True and D2(m, n)=False, indicating that the image block has a state change but does not meet the track characteristics.
The third type is D1(m, n)=True and D2(m, n)=True, indicating that the image block has both state changes and trajectory characteristics.
The image block of the second case is an abnormal area determined initially, and the image block of the third case is an abnormal area caused by the occurrence of a parking or a dropping event.
The current scanned exception block is seeded and marked. And the upper, lower, left and right boundaries of the connected area are set to be the boundary of the abnormal block. Then, the eight image blocks adjacent to the abnormal block are sequentially scanned, and if there are abnormal blocks, all the abnormal blocks are pushed onto the stack. Popping the top block of the stack, giving the same mark, and updating the four boundaries of the connected domain according to the positional relationship between the position of the abnormal block and the upper, lower, left and right boundaries. Then, the eight image blocks adjacent to the abnormal block are sequentially scanned. If there are abnormal blocks, all the abnormal blocks are pushed onto the stack. Repeat (b) until the stack is empty. Then a connected area with four known boundaries in the image is found, and all the exception blocks in the connected area are marked as True. Second, repeat step one until the end of the scan.
After the scan, all connected domains in the image can be found. All the exception blocks in each connected domain have the same mark, and four boundaries of each connected domain can be derived. By considering each connected domain as a target area, you can determine the location and size of each target area, as shown in Figure
Results of connected domain analysis.
Parking
Dropping
The above method of connected domain analysis has defects, mainly reflected in the following two aspects: Due to the interference of illumination, vehicles, pedestrians, etc., the timing of each image block in the abnormal target reaching an abnormal state may be different, as shown in Figure
Status difference of different blocks in the abnormal area.
In Figure
In view of this situation, it is necessary to save the abnormality flag of the detected abnormal block for a while (usually 2~3 seconds). If an image block with an abnormality flag D1(m, n)=D2(m, n)=True appears around the block during this time, the block is considered to meet the adjacent condition. When using the two-way tracking method to reduce false positives caused by shadows and other disturbances, some image blocks in the abnormal target are also removed. Thereby causing the originally connected area to become a nonconnected area, and one target becomes multiple targets. Therefore, when carrying out the connected domain analysis, all the image blocks with the abnormality flag D1(m, n)=True and adjacent to the seed are pushed onto the stack. (The initial seed must be an exception block).
According to the two shortcomings proposed in Section
Firstly, scan the image in rows in units of image blocks. If an abnormal block is scanned, a new connected region is considered to appear. The current scanned exception block is seeded and marked. And the upper, lower, left and right boundaries of the connected area are set to be the boundary of the abnormal block. Then, the eight image blocks adjacent to the abnormal block are sequentially scanned, and If there are image blocks with abnormal markers D1(m, n)=True in these eight image blocks, all the blocks are pushed onto the stack. Popping the top block of the stack, giving the same mark, and updating the four boundaries of the connected domain according to the positional relationship between the position of the abnormal block and the upper, lower, left and right boundaries. Then, the eight image blocks adjacent to the abnormal block are sequentially scanned. If there are image blocks with abnormal markers D1(m, n)=True in these eight image blocks, all the blocks are pushed onto the stack. Repeat (b) until the stack is empty. Then a connected area with four known boundaries in the image is found, and all the exception blocks in the connected area are marked as True. Secondly, Repeat step one until the end of the scan.
After the scan is finished, all the connected domains in the image can be acquired; likewise, each connected domain is considered as a target region, and the experimental result is shown in Figure
Improved results.
Parking
Dropping
By comparing Figure
The work done in the previous section only defines the parking and dropping area in the image and does not distinguish the target type, that is, whether it is parking or dropping. The traditional methods of differentiation like area, rate of change of the direction of motion, and average speed have limitations. This section describes how to differentiate between targets using 3D information.
The use of a camera to capture images or video will result in the loss of target information, which will cause a series of problems such as geometric deformation and scale change when processing images in traffic scenes. If you can use the image to restore objects in the space and use the properties of the object itself, these problems will be completely eliminated. In such an environment, camera calibration technology was born. The main purpose of this technology is to define the internal and external parameters of the camera under a specific imaging model and then establish the relationship between image pixel coordinates and world coordinates.
The camera imaging process can be described by its imaging model. In this paper, we use a linear imaging model for calibration, which can make calculations easier.
With the deepening of calibration technology research, many scholars have proposed a representative calibration method [
Consider the existence of a large number of parallel markings in traffic scenes, and the country has uniform standards for the dimensions of these markings. Therefore, the use of these marking lines can not only easily find the vanishing point in three directions, but also easily find a line segment of a known distance on the image. Therefore, in the paper, employ the method of camera calibration based on vanishing point proposed by Zheng Yuan et al.
Since the Direct Linear Transformation (DLT) [
The image captured by the camera is a three-space to two-dimensional projection, mathematically known as the perspective mapping. According to the camera imaging model, the closer to the camera, the larger the object. And the image during projection has been deformed and physical size information has been lost.
In this paper, the inverse perspective mapping method [
If a plane is determined in advance in three-dimensional space, the points on the plane are one-to-one correspondence with those on the two-dimensional image, and the plane is called a back-projection plane. In this way, the data in the two-dimensional image can be mapped onto the back-projection plane to obtain a map, which contains the information of the back-projection plane. And the closer the back-projection plane is to the surface of the object [
We set the back-projection surface in the anomalous area and get the true size of the abnormal target through the corresponding back-projection map. If the size is close to the size of the real vehicle [
A rough three-dimensional model frame.
Van
Car
Suv
Extraction results of the chassis line.
Location of the back-projection plane of the front and left views.
Van
Car
Suv
Location of the back-projection plane of the front and left views of truck.
The inverse projection of the main view of van.
The inverse projection of the main view of truck.
The inverse projection of the left view of van.
The inverse projection of the left view of truck.
Three-view color image of car.
Three-view color image of suv.
Extraction result of the line in the main view of van.
Extraction result of the line in the main view of truck.
Extraction result of the line in the left view of van.
Extraction result of the line in the left view of truck.
Extraction result of the line of car.
Extraction result of the line of suv.
Inverse projection of the top view of van.
Extraction result of the line in the top view of van.
Test results for white vehicle size.
Object width | Object length | Object height | |
---|---|---|---|
Van | 188cm | 405cm | 197cm |
Car | 175cm | 405cm | 147cm |
SUV | 169cm | 390cm | 168cm |
Truck | 254cm | 1843cm | 321cm |
Table
There are three inverse projective planes involved in this paper, which are the tail (or head), side and top of the vehicle, as shown in Figure
As can be seen from the Figure
In this application, it is necessary to detect the linear segments of the vehicle body. And the smooth design of the modern vehicle manufacturing process makes the original distinct straight line segment on the contour of the vehicle smooth and inconspicuous. Therefore, the conventional method of extracting a straight line cannot link the partial broken edge and the curved segments with less curvature, and the detected straight line is prone to breakage.
This paper designs a method of edge coding based on the reverse projection image. If it is the edge of the image, it is encoded as 1, not the edge of the image is encoded as -1. Calculate and get the line segment with the largest sum of the directions in which the extracted line is located. An example of the encoding of an edge is shown in Figure
Edge coding.
Construction of the inverse projection.
The advantage of this is that, without the use of time-consuming and complex algorithms such as Hough transform, the straight line is corrected using a back-projection image, so that the algorithm for detecting the line only needs to consider vertical or horizontal lines. The complexity of the algorithm is reduced and the accuracy of the detected line can be greatly improved, and the line with small curvature and local fracture is well inclusive. The algorithm flow is shown in Figure
Algorithm for detecting straight lines based on the inverse projection.
The algorithm proposed in the paper is implemented in VC6.0 environment [
Parking detection results in different scenarios.
scenarios | Number of parking events | Total number of alarms | The total number of correct alarms | Recognition rate % | Missed detection rate % | False detection rate % | Average time required for testing /ms |
---|---|---|---|---|---|---|---|
Xi'an South Second Ring | 38 | 39 | 37 | 97.3 | 2.63 | 5.12 | 4300 |
Beijing Yanqing Road Section | 52 | 54 | 49 | 94.2 | 5.77 | 9.25 | 4900 |
Chongqing Expressway | 43 | 45 | 42 | 97.7 | 2.32 | 4.44 | 4100 |
Shanghai Outer Ring Road | 60 | 63 | 58 | 96.7 | 3.33 | 7.93 | 4800 |
Shanghai Fuxing Road Tunnel | 26 | 28 | 25 | 96.1 | 3.84 | 10.0 | 5000 |
Detection results of the falling objects in different scenarios.
scenarios | Number of parking events | Total number of alarms | The total number of correct alarms | Recognition rate % | Missed detection rate % | False detection rate % | Average time required for testing /ms |
---|---|---|---|---|---|---|---|
Xi'an South Second Ring | 14 | 15 | 13 | 92.8 | 9.09 | 13.3 | 4300 |
Beijing Yanqing Road Section | 19 | 22 | 18 | 94.7 | 7.14 | 18.2 | 4900 |
Chongqing Expressway | 16 | 18 | 15 | 93.7 | 6.25 | 16.7 | 4100 |
Shanghai Outer Ring Road | 15 | 17 | 14 | 93.3 | 6.67 | 17.6 | 4800 |
Shanghai Fuxing Road Tunnel | 13 | 14 | 12 | 92.3 | 7.69 | 14.3 | 5000 |
It can be seen from Tables
The existing algorithms for detecting parking and dropping objects generally have two shortcomings: significant dependence on background and inaccurate distinction between parking and dropping objects.
In view of the above deficiencies, we study on two aspects: the detection of stationary targets and the differentiation of target types. First, based on the method of the status change, when there is an object that did not exist before in the traffic scene, the abnormal region in the scene is preliminarily defined, and then the initially determined abnormal region is bidirectionally tracked to further determine the target from the motion to the stationary in the scene. Finally, the eight-neighbor seed filling method is used to segment the target area. Therefore, the dependence on the background is reduced. Only tracking the target area where the state changes is needed, which significantly reduces the amount of computation. Second, a method of using the three-dimensional information of the target to distinguish the target type is proposed. Firstly, use the difference between the projections of the feature points to determine the relative height between the feature points, and use the height to distinguish the parking and dropping objects. Secondly, establish a 3D wireframe model of common vehicle model. The parking and the dropping objects are distinguished by the matching of the projection of the wireframe model on the two-dimensional image and the area of the target area. Thirdly, by establishing the inverse projection planes of different heights, the three-dimensional information of the length, width, and height of the target is obtained, and the parking and the dropping objects are distinguished by the size of the vehicle which is known. This method of distinguishing target types using three-dimensional information can not only accurately distinguish the parking and dropping objects, but also roughly classify the models of stationary vehicles.
By testing in a large number of different traffic scenarios [
The measured data used to support the findings of this study have not been made available because it belongs to the local authorities of traffic control and management in Shanghai, Xi’an, and Chongqing, China.
The authors declare that all data availability in this article are true and reliable and there are no conflicts of interest regarding the publication of this paper.
The work was funded by the Project of Shaanxi Provincial Science and Technology Program (Grant no. 2014JM8351), the Fundamental Research Funds for the Central Universities (Grants nos. 2013G1241109 and 300102248305), and the National Natural Science Foundation of China (Grant no. 61501058). Thanks are due to Liting Sun for the great work she has done.