The Line Pressure Detection for Autonomous Vehicles Based on Deep Learning

Nowadays, vehicle line pressure detection is an important function of an intelligent transportation system. At present, the line pressure detection algorithms mainly include algorithms based on traditional features and models and algorithms based on deep learning. However, these algorithms also have shortcomings such as low detection accuracy or relying on specific scenarios. In this regard, this paper proposes a fast and accurate vehicle line detection algorithm based on deep learning for vehicle images. *e algorithm builds a GooleNet-based FCN semantic segmentation network and adds a BN layer, 1× 1 convolution, and FPN structure to improve the segmentation effect of the GooleNet-FCN network and reduce network parameters. *eMobileNet-SSD (no pretrained model) network structure is used for vehicle detection. According to the relationship between the receptive field and the anchor, and then combined with specific data, the prediction branch of the network and the Default Box on the branch are modified and the FPN structure is added for feature fusion to form the final improvedMobileNet-SSD network.*e experimental results show that the algorithm takes an average time of 67.8ms per frame, the detection rate of line pressing for a vehicle is 96.6%, and the deep learning models are 25.5M and 19.2M, respectively.*e experimental results verify the effectiveness and practicality of the detection algorithm proposed in this paper.


Introduction
With the development of science and technology, automobiles have become more and more common in our life, and the number of traffic accidents caused by automobiles has also increased rapidly. e main reasons for accidents are the bad habits and illegal operations of drivers, and so on. Among them, (the line pressing of a vehicle) LPV is the most common illegal driving behavior, which accounts for the highest proportion of traffic accident mortality. e existing traffic management system [1] uses surveillance cameras to monitor LPV, which can record the violations of drivers and optimize the distribution of traffic flow. In addition, it has a deterrent effect on drivers.
At present, the algorithms for lane and vehicle detection based on vision mainly include fixed cameras and vehiclemounted cameras. Noteworthy, the fixed camera method, as reviewed in the literature [2], is based on the wavelet transform method to segment the yellow line area in the image without vehicles. en the vision processor separates the same area of the subsequent captured real-time data and compares it with the template to determine whether there is the phenomenon of LPV. Literature [3] is based on the statistical method of gray frame difference, which mainly detects VPL by calculating the difference between the average value of the pixels for two adjacent frames and the interest region of image. If it is greater than the set threshold, it is judged as a broken lane line. Otherwise, it is judged as an LPV. Literature [4] analyzes that if the yellow line is pressed by the vehicle, the yellow line will have a gap. By analyzing the contour characteristics of the object that produces the gap, it is judged whether it meets the vehicle contour, and the centroid position of the vehicle is found from the contour, and then based on the distance between the vehicle centroid and the lane line to judge whether there is an LPV. On the other hand, in the mobile camera method, literature [5] collects a bird's-eye view of the road by installing the camera directly above the four wheels. If there is an intersection between the lane line and the wheel tangent line, it is judged as an LPV. Literature [6] uses the camera to select a frame of picture data every 20 minutes and extract the color histograms around the yellow line of the picture as a template, and extract pictures every 5 frames based on the subsequent real-time data, the information of the same position of the template is matched with the information of the template, and the similarity between the two templates is compared to determine whether there is a LPV. Literature [7] expands the detected vehicle as a target in proportion, estimates the place where the wheel touches the ground, finds the tangent line between the vehicle and the ground, and judges whether there is an intersection with the lane line. If there is an intersection, the LPV is determined. In the literature [8], a data set of pressure line detection is first constructed, and then the vehicle and lane line detection is completed by combining the image semantic segmentation method, and then the front and rear wheel positions of the vehicle are obtained by the method of front and rear wheel estimation, and finally the vehicle pressure line judgment is realized by comparing the position of the wheel and the lane line. Literature [9] proposes to use the angle formed by the two lane lines and the vehicle in the process of driving, and it is simply defined that the maximum angle formed by the vehicle and the lane lines on both sides is greater than the set threshold, and it is determined as a line pressure. Literature [10] uses object detection and semantic segmentation techniques to obtain vehicle and lane information and uses a lightweight spatial convolutional network module to achieve high-precision segmentation of lane lines. e vehicle and lane line information is fused to detect the indentation behavior, and the solid dotted line is judged using pixelbased statistics in multiple areas of interest. In the literature [11], an automatic verification method for vehicle pressure line violations based on CNN and geometric projection is proposed.
at is, in the three-dimensional coordinate system, the ground is the x-y plane, and the projection frame from the vehicle to the ground is fitted, and finally, according to the lane line, judging whether to press the line with the pose relationship of the projection frame.
In general, the current algorithms for line pressing detection based on deep learning have the following issues: (1) based on the high cost of a fixed camera, it cannot cover a large area; (2) most of the current algorithms are based on template matching to make judgments. For vehicles using this method, it is easy to cause false detection and missed detection; (3) the algorithms based on traditional features and models perform better in speed and accuracy in specific environments but have poor robustness on complex roads; (4) the algorithm based on deep learning has good adaptability to the environment, but the speed is slow and cannot be detected in real time; (5) algorithms based on CNN and geometric projection are difficult to automatically verify the line pressing behavior in the face of large vehicles.
Motivated by the aforementioned discussions, this article uses mobile devices to collect in-car video, designs a deep learning algorithm with a small number of model parameters and high accuracy to extract information, and finally determines whether the lane line and vehicle position are line-pressed. e main contributions of this paper are summarized as follows: (1) Considering that the deep learning detection algorithm needs a large amount of data, this paper builds the data set required for the experiment according to the needs and preannotating the constructed data, which save a lot of manual time, and then use the annotation tool to modify it. (2) GoogleNet-FCN semantic segmentation network is conducted to extract lane line and wheel-line line information, perform morphological filtering on the segmentation information and connected area threshold method to improve the segmentation effect, and the model evaluates the MIoU value on the test set to 66.2%.
is paper improves the MobileNet-SSD network to modify the prediction branch and anchor value according to the specific data set and performs feature fusion. erefore, the accuracy of the network performance on the data set is 95.7%, and the recall rate is 94.6%. (3) is paper establishes a pressure line algorithm model and uses the information extracted by the deep learning model to judge the line pressure of the vehicle. According to the position of the intersection of the lane line and the wheel-line line, a threshold is set to determine whether the line is pressed, and the best threshold is found according to the evaluation results of the different thresholds on the line pressure data. Finally, the performance of the algorithm is analyzed, which shows that the average time of the algorithm is 67.8 ms and the accuracy rate is 96.6%. It is proved that the target vehicle pressure line detection algorithm in this experiment has good accuracy and robustness, and it can also meet the realtime requirements. e remaining part of this paper is organized as follows. In Section 2, the extraction scheme of vehicle image information based on deep learning is proposed. e performance analysis of vehicle line pressing and the simulation results showing the validation of our designed method are presented in Section 3. Finally, the conclusion is given in Section 4.

Extraction of Vehicle Image Information Based on Deep Learning
Deep learning is a new direction in many fields of machine learning. Due to its strong learning ability, strong adaptability, and excellent portability, it performs well in many fields, especially computer vision-related tasks [12]. In this paper, a lightweight model is used to extract useful information from experimental data. A GoogleNet-based full convolutional network semantic segmentation model is built for segmentation data to extract lane lines and other information. e vehicle detection data set uses an improved MobileNet-SSD target detection model.

GoogleNet-FCN.
GoogleNet-FCN is a (fully convolutional network) FCN based on the GoogleNet classification network, which is finally used in the detection of lane lines and wheel-line lines in the semantic segmentation data set in this experiment [13].
e inception structure in GoogleNet has a small number of parameters and a wide width. It is a combination of speed and effect. is experiment continues to improve it. e improved model Inception structure is visualized in Netscope as shown in Figure 1.
From Figure 1, it can be observed that the improved Inception structure of this experiment replaces the 5 × 5 convolutional layer with two 3 × 3 convolutional layers, which can greatly reduce network parameters and enhance network nonlinearity.
From Figure 2, it shows the convergence effect of the BN layer by comparing the loss curve of the split lane line network with the BN layer and the loss curve without the BN layer. e loss in the segmentation network is softmax loss, the batch is set to 64, and the training is 15w times in total. It can be seen from the figure that without the BN layer, the loss fluctuation will be larger and the convergence time will be longer. erefore, the network adds a BN (Batch Normalization) layer after each convolution, which can be used to solve the problem of nonuniform change scales between feature variables, so that the data becomes stable to reduce gradient dispersion and accelerate network convergence. e entire model is a FCN model. In the decoder part of the model, the last convolutional layer of the Encoder is upsampled, as shown in the black box in Figure 3. e five black boxes are all upsampling processes. e entire network structure is FCN32s, which can improve the segmentation effect on small targets. e blue part in Figure 3 belongs to the last convolutional layer of the decoder. In the black box, a 1 × 1 convolutional layer is introduced before the addition of the upsampling result channels to reduce the number of channels and greatly reduce the amount of calculation. e green part of Figure 3 shows the addition of feature channels. On the whole, the above-mentioned green boxes integrate network feature maps of different depths. is is also the Feature Pyramid Networks (FPN) feature fusion structure commonly used in deep learning now, which can better retains the target position and edge information and makes the segmentation effect more refined.
After continuous training and testing to modify the network, the brief structure parameters of GoogleNet-FCN are shown in Table 1. Table 1 shows the final parameters of the GoogleNet-FCN network model, in which 1 × 1 convolution, BN layer, and FPN structure are not described in detail, and only the input and output of the key nodes of the network are summarized. From the perspective of deep learning, conv3 indicates that the size of the convolution kernel is 3 × 3, and Inception indicates input to the output of the structure. e final output size is 480 × 224 × 4, consistent with the input data of the original image. 4 represents the 3 categories of this experiment and one background category, which is used for loss calculation or output during training. e overall parameters of the network are small in number, and the BN layer is added to reduce overfitting and speed up the convergence, using the FPN structure to fuse features to improve the detection effect. At the same time, the network is FCN32s to improve the segmentation effect of small targets.

Lane Line Segmentation Effect and Postprocessing.
e experiment was carried out under the Linux operating system, the corresponding caffe environment of GoogleNet-FCN was built, the segmented data set was trained, and the converged model was used to test the data. e original picture and the forecasted effect picture are as follows.
It can be seen from Figure 4 that the overall effect of segmentation of lane lines and wheel-line targets is better, and edge contour segmentation can be segmented. Analyzing the segmentation effects of all test sets, it is found that there are subtle edges with unevenness, background pixels appear in the target connected area, and a small number of target pixels appear in the background. Because the lane line and the wheel-line shape edge are biased to a straight line, in order to solve the above problem, the processing scheme is carried out from two aspects: Firstly, the closed operation is used to smooth the segmentation of the target edge, and the second is the connected region threshold method to remove the pixel area whose area is less than the threshold. Figure 5 shows the model prediction effect and the postprocessing effect diagram.    It can be seen that the target edge segmentation after postprocessing is more detailed and more in line with the target shape, and the small area target pixels that are incorrectly detected in the initial segmented background area are also improved.

Model Evaluation Results
. Semantic segmentation is based on pixel-level segmentation. It commonly uses evaluation indicators which include Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), and Mean Intersection over Union (MIoU). In this experiment, the evaluation code is written as needed and evaluated on the experimental split test set. e final results are shown in Table 2.
Suppose that there are n + 1 classes in the segmented dataset. 0 denotes the background set and n is the number of target categories. p ii means that the real class is i and the forecast class is also i; p ij means that the real class is i, but the forecast class is j.
Pixel Accuracy (PA) represents the proportion of all pixels in the image correctly classified. e calculation formula is expressed as follows: (1) Mean Pixel Accuracy (MPA) represents the proportion of pixels correctly classified and the pixels predicted for each category in the image, and then the average is obtained as follows: (2) From Table 2, we can obtain the pixel accuracy results of the network under the experimental data. e background has the largest proportion in the image and the segmentation accuracy is also the highest. Although the PA_wheel_line_v accuracy rate is the lowest at 0.668, it can also meet the application requirements.
Mean Intersection over Union (MIoU) represents the average of the intersection union ratio of each type of prediction result and the real label. e calculation formula is defined as follows: e IoU evaluation indicators in Table 3 are more widely used in segmentation models. e experimental recognition rate of MIoU is 0.662. e background of IoU is the largest, which is easier to distinguish from other categories. e lowest IoU is 0.505, indicating that the ratio of nearly 0.7 prediction results for this category is the correct pixel, which satisfies experiment requirements in this paper.

Vehicle Detection Algorithm Based on Improved
MobileNet-SSD Network 2.2.1. Improvement of MobileNet-SSD Network. MobileNet-SSD is a faster and less parameterized network designed based on the SSD network [14]. Its detection part is the same as that of SSD. It uses the convolution characteristics of different stages for multiscale prediction. MobileNet-SSD and the difference in SSD are that Mobi-leNet-SSD replaces the basic network with the MobileNet [15] structure. e entire convolution of the MobileNet-SSD network convolution structure adopts the depth separation convolution, and the BN layer and the ReLU layer are used to optimize the model after the convolution. Aspect rations are used to change the size of the anchor to make it a rectangle of the corresponding proportion, which is helpful for the model to find positive and negative samples for training, and at the same time, the matching degree of the target frame and the label is higher during the target regression. A welldesigned Default Box can be beneficial to model detection  capabilities. How to set the anchor size for different data and how to choose the corresponding feature layer for anchor placement will be the key points for model improvement. e important parameter Anchor in target detection transforms the detection problem into whether there is a recognized target in this fixed frame and how far the target frame deviates from the fixed frame. A well-designed Anchor is conducive to the model to find the difference between positive and negative sample characteristics, and at the same time, the target box and the label match better when the target returns. e setting of the Anchor size in the network is related to the Receptive Field (RF) and actual data of each convolutional layer. e receptive field represents the size of the pixel points on the feature map mapped to the input image in the network. In the range mapped by the receptive field, there are differences in the importance of pixels at different positions, and the closer the pixel is to the center, the greater the impact on RF. According to the characteristics of convolution, the closer to the center, the more convolution times, the whole importance of division is similar to the Gaussian distribution, and the characteristics of the receptive field output are basically concentrated in the central area, which is the Effective (ERF) Receptive Field [16]. Each pixel on the feature map of the convolutional layer corresponds to a receptive field. e pixel extracts the features of the corresponding area of the theoretical receptive field, but the final response range of each layer to the output of the feature map is actually ERF. e effective receptive field area is related to the actual size of the target. e size of the target in the image can be used to set the anchor size. According to the setting of the anchor size of the classic network, calculate the ratio of anchor to RF, explore the ratio, and set anchors in all layers of the network, setting up nine sets of comparative experiments according to the ratio of anchor to RF of 0.1 m to 0.9 m, designing the training network and tested on multiple sets of data, and finally founding that the ratio of anchor to RF is the best in the range of 0.2 m-0.3 m. e most suitable anchor value for this layer can be calculated according to the size of RF of the convolutional layer. e specific Anchor value is related to the detection target in the data set. In this experiment, the vehicle is detected, and the anchor is the size of the vehicle target frame. is experiment will use a clustering algorithm to cluster all target sizes and get representative sets of data as anchor values. Furthermore, YOLOv3 was selected in the multiclustering method, and the clustering was used the IoU value of the box and the actual target box as the criterion so that the clustering target and the size of the target box would not have too much relationship and the clustering the data will be more balanced. is clustering method is used to cluster the vehicle detection data set, and 9 sets of coordinates are obtained by clustering. e cluster coordinates are converted into the Default Box and aspect ratios required by the MobileNet-SSD network.
According to the basic structure of MobileNet-SSD, the theoretical receptive field size of each layer of convolution is calculated, and according to the ratio of the anchor to the receptive field, the suitable range of the anchor for each layer can be found. Select the appropriate convolutional layer to predict the Default Box value calculated after the target box clustering. e improved MobileNet-SSD network predicts the convolution branch and its related parameters are shown in Table 4.
ere are still 6 prediction branches in the network, but it is more in line with the detection of vehicles in the vehicle data set. In the end, the total number of predictable targets for each branch of the network is 6219 prediction boxes. e convolution depth of the model where the changed prediction branch is located is shallow, especially the prediction branch corresponding to the conv6 and conv8 convolutional layers. It may appear that the network is too shallow and has not yet extracted features that can be used for detection. Using the characteristics of the FPN structure, the deep feature layer and the shallow feature layer can be merged, and the shallow network contains more features such as locations, details, and deep semantic features, which can be used for network prediction. e finally improved Mobi-leNet-SSD network structure is shown in Figure 6.
As can be seen from the figure, upsampling is performed at conv13, the number of relevant convolution channels is changed, and finally, feature fusion is performed with the features of the conv10 layer. After conv10 to conv8 and conv8 to conv6 adopt 1 × 1 convolution and deep convolution operations, the FPN structure is formed after fusion, and the fused features are trained and predicted.

Model Evaluation Results.
In the experiment, the original MobileNet-SSD network and the improved network were trained separately, and the trained model was tested on the test set of vehicle detection data. e test comparison results are shown in Figure 7: e evaluation indexes of target detection are generally precision (notated as P) and recall (notated as R). It is used to represent the error relationship between the number of detected targets and the real tag. Terms related to precision and recall are TPFPFNTN. e calculation formula of precision and recall are expressed as follows: It can be seen that the prediction effect of the improved network model is significantly better than that before the improvement. e improved network is more suitable for Journal of Advanced Transportation this data set, whether it is the fit between the prediction frame and the target or the prediction precision is more accurate. e evaluation indicators of target detection are generally Precision and Recall. e experiment draws a comparison chart of the P-R curve of the network before and after the improvement, as shown in Figure 8. e red curve in the figure is the improved PR curve, and the blue is the PR curve detected by the original network test. It can be clearly seen that the improved red curve is higher than the blue curve. e larger the area enclosed by the PR curve and the coordinate axis is, the better the network detection ability is. It proves that the improved network performs better on the experimental data set.   e experiment set the IoU of the predicted target frame and the label to be greater than 0.5, and the predicted category is consistent with the predicted true positive sample. e conf value is the predicted probability of belonging to the category. When the value is set to 0.5, the vehicle target detection data set is evaluated, and the total number of vehicles in the test set is 18916. At this time, the TP is 17801, the FP is 800, and the FN is 996. e corresponding accuracy rate is 0.957, and the recall rate is 0.946, which meets the test availability standard.

Establishment of the Vehicle Line
Model. e judgment of the vehicle pressure line is based on whether the entire car and lane line intersect in three-dimensional space, but the images collected by the vehicle camera belong to two-dimensional space. erefore, how to use the two-dimensional space map to judge the vehicle pressure line will be the key to the experiment. e experiment uses a semantic segmentation network to segment the lane line and the wheel-line of the vehicle ahead captured by the vehicle camera (the blue line and the red line in Figure 4 pressed. e specific process is shown in Figure 9.
As can be seen from the figure, the whole process is mainly divided into three steps. e first step is the straight line fitting of the lane line and the wheel-line line, the second step is the judgment of wheel-line ownership, and the third step is the judgment of the vehicle pressure line.

Lane Line Fitting.
e lane lines are fitted using the postprocessed connected regions of the semantic segmentation model. At the same time, considering that when the detected vehicle is under line pressure, the distance between the recorder and the detected vehicle is relatively close, so the lane line can be approximated as a straight line, and the real situation of the wheel-line line is a straight line. In this paper, fitting is performed on the convex hull after postprocessing.
is method greatly reduces the number of fitting detection points. Fitting adopts the improved probability Hough transformation [17,18] to fit the edge of the straight line. e two straight lines of the same lane line are drawn through their center points to represent the middle line of the lane line.

Judgment of Wheel-Line Ownership.
e wheel-line attribution judgment is to judge whether the wheel-line line belongs to the vehicle target frame detected in the current image, which can prevent other objects with similar characteristics from interfering with the experimental results. After the image is detected by the vehicle target, the target frame of the vehicle ahead will be predicted, so it is necessary to match the best wheel-line-h line (red line in Figure 4(b), defined as the line connecting the tangent points between the two front or rear wheels of the target vehicle and the ground) and wheel-line-v line (blue line in Figure 4 rough the position of the intersection T in the line segment AB to determine whether the target vehicle is pressing the line, the relational expression is as follows: e previous formula T x G x represents the distance between the line segment TG and the X axis; the line segment A x B x represents the distance between the line segment AB and the X axis; the angle θ is a set threshold, which represents the wheel-line of the target vehicle and the two ends of the lane line ratio. Only when the ratio of the distance TG from the intersection to the center point to the line segment AB is less than θ, the vehicle is judged to be on the line; otherwise, the line is not pressed. In the same way, the wheel-line-v can be judged by pressing the line.

Evaluation Results of the Line Pressing Algorithm.
e judgment of LPV is a classification problem, and the evaluation indicators are set according to specific needs: accuracy, false detection rate, and missed detection rate. In this experiment, the accuracy rate is expressed as the ratio of the number of correctly classified images to the total number of predicted images; the false detection rate is the ratio of the number of all unlined images that are falsely detected as crimped images to the total number of predicted images; the rate of missed detection indicates the ratio of the number of unlined images that are predicted to be unlined images to the total number of predicted images. e total number of frames of the pressure line detection data set is 3415, and the number of pressure line frames is 996. e experiment is evaluated according to different values.
From Table 5, it can be found that when the lane line has an intersection with the wheel, a large number of missed detection will occur when the distance between the intersection points and the center of the wheel line is too close, and a large number of false detection will occur when the distance is too far. e accuracy of the algorithm is better when the distance is moderate. When the set threshold is equal to 0.25 m, the performance of the whole algorithm is the best. At this time, the accuracy rate reaches 96.6%, the missed detection rate is 1.67%, and the false detection rate is Journal of Advanced Transportation 1.72%, which shows that the vehicle line pressing algorithm in this paper is accurate and reliable.

System Performance
Analysis. e algorithm is tested on the environment where the CPU is Core i7 and the graphics card is RTX2070. e performance of the algorithm is related to the experimental environment, but the algorithm itself is more important. is experiment is mainly divided into three modules: segmentation network module, target detection module, and pressure line judgment module. e data in Table 6 are the average time for testing all images in the respective test set, the unit is ms, the total pressure line detection algorithm is 67.8 ms, and it can be detected as 14.7 frames per second in the above experimental environment. For a vehicle recorder, 30 frames per second, one or two frames can be selected for detection to meet realtime requirements.
is experiment uses two deep learning models to extract features, and the model parameters are given in Table 7. Table 7 is the size of the model parameters obtained after training of the network built under the caffe framework. e two network parameters are small in number and can be transplanted to the mobile terminal for detection.

Conclusion
is article uses the image information obtained by the vehicle-mounted camera. e lightweight deep learning network is used to extract useful information from the vehicle image, and the vehicle pressure line detection algorithm model is constructed to realize a fast and accurate target vehicle pressure line detection method. Experiments show that this method has good accuracy and robustness for the target vehicle pressure line detection, and it can also meet real-time requirements and the algorithm can be transplanted to the mobile terminal to run.
e whole system has certain practical application value.
Data Availability e data supporting the research results of this paper are divided into three parts: the original vehicle video image, the segmentation effect image, and the vehicle detection effect image. e image data used to support the results of this study have been deposited at https://github.com/ chenjiayang-fm/data-statement.git.

Conflicts of Interest
e authors declare that they have no conflicts of interest.