A Robust Detection Method for Multilane Lines in Complex Traffic Scenes

*e robustness and stability of lane detection is vital for advanced driver assistance vehicle technology and even autonomous driving technology. To meet the challenges of real-time lane detection in complex traffic scenes, a simple but robust multilane detection method is proposed in this paper.*e proposed method breaks down the lane detection task into two stages, that is, lane line detection algorithm based on instance segmentation and lane modeling algorithm based on adaptive perspective transform. Firstly, the lane line detection algorithm based on instance segmentation is decomposed into two tasks, and a multitask network based on MobileNet is designed. *is algorithm includes two parts: lane line semantic segmentation branch and lane line Id embedding branch. *e lane line semantic segmentation branch is mainly used to obtain the segmentation results of lane pixels and reconstruct the lane line binary image.*e lane line Id embedding branch mainly determines which pixels belong to the same lane line, thereby classifying different lane lines into different categories and then clustering these different categories. Secondly, the adaptive perspective transformation model is adopted. In this model, the motion information is used to accurately convert the original image into a bird’s-eye view image, and then the least-squares second-order polynomial fitting is performed on the lane line pixels. Finally, experiments on the CULane dataset show that the proposed method achieved similar or better performance compared with several state-of-the-art methods, the F1 score of the proposedmethod in the normal test set andmost challenge test sets is better than other algorithms, which verifies the effectiveness of the proposed method, and then the field experiments results show that the proposed method has good practical application value in various complex traffic scenes.


Introduction
Vehicle and road safety has been a key issue for the communities and governments [1]. With emerging new technologies and knowledge, advanced driver assistance systems (ADAS) have been proposed to reduce road accidents and improve vehicle safety [2]. In ADAS and even autonomous driving vehicles, the main technical bottleneck is the perception problem, which has two elements: road and lane perception and obstacle detection [3].
e robustness and stability of lane detection is vital for advanced driver assistance vehicle technology and even unmanned technology [4]. Firstly, lane detection and tracking aids in localizing the ego-vehicle motion, which is one of the very first and primary steps in most ADAS, such as lane departure warning (LDW) and lane change assistance. Furthermore, lane detection is also able to aid other ADAS modules such as vehicle detection and driver intention perception.
At present, a large number of research results have been achieved in the research of lane detection at home and abroad. e lane detection methods nowadays usually contain two main steps: (1) lane feature extraction and (2) lane modeling [3,5,6]. Generally, the lane detection methods can be divided into three categories: (1) traditional lane detection methods, (2) image processing method combined with deep learning, and (3) end-to-end lane line detection method. e traditional lane line detection method can be divided into three steps [7,8]. Firstly, the road image data is preprocessed to remove the noise and to obtain the lane line features; then, the lane line is detected from the preprocessed image by means of feature-based or model-based methods; finally, the detection results are fitted to convert the lane lines represented by image coordinates into world coordinates. Aly [9] proposed a real-time and robust model-based detection method for urban road lane lines. IPM is used to convert the front view into a top view, and then the selective directional Gaussian filter is used to filter to check the lines in the image and deal with other abnormal lines. Finally, the lane line is simulated in the front view by mapping. Bounini et al. [10] proposed a feature-based lane line detection method in virtual simulation environment. In the initialization stage, the method integrates Hough transform, Canny edge detection, and Kalman filter to greatly reduce the region of interest and predict the lane line in the future. Overall, the feature-based method has higher computational efficiency but poor robustness. Compared with the featurebased method, the model-based method is more stable, but it is difficult to implement, and the calculation is huge. e lane line detection method combined with deep learning [11] can extract features from various types of complex environments, which breaks through the limitations of traditional image processing methods. Amayo et al. [12] designed a target classification method based on the extracted lane line geometric elements. Firstly, weakly supervised neural network is used to extract lane line elements under different conditions; then, the global energy optimization method is used to retrieve and cluster the lane geometry, and the corresponding semantic categories are given. Finally, the tracking effect of continuous frame images is optimized to improve the stability of classification effect. Song et al. [13] designed a lane line detection and classification system based on 3D vision. Firstly, the lane line is extracted from the ROI region by the Sobel filter; then, an adaptive lane line model is designed in Hough space to further extract lane lines; finally, the convolution neural network is used to obtain the specific classification results.
Due to the rapid development of deep learning [14][15][16], researchers prefer to use end-to-end methods to solve computer vision problems [17]. As shown in Figure 1, the initial picture is input into the model, and the desired result can be obtained through the end-to-end method. e method of lane line detection based on end-to-end deep learning technology can increase the accuracy of lane line recognition in a complex environment and simplify the steps of two coordinate system transformation, that is, the transformation between pixel coordinate system and world coordinate system, thus increasing the robustness of lane line detection. Li et al. [18] proposed an end-to-end system called line-CNN to realize the lane line detection method, which uses the straight-line anchor to locate the precise position of the lane line. In this paper, the semantic information of the whole lane line at the global level is considered. Van Gansbeke et al. [19] showed a method to realize lane line detection by solving the weighted leastsquares problem, which is specifically divided into two parts. One part is to predict the weight of pixels of each lane line by using the depth convolution network, and the other part is to use the differentiable least-squares fitting module to return the parameters of the best fitting curve of lane line features. Generally, the end-to-end method does not need image preprocessing and manual feature extraction, and the experimental results have great advantages in accuracy and robustness.
Although many novel end-to-end methods have been proposed to deal with lane detection applications and have achieved very good performance, they are usually implemented in high-performance PC or embedded systems so that those methods can use a complicated algorithm and utilize big storage space. However, automotive companies are very sensitive to hardware cost, so they more prefer lane detection method that can be used in low cost and resourcelimited platforms. Meanwhile, the complexity and uncertainty of road conditions has led to the fact that there is still room for improvement of existing local methods. e challenges include shadows cast from trees, other vehicles and buildings, invisible or defect lanes, highlight, strange lane shapes, and bad quality lines.
To meet the challenges of real-time lane detection in complex traffic scenes, a simple but robust multilane detection method is proposed for the resource-limited automotive embedded platform in this paper. e proposed method breaks down the lane detection task into two stages, that is, lane line detection algorithm based on instance segmentation and lane modeling algorithm based on adaptive perspective transform model. e main contributions can be listed as follows: (1) e multilane line detection is considered as an instance segmentation problem in this paper, and a deep learning network is designed to solve the problems of uncertain number of lanes and lane change. e lane detection problem is decomposed into two tasks, that is, lane line semantic segmentation branch and lane line Id embedding branch, and a multitask network based on MobileNet [20] is designed to shorten the forward propagation calculation time without loss of detection accuracy. In MobileNet, a convolution method called depthwise separable convolution is used to replace traditional convolution methods to reduce the network weight parameters. By sharing the first four coding processes between the two tasks, the calculation speed and the accuracy of lane line segmentation are improved. Meanwhile, by fully considering the semantic characteristics of the encoding and decoding process, the strategy of fusing the corresponding encoded information during decoding is adopted to improve the accuracy of lane segmentation. (2) An adaptive perspective transformation model is designed to overcome the disadvantage that the conventional perspective transformation model is only suitable for smooth roads, and the image will be distorted when the vehicle is driving on the uneven road or the pitch angle of the camera changes due to turbulence. e adaptive perspective transformation model overcomes the deficiency of the transformation matrix with fixed parameters in the traditional model and can realize the accurate conversion from the camera image to the bird's-eye view under the camera motion state so as to improve the robustness of lane line fitting. e remainder of the paper proceeds as follows. e lane detection method is given in Section 2. Section 3 presents the lane line detection algorithm based on instance segmentation. Section 4 is devoted to the adaptive perspective transform model. Section 5 shows the experimental results and analysis. Section 6 gives the conclusion.

Overview of the Proposed Multilane Line Detection Method
In this paper, the multilane line detection is considered as an instance segmentation problem. Each lane line is formed into a separate category, and then each lane line is fitted. In order to increase the running speed, improve the detection accuracy, and meet the requirements of real vehicle applications, the lane detection problem is decomposed into two tasks, and a multitask network After completing the lane line instance segmentation, these pixels belonging to the lane line need to be parametric curve fitting. Curve fitting refers to the process of finding an appropriate function that fits a finite set of data points. e commonly used curve fitting models include Bayesian fitting [21], B-spline curve fitting [22], and least-square curve fitting [23]. Generally, in order to improve the accuracy of fitting, the fixed perspective transformation model [9,24] is commonly used to convert the image into a bird's-eye view for curve fitting. However, the algorithm is only suitable for smooth roads. When the road is uneven or the vehicle is bumpy, the image will move and be distorted, resulting in large errors in the conversion into a bird's-eye view. To solve this problem, the adaptive perspective transformation model is designed. In this model, the motion information is used to accurately convert the original image into a bird's-eye view image, and then the least-squares second-order polynomial fitting algorithm is performed on the lane line pixels. Compared with the traditional fixed perspective transformation, this model has better robustness.
e specific process of lane line detection is shown in Figure 2.

Lane Line Detection Algorithm Based on Instance Segmentation
In this paper, MobileNet is designed to train an instance segmentation model for lane line detection. Its advantage is that it can solve the problem of uncertain number of lanes and lane change. e instance segmentation is composed of two branches: (1) lane semantic segmentation, which is used to obtain the lane line binary image; (2) lane line Id allocation, which is used to determine the pixels that belong to the same lane line.
To improve the calculation speed and the accuracy of lane line segmentation, the two-branch algorithm, for instance, segmentation, shares the first four coding processes. To ensure the real-time performance of the multilane line detection algorithm, the input image is converted into a 512 × 256 pixel format for network learning. e encodingdecoding structure of the network is shown in Figure 3.
To meet the cost requirements of in-vehicle computing, MobileNet is adopted as an encoding structure to shorten the forward propagation calculation time without loss of detection accuracy. In the encoding process, the semantic feature information is less, and the lane line position is accurate, while the semantic feature information in the decoding process is richer, and the lane line position information is relatively rough. erefore, similar to the ENet [25], the corresponding encoded information is fused during the decoding process to improve the accuracy of lane line segmentation. 3.1. MobileNet. As a representative of lightweight convolutional networks, the core idea of MobileNet is to use a convolution method called depthwise separable convolution instead of traditional convolution methods to reduce the network weight parameters. e depthwise separable convolutions divide standard convolution into two steps. e first step is depthwise convolution, that is, channel-bychannel convolution. In depthwise convolution, one convolution kernel is only responsible for one channel, and one channel is "filtered" by only one convolution kernel. e second step is pointwise convolution, which is to "string" the results obtained by the first step. Assuming that, in standard convolution, the size of the input feature map is D F × D F , the number of input channels is M, the number of output channels is N, and the size of the convolution kernel is D K × D K ; then the calculation amount of standard convolution is

Input Output
In MobileNet, the depthwise separable convolution is obtained by the two-part standard convolution solution, and the convolution kernel size is D K × D K and 1 × 1, respectively. en, the calculation amount of the depth separable convolution is As an example, Figure 4 is a schematic diagram of a standard convolution where the input feature map size is 5 × 5, the number of input channels is 3, the number of output channels is 4, and the size of the convolution kernel is 1 × 1. Figure 5 is a schematic diagram of the depthwise separable convolution where the input feature map size is 5 × 5, the number of input channels is 3, the number of output channels is 4, and the size of the convolution kernel is 3 × 3 and 1 × 1, respectively.
From (1)and (2), it can be seen that the ratio of the parameter calculation amount of the depthwise separable convolution and the standard convolution under the same input and output feature map size is In order to compare the performance of convolutional neural networks using standard convolution and depthwise separable convolution, these two networks were used for training and testing on the ImageNet dataset in [20]. e results show that, in terms of accuracy, using standard convolution is 1.1% higher than using depth separable convolution, but in terms of calculation amount and parameter amount, the former is 8-9 times the latter. From [20], we can conclude that the use of deep separable convolution can greatly reduce the amount of calculation and the number of parameters while basically ensuring the accuracy. Correspondingly, it can reduce the difficulty of network model training, reduce training time, and reduce the performance requirements of hardware devices.

Lane Line Semantic Segmentation Branch.
e purpose of lane line semantic segmentation is to obtain the segmentation result of lane pixels and reconstruct the binary image of the lane line, and then determine which pixels belong to the lane line. Due to the extreme imbalance between the lane pixels and the background pixels, the focal loss function [26] is used for model training, and the specific lane line pixel segmentation loss function formula is as follows: where s(c, x, y) indicates whether the pixel (x, y) belongs to represents the probability that pixel (x, y) is predicted to be category c; n c is the number of segmentation categories, n c � 2; h and w are the length and width of the output Input 5×5×3 Convolution kernel 1×1×3×4 Output 5×5×4 Figure 4: e standard convolution.
Mathematical Problems in Engineering prediction image, respectively; c represents the focus parameter. To increase the weight of the pixels that are difficult to classify, the weight of the pixels that are easy to classify becomes smaller so that the network can pay more attention to the pixels that are difficult to learn; in this paper, c � 2. α refers to the weighting parameter of the lane line category. α is mainly used to solve the problem of the uneven number of pixels in each category. e mathematical calculation formula of α is where n represents the number of lane line pixels; m indicates the total number of pixels.

Lane Line
where In the inference stage, the DBSCAN clustering algorithm [27] is used to cluster each pixel. e DBSCAN method is a clustering algorithm based on high-density connected regions, which can divide regions with sufficiently high density into clusters and can find clusters of any shape in noisy data. e DBN method is used in this paper for clustering until all lane line pixels are assigned to the corresponding lanes. e cluster center is taken as the center of the circle, the radius is 0.28 m, and the minimum number of points in the domain is 180.

Adaptive Perspective Transform Model
After obtaining the clustered pixel point set of each lane line based on the lane line instance segmentation, the pixel points should be fitted. According to the imaging principle of the camera, the lane line will gradually converge into a point at a long distance, so the accuracy of the lane line fitting directly on the original image will be greatly reduced. In the commonly used improvement method, the picture is converted into a bird's-eye view through a perspective transformation model to make the lane lines parallel so that the fitting accuracy is improved. However, in the conventional perspective transformation model, the transformation matrix with fixed parameters is adopted, which is only suitable for smooth road conditions. If the vehicle is driving on an uneven road or the pitch angle of the camera changes due to turbulence, the image will be distorted, which will affect the generated bird's-eye view. erefore, an adaptive perspective transformation model that can realize the accurate conversion of the camera image to the bird's-eye view under the camera motion state is adopted.
As shown in Figure 6, the origin of the world coordinate system is defined as the point where the car's center of mass is perpendicular to the ground, and the vertical distance between the camera installation position and the origin of the world coordinate system is h. According to the relationship between the world coordinate system (O − XYZ), the camera coordinate system (O 2 − cr), and the image pixel Convolution kernel 1×1×3×4 Output 5×5×4 Convolution kernel 3×4×4 Figure 5: Depthwise separable convolution. 6 Mathematical Problems in Engineering coordinate system (O 1 − uv), the position of the pixel in the camera coordinate system can be expressed as where K is the conversion scale factor between the camera coordinate system and the image pixel coordinate system; m is the width of the image; n is the length of the image. In order to better explain the parameters of the adaptive perspective transformation model, the lateral view structure of the adaptive perspective transformation model is established, as shown in Figure 7. e x-axis in the world coordinate system can be written as a function of pixel points v, θ 0 , and θ b : When v � m ，θ(v) is α. Substituting formula (7) into formula (10), f r can be obtained: Substituting formulas (9) and (11) into (8), where θ 0 is the tilt angle of the camera; θ b is the change value of the pitch angle when the camera bumps; θ(v) is the angle value between the ground pixel and the camera; α is the half of the vertical field of view of the camera; f r is the vertical focal length of the camera. As shown in Figure 8, the vertical view structure of the adaptive perspective transformation model is established, and the Y(u, v, b) axis in the world coordinate system can be expressed as When u � n, θ(u) is β. f h can be obtained from formula (13): Substituting formulas (12) and (14) into (13), where β is half of the horizontal field of view angle of the camera; f h is the horizontal focal length of the camera. Assuming z � 0, according to the relationship established among the world coordinate system, camera coordinate system, and pixel coordinate system, the coordinate changes of pixel coordinate (u, v) on the x-axis and y-axis in the world coordinate system are updated when the vehicle motion turbulence causes the change of camera pitch angle θ b . erefore, the influence of image motion distortion on aerial view image is reduced, and the adaptive perspective transformation model is more robust to camera motion.
In the transformed aerial view, the pixels of the lane line are fitted by the second-order polynomial of the least-square algorithm, and the fitted curve is regressed to the original image so that the quadratic polynomial of the lane line in a real road scene is obtained. Figure 7: e lateral view structure of the adaptive perspective transformation model.  Figure 6: Relationship between world coordinate system, camera coordinate system, and image pixel coordinate system.

Experimental Results and Analysis
e CULane dataset [13] was used to train and validate the proposed multilane line detection model, and then field experiments were conducted to verify the performance of the proposed model.

Experimental Environment Configuration.
e algorithm design, training, and testing are based on the deep learning framework PyTorch. e experimental configuration used in this experiment is shown in Table 1.  Table 2 shows the various scene information of the CULane dataset.

CULane
It can be seen from Table 1 that a total of 9 types of traffic scenes are included in CULane. Among them, the number of images in normal traffic scenes with clear lane lines accounted for 27.7% of the total dataset, while the number of images in complex road scenes with unclear lane lines or interference with lane lines due to various reasons accounted for 72.3% of the dataset. It shows that complex road scenes are more concerned in CULane, and it is also consistent with the proportion of various traffic scenes encountered in the actual driving process. erefore, this dataset is more suitable for verifying the ability of the proposed method to detect lane lines in complex road scenes.

Data Processing Method.
In order to ensure the realtime performance of the multilane line detection algorithm, the CULane input image is converted into an 800 * 288 pixel format for network learning. e input data is normalized so that the value of the pixel is limited to the interval [0,1], which improves the computational efficiency of the model fitting process. e data augmentation strategies are used, including brightness conversion, horizontal flip, and overall image translation of 0∼2 pixels; an example is shown in Figure 10.

Model Training Parameters Setting.
Taking into account the performance of the hardware platform used in this study, the training parameters are shown in Table 3. In Table 3, batch size represents the number of training groups input for each iteration of training; epoch represents the training period of the whole training set; the learning rate decays exponentially to prevent gradient dispersion. In addition, in order to accelerate the convergence of the network, the MobileNet pretraining model is used to initialize the weight parameters of the unmodified part of the feature extraction network. e mainstream Glorot uniform distribution method is used for the initialization of other parameters.

Training Result Analysis.
To verify the performance of the proposed multilane line detection algorithm based on deep learning, F 1 is used to judge whether a lane marking is successfully detected. e lane markings are viewed as lines with widths equal to 30 pixels, and the intersection over union (IoU) between the ground truth and the prediction is calculated. Predictions whose IoUs are larger than 0.5 are viewed as a correctly predicted lane line.
From Table 4, it can be seen that the F 1 score of the proposed method in the normal test set and most challenge test sets is better than other algorithms, which verifies the effectiveness of the proposed method. Figure 11 shows the multilane line detection results of the CULane dataset, and the curves of different colors represent the detected different lane lines. As can be seen from Figure 11, the proposed method can accurately detect multilane lines in various scenarios.

Real Vehicle Experimental Verification.
To verify the positioning performance of the proposed strategy, experiments were conducted on the intelligent driving vehicle platform. e platform is refitted by Chery pure electric vehicle [31], as shown in Figure 12.

Camera Parameters.
e HD industrial camera with a USB driver interface is selected as the visual acquisition device. e specific parameters are shown in Table 5.

Industrial Computer.
Advantech MIC-7700 industrial computer is selected as the data computing platform, as shown in Figure 13. MIC-7700 is a compact and fan-free computing platform designed based on the industrial market. It can work in an environment of 0∼50 degrees, can reduce the external vibration and impact on the inside of the chassis, and can be used all day in bad weather.

Software Platform.
e real vehicle data acquisition and experiment in this paper are carried out in ROS (robot operating system). Ubuntu 16.04 is used as the installation and deployment operating system of ROS. e version of ROS is kinetic.
To achieve multitask road multitarget detection and the integration of the drivable area segmentation model on the ROS platform, as shown in Figure 14, the topic structure diagram of multitask target detection and segmentation node and network connection mode is established.
Firstly, node/lane_node is created, which subscribes to the image Topic/usb_cam/image_raw of real-time road conditions driven by the HD camera installed on the vehicle in real time, and converts the image format into the format recognized by the OpenCV tool library by calling the      cv_bridge module in the ROS library. en the image data is reasoned according to the multitask road target detection and segmentation model, and finally, the results are published in real time as topic/lane_image. ROS graphical tool RViz can display the multitask road multitarget detection and drivable area segmentation prediction results by subscribing to the topic.

Real Vehicle Test of Multilane Line Detection
Algorithm. Figure 15 shows the multilane line detection effect of the algorithm proposed in this paper for the data collected by the onboard vision system of the intelligent driving vehicle platform. It can be seen that the proposed method can achieve better detection results in various complex scenarios. e reason is that the lane line detection problem is transformed into an instance segmentation problem, which can solve any number

Conclusion and Discussion
Aiming show that the lane line detection effect F 1 in normal scenes is 91.2%. Integrating the model into the ROS platform realizes the real-time detection of multilane lines in various complex traffic scenes, which has good practical application value. We also conclude that the proposed strategies can be easily embedded into other advanced driver assistance approaches with slight modifications.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Xiang Song and Hai Wang contributed to methodology; Xiang Song, Xiaoyu Che, and Huilin Jiang contributed to software; Xiang Song, Ling Li, Chunxiao Ren, and Hai Wang contributed to validation; Xiang Song, Shun Yan, and Hai Wang contributed to investigation; Xiang Song was responsible for original draft preparation; Xiang Song, Xiaoyu Che, Chunxiao Ren, and Hai Wang responsible for review and editing; Xiang Song, Ling Li, and Huilin Jiang contributed to funding acquisition. All authors have read and agreed to the published version of the paper.