A Network That Balances Accuracy and Efficiency for Lane Detection

In the automatic lane-keeping system (ALKS), the vehicle must stably and accurately detect the boundary of its current lane for precise positioning. At present, the detection accuracy of the lane algorithm based on deep learning has a greater leap than that of the traditional algorithm, and it can achieve better recognition results for corners and occlusion situations. However, mainstream algorithms are difficult to balance between accuracy and efficiency. In response to this situation, we propose a single-step method that directly outputs lane shape model parameters. *is method uses MobileNet v2 and spatial CNN (SCNN) to construct a network to quickly extract lane features and learn global context information. *en, through depth polynomial regression, a polynomial representing each lane mark in the image is output. Finally, the proposedmethod was verified in the TuSimple dataset. Compared with existing algorithms, it achieves a balance between accuracy and efficiency. Experiments show that the recognition accuracy and detection speed of our method in the same environment have reached the level of mainstream algorithms, and an effective balance has been achieved between the two.


Introduction
In ALKS, the vehicle must reliably detect the boundary of its current lane for precise positioning, and on this basis, the understanding of the traffic scene is completed, and the vehicle is kept in the lane through trajectory planning and vehicle control. Among them, the lane detection module is the starting point of the entire system, and its safety and effectiveness are particularly important. However, due to the inherent slenderness and complex conditions of the lane itself, such as weather changes, light changes, and other road users' occlusion, this task becomes very challenging. In addition, the calculation time of the detection module is very important for the real-time application of the entire system. It is necessary to improve the high operating efficiency and transmission adaptability of the algorithm while maintaining detection accuracy [1]. e early algorithms are mostly traditional methods [2,3], which mainly used the fusion of manual features and heuristic methods, and then combined postprocessing techniques, such as Hough transform (detecting lines, circles, or other parametric curves), random sample concensus (RANSAC) estimating parameters of a mathematical model from a set of observed data that contains outliers, and other algorithms for lane line detection. is type of algorithm has a small amount of calculation but requires manual adjustment of parameters, a large workload, and poor robustness. When the driving environment changes significantly, the detection effect of lane lines is not good. e current deep learning-based methods have become the current mainstream method due to their high accuracy. Among them, the method based on instance segmentation [4] is to first generate the segmentation results, and then perspective transformation is used to convert into a bird'seye view (BEV) and perform curve-fitting. Popular models include polynomials, splines, or cyclotrons.
is type of method achieves high accuracy by automatically extracting features from data, but the low-efficiency decoder makes the calculation efficiency low and is not sensitive enough to the curve scene [5,6]. In response to this situation, the method based on message transmission [7,8] uses spatial information in deep neural networks to capture the global context to improve recognition accuracy. However, a major problem is still that this method is usually computationally intensive and difficult to run in real-time, which is not conducive to the use of ALKS in-vehicle embedded devices. In addition, there is a method based on end-to-end training [9], which directly returns lane line parameters. is method has a great improvement in speed, but it is slightly insufficient in accuracy and lacks overall interpretability.
To effectively balance the accuracy and efficiency of the algorithm, we propose a single-step lane detection method based on the MobileNet v2 + SCNN network. e network backbone of the model adopts the lightweight MobileNet v2 [10], which can effectively reduce the calculation amount and parameters of the lane model. At the same time, the SCNN layer is added so that the information can be effectively transmitted in the space layer. e network finally outputs the polynomial coefficients of each lane and the confidence score of each lane.
rough experiments, the recognition accuracy and detection speed of the model proposed in this paper have reached the level of mainstream algorithms in the same environment, and an effective balance has been achieved between the two.

Related Work
Lane detection methods can be based on traditional image processing and deep learning. We briefly review the most extensive classical methods related to lane detection and highlight the differences between them in this section.

Traditional Methods.
Traditional lane detection methods usually use road image features, geometric features, and another information modeling [11]. Feature extraction is based on global and local location information [12,13]. Feature extraction is essentially a filtering algorithm which aims to reduce the number of features in a dataset by creating new features from the existing ones. en, Hough transform and RANSAC are used for straight line and curvefitting, respectively. Finally, the error detection is eliminated, and the lane boundary segments are clustered into the final result. Traditional lane line detection methods cannot achieve the function of lane tracking, and most of the algorithms are limited to specific application environments and are not robust to light changes and sudden weather changes. Similarly, there are lane color degradation and structural damage in real data, as well as road noise and occlusion. Traditional methods cannot handle the complex situations faced in actual driving.

Deep Learning Methods.
Lane detection methods based on deep learning can be roughly divided into two categories: single-step models and two-step models.
e two-step model first extracts the characteristics of the lane lines and then clusters and fits each line. e feature extraction in the first stage is mostly based on segmentation. For example, VPGNet [14] uses four-quadrant segmentation to define the location of the vanishing point and guides network learning through the vanishing point, to obtain a better convergence effect, and a model converges when additional training will not improve the model. SCNN stacks convolutional layers so that information can be transmitted across rows and columns. It is effective for long and narrow lane detection, but the running speed is only 7.5 FPS. e authors proposed a self-attention distillation (SAD) module in [15]. Based on information distillation, to solve the impact of the large backbone network on the speed, text information can be aggregated. CurveLanes-NAS [16] down samples the entire image and classifies (grid) rows of each cell. Although the most advanced results have been achieved, they are computationally very time-consuming. In the second stage of the two-step model, most of the work is performed on curve-fitting through the learned transformation matrix. First, the result of the first segmentation is converted into a BEV, and then uniform point selection + least squares is used to fit the line of the mask map.
e single-step model can directly output the parameters of the lane shape model, for example, Line-CNN [17], LaneATT [18], and so on. LaneATT is implemented based on anchors, and the attention mechanism is applied to perform SOTA, reaching 250 FPS. In addition, PolyLaneNet [9] assumes that the lane line is a curve and uses polynomials to learn curve parameters. However, due to the imbalance of the existing dataset, some deviations are caused. e rest of the paper is organized according to the following pattern. Methods and materials are discussed in Section 3, Results are given in Section 4, and the paper is concluded in Section 5.

Method
We will describe the structure and loss function of our proposed single-step lane detection method based on the MobileNet v2 + SCNN network in this section.

Architecture Design.
e proposed network structure is shown in Figure 1. It mainly consists of two parts including the backbone network and the SCNN layer, and then the fully connected layer outputs the prediction results.
Lightweight MobileNet v2 uses deep separable convolution instead of ordinary convolution, which can reduce the amount of calculation and parameters of the model. In addition, it draws on the residual connection idea of ResNet [19] and proposes a reverse residual structure on this basis. e number of network layers is deepened, and the ability to express features is enhanced; a linear bottleneck structure is used to replace nonlinear bottlenecks to reduce the loss of low-dimensional feature information. Because of these advantages, we use MobileNet v2 as the network backbone, discard the last two fully connected layers, and replace them with a dilated convolutions layer. Dilated convolutions can get a larger receptive field and thus get denser data. At the same time, the spatial characteristics of the image can be well preserved without the loss of image information. After expanding the dilated convolutions layer, an SCNN layer is added to allow the message to propagate down and to the right so that each pixel can receive messages from all other pixels to further expand the receptive field.
For each frame of the image to be detected, M max lane marker candidates (expressed as a polynomial), the lowest point of each lane marker, and the uniform vertical height h of the horizon will be output. All output results can be expressed as follows: where P j is the polynomial of the lane marking candidates, s j is the vertical offset, and c j ∈ [0, 1] is the prediction confidence score, as illustrated in Figure 2 [9]. And P j can be expressed as follows: where K is the parameter that defines the order of the polynomial.

Loss Function.
e loss function is a basic and key element in deep learning. To balance the magnitude of the different parts of the loss function and improve the convergence speed, it is necessary to define the loss function by weight: where the weighting coefficients W p , W s , W c , and W h are hyperparameters manually adjusted. e first part of the loss refers to how well the polynomial model fits. Here, we use the mean square error (MSE) function to calculate the error between the predicted value and the true value. e closer the predicted value x * j is to the ground true value x j , the smaller the MSE is between the two. Loss p is defined as follows: where x * j � [x * 1,j , . . . , x * N,j ] T and x j � [x 1,j , . . . , x N,j ]. Next, the second part of the loss is the vertical offset s j loss. e last part of the loss is the vertical position h loss. We also use the MSE to calculate them. e third part of the loss is the prediction confidence score c j loss. For this binary classification task, we use cross-entropy to calculate it which measures the relative entropy between two probability distributions over the same set of events.

Experiments
We will show the verification effect of the proposed singlestep lane detection method based on the MobileNet v2 + SCNN network on the TuSimple dataset in this section. Experimental results show that the recognition accuracy and detection speed of the detection algorithm proposed in this paper have reached the level of mainstream algorithms in the same environment, and there is an effective balance between the two. Next, the details of the implementation process will be detailed, and the experimental results will be analyzed.

Hyperparameters.
We implemented the proposed network based on Pytorch. e transformations of random cropping, scaling, rotation, and color dithering are used to increase the diversity of images and increase the number of training samples. en, the image is resized to 1280 × 720 pixels. Finally, ImageNet's [21] mean and standard deviation are used for normalization. e hyperparameters in the network are set as follows: the batch size is set to 4, the SCNN convolution kernel width is set to 7, the loss coefficients W S , W C , and W h are set to 1, and W p is set to 800. Cosine annealing is used to adjust the learning rate. e initial learning rate is 3e -4, and the period is 750 periods. We performed 9.5 k iterations on a single Nvidia P4000 GPU and got the final result.

4.2.
Results. Based on the same environment of the TuSimple dataset, we verify the effectiveness of our proposed method and compare it with PolyLaneNet and SCNN. In this experimental result, the accuracy and running time of TuSimple's evaluation are compared.
From Table 1, we can see that the proposed method has achieved the recognition performance of mainstream algorithms. e ACC rate is 1.05% higher than that of Poly-LaneNet. Although it does not achieve the same performance as SCNN, it is not far from it. At the same time, it is about 4 times faster than SCNN, which makes it have practical application value in the actual ALKS system. Our method achieves an effective balance between accuracy and efficiency.
In addition to quantitative evaluation, some visualization work has also been done on the TuSimple dataset, and the effect is shown in Figure 3.

Conclusion
In this research, we propose a single-step lane detection method based on the MobileNet v2 + SCNN network to solve the balance between accuracy and efficiency. e   backbone network is based on the lightweight MobileNet v2, which greatly reduces the amount of calculation and parameters of the lane model. After that, the additional dilated convolutions in the network expand the receptive field, and the SCNN layer realizes the effective transmission of spatial layer information, which ensures the recognition accuracy of the model. e network finally directly outputs the polynomial coefficient of each lane and the confidence score of each lane. Experiments show that the recognition accuracy and detection speed of our method in the same environment have reached the level of mainstream algorithms, and an effective balance has been achieved between the two. In the future, we will deploy this method on mobile devices.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.