A Real-Time Complex Road AI Perception Based on 5G-V2X for Smart City Security

The Internet of Vehicles and information security are key components of a smart city. Real-time road perception is one of the most di ﬃ cult tasks. Traditional detection methods require manual adjustment of parameters, which is di ﬃ cult, and is susceptible to interference from object occlusion, light changes, and road wear. Designing a robust road perception algorithm is still challenging. On this basis, we combine arti ﬁ cial intelligence algorithms and the 5G-V2X framework to propose a real-time road perception method. First, an improved model based on Mask R-CNN is implemented to improve the accuracy of detecting lane line features. Then, the linear and polynomial ﬁ tting methods of feature points in di ﬀ erent ﬁ elds of view are combined. Finally, the optimal parameter equation of the lane line can be obtained. We tested our method in complex road scenes. Experimental results show that, combined with 5G-V2X, this method ultimately has a faster processing speed and can sense road conditions robustly under various complex actual conditions.


Introduction
The 5G network has the characteristics of high transmission speed and low transmission delay. It provides a more reliable communication environment for V2X. The 5G-based intelligent traffic management system has stronger management capabilities and robustness, which helps to improve traffic flow. With the help of 5G-V2X, real-time complex road perception becomes possible. Therefore, 5G-V2X is the key to a smart city [1,2].
Lane detection is one of the most important tasks in understanding road scenes and is also the most complicated part. It can use extracted lane-marker information to locate the road and determine the relative position between the vehicle and the road. A lane-detection solution using visual algorithms is a relatively common solution. But there are a wide variety of lane markings. The markings may be blocked due to vehicle crowding, the line may be corroded and worn, and the weather and other factors can bring challenges to lane-detection [3][4][5].
Early cognitive algorithms for urban roads relied on manual design, which required a lot of work. These methods use Hough transform, random sample consistency, and other methods to segment the road area and detect lane lines [6]. The obvious disadvantage of this method is poor generalization. When the driving environment changes significantly, the accuracy may be significantly reduced [7].
Convolutional neural networks have achieved great success in computer vision. Methods based on deep learning can directly learn domain knowledge from large datasets, greatly improving the ability to understand complex urban road scenes [8][9][10]. On this basis, we propose a new method of marker detection based on a traditional detection algorithm and deep learning. First, we extract the overall road area by training the Mask R-CNN [11][12][13][14][15]. The identified road area is used as the constraint area, the lane mark is detected in the area, the obtained discrete lane-line featurepoint information is clustered by the least-squares method, and the lane lines are fitted in a different field of view using straight-line and curve-fitting models [16,17].
The main contributions of this paper are the following two aspects: (1) The model combines the convolutional neural network, 5G-V2X, with the traditional algorithm. The model is used to improve the detection speed of extracting feature points. (2) In different fields of view, the straightline and curve-fitting models are used to fit the straight line to make the fitting result more accurate, and the optimal parameter equation of the lane line can be obtained.
This article is organized as follows. Section 2 introduces related work. Section 3 describes the improved deep learning method used and the clustering and fitting algorithms for lane lines. Section 4 reports the results of the experiment. Section 5 provides our conclusions.

Related Work
In recent years, with the availability of parallel computing, the training process of large-scale data has accelerated. The convolutional neural network (CNN) has become a research hotspot and is widely used in computer vision and pattern recognition [18,19]. The convolutional neural network can automatically learn the hierarchical features of an image, avoiding the blindness of people's artificial design and selection features, and showing its excellent performance in the task of object detection and instance segmentation.

Semantic Segmentation.
A convolutional neural network-based SegNet learns high-order features in a scene to perform road scene segmentation. It classifies other test images by applying a training algorithm to a common image dataset to generate training labels. A new texture descriptor based on color-layer fusion obtains the maximum consistency of the road area [20]. Offline and online information are combined to detect the road area. A combined hierarchical framework for road-scene segmentation can reliably estimate the topological structure of a scene and effectively identify traffic scenes of multilane roads [21]. The state-ofthe-art instance segmentation methods Mask scoring R-CNN and Cascade Mask R-CNN are both improved based on Mask R-CNN, which prove the effectiveness of Mask R-CNN [22].

Lane Line Detection.
Lane line detection is the most important part of the entire road surface inspection. In recent years, deep learning has enabled great success in computer vision. In the lane-line detection problem, a deep neural network is used to learn the lane-line feature, which improves the accuracy of lane-line feature extraction and is suitable for complex road environments [23]. The University of Sydney used CNN and RNN to detect lane lines, with the CNN providing geometric information on lane-line structures for use by the RNN to detect lane lines. Kyungpook National University combined CNN and RANSAC algorithms to stably detect lane-line information even in complex road scenes [24]. The Baidu map project team proposed a dual-view convolutional neutral network (DVCNN) for lane-line detection. The Korean Robotics and Computer Vision Laboratory proposed a method to extract multiple regions of interest, merging regions that may belong to the same class, and using a principal component analysis network (PCANet) and neural networks to classify candidate regions [25]. The laboratory proposed a vanishing point guided network (VPGNet) [26] to solve the problem of lane-line and pavement-marking recognition and classification under complex weather conditions. The Ford Research and Innovation Center used the DeepLanes Network to extract lane-line features acquired by cameras on both sides of a vehicle.

Clustering and
Fitting. The lane-line features extracted by deep learning cannot be directly used, and it is still necessary to cluster and fit the lane-line feature points. The main purpose of fitting is to depict the lane markings on a picture and to display the location in the road image [27]. In laneline clustering and fitting algorithms, the commonly used road models are linear, linear parabola, least-squares (LS) curve fitting, cubic curve, and Bezier curve fitting [28,29].
2.4. 5G-V2X. The goal of the 5G-V2X communication system is to achieve accurate and efficient road scene perception and accident-free and efficient collaborative autonomous driving [30]. Literature [31][32][33] proposed a network protocol based on edge computing and a new vehicle network privacy protection protocol to enhance road safety, intelligent transportation, and smart city systems. Literature [34] introduced a real-time communication method based on 5G-V2X, which reduces the energy and time costs of the system and improves the management efficiency of vehicular networks. Literature [35] proposed an intelligent traffic Vehicle Detection Model based on 5G-V2X, which can dynamically coordinate computing and content caching and effectively allocate network resources.

Complex Road Lane Line Detection and Processing
We can divide the detection process of the lane line into three steps. The first is to use deep learning to extract the feature points. The second is to cluster the extracted laneline feature points, and the third is to obtain cluster points.

Feature-Point Extraction Based on Improved Model.
Building deep models is just a means of learning. Unlike traditional shallow learning, which relies on artificial features, deep learning has a deeper model depth, a greater emphasis on feature learning, and a greater amount of data for training. Therefore, it can better describe the internal correlation between the data. As shown in Figure 1, Mask R-CNN consists of three parts. The first part is the backbone network, which is used for feature extraction. The second part is the head, which is used to obtain category scores and regression bounding boxes, and the third part is to generate a mask.

Wireless Communications and Mobile Computing
The RPNs in Mask RCNN and Faster R-CNN are the same, but after adding the mask layer, each ROI can be predicted in parallel for each category and "edge," and in parallel. ROI align of Mask R-CNN is shown in Figure 2.
The loss function of ROI is redefined as In the above formula, both work for the positive ROI. For the mask branch of each ROI, the output dimension is K m * m . K means to encode two-class masks for m * m images, each with categories. Therefore, one must apply a single-pixel sigmoid for two classifications and define L mask as the average two-class cross-entropy loss. For an ROI of category K, L mask is defined at the K th mask.
The definition of L mask allows the network to generate a mask for each category, and there is no competition between categories. A special classification branch is used to predict the category label, thus decoupling the category and mask predictions. The FCN uses a single-pixel softmax and multinomial cross-entropy loss. In this case, there is competition between the mask and the classes. Mask R-CNN uses perpixel sigmoid and binary loss without this consequence. Experiments prove that this is the key point to improve instance segmentation.
We use ResNet101 as the backbone of Mask R-CNN for feature extraction. ResNet includes multiple computing blocks composed of convolution, bias, and Batch Norm (BN). After the training of the model, there are some redundant steps in the model which are only used for forwarding propagation, and the redundant parameters can be reduced by parameter combination.
Parameter combination is the 5 parameters of the bias layer and BN layer β b , α, β, μ, and σ 2 ; combined into α ′ and β ′ , the 2 parameters can then calculate it by Y = α ′ X + β.
3.2. Driving Area Division. In this paper, the samples are labeled according to the Mask R-CNN labeling rules. The labeled sample images are sent to Mask R-CNN for training. Mask R-CNN is ideal for detecting and segmenting lane marking, and the architecture of Mask R-CNN is shown in Figure 3. However, the lane-marking feature obtained through deep learning cannot be directly used. Although the lane-line feature is learned through depth, the extracted lane-line feature only has the coordinate information of the lane line. In the lane lines formed by the dotted lines, we still must identify which lane lines each of these dotted lane-line segments belong to, and this information is not available for discrete coordinate points. Also, the real driving scene is multilane, and we must classify each lane line. Therefore, we propose a clustering method for feature points of lane lines, which can eliminate the interference between multiple lane lines and obtain their information. This provides more accurate and comprehensive input for subsequent lane-line fitting. First, it divides the pavement area, assuming that during data acquisition, the intelligent camera will have a collection period of 10 milliseconds and a top speed of 200 kilometers per hour. By calculation, we know that there are about 10 meters in one cycle. In principle, the camera can get a clear image of the 100-meter mark in front of the vehicle, and lane lines that are too far away are difficult to recognize.
The original image we acquired through the camera is shown in Figure 4. We will use the upper edge of the overall road information identified by the neural network as area C. Above this area is the sky image, which contains no lanemarking line information. Area A is a near-field area that is the main part of our field of vision, where the lanes can be approximated as straight lines. Area B is the midfield area, and there may be a curve with a small curvature. Area The horizontal constraint is using the following equation.

Straight Line Fitting Model.
To ensure that the regression line mainly includes normal data points of which the error is close to zero, the noise points with maximum error on both sides of the regression line are removed. Considering the advantages and disadvantages of HT and LS, we propose a line-detection method that combines the two algorithms. First, HT is used to determine the approximate region of the line, and then, the improved least-squares method is used to determine the line parameters based on the specific point of each region after clustering. The algorithm flow is shown in Figure 6.
where dðpÞ represents the distance from point P to the regression line.
ð11Þ P 1 ðx 1 , y 1 Þ and P 2 ðx 2 , y 2 Þ are the two end points of the line L 1 , and the inclination angle is θ 1 . P 3 ðx 3 , y 3 Þ and P 4 ðx 4 , y 4 Þ are the two end points of the line L 2 , the inclination angle is θ 2 . The inclination angle of the straight line made up by points P 2 and P 3 is θ.
3.5.2. Different Area Merging Algorithm. After getting the candidate marker lines in the different fields, we also need to merge the near midfield lines. For segmented lanes, each zone model needs to be connected first, and then, the lane model is built. The connection method is divided into a straight-line model and a curve model. The linear connection compares the slope of the line; the curve compares the curvature of the curve at the same point and determines the connection according to the distance between the segments to be merged. The specific method is as follows.
As showed in Figure 8(a), in the straight-line connection mode, A and B are the two end points of the straight line L 1 and C and D are the two end points of the straight line L 2 . B 0 and C 0 are the coordinates of the vertical axis to the intersection of the two straight lines on the dividing line. K 1 and K 2 indicate the slopes of straight lines L 1 and L 2 , respectively. If the condition of equation (9) is satisfied, then A and D are connected to form a combined line segment. T k is the slope difference threshold for merging. T d1 is the intercept difference threshold of merging. It can be set according to the actual situation.
As shown in Figure 8(b), in the connection mode at the curve, if B of curve S1 is above C of S2, we extend the two end points S1 and S2 to B0 and C0 of the divider. If formula (9) is satisfied, we use point B of S1 and points C and D of S2 to determine the new curved line segment after the merge. T d2 is the threshold at the time of the merge. However, when the connection is actually used, the selection of the threshold has a great impact; there are fluctuations in different sections of the test, and it is not easy to set manually.
Actually, from the lane-line model design, we have made an ideal assumption that the lane lines in the mid-and nearfield regions are approximately linear. This greatly reduces the lack of connection at the corners.

Datasets and Validation.
Our experiment was designed mainly to detect the real road information in real time, so the training and test datasets consisted of real road information. In this paper, we used the TuSimple lane dataset and TSD-MAX traffic scene dataset to verify the effectiveness of our method.
It labeled the lanes of the test dataset and checked the accuracy of the IPM picture. We defined the criteria for the lane line as follows: where the input discrete point is regarded as the positive checkpoint in the true value area, TP is the total length of the positive lane-mark; FN is the total length of the missed check, and it equals the total marked true value minus TP; FP is the total length of a false check, and its value is equal to all extracted lane-mark points minus TP. For Mask R-CNN detection accuracy intersection over union (IoU), the following formula (13) is also applicable.
S IoU ð Þ= TP TP + FP + FN : ð13Þ  In the driving process of an intelligent vehicle, due to the complexity and diversity of the driving environment, false and missed detections often occur. Our proposed deep learning-based lane-marker extraction algorithm can well shield the influence of these uncertain factors on the lane-marker extraction results, so as to adapt to the complex and varied real road environment. Figure 9 shows the results of training using our improved model. As is shown in Figure 9, our model clearly identifies all the information. Table 1 shows the comparison of our improved method and some semantic segmentation methods on the TuSimple       Table 2, the average accuracy was 97.61%. Our method is more accurate than other fitting algorithms. In the meantime, combining linear and polynomial fitting methods of feature points in different fields of view has higher precision than other algorithms. Therefore, the robustness of the proposed algorithm was verified by the detection accuracy. Table 3 shows the detection accuracy in different environments. S is the total number of frames, MPR is the missed positive rate, FPR is the false positive rate, and TPR is the true positive rate. The calculation formulas of FPR and TPR are (14) and (15), respectively.      Wireless Communications and Mobile Computing complicated situations are shown in Figure 15. As can be seen from the figure, our fitting algorithm fits the real lane line better, and it is also very good at the corner. Figure 16 shows the response time comparison between our method and the traditional method. The results show that our method has a faster response speed. This shows that 5G-V2X has greatly improved the perception of complex roads and has met the real-time requirements.

Conclusions
In this article, we propose a real-time road perception method based on deep learning and 5G-V2X. Compared with the traditional method, this method has higher road perception ability and faster response time. Using linear fitting and polynomial fitting methods of feature points in different fields of view, the lane markings can be extracted robustly under various complex practical conditions, and the optimal parameter equations of the lane lines can be obtained. The experimental results show that the method adopted in this paper can better adapt to various types of road scenes. The algorithm has good detection effect in different scenarios, fast processing speed, good fitting effect, wide application scenarios, and strong robustness. In future work, we will focus on the optimization of the fitting algorithm to further improve the real-time performance in a dense traffic scenario of the proposed method.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.