Recently, self-driving cars became a big challenge in the automobile industry. After the DARPA challenge, which introduced the design of a self-driving system that can be classified as SAR Level 3 or higher levels, driven to focus on self-driving cars more. Later on, using these introduced design models, a lot of companies started to design self-driving cars. Various sensors, such as radar, high-resolution cameras, and LiDAR are important in self-driving cars to sense the surroundings. LiDAR acts as an eye of a self-driving vehicle, by offering 64 scanning channels, 26.9° vertical field view, and a high-precision 360° horizontal field view in real-time. The LiDAR sensor can provide 360° environmental depth information with a detection range of up to 120 meters. In addition, the left and right cameras can further assist in obtaining front image information. In this way, the surrounding environment model of the self-driving car can be accurately obtained, which is convenient for the self-driving algorithm to perform route planning. It is very important for self-driving to avoid the collision. LiDAR provides both horizontal and vertical field views and helps in avoiding collision. In an online website, the dataset provides different kinds of data like point cloud data and color images which helps this data to use for object recognition. In this paper, we used two types of publicly available datasets, namely, KITTI and PASCAL VOC. Firstly, the KITTI dataset provides in-depth data knowledge for the LiDAR segmentation (LS) of objects obtained through LiDAR point clouds. The performance of object segmentation through LiDAR cloud points is used to find the region of interest (ROI) on images. And later on, we trained the network with the PASCAL VOC dataset used for object detection by the YOLOv4 neural network. To evaluate, we used the region of interest image as input to YOLOv4. By using all these technologies, we can segment and detect objects. Our algorithm ultimately constructs a LiDAR point cloud at the same time; it also detects the image in real-time.
As the future move towards the commercialization of self-driving car technologies in this area is quickly advancing, researchers are concentrated in the study of self-driving car sensors. Among that sensors, optical radar LiDAR and cameras are the most researched projects. Optical radar LiDAR can produce 360° real-time depth information and its sensing up to a distance of 100 meters, it is one of the crucial sensors for self-driving vehicles, and high-resolution cameras have a real-time color image. Therefore, the purpose of this paper is to use LiDAR point cloud map information with deep learning to detect objects.
Nowadays, Artificial Intelligence (AI) is progressing rapidly. By using AI technology, self-driving has received tremendous attention as of late. In self-driving technology, AI plays a vital role; AI acts as a brain for cars in self-driving technology by performing things like automatic detection of people, other vehicles, and objects and helps the care to staying in the lane and switching the lanes and following the GPS to navigate the car to reach the final destination. During past development first, autonomous cars appeared in 1980, with NAVLAB and ALV projects at Carnegie Mellon University in 1984 [
Inventors begin the experiments on self-driving cars in 1920. In the early days, radio-controlled electric cars were shown to be generated by electromagnetic fields. From 1950, trails started to develop find a feasible method for self-driving cars. Finally, self-driving technology cars appeared in the 1980s. The one is the Mercedes-Benz Robotic van designed by Ernst Dickmanns which can reach a speed of 39 miles per hour (63 km/H) on streets without traffic; this car was an iconic achievement in self-driving technology [
From above, self-driving vehicles are well developed, so LiDAR can accurately simulate the surrounding environment and escape collisions with obstacles. To identify the vehicles, LiDAR is a very suitable resource, but it also has some difficulties. Nowadays, AI is becoming more trends; this technology helps to overcome those difficulties by using deep learning algorithms.
Light Detection and Ranging (LiDAR) is a remote sensing method; LiDAR is applied in autonomous vehicles that are mostly based on Time of Flight (TOF). The emitted light pulses strike the object and reflect the LiDAR system; it is calculated as the distance between sensor and object. The measured result is transformed into 3D point clouds. By using point cloud data, it can map the scanned parts; it offers high resolution and accurate depth information and 3D data provided in the absence of light and bad weather conditions. There are many different types of LiDAR known as airborne, terrestrial, and mobile LiDAR.
In airborne laser scanning systems, which can be mounted on aerial vehicles such as aircraft and helicopters with specialized GPS receivers, the infrared laser light is transmitted to the ground and returned to the moving LiDAR sensor of airborne.
Bathymetric LiDAR uses near-infrared light and green light to scan deep terrain. It is mainly used for underwater terrain measurement. The generally used LiDAR wavelength is the near-infrared light band of 905 nm, and this band is suitable for measurements where the medium is air. If the medium is water, it will not be able to penetrate normally, so green light with a wavelength of 532 nm is used for underwater measurements.
Terrestrial LiDAR is mostly used to scan the ground for architectural and cultural heritage-related scans. Sometimes, it is also used to scan forest canopy structures. Terrestrial LiDAR is mostly fixed at a certain point and used with the camera. The point cloud image generated by it is matched with the digital image, and a three-dimensional model is generated. Compared with other methods, this method can generate the required model in a shorter time, so it is widely used in the industry.
Vehicle LiDAR implies the mobile LiDAR, and the vehicle carries the LiDAR on top of the vehicle, one of the most useful applications for a self-driving car. It can quickly scan the 360 degrees in the horizontal field of scanning, and the vertical field of scanning reaches 40 degrees. LiDAR can archive real-time analysis to speed up the performance of self-driving car to avoid accidents. Most of the self-driving car companies such as Google, Uber, and Baidu are using Velodyne LiDAR. In this paper, we used point cloud information for the object segmentation algorithm as one part of the method.
Lidar point cloud image cutting algorithm is roughly divided into two methods. One is to use the marked point cloud image data for training. Mostly used to find the point’s roads or vehicles in cloud images, this type of algorithm uses point cloud image features to find a particular every single object. Another way, it uses algorithms other than neural networks for segmentation. This algorithm can be divided into two major categories. The first category is the ground extraction-oriented cutting algorithm. The second type is the use of a two-dimensional-grid algorithm for object segmentation. As mentioned in the literature [
Object detection is an important and difficult field in the world of computer vision. The target of object detection is to detect all objects and classes. In recent years, object detection methods based on deep learning mainly include Region-based Convolutional Neural Network (RCNN), Faster Region-based Convolutional Neural Network (Faster-RCNN), and YOLO (You Only Look Once). The traditional convolution network can only detect a single object in a single image. To overcome the problem, the Region-based Convolutional Neural Network (RCNN) method researchers proposed RCNN [
As in earlier studies, the researcher proposed unified pipeline framework-based methods that are four versions of YOLO neural network [
In this paper, we used the unified pipeline framework-based methods; researchers proposed different kinds of versions in YOLO, which are YOLOv2; YOLOv2 introduced anchor boxes [
The proposed architecture scheme is as shown in Figure
Proposed system architecture.
In this paper, the LiDAR segmentation algorithm is improved; as illustrated in Figure
Steps of LiDAR segmentation.
The LiDAR cutting algorithm is divided into three steps. The first step is hierarchical segmentation; it segments each layer of point cloud images, to allow the following algorithm to merge the objects of point cloud images. This section is focused on the horizontal features of the point cloud images because Velodyne has Horizontal 64E using light information for cutting; the 64-layer points are separately divided into preliminary parts. Different color represents different levels of calculation in point cloud images in Figure
LiDAR segmentation algorithm: (a) layer segmentation; (b) layer combination; (c) ground extraction.
It is shown in the above algorithm, Equations (
Formula (
After hierarchical segmentation, the point cloud image generates many segmented clusters. The cluster will be merged according to the vertical characteristic of the initial and final point of each cluster during the hierarchical merge step. Different color represents different hierarchical categories. The first layer represents with yellow color, the second layer represents blue-gray color, the third layer represents blue color, and the last layer represents green color. Merge is affected by two thresholds. The first threshold is limited distance point to point in between two layers separately. The distance between starting point and end point is as shown in Figure
The formula as shown above Equation (
Equation (
The final step of LiDAR segmentation is ground extraction. This step utilizes a threshold to filter the ground clustering, allowing the object to be measured by a point cloud image.
If the lowest point
After completion of point cloud image clustering, by using that result, we can find the target area on a color image. Since the point cloud image has 3D information and the color image has 2D information, the area of segmented objects was used to match the 3D to 2D formula to find the specific target area on a color image. The algorithm formula as shown in Equation (
ROI image.
Compared with other schemes, YOLOv4 is advanced in the fastest and accurate detectors [
YOLOv4 architecture.
We used CSPDarknet-53 as a feature extractor; this extractor helps in enhancing the network of learning rate by using mish activation in backbone. By using this feature extractor, it can enhance the learning rate of a network with a mish activation function. YOLOv4 also has SPP (spatial pyramid pooling layer) and PAN (path aggregation network) models as a neck, which it takes the input and extracts the feature maps using a deep network. The neck is composed in between the backbone and the head, object detectors consisting of a backbone for extraction of the functionality and a head for detection of an object. To identify the objects at multiple levels, a hierarchical structure is generated at different spatial resolutions using the head probing feature maps. The nearest feature mapping is introduced in the basic sense that data is fed into the head from the bottom upstream and the top downstream. Therefore, the input of the head should include rich spatial information from the bottom upstream and from the top downstream; this process component is called a neck. It has used the SPP layer and has a slightly different approach for identifying objects into different scales. It adjusts the last layer of pooling after the last layer of convolution with a pooling spatial pyramid layer. PAN introduced a shortcut path which only decides to take just around 10 layers to the top of the
In this paper, the proposed YOLOv4 network, the input image with high resolution is
This paper used the LiDAR cutting algorithm for preprocessing and next step to convert the 3D to 2D formula to find the ROI of the color image and send to the input of R-YOLOv4 network structure for object detection; by this algorithm, it has great improvement in speed rate and high accuracy. YOLO detection has less detection rate to be compared with the proposed LS-R-YOLOv4 as shown in Figure
Our proposed method and comparative with YOLO detection: (a) YOLO without region of interest; (b) ROI image input to YOLOv4; (c) proposed LS-R-YOLOv4 final detection.
In this paper, we used two datasets, KITTI (Karlsruhe Institute of Technology) and PASCAL VOC (Pattern Analysis, Statistical Modeling and Computational Learning Visual Object Classes) used for image classification and object detection. KITTI dataset has been used for testing part in object recognition, to identify the categories of objects by using neural network. Dataset provides two kinds: 45° LiDAR point cloud and color images are taken in real scenes and PASCAL VOC dataset contains 20 classes, 9963 (2007) and 23080 (2012) images, respectively [
The experiment in this paper was performed on the Ubuntu and windows system; the LS-R-YOLOv4 algorithm was run under the DarkNet framework. The processor was an Intel® Core™ i7-9700 Processor, and the environment of GPU is GPU@2.0Ghz graphic card that was used to accelerate training.
As shown in Figure
Results of LiDAR segmentation and object detection: (a) 45° original point cloud data, (b) original color image as input to detection, (c) object segmentation result, (d) ROI image, and (e) final object detection LS-R-YOLOv4 result.
The advantage of this proposed method will be discussed in this section; at first, the LiDAR cutting algorithm helps to extract the background information and segments the objects in a different color and this helps to match 3D to 2D information in order to find the region of interest in 2D two bounding boxes as input sent to the R-YOLOv4 object detection. ROI aims to mitigate unnecessary detection of data processing, which concentrates on the region of interest image to detect, and with this, LS-R-YOLOv4 have very good performance in detecting objects even through small size objects.
To evaluate the reliability of the recognition rate, analyze the experimental data and compare the experimental results. Problems with missed identification and false identification can arise in traffic flow control systems. Precision, recall, and F1-score are used as evaluation parameters in this experiment. Completeness takes into consideration the proportion of correctly observed vehicles with respect to the ground truth. Correctness analyzes the proportion of correctly detected vehicles with respect to all detected instances. Performance and F1-measures reflect the overall results shown in Table
Comparison between YOLO models and proposed method.
Method | Precision | Recall | F1-measure |
---|---|---|---|
YOLOv2 | 85.5% | 55.2% | 63.3% |
YOLOv3 | 89.7% | 50.2% | 64.3% |
YOLOv4 | 92.5% | 78.2% | 84.7% |
LS-R-YOLOv4 | 97.7% | 92.3% | 95.2% |
In these formulas [
The main advantage of LS-R-YOLOv4 proposed in this paper is its computational efficiency, because the algorithm combines the color images and point cloud data for segmentation and uses color images for identification to speed up object detection. Comparison of YOLO and R-YOLO object detection results below, it can be seen that some background pixels are removed during the R-YOLO preprocessing, and it divided into smaller blocks that are used for detection, so pixels input the neural network will not be disturbed by background pixels, so making is easier to identify smaller objects and improve the quality identification.
We compared different versions of YOLO such as YOLOv2, YOLOv3, and YOLOv4 that have different object detection capacities with the proposed method LS-R-YOLOv4 to evaluate the performance of object detection in small size objects. An example of a detected image with different YOLO versions and proposed model LS-R-YOLOv4 is shown in the following figure.
The performance of algorithms, bounding boxes detected by using YOLOv2, has incorrect or less region of interest bounding box for two objects, and less detection rate to detect the smaller size and objects that are far away with less vision ability. YOLOv3 can detect only smaller objects, and many objects are missing to detect. The performance of YOLOv4 is good, but still, some certain objects are missing to detect. We can observe that, although far objects, smaller objects, and shapeless objects can be detected without being missing, our proposed method shows the great detection results in Figure
Comparison of car detection based on YOLO algorithms: (a) YOLOv2, (b) YOLOv3, (c) YOLOv4, and (d) proposed method LS-R-YOLOv4.
LiDAR acts as an eye of a self-driving vehicle, by offering a high-precision 360° horizontal field view in real-time. This paper represents the results of LS-R-YOLOv4 and their benefits of the algorithm based on the LiDAR sensor. Besides, the left and right cameras assist in obtaining front image information at the same time. The distribution of the model available features can greatly improve the algorithm efficiency, reduces oversegmentation, and achieves the objective of real-time object detection. In relation to the LS-R-YOLOv4 recognition rate, we used to detect the pedestrian and cars, and results that are significantly better than the YOLO algorithm. In terms of the recognition rate of calculations for object recognition, our algorithm provides a significant benefit with 97.7% precision, recall rate 92.3%, and F-1-measure 95.2%. Overall, LS-R-YOLOv4 has excellent performance in computing speed and recognition rate and is very suitable for unmanned driving and other related applications. Our proposal reduces oversegmentation. Therefore, the surrounding environment model of the self-driving car can be accurately obtained. It is convenient for the self-driving algorithm to perform route planning.
In this paper, we used two datasets, KITTI (Karlsruhe Institute of Technology) and PASCAL VOC (Pattern Analysis, Statistical Modeling and Computational Learning Visual Object Classes) used for image classification and object detection. KITTI dataset has been used for testing part in object recognition, to identify the categories of objects by using neural network. Dataset provides two kinds: 45° LiDAR point cloud and color images are taken in real scenes and PASCAL VOC dataset contains 20 classes, 9963 (2007) and 23080 (2012) images, respectively. The number of objects is 24640 and 54900. Therefore, the total amounts of images are 33043 and 78540 objects which are used to train the neural network.
The authors declare that they have no conflicts of interest.
This work was supported by the Ministry of Science and Technology of Taiwan under Grant MOST 109-2221-E-027-082. The authors gratefully acknowledge the Taiwan Semiconductor Research Institute (TSRI), for supplying the technology models used in IC design.