The road environment prediction is an essential task for intelligent vehicle. In this study, we provide a flexible system that focuses on freespace detection and road environment prediction to host vehicle. The hardware of this system includes two parts: a binocular camera and a low-power mobile platform, which is flexible and portable for a variety of intelligent vehicle. We put forward a multiscale stereo matching algorithm to reduce the computing cost of the hardware unit. Based on disparity space and points cloud, we propose a weighted probability grid map to detect freespace region and a state model to describe the road environment. The experiments show that the proposed system is accurate and robust, which indicates that this technique is fully competent for road environment prediction for intelligent vehicle.
The road environment prediction is an essential task for intelligent vehicle and robotic applications. As the basis of path planning [
In recent years, light detection and ranging (LiDAR) [
In this paper, we propose a flexible and robust road environment perception system. This system consists of a binocular camera and a low-power mobile platform. Because that software functions run in the terminal device rather than transmitting a large amount of data to the cloud [
The traditional stereo matching algorithms, such as [
The traditional road environment perception methods, such as [
The main contributions of this work are summarized as follows. A multiscale stereo matching algorithm is presented to reduce the computing cost and improve the accuracy Based on the disparity map, a weight probability grid map is proposed to detection the freespace region A state model is proposed to describe the road environment in the front of the host vehicle An efficient deployment programme is put forward to process our system at the low-power mobile platform in realtime
Humenberger et al. [
Yang [
Zhang et al. [
Mao et al. [
However, the above algorithms cost a large number of computing resources, which is a risk to process in realtime. In this paper, we propose a multiscale stereo matching fusion algorithm. This algorithm is designed to reduce the computing cost of FPGA and process stereo matching in realtime.
Qu et al. [
Deilamsalehy and Havens [
Xiao et al. [
Zheng et al. [
Cong et al. [
In this paper, we propose a weighted probability grid map to freespace detection and a state model to describe the road environment. By describing of road state in front of the host vehicle, we predict the future road environment information and vehicle’s estimation direction.
In binocular stereo matching algorithm, feature similarity is an unsupervised matching method, such as census feature [
The SSIM method [
In engineering practice,
In order to adapt to the high dynamic matching cost aggregation method, we define the matching cost based on SSIM as Equation (
We set the penalty item to
First, we provide a one-dimensional Gaussian edge feature map (setting to
An example of the feature map in stereo matching. Left: grey image. Right: feature map.
Based on the feature map, the larger the response value is, the stronger the edge characteristics are, and the greater the possibility of depth change. According to the above method, we infer that the maximum aggregation cost between adjacent pixels is 610 (
Multiscale images pyramid and disparity map. Top: the large-scale layer. Middle: the middle-scale layer. Bottom: the small-scale layer.
The structure features are obvious in the large-scale layer. Correspondingly, the remote details are completely preserved in the small-scale layer. At first, we propose to set the disparity search range as 16 pixels in the small-scale layer, which is equal to the search range of 64 pixels in the large-scale layer. Based on that, we calculate the small-scale disparity map by the SSIM algorithm in the small-scale layer. Then, we initialize the middle-scale disparity map by upsampling from the small-scale disparity map by the linear interpolation method. Because the small-scale disparity map has the same size as the small-scale layer,
At the calculation process of the multiscale stereo matching algorithm, the feature map is only adopted in the large-scale layer. The proposed stereo matching method is suitable for the FPGA to process: There are many reusable operation modules because the rules and parameters are the same There is no need to cache all image data in memory at the same time, so the calculation process can be consistent with the data transmission process The algorithm is designed to process numerical multiplication and addition by a large number of fixed-point data
In practice, we focus on two issues: (1) where is the obstacle and (2) what is the trend of road. We propose the grid projection method to predict the trend of road. When obstacles occupy the road, the trend prediction will be hindered, but the freespace region should be correctly described. In this study, our method consists of three stages: grid projection, boundary search, and shape detection.
Based on the disparity map, we calculate the 3D point cloud coordinate by Equation (
Disparity and 3D point cloud. (a) The left image. (b) Disparity map. (c) 3D point cloud.
Inspired by pseudolidar data from visual [
The process of probability grid map. (a) Left grey image. (b) Disparity map. (c) Grid projection. (d) Probability grid map.
Inspired by the stochastic occupancy grids method [
By transferring the ics (disparity map) to vcs (grid projection), a unified physical scale is helpful to build a more robust mathematical model to solve the problem. In detection space
Road plane and three obstacle plane models in wcs.
These plane models describe obstacles in different states. The plane ② represents obstacles perpendicular to the optical axis on the driving route, such as a vehicle in the same lane, which only the vehicle rear is visible. The plane ③ represents obstacles parallel to the optical axis adjacent to the driving route, such as fences and barriers, which only the side is visible. The plane ④ represents obstacles that intersect the optical axis on the driving route, such as guardrails and walls in the curve road. In addition, plane ② and plane ③ are combined to describe adjacent discontinuous obstacles, such as cut-in vehicles, the vehicle side and rear are visible. Based on the above model, obstacles generate a large number of 3D point cloud data on its plane. Therefore, there are many accumulated points in the projection area of the obstacle plane in the grid projection.
Suppose that the basic detection unit on planes is a square, which represents that the obstacle is divided into many square units. In practice, the side length of the unit is consistent with the gird projection size in
In the detection space
As shown in Figure
Directed acyclic graph with dynamic programming method. The black arrow indicates the best path.
The penalty term represents an assumption that the path between adjacent columns is smooth in DAG.
The purpose of parameter setting is that the row coordinate changes of adjacent column nodes should be smooth. In this study, we set the
Freespace boundary. (a) Optimal path by DP. (b) Boundary in ics.
In practice, we employ the multilayer grid projection to overcome that case where the near obstacle is lower than the far obstacles. Three layers are divided by height with
Multilayers grid projection. (a) Left image. (b) Disparity map. (c) PGM in
Based on the three obstacle models, we detect geometric constraints on the multilayer grid projection. In practice, we propose to identify the geometry feature by prior. The plane ②, in Figure
To reduce the noise, we discard the grid whose probability is less than 0.1. The result is shown in Figure
The process of shape detection. (a) In curved road. (b) In straight road. (c) With obstacles.
In the middle column of Figure
We describe the shape feature by different plane models so that the complex road environment is classified into finite-state models. Table
Describe the road environment by state model.
Curve road | Straight road | Obstacles | |
---|---|---|---|
(a) | 1 | 0 | 1 |
(b) | 0 | 6 | 0 |
(c) | 0 | 0 | 4 |
The proposed system is used for forward sensing of automatic driving or advanced driver assistance system (ADAS). In practice, the system is generally installed in the narrow space between the windshield and the inside rearview mirror. It requires that the space volume of the hardware system must be small. The system hardware design architecture is shown in Figure
System hardware platform.
The lens focal length is
System hardware parameters.
Items | Parameters | Items | Parameters |
---|---|---|---|
Baseline | 120 mm | Focal length | 8.26 mm |
Pixel size | Resolution | ||
Horizontal FOV | 30° | Vertical FOV | 22° |
In the algorithm function, we divide three parts: stereo matching, probability grid, and state model, as shown in Figure
System software flow framework.
The three modules are processed by parallel computing on three processing units. The frequency of data processing depends on the slowest one of the three modules, and the system delay is equal to the sum of the three modules. In the proposed system, the FPGA delay is 66 ms, the ARM1 delay is 16 ms, and the ARM2 delay is 21 ms. Therefore, the frequency of the system is 15 fps and the system delay is 103 ms.
We evaluate our stereo matching method by two comparative experiments: efficiency and accuracy. By compared on KITTI dataset [
Evaluation of stereo matching in KITTI.
Method | Accuracy (%) | Efficiency |
---|---|---|
Our proposed | 93.22 | 1.00 |
Origin MPV | 94.43 | 3.05 |
SGBM | 92.36 | — |
ELAS | 91.76 | — |
In addition, we test the multiscale stereo matching method in the private dataset, where we focus following items: subpixel disparity accuracy, light condition, and point cloud distribution.
The result is shown in Table
Evaluation of stereo matching in private dataset.
Test case | Sunny | Cloudy | Night | Backlight |
---|---|---|---|---|
Accuracy (1.0) | 78.94% | 80.06% | 58.68% | 78.49% |
Accuracy (0.5) | 51.00% | 51.33% | 32.47% | 50.11% |
Accuracy (0.3) | 33.84% | 32.42% | 19.54% | 31.62% |
Accuracy (0.1) | 11.97% | 11.13% | 6.66% | 10.84% |
Convergence (pixel) | 4.66 | 4.15 | 5.73 | 5.37 |
Figure
The WPGM in different road environment. Grey images, disparity space, and weight probability grid map.
We evaluate the proposed system by following experiments: freespace detection, obstacle prediction, road environment prediction, and system performance.
Based on the private dataset, we label the GT of freespace on images. We evaluate the freespace by recall method, in which the index represents the similarity between the detection result and GT. Recall = 100% indicates that the detection result is completely consistent with the true value, while IoU = 0 indicates completely inconsistent. The result is shown in Table
Evaluation of freespace detection in private dataset.
Method | Running time (ms) | Recall (%) |
---|---|---|
Hautiere et al. [ | 30 | 95.0 |
Xin et al. [ | 18360 | 91.7 |
Ours | 16 | 93.5 |
The obstacle prediction is one of the tasks based on the state model. We evaluate the results of obstacle prediction by the number and location of obstacles based on our private dataset. The GT of obstacles is labelled manually, where we focus on independent objects such as vehicles and pedestrians. We count the number of obstacles in the state model to compare with GT, so the average recall rate of obstacle detection of obstacles is 98.3% in private dataset. As shown in Figure
Obstacle prediction in deep and horizon distance.
We evaluate the state model by precision and recall as Table
Evaluation of state model prediction in private dataset.
Curve road | Straight road | Obstacle | Precision | |
---|---|---|---|---|
Curve road | 293 | 7 | 4 | 96.38% |
Straight road | 1 | 591 | 0 | 99.83% |
Obstacle | 5 | 2 | 96 | 93.20% |
Recall | 97.67% | 98.50% | 96.00% | — |
In addition, we test the system performance by running time and power cost. As shown in Table
Running time of modules.
Hardware unit | FPGA | ARM1 | ARM2 | |||
---|---|---|---|---|---|---|
Module | Stereo matching | Point cloud | WPGM | Path planning | Shape detection | State model |
Running time | 55 ms | 11 ms | 12 ms | 4 ms | 20 ms | 1 ms |
Delay time | 66 ms | 16 ms | 21 ms |
Finally, we test the power cost and the chip temperature. Our system is installed between the front windshield and the rear-view mirror, so it must meet the requirements of GB/T 28046.4, which stipulates that the maximum operating temperature of the system shall not exceed 90°C. Table
Chip temperature test.
Ambient temp (°C) | Chip temp (°C) | Load power (W) |
---|---|---|
-15 | 40∼50 | 6 |
15 | 50∼60 | 6 |
45 | 60∼75 | 6 |
65 | 75∼90 | 6 |
In this study, we propose a low-power road environment prediction system, which the proposed system consists of a binocular camera and a low-power computing unit. Our contribution includes three points as follows. Firstly, a multiscale stereo matching algorithm is proposed for hardware computing. Next, we propose a weighted probability grid map-based points cloud. Finally, the plane model and state model are proposed to describe the road environment. Our work proves that the existing technology achieves the function requirement under the low-power constraint. Experiments show that the proposed system is robust and sensitive to road environment prediction and the performance meets the mandatory standards in practice. In future work, our study provides a benchmark for obstacles recognition and path planning.
No data were used to support this study.
The authors declare that there is no conflict of interest regarding the publication of this paper.
This work was supported by the National Key R&D Program of China (Grant No. 2018AAA0103103), the Science and Technology Development Fund, Macao SAR (no. 0024/2018/A1), and the Research Fund of Guangdong-Hong Kong-Macao Joint Laboratory for Intelligent Micro-Nano Optoelectronic Technology (No. 2020B1212030010).