Automatic Roadblock Identification Algorithm for Unmanned Vehicles Based on Binocular Vision

In order to improve the accuracy of automatic obstacle recognition algorithm for driverless vehicles, an automatic obstacle recognition algorithm for driverless vehicles based on binocular vision is constructed. Firstly, the relevant parameters of the camera are calibrated around the new car coordinate system to determine the corresponding obstacle position of the vehicle. At the same time, the three-dimensional coordinates of obstacle points are obtained by binocular matching method. Then, the left and right cameras are used to capture the feature points of obstacles in the image to realize the recognition of obstacles. Finally, the experimental results show that for obstacle 1, the recognition error of the algorithm is 0.03m; for obstacle 2, the recognition error is 0.02m; for obstacle 3, the recognition error is 0.01m. The algorithm has small recognition error. The vehicle coordinate system is added in the camera calibration process, which can accurately measure the relative position information between the vehicle and the obstacle.


Introduction
With the wide popularity of automobiles, driverless vehicles have become a hot research topic. UAV is a complex mechanical product integrating a variety of technologies, and its key technologies include environment perception technology, positioning and navigation technology, path planning technology, and motion control technology [1][2][3][4]. In order to ensure the safe driving of unmanned vehicles, it is necessary to use environmental sensing technology to enable unmanned vehicles to automatically avoid obstacles on the road [5][6][7][8][9]. Therefore, as an important means of environmental perception, vision has been studied more and more [10]. Research on visual perception mainly includes vision-based positioning, visionbased road and traffic sign detection and recognition, and vision-based collision avoidance technology [11,12]. The computer vision system usually requires two or more cameras for the same scene from two or more angles, in order to obtain a set of images in the same scene under different angles of view, and then, through different images of the same scene, the parallax, calculation of the target space geometry, and position of the object are determined; this method is called stereo vision [13,14]. The binocular stereo vision system uses two cameras to obtain two images of the same scene from two different perspectives, namely, the binocular stereo image pair. The grayscale, shape, distance, and other information of the surface of the target object can be recovered by calculating the parallax of the target object in stereo image alignment. The binocular stereo vision system directly simulates the human eye to process the scene, which has important practical value and broad application prospect.
Compared with the binocular vision ranging system, the monocular vision ranging system cannot accurately obtain target distance because of less information acquired by monocular vision. Therefore, there are more and more researches on binocular ranging at home and abroad. However, although researchers have done a great deal of research on the measurement of vehicle distance ahead, reference [15] proposes a monocular vision-based understanding of street view curves, and the use of autonomous vehicles to deliver medical and emergency supplies is a potential way to avoid unsafe and unpredictable factors. However, its implementation has been hampered by several key issues. A major difficulty was understanding the crooked alleys of the street scene. These can be seen as combinations of non-Manhattan structures that help us estimate their original posture in a three-dimensional scene. A new approach is proposed to understand curving alleys and bridge the gap between 2d scene understanding and monocular 3d environment reconstruction. The angular projection is assigned to the cluster. The curving alley scene approximates the Manhattan and non-Manhattan fold structures approximated in the alley scene reconstruction. This algorithm has geometric characteristics and does not require prior training or understanding of the internal parameters of the camera. The results show that the algorithm can successfully understand alley scenarios including Manhattan and curved non-Manhattan structures. Reference [16] proposed a monocular vision-based distance estimation method for a 3d detection workshop. In order to improve the accuracy and robustness of ranging results, the actual area of the vehicle rare visual field and the corresponding projection area in the image were obtained by 3d detection method. Then, an area-distance geometric model is established to restore the distance according to the camera projection principle. Our method shows its potential in complex traffic scenarios by testing test set data provided on KITTI, a real-world computer vision benchmark. The experimental results show better performance than the existing methods. In addition, the accuracy of shielding vehicle ranging results can reach about 98%, while the accuracy deviation between vehicles from different perspectives is less than 2%. However, the current distance measurement system is still based on monocular vision, and the accuracy of monocular vision image distance perception is low. Therefore, we should focus on the research of the binocular stereo ranging system, which can well simulate the function of human eyes and perceive the three-dimensional world. The binocular stereo ranging system mainly includes this method, which can effectively improve the accuracy of obstacle detection and accurately measure the relative position information with the object in front and has certain practical value [17]. On this basis, a road block automatic recognition algorithm based on binocular vision is proposed. Binocular vision is an important form of machine vision. It is a method to obtain the threedimensional geometric information of the object by calculating the position deviation between the corresponding points of the image based on the parallax principle and using the imaging equipment to obtain two images of the measured object from different positions. Binocular vision fuses the images obtained by two eyes and observes the differences between them, so that we can obtain an obvious sense of depth, establish the corresponding relationship between features, and correspond the image points of the same spatial physical point in different images. The binocular vision method has the advantages of high efficiency, appropriate precision, simple system structure, and low cost. Binocular vision is one of the key technologies of computer vision. Obtaining the distance information of spatial 3D scene is also the most basic content in computer vision research. By adding the vehicle coordinate system, the relative position information of the vehicle and obstacle can be obtained, and the accuracy of obstacle identification can be improved.

Design of Automatic Roadblock
Identification Algorithm for Unmanned Vehicle Based on Binocular Vision 2.1. Camera Calibration. Through camera calibration, the mapping relationship between coordinates is generated, and the mapping relationship between world coordinates and image coordinates is expressed by projection matrix. The mapping relationship between the left camera image and the right camera image is represented by a homography matrix, in which the coordinate system of the whole vision system needs to be constructed, including the camera image coordinate system, the camera coordinate system, and the world coordinate system. The concrete content of coordinate system establishment is described as follows. The Cartesian coordinate system QOP is set as the image coordinate system, where the Q axis represents the number of columns of image pixels, the P axis represents the number of rows of image pixels, and the pixel point ðq, pÞ represents the q row and the p column of this pixel on the image. Because it determines the position of pixel points according to the number of rows and columns, but does not reflect its position in the image through physical units, it is necessary to create an image coordinate system. The physical image coordinate system is set as XO 1 Y, where the origin O 1 is the intersection point of the camera optical axis and the plane where the image is located. The X axis and Y axis are parallel to the Q axis and P axis, respectively. The physical size of each pixel in the XO 1 Y coordinate system is set as dx and dy, and the following relation can be obtained: Formula (1) is expressed in homogeneous coordinates and matrix form: The transformation from the image plane coordinate system to the image pixel coordinate system is completed by Formula (2). The coordinate system O c X c Y c Z c is set as the camera coordinate system, which is the coordinate system of a single vision system in binocular vision. The origin O c is the optical center of the camera; the X c and Y c axes are parallel to the X and Y axes in the image coordinate system XO 1 Y, respectively; and the Z c axes coincide with the optical axis of the camera. A reference coordinate system O w X w Y w Z w , selected from the space environment, is set as the world coordinate system to represent the position of the camera and the car in the space 2 Wireless Communications and Mobile Computing in the environment. The relation of any point A in the space in the camera coordinate system and the world coordinate system is shown in the following formula: : ð3Þ In Formula (3), R represents an orthogonal rotation transformation matrix, T represents a three-dimensional translation vector, ðx c , y c , z c Þ represents the coordinates of point A in the camera coordinate system, and ðx w , y w , z w Þ represents the coordinates of point A in the world coordinate system. Through Formula (3), the space point A realizes the transformation from the world coordinate system to the camera coordinate system. According to the pinhole imaging model, the transformation from the camera coordinate system to the image plane coordinate system can be obtained by using geometric relations, as shown in the following formula: In Formula (4), f represents the focal length of the camera, which is expressed in homogeneous coordinate matrix form as follows: : ð5Þ Using the above formula, the establishment and conversion of different coordinate systems (left and right camera calibration) are realized. After the calibration of the left and right cameras is completed, the binocular calibration also needs to know the relative positions between the left and right cameras. Therefore, two matrices are introduced, namely, the rotation matrix R s and the translation matrix T s of the left camera relative to the right camera. Then, the relationship between the two cameras is shown in the following formula: In Formula (6), After considering camera calibration and once internal and external parameters are established, the world coordinate system and the location of the relationship between cars are relatively static. If the car is in motion, the world coordinate system is also doing the same movement, considering the factors, and in this system, only considering the relative position of car and obstacles. Therefore, a new coordinate system is selected, which is called the automobile coordinate system ðX, Y, ZÞ, and its origin position is selected at the midpoint of the two cameras. Then, the homogeneous coordinated relationship is shown in the following formula: In Formula (8), X cl , Y cl , and Z cl , respectively, represent the obstacle distance on the camera image corresponding to the X axis, Y axis, and Z axis, and X cr , Y cr , and Z cr , respectively, represent the relative position of the obstacle on the camera image corresponding to X axis, Y axis, and Z axis. Since the world coordinate system has changed, Formula (2) and Formula (5) are used to directly convert to the camera coordinate system. By calculating the coordinate of the obstacle on the left and right camera images, the relative position between the obstacle and the car is further calculated. Then, image matching is performed based on binocular vision.

Binocular
Matching. The region matching algorithm is used to create a window centered on the point to be matched in the base graph, and the adjacent pixels in the window are used to represent the point to be matched. A sliding window of the same size is created as the center at a point on the polar line corresponding to the alignment diagram. The sliding window moved on the outer polar line and the window matching measure at each displacement point is calculated. The best matching point is obtained by searching the maximum or minimum value of the matching measure. For binocular image pairs, for a point in the left image, create a window centered on that point. A sliding window of the same size is created as the center at a point on the corresponding polar line in the alignment diagram. The sliding window moved on the outer polar line, and the window matching measure at each displacement point is calculated. The point with the maximum similarity or minimum difference is selected as the matching point of this point. The specific process is described below. First of all, when determining appropriate matching primitives, the obstacle target is uncertain. The obstacles could be pedestrians or other objects such as cars. Cars can also be divided into trucks, ordinary sedan, and so on; from the point of view of form, the difference is bigger and the relative characteristics of a single image are more complex, which is therefore difficult for feature matching [18]. By image segmentation, the target image is segmented from the background. The discussion is carried out if there is only one target in the image, and the target gray block in the left image is used as the base element to search in the right image. In the process of driving, the position of the obstacle on the horizontal plane directly determines the driving 3 Wireless Communications and Mobile Computing result, while the vertical height of the obstacle has little influence on the driving decision. When the target is a pedestrian, the pedestrian's position on the horizontal plane will determine whether to stop or detour, and the pedestrian's height has no effect on this. Grayscale blocks in the two images are compared horizontally. The car is formed into a continuous gray block in the binary image, and the gray area registration in the horizontal direction can identify it as a matching target. Firstly, the background information is filtered out of the image by using threshold segmentation, and the target is highlighted. In the processed image, there are two obvious targets that can be matched: the car on the left of the center of the image and a narrow street line on the left of the car. Next, the gray block of the car target in the middle of the image is projected, and the projection line on the abscissa is used as the primitive for matching. Due to the existence of more than one target gray block in the image, due to the image illumination angle, shadow, overlay, and other adverse factors directly projected on the image, multiple targets will be fused together. Therefore, multiple targets should be extracted into a single target to facilitate matching. In order to reduce the amount of data to be processed, the whole image is cut into several banded regions, and the target of the gray block in each banded region is projected. After the projection processing, the target is transformed into several line segments in the image, the length of the line segments at the same height is calculated, and the similarity matching of the length of the line segments is carried out. If there are two line segments whose lengths are the closest, it can be considered as a target with the same name, and the two endpoints of the line segment are points with the same name. The smaller the interval is, the richer the information is and the more computation is needed. The larger the interval is, the less information is needed and the less computation is needed. The appropriate interval is selected according to the characteristics of the target. If the relative speed of the target is faster, the interval should be reduced; otherwise, it can be amplified to improve the real-time performance. If the height feature of the target is small, the interval should be reduced; otherwise, it can be enlarged. The length of the projected line segment is analyzed to get the length of the line segment. We list the matrix of the line segment length in both images and match the line segment with the closest length according to the rule from left to right to obtain three line segments with the same name. The two endpoints of the line segment are points with the same name. Through triangulation, the depth information of the points with the same name is calculated and obtained. The length of the line segment represents the width index of the target in the real world. Through the above process, the matching based on the region gray area is transformed into the feature matching based on the width of the projected line segment of the ribbon region. Three line segments and six points with the same name are obtained by matching the above pictures. Next, the six points are triangulated to calculate the three-dimensional coordinates of the points with the same name.

Obstacle Identification.
After acquiring the characteristic points of the obstacles in the images captured by the left and right cameras, the position coordinates of the points in the binocular vision system are calculated by using the method of depth information calculation of the points. Among them, in the calculation of the depth information of the point with the same name, the first step is to get the coordinates of the point with the same name in the image. The two images are taken by the left and right CCD cameras at the same moment by camera, and the position parameters ðx 1 , y 1 Þ and ðx 2 , y 2 Þ of the point with the same name in the two images are obtained by image processing. After the image coordinates of pixels to the coordinates into space coordinates of pixels corresponding to the physical, through the two image points with triangle to locate the implementation of physical points, since one side of the triangle (the connection between the two cameras) is known, so the second step to calculate as points to the physical connection with the angle of optical axis in the relative coordinates. Constants obtained from calibration measurement include the height from the origin to the ground, the distance between the two cameras, and the image width and height; parameters obtained from calibration include the focal length; and ðx 1 , y 1 Þ and ðx 2 , y 2 Þ of the same name point position parameters are detected in real time from the collected images. The coordinates of the target point in the relative coordinate system are calculated according to the determined geometric relation. After getting the coordinate information, through the analysis of the coordinate of the point with the same name, the type, color, and other details of the obstacle can be identified. For example, the width of the car in the original image can be calculated by comparing the abscissa of the left and right points with the same name. The left side of the car has a white door of the word line. The system can measure the width and vertical height information of the line. Use a series of points for complete matching as the most important target in the image information. The flow chart of driverless vehicle obstacle automatic recognition algorithm is shown in Figure 1.
Through further analysis of the information, the contour information of the target can be obtained, such as whether the target is the car, the pedestrian, or just the railing [19,20]. Through the above process, the recognition of the target obstacle is realized.

Experiment
The proposed algorithm based on binocular vision is used to make a comparative experiment with two traditional identification algorithms. The specific content is as follows.

Experimental Environment.
In the simulation experiment, the simulation platform of Pentium 42.8 GHz and 4G memory is used. The operating system model is XP SP2. The binocular vision hardware platform is composed of two identical USB3.0 industrial cameras, Jetson TX-1, camera support, and display. Binocular camera is used to collect images, and Jetson TX-1 is used to process the collected images. Specific parameters are shown in Table 1.
In the above experimental environment, the binocular calibration toolbox provided by MATLAB 2017b is used to calibrate the binocular camera. 4 Wireless Communications and Mobile Computing

Experimental Process.
First of all, in order to ensure the smooth progress of the experiment, the binocular camera is used to carry out calibration work.
Step 1, take the image. During shooting, the camera position should be fixed to shoot a group of images, and the direction and angle of the calibration plate should be changed at the same time, and all the corner points on the calibration plate should be included in the image, so as to facilitate the subsequent corner detection.
Step 2, extract the corners. Input each checkerboard size 25 mm, then use MATLAB to extract checkerboard corner points. The third step is the calibration of the monocular camera. The camera calibration module of MATLAB 2017b is used to calibrate the left and right cameras, to obtain the internal parameters, rotation matrix, distortion coefficient, and translation vector of the left and right cameras. The fourth step is calibration error analysis. After the calibration of the monoc-ular camera is completed, the calibration results are analyzed based on the error pixel distribution. The fifth step is binocular camera calibration. The stereo camera is calibrated using the MATLAB 2017b camera calibration module to obtain the rotation matrix and translation vector of the binocular camera. Finally, the calibration results are outputted, as shown in Figure 2.
As can be seen from Figure 1, the average calibration errors of both monocular camera and binocular camera are within the range of 0.1 pixel, indicating high calibration accuracy and good calibration effect. After that, the obstacle identification is carried out automatically. The car went straight ahead at a speed of 30 km/h. Select the vehicle forward person as obstacle 1, the bicycle as obstacle 2, and the fork port as obstacle 3. The obstacles 1, 2, and 3 are set at 5 m, 10 m, and 20 m in front of the vehicle, respectively. Lighting conditions are natural light during the day. The simulation diagram of the experimental obstacles is shown in Figure 3, and the identification results are compared.

Experimental
Results. The verification standard for effective obstacle avoidance is the verification standard that has no collision and can drive normally. With regard to failure to complete the avoidance or a collision with the obstacle, the verification standard of false avoidance is the behavior of avoiding obstacles on a road that is free of obstacles. The recognition results of the proposed binocular visionbased automatic recognition algorithm for UAV driving vehicle roadblocks, traditional algorithm 1, and traditional algorithm 2 are shown in Table 2.
As shown in Table 2, for obstacle 1, the recognition error of the proposed algorithm is 0.03 m; for obstacle 2, the recognition error of the proposed algorithm is 0.02 m; and for obstacle 3, the recognition error of the proposed algorithm is 0.01 m. In conclusion, the proposed algorithm has a smaller recognition error (higher recognition accuracy). Through analysis, it is found that the proposed obstacle recognition algorithm based  Figure 1: Flow chart of automatic obstacle recognition algorithm for driverless vehicle.

Conclusions
Computer vision has a broad application prospect; intelligent robot, medical image processing, and graphics search technology in all walks of life have their place; because the image contains abundant information, relative to other types of sensors, computer vision has obvious superiority, which will be applied to more and more engineering. Binocular stereo vision, as an important research branch of computer vision, has always been one of the focuses and hotspots of computer vision research. It simulates the process of human perception with both eyes and can measure the depth of a target using the parallax generated by multiple viewing angles. In this paper, binocular vision is applied for automatic identification of roadblocks of unmanned vehicles. By adding a car coordinate system, the relative position information of vehicles and obstacles can be obtained easily and the identification accuracy can be improved. It is hoped that the proposed algorithm can provide some reference value for the research in this field. Binoc-ular vision processing system needs to be aimed at different applications because of its large information capacity and high complexity. Real time is the bottleneck of its engineering application. More in-depth research about how to solve the bottleneck of its engineering application is the key future research direction.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.