An Improved VM Obstacle Identification Method for Reflection Road

An obstacle detection method based on VM (VIDAR and machine learning joint detection model) is proposed to improve the monocular vision system’s identification accuracy. When VIDAR (Vision-IMU-based detection and range method) detects unknown obstacles in a reflective environment, the reflections of the obstacles are identified as obstacles, reducing the accuracy of obstacle identification. We proposed an obstacle detection method called improved VM to avoid this situation. *e experimental results demonstrated that the improved VM could identify and eliminate unknown obstacles. Compared with more advanced detection methods, the improved VM obstacle detection method is more accurate. It can detect unknown obstacles in reflection, reflective road environments.


Introduction
Obstacle detection has become a major concern in the field of driver assistance systems due to the complexity of the outdoor environment. Cameras (monocular, binocular, infrared, etc.), lidar, and millimeter-wave radar are all examples of obstacle identification equipment. While lidar and millimeter-wave radar are highly accurate at detecting obstacles, their high cost limits their use in low-end vehicles [1][2][3]. Due to the low cost, high detection accuracy, and speed of vision-based obstacle identification equipment, it has become more suitable for various vehicles [4,5]. e vision-based sensor used in this study is a camera. Camera, GPS, and IMU constitute an innovative sensor combination. Compared with single sensor, the application of multisensor information fusion technology can improve the reliability of the whole system, enhance the reliability of data, improve the accuracy, and increase the information utilization rate of the system in solving the problems of detection, tracking, and target recognition.
Machine learning is the process of training and identifying images using deep convolutional neural networks. Compared with other image recognition technologies, machine learning has an extremely high recognition rate for specific images. While machine learning is capable of accurate classification, it can only be used to identify known obstacles. While the vehicle is in motion, using machine learning to identify unknown obstacles may result in misidentification, posing a serious risk to the vehicle's safety ( Figure 1). As a result, a method for detection and ranging using vision and an IMU (inertial measurement unit) has been proposed [6]. Given that VIDAR requires more time to run than machine learning, a method combining VIDAR and machine learning to detect obstacles has been proposed (called the VM method). Machine learning is used to identify known obstacles in the proposed method, while VIDAR is used to detect unknown obstacles.
To avoid the situation in which VIDAR detects the reflection as an obstacle when used in a reflective environment ( Figure 2) and improve detection accuracy, a VIDAR-based pseudo-obstacle detection method (called improved VIDAR) has been proposed.
is method's identification procedure is as follows. e rectangle of the obstacle was determined.
e width of the obstacle rectangle was calculated using the transformation relationship between pixel coordinates and world coordinates, and then the height of the obstacle rectangle was calculated using the transformation relationship between pixel coordinates and world coordinates. e true obstacle is determined by the fact that the actual height of the obstacle rectangle remains constant throughout the ego-vehicle movement. If the obstacle is a real one, tracking is continued.
To accelerate the detection speed of improved VIDAR, we combined it with machine learning (this article uses the faster RCNN algorithm) to identify known obstacles, which we refer to as improved VM. e improved VM obstacle detection method can quickly and accurately detect obstacles on reflection roads. e enhanced VM obstacle detection procedure is as follows: first, machine learning is used to identify known obstacles; second, the identified obstacles are removed from the background area; and finally, pseudoobstacles are eliminated through the use of enhanced VIDAR.

Related Work
As the core section of automobile-assisted driving, obstacle detection has emerged as a critical area of research in recent years. Due to its simple ranging principle, the monocular vision sensor has become the primary obstacle identification equipment in obstacle identification. Many scholars have conducted related research on obstacle identification to accelerate the process. Traditional image object classification and detection algorithms and strategies are difficult to meet the requirements of image and video big data in terms of processing efficiency, performance, and intelligence. Deep learning establishes the mapping from low-level signals to high-level semantics by simulating the hierarchical structure similar to the human brain, so as to realize the hierarchical feature expression of data, and has powerful visual information processing capabilities. erefore, in the field of machine vision, the representative of deep learning-convolutional neural network (CNN) is widely used [7,8]. Convolutional neural networks are also called cellular nonlinear networks. Arena et al. have stressed the universal role that cellular nonlinear networks (CNNs) are assuming today. It is shown that the dynamical behavior of 3D CNNbased models allows us to approach new emerging problems, to open new research frontiers [9]. Shustanov and Yakimov proposed an implementation of the traffic signs recognition algorithm using a convolution neural network; training of the neural network is implemented using the TensorFlow library and massively parallel architecture for multithreaded programming CUDA; and the experiment proves the high efficiency of this method [10]. Zhu et al. have proposed a novel image classification framework that combines CNN and KELM (kernel extreme learning machines). ey extracted feature categories using DenseNet as a feature extractor and radial basis function kernel ELM as a classifier to improve image classification performance [11]. Wang et al. proposed the occlusion-free road segmentation network, a fully convolutional neural network. rough foreground objects and visible road layouts, this method can predict roads in the semantic domain [12]. e accuracy of obstacle identification is also continuously improved as new machine learning concepts such as SegNet, YOLO v5, faster RCNN, BigGAN, and mask RCNN are developed [13][14][15][16][17][18][19][20]. While machine learning is capable of accurate classification, it can only be used to identify known obstacles. Unknown obstacles may be missed while the vehicle is moving, which will cause a serious impact on the vehicle's safety.
Generally, obstacles are detected using significant information such as color and prior shape. Zhu et al. proposed a method for detecting vehicles based on their edges and symmetry characteristics [21]. ey hypothesized the vehicle's location based on the image's detected symmetric regions. e vehicle's bounding box is determined using the projected image of the enhanced vertical and horizontal edges. Zhang et al. [21][22][23] used color information for background removal and shadow detection to improve object segmentation and background updating. is method is capable of rapidly and precisely detecting moving objects. Zhang et al. [24] introduced Deep Local Shapes (DeepLS), which are high-quality 3D shapes that can be encoded and reconstructed without requiring an excessive amount of storage. is local shape of the scene decomposition simplifies the prior distribution that the network must learn and accelerates and accurately detects obstacles. However, in an environment with reflections, the reflections contain significant information about the obstacles they use, reducing the accuracy of obstacle detection. e vehicle's position is generally determined by highlight information, such as the highlighted area and contour features. Park and Song [25] proposed a front vehicle identification algorithm based on contrast enhancement and vehicle lamp pairing. Lin et al. [26] discovered that the characteristics of headlights were more distinctive than the contours of vehicles and had a greater identification effect and thus proposed the use of lamps as a sign for vehicle identification at night. e Hough transform was proposed by Dai et al. [27] as a method for intelligent vehicle identification at night. is method divides the extracted lamps into connected domains, extracts the lamps' edges, and then identifies the circle using the Hough transform. Finally, by pairing the lamps, the vehicle's location is determined. Kavya et al. [28] proposed a method for detecting vehicles based on the color of the brake lamp during braking in the captured color image. e feature information required for the above identification method in a reflective environment will detect lamp reflections. e vehicle's lamps will also be paired, which will reduce the vehicle's accuracy of identification. We used a modified VM to detect obstacles, allowing for eliminating obstacles in the reflection, thereby increasing obstacle detection accuracy.

Methodology of Improved VIDAR's Pseudo-
Obstacle Detection e monocular visual identification method, based on machine learning, is limited to identifying previously identified obstacles. A vehicle collision accident may occur when unknown obstacles are present on the road. When VIDAR is used to detect obstacles, pseudo-obstacles in the reflection environment are mistaken for real obstacles. us, 2 Journal of Robotics to increase the speed and accuracy of obstacle detection, we use an improved VM.

Transformation from World Coordinates to Pixel
Coordinates. e camera can project objects in the threedimensional world into a two-dimensional image by capturing an image. In reality, the imaging model establishes a projection mapping relationship between three-dimensional and two-dimensional space. e coordinate transformation is required to convert the world coordinate system's coordinates to the camera coordinate system's coordinates. A rigid body transformation is used to convert the world coordinate system to the camera coordinate system. It is determined by the camera's external parameters. e camera coordinate system to pixel coordinate system transformation converts three-dimensional coordinates to two-dimensional plane coordinates, as determined by the camera's internal parameters. Although both the pixel and image coordinate systems are located on the imaging plane, their origins and units of measurement are distinct. e origin of the image coordinate system is the point at which the camera's optical axis intersects the imaging plane, which is typically the imaging plane's midpoint.
Suppose the internal parameters matrix is M. Project Q(X, Y, Z) in the physical world to the image plane q(x, y). By adding a dimension w to q(x, y), which is the expansion of q(x, y, w), obtain w � Z. Point q is in the form of homogeneous coordinates. Combine the rotation matrix R and the offset matrix T to obtain the external parameter matrix K, where c x and c y are the offsets. e coordinate transformation is shown in Figure 3. e internal parameters of the camera are obtained by Zhang Zhengyou demarcate to determine the transformation relationship between world coordinates and pixel coordinates.

Obstacle Ranging Model.
e range of obstacles is as follows (Figure 4). Let f be the focal length of the camera; h be the installation height of the camera; μ be the pixel size; z be the camera pitch angle; (x 0 , y 0 ) be the intersection of the image plane and the optical axis of the camera; set to (0, 0); (x, y) be the coordinates of intersection points of obstacles and pavement plane set to P; and the horizontal distance d between the object point and the camera is

Journal of Robotics
Assume that Y 1 is the Y-axis in the previous image, Y 2 is the Y-axis in the previous image. When the camera moves from Y 1 to Y 2 on the axis of the imaging plane ( Figure 5), let A be an imaging point for obstacle's top in the subsequent image, B be the same imaging point for obstacle's top in the latter image, A ′ is the object point of A, and B ′ is the object point of B. d 1 is the horizontal distance between A ′ and the camera; similarly, d 2 is the horizontal distance between B ′ and the camera. d 1 and d 2 can be obtained from Equation (3). e camera moved a certain distance Δd during the time between the previous and subsequent images; Additionally, if the obstacle is moving (as illustrated in Figure 6), the Δl can also be used as an obstacle judgment. e verification process has been shown in the paper [6].

Static Obstacle Identification Model.
ere are two types of static obstacles: real static obstacles and static pseudoobstacles. Static real obstacles refer to actual road obstacles. e reflections identified as real obstacles during the obstacle identification process are called static pseudo-obstacles. It is a type of pseudo-obstacle that reflects some road obstacles but does not affect the vehicle's driving safety. To improve the accuracy of obstacle identification, we must identify and remove static pseudo-obstacles.

Static Real Obstacle Identification.
First, we used the VIDAR to detect stereo obstacles and determine which object point on the obstacle is the furthest away in the horizontal and vertical directions to construct a rectangle (the obstacle rectangle, Figure 7). Let A (x 1 , y 1 ) be the first imaging point for the identification of the width of the rectangular road surface of the obstacle, B (x 2 , y 1 ) be the other imaging point for the identification of the width of the rectangular road surface of the obstacle, and A ′ be the object point of A; similarly, B ′ is the object point of B. e horizontal distances d 1 between A ′ and the camera can be calculated by Equation (2). Similarly, the horizontal distances d 2 between B ′ and the camera can also be calculated. e width of the obstacle rectangle can be calculated using the pinhole imaging principle and the geometrical relationship between cameras.
When the camera moves from Y 1 to Y 2 on the axis of the imaging plane (Figure 8), let C (x 3 , y 2 ) be an imaging point for identifying the width of the opposite side of the obstacle, D(x 4 , y 2 ) be another imaging point for the identification of the width of the opposite side of the obstacle, C ′ be the object point of C, and D ′ be the object point of D. (4) and (5) have the same width. e height of the pseudo-obstacle rectangle is calculated.

Static Pseudo-Obstacle Identification.
e procedure for identifying pseudo-obstacles is similar to the procedure for identifying real obstacles. However, when obstacles are detected using VIDAR, the object points of the obstacles are different from their actual positions (Figure 9). We detect pseudo-obstacles using VIDAR and construct a rectangle (pseudo-obstacle rectangle) from the object points on the pseudo-obstacle (the object points are the farthest in the horizontal and vertical directions). Let A be the first imaging point for pseudo-obstacle width identification with (x 1 , y 1 ),        (2). Similarly, the horizontal distances d 2 between B ′ and the camera can also be calculated using (2), and the width of the pseudo-obstacle rectangle can be calculated by the pinhole imaging principle and the geometrical relationship between cameras.
w � ��������������������������������������������������������� � When the camera moves from Y 1 to Y 2 on the axis of the imaging plane (Figure 10), after the pseudo-obstacle moved, let C (x 3 , y 2 ) be an imaging point of the rectangular width of the identified pseudo-obstacle and D (x 4 , y 2 ) be another imaging point of the rectangular width of the pseudo-obstacle, and C ′ be the object point of C and D ′ be the object point of D. At this point, the width of the object point of the pseudo-obstacle changes from W 1 to W 2 . Similarly, W 2 can be solved using the pinhole imaging principle and the geometrical relationship between cameras. e rectangular height of the pseudo-obstacle can be obtained by (5) and (6) and the triangle similarity principle.

Static Obstacle Identification Model.
Moving obstacles are classified as either moving real obstacles or moving pseudoobstacles. Moving real obstacles refers to obstacles on the road. e reflections identified as real obstacles during the obstacle identification process are referred to as moving pseudo-obstacles. It is a type of pseudo-obstacle that replicates some road obstacles but does not destroy the vehicle's driving safety. We must identify and remove moving pseudo-obstacles to improve accuracy when identifying obstacles.

Static Pseudo-Obstacle Identification.
e steps for identifying real moving obstacles are identical to those for real static obstacles ( Figure 11). VIDAR is used to detect stereo obstacles, construct obstacle rectangles, and calculate their width. After the ego-vehicle and obstacle have been moved, the width of the obstacle rectangle is recalculated and then the obstacle height is solved using the triangle similarity principle.

Moving Pseudo-Obstacle Identification.
e steps for identifying moving pseudo-obstacles are identical to those for static pseudo-obstacles ( Figure 12): detecting stereo obstacles with VIDAR, determining the pseudo-obstacle rectangle, and calculating the pseudo-obstacle rectangle's width. Following the ego-vehicle and pseudo-obstacle movement, the height of the pseudo-obstacle is calculated using the width of the pseudo-obstacle's imaging point.

Removal Model of Pseudo-Obstacles.
e ego-vehicle movement assesses the obstacle's authenticity. Using (3) and (5), the widths of obstacles and pseudo-obstacles when the vehicle moves for the first time are calculated. When the vehicle resumes motion, the width and height of the obstacle and pseudo-obstacle are also calculated using (4) and (6), and the heights of the obstacle and pseudo-obstacle are determined by their widths. Compared with the calculated 6 Journal of Robotics results for obstacles, detected obstacles are those with a similar height. e process of obstacle identification is as follows: (1) Confirm stereo unknown obstacles. Machine learning is used to identify known obstacles, obtain images after removing the known obstacles, and then screen out stereo unknown obstacles using VIDAR's obstacle detection principle.

Obstacle Identification Experiment and Effect Analysis
We analyze the identification effect of the VM and improved VM in two environments. On the movable platform, the experimental equipment, including the camera unit and IMU, is installed (Figure 16(a)). A scale model of the vehicle is used to simulate a specific obstacle. To simulate the unknown obstacle, a beverage bottle cap is used (Figure 16(b)). e polished paper is used to create a reflection of the road (Figure 16(c)). e camera's video captured at a frame rate of 20 fps is utilized to generate an image sequence, and then W X 1 Figure 8: Schematic diagram of static pseudo-obstacles after the ego-vehicle moved. Figure 9: Schematic diagram of static pseudo-obstacle. Figure 10: Schematic diagram of static pseudo-obstacle after the ego-vehicle moved. Figure 11: Schematic diagram of moving obstacle. Figure 12: Schematic diagram of moving pseudo-obstacle after the ego-vehicle moved.
Journal of Robotics 7 obstacle detection on the generated image sequence is performed.

Improved VIDAR and Improved VM Simulation Experiments.
A beverage bottle cap is used as an unknown obstacle, and the angular acceleration and acceleration of the ego-vehicle are obtained from the IMU installed in the egovehicle. e quaternion method is used to solve the camera attitude and update the camera pitch angle. e image is processed by a fast image region matching method based on MSER. Acceleration is used to calculate the horizontal distance between the vehicle and the obstacles. e height of the obstacle rectangle by keeping the actual width constant during the vehicles and obstacles movement is calculated, the authenticity of the identified obstacles by keeping the height unchanged is confirmed, and the real obstacles are marked. e previous image and latter image are used to judge whether the height of the obstacle rectangle has changed ( Figure 17).
In the VM and improved VM comparison experiments, the faster RCNN is used to identify known obstacles and to identify known obstacles as background, while VIDAR and improved VIDAR are used to perform secondary detection on the background-removed image to identify unknown obstacles ( Figure 18, Figure 19, and Figure 20). e detection of unknown obstacles in this paper is shown in Figure 19 and Figure 20. While VIDAR in the VM is capable of identifying the bottle cap, the cap's reflection is also detected as an obstacle, resulting in low obstacle identification accuracy. When the improved VIDAR is used  Journal of Robotics to detect unknown obstacles, the obstacles in the reflection without height can be eliminated, compensating for the unknown obstacles being misdetected in the reflective environment. As a result, the improved VM detects obstacles more precisely than the baseline VM.

Analysis of the Identification Result of Improved VM and
Improved VIDAR. In the experimental test, a pure electric vehicle is used as the test vehicle ( Figure 21). A MV-VDF300SC industrial digital camera is used as a monocular vision sensor. is model camera adopts the USB 2.0 standard interface and has a high resolution, precision, and clarity. e camera's performance parameters are listed in Table 1  Journal of Robotics 9 motion status in real time. GPS is used to determine a precise location. Digital maps are utilized to obtain precise road data, such as distance and slope. e computing unit is used to perform real-time data processing. In the process of calculation, multisensor data processing is the combination and processing of multisource information, which is rather  complicated. Fuzzy logic can deal with complex systems [29]. It can coordinate and combine the acquired information to improve the efficiency of the system and effectively deal with the knowledge acquired in the scene. Accurate calibration of camera parameters was a prerequisite for the whole experiment and is a very important task for obstacle detection methods. In this paper, Zhang Zhengyou's camera calibration method was adopted to calibrate the DaYing camera. First, the camera was fixed to capture images of a checkerboard at different positions and angles. en, key points of the checkerboard were selected and used to establish a relationship equation. Finally, the internal parameter calibration was realized. e camera calibration process and result are shown in Figure 22.
Camera distortion includes radial distortion, thin lens distortion, and centrifugal distortion. e superposition of the three kinds of distortion results in a nonlinear distortion, the model of which can be expressed in the image coordinate system as follows: where s 1 and s 2 are the centrifugal distortion coefficients; k 1 and k 2 are the radial distortion coefficients, and p 1 and p 2 are the distortion coefficients of thin lenses.
Because the centrifugal distortion of the camera is not considered in this paper, the internal reference matrix of the camera can be expressed as shown in e calibration of the camera's external parameters can be calculated by taking the edge object points of lane lines.
e calibration results are shown in Table 2.
Due to a lack of reflection road images in the public data set and the fact that different camera parameters would affect range accuracy, we created a VIDAR-Reflection Road database ( Figure 23) with a total of 2000 images. e MV-VDF300SC camera unit was used to record the experiment in its natural environment. e test roads were Xuezhai Road and Jiefang East Road in Jinan, Shandong Province, and traffic environment images were collected during rainy days from 10 : 00 to 11 : 00 and 19 : 00 to 20 : 00. Figure 24 depicts the identification result for the two images. e VM and improved VM accuracy are compared by counting the number of TP, FP, TN, and FN obstacles in each image frame. Let a be an obstacle that is correctly identified as a positive example; b be an obstacle that is incorrectly identified as a positive example; c be an obstacle that is correctly identified as a negative example; and d be an obstacle that is incorrectly identified as a negative example. en, TP � n i�1 a i , FP � n i�1 b i , TN � n i�1 c i , and FN � n i�1 d i . e comparison of the detection effects of VM and improved VM in the reflection environment is shown in Table 3.
In the results' analysis, accuracy (A), recall (R), and precision (P) were used as evaluation indices for the two obstacle detection methods, calculated through e accuracy, recall, and precision of the method proposed in this paper are shown in Table 4.
As demonstrated by the experimental results in Table 4, the accuracy of obstacle identification is increased when using the improved VM for obstacle identification in a reflective environment. Due to the weather and other factors, there are times when misidentification and missed  identification occur during the experiment. However, the improved method proposed in this paper improves obstacle identification accuracy. Additionally, we compared our method's detection accuracy to other commonly used target detection methods. Table 5 summarizes the detection results. It is obvious that the proposed obstacle detection method outperforms stateof-the-art methods in terms of accuracy. e term "real time" refers to the processing of each image frame collected over time. In terms of detection speed, 2000 images were processed using improved VIDAR, VM, VIDAR, VM, and YOLO v5. Table 6 summarizes the average detection times for the five identification methods.
As shown in Table 6, an improved VM takes longer to determine the authenticity of obstacles than a VM. Similarly, improved VIDAR requires more time to determine the  authenticity of obstacles when compared with VIDAR. Due to the fewer feature points, improved VM detects faster than improved VIDAR, and less time is required. As a result, using an improved VM for obstacle detection takes not only advantage of machine learning's speed but also improves identification accuracy.

Conclusion
is paper first proposes an improved VIDAR method based on VIDAR and then combines machine learning to propose an improved method for VM obstacle identification. On the basis of machine learning to detect known obstacles, VIDAR is used to determine whether there is an obstacle with height by calculating the position of road imaging points, the obstacle rectangle is determined for nonroad obstacles, and then the obstacle height (including real obstacles and pseudo-obstacles) is calculated by using the obstacle imaging points of two frames before and after the vehicle moves. By calculating the height after moving again (including real obstacles and pseudo-obstacles), the two heights are compared to determine the authenticity of the obstacle, so as to realize the obstacle detection. is paper aims to show the effect of obstacle detection using improved VM in the environment with reflection. e experimental results indicate that when compared with VM, the improved VM method for obstacle detection is more accurate in a reflective environment. Because the method proposed in this paper needs a lot of calculations, improving the efficiency of the proposed method will be the next research direction. In addition, obstacle detection is a prerequisite for obstacle avoidance, and an improved obstacle avoidance method is also a future research direction.

Data Availability
Data are available on request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Input  TP  FP  TN  FN  VM  7356  6591  365  279  392  Improved VM  7356  6832  224 252 243