Vehicle Detection Based on Perspective Transformation Using Rear-View Camera

In recent times, the number of vehicles with a rear-view camera has been increasing. The rear-view camera can be utilized as a sensor for monitoring a collision from behind the vehicle in a driving scene. To prevent rear-end collisions, we have been developing a technology that detects approaching vehicles from images obtained using an onboard rear-view camera. In conventional vehicle detection methods, often, camera-view images are used. However, it is difficult to accurately estimate the position of a distant approaching vehicle using such images. In this paper, we propose an improved method to accurately estimate the position of distant approaching vehicles by using virtual top-view images. The displacement of the vehicle in the top-view image is proportional to its speed. Thus, the proposed method can provide the accurate position of the distant vehicle. We describe the details of the proposed method and its availability by the experiment using actual images.


Introduction
Recently, the use of active safety technologies has increased, and various driver-assistance systems are in practical use.Systems that use on-board sensors such as radars or cameras to minimize the damage caused by rear-end collisions are an example of active safety measures.However, the number of rear-end collisions still remains high.Therefore, it is necessary to prevent not only head-on but also rear-end vehicular collisions.In order to avoid a rear-end collision, we propose a system where an on-board sensor continuously surveys the rear vehicle and if there may be a possibility of a rear-end collision, the system warns the driver of the approaching rear vehicle by flashing the hazard lamps.We have named this system as a rear collision prevention system, and in this paper, we have proposed a method for detecting an approaching vehicle using an on-board rear-view camera that can be utilized in this system.As the number of vehicles equipped with rear-view cameras for parking has been increasing, camera-based systems are expected to be implemented at a lower cost than radar-based systems; therefore, we decided to use an on-board rear-view camera for detecting rear vehicles.In addition, systems for surveying surroundings equipped with front, right, left, and rear-view cameras are commercially available today.With the increasing number of on-board cameras, research for utilizing these cameras for other active safety systems than just checking surroundings has become active [1,2].
The study of detecting approaching vehicles from onboard camera images began over 20 years ago [3][4][5] and it has raised the possibility to detect vehicles using monocular camera.From the studies on vehicle image detection where only an on-board, single-lens rear-view camera is used as a sensor, we find that there are two approaches to detect vehicles in an image-one camera-view images (as captured by a camera onboard the vehicle) and the other using top view images (top-view of the road).In the case of using the camera-view image, the optical flow is often calculated to detect the approaching vehicles [6,7].Considering the use of rear collision prevention system, it is good enough to detect only the approaching vehicles and optical flow is easy to introduce for extracting the areas which have the motion of approaching direction.For monitoring the rear circumstances of the host vehicle, the orientation of optical flow is reversed against the one of background.A method involving the use of the feature that vehicles have horizontal 2 International Journal of Vehicular Technology or vertical edge components has also been proposed [8,9].They find the left and right boundary of the vehicles using the vertical edges with the clues of the horizontal edges by the shadows derived from the bumper and the road.Calculating left-right symmetry in parallel, the areas of the vehicle candidates in the image are selected.After the validation of them by using pattern recognition or particular knowledge about driving circumstances, distance between the host vehicle and the rear vehicle is calculated with the base of extracted vehicle and the approaching vehicle can be detected by observing the time change of the distance.Since the rearview camera uses a wide-angle lens and is mounted off to the downward, the distance resolution becomes significantly low for the area of the image projecting the far range, while it is high for the area of the image projecting the close range.
When the rear vehicle is far from the host vehicle, optical flow is almost zero although the rear vehicle is approaching.The horizontal or vertical edges are too short in order to extract the areas of vehicle candidates.These disadvantages cause missed detection.Or else, if vehicles can be detected, the distance between them that is estimated by using such images is not sufficiently accurate to recognize whether the detected vehicle is approaching, following, or moving away from the host vehicle.Therefore, in order to recognize distant vehicles, a method for detection of vehicles from lowresolution images was proposed [10].
On the other hand, a method for detecting vehicles using the virtual top-view image has also been proposed [1,[11][12][13].Ito et al. proposed a method for detecting a vehicle by finding the difference between two top-view images: one that is transformed from the front-view camera and the other from the rear-view camera.Song et al. (e.g., [12]) used the feature that the edges in the vertical direction in the camera-view image converge on the camera position in the top-view image at the Hypothesis Generation step (e.g., [3]) and detect the distant vehicles in the camera-view image.The advantages of using top-view images are that the actual distance per pixel is constant in the image and that the motion of the object can be described as an affine transformation.Therefore, the displacement of the object in the top-view image is proportional to the object's velocity, and the distance resolution of the top-view image is virtually higher at the area projecting the distant scene than that in the camera-view image.We detect the distant approaching vehicles by using these features of a top-view image obtained by transforming the camera-view image and improve the accuracy of the estimation of the distance between the host vehicle and the rear vehicle.
The rest of this paper is organized as follows: we first discuss how to transform a camera-view image into a topview image by perspective transformation and provide the details of the vehicle detection with the top-view image.We then show the experimental results obtained using actual images and conclude the paper with a summary.

Perspective Transformation
The coordinate system is defined as shown in Figure 1.O W -X W Y W Z W is defined as the vehicle coordinate, and O C -X C Y C Z C is defined as the camera-view coordinate.When the viewpoint O C , whose location is (t X , t Y , t Z ), rotates by θ about the X W axis, by ρ about the Y W axis, and by φ about the Z W axis, the relation between the two coordinate systems is given by the following equation: where The transformation between O C -X C Y C Z C and the image plane xy is described by using the focal length f as follows: When the virtual viewpoint O VC is located (0, H VC , D VC ), we can calculate the virtual camera-view coordinate O VC -X VC Y VC Z VC , from the vehicle coordinate and the virtual image plane coordinate x y from the virtual cameraview coordinate in the same way.Then, the point in the virtual image plane can be calculated from the point in the rear camera-view image plane through the vehicle coordinate system.However, the vehicle coordinate system of three parameters cannot be calculated because the image plane has only two parameters.We assume that all of the objects projected onto the image plane are drawn on the road plane whose Y W is zero; we consider these objects to be planar objects.Then, the top-view image can be obtained by    transforming the rear camera-view image with the following equation: where f is the focal length of the virtual camera and f /H VC is the invariance used for determining the range of X W and Z W projected onto the top-view image.We prepare two invariances f X /H VC and f Y /H VC for determining that the range of X W is ±6 m and the range of Z W is 0 to 40 m.
Figure 2 shows the undistorted rear camera-view image and the top-view image by perspective transformation with a bilinear interpolation.

Vehicle Detection
In this method, the base of the vehicle in the same driving lane, which is straight, approaching from behind is detected from the top-view image using the advantage that a vehicle has many horizontal edge components for utilizing in the rear collision prevention system.First, the horizontal edge components are calculated by differentiating the undistorted camera-view image with respect to y, and the cameraview image is transformed into the top-view image using a perspective transformation.Second, the driving lane in the image is defined as the area for searching vehicles.In this area, the horizontal edge components are accumulated at each j, which is the axis representing the vertical direction of the image considering that the origin is at the upper left of the image.Finally, we cluster the adjacent j that has a cumulative value over a certain threshold and determine the cluster of the approaching vehicle by tracking each cluster through successive images.(Figure 3), this information is useful for vehicle detection.

Generation of Vehicle
For using this feature, we extract the image of the horizontal edge components I H from the camera-view image I and transform them into the top-view I TH .Considering the rear collision prevention system, the driving lane in the image is defined as the area for searching vehicles.The horizontal edge components are accumulated at each j in this area, and the cumulative value at j is defined as follows, which is called Histogram of Horizontal Edge Components (HHEC): where R is the area for searching vehicles, and i and j are the axes of the horizontal and vertical direction, respectively, considering that the origin of the image is at the upper left.Then, we generate vehicle candidates by forming the clusters of HHEC of adjacent j and ensuring that S( j) is over a threshold.The height of a cluster is the number of pixels about the j direction in the cluster.In Figure 4, the centre shows I TH , the left shows HHEC, and the right shows the result of clustering HHEC.Some clusters are generated by the HHEC of nonvehicle planar objects such as road markings or shadows.

Removal of Road Markings from Vehicle Candidates.
The rear vehicle is detected by tracking each cluster and observing its motion through the successive images.The topview image is generated by assuming that all of the objects are planar: marks drawn on the road plane.Therefore, it is considered that the displacement of the clusters is proportional to their velocity in the image.When we define a cluster of the image at time t as C t and the bottom position of C t as b(C t , t), the predicted position in the image at t of the cluster at t − 1 can be described as b(C t−1 , t).Clusters are tracked by using the corresponding C t−1 to C t near the bottom position b(C t−1 , t) predicted by using the motion of C t−1 .We can detect only the clusters of the vehicle because the clusters generated by the HHEC of planar objects move away at the same speed as the host vehicle.As the velocity of the host vehicle can be obtained from the on-board velocity sensor, the moving distance per image frame can be easily calculated.Moreover, since the rear collision prevention system is only active when the host vehicle is moving forward, the clusters of road makings can be recognized by the feature that they appear from the bottom area of the image, where the host vehicle has passed.However, in the case when the rear vehicle is passing on the road marking, the clusters of the vehicle and the nonvehicle object are merged into one cluster since the HHEC of each cluster is almost the same.Therefore, we cannot track the cluster of the vehicle appropriately or fail to track it continuously.For solving these difficulties, we prepare five types of clusters depending on the scene and remove the clusters of planar objects by processing the clusters according to their type.We define the five types of clusters as given in Table 1 and Figure 5 considering that the rear vehicles pass on the planar objects such as road markings or shadows.The area for monitoring the appearance of the clusters is set at the bottom of the image in advance.All types of clusters are initialized with VEHICLE at the step of the generation of vehicle candidates.Once a cluster, which has not been tracked, appears in the monitoring area for the first time at time t, its type changes from VEHICLE C Vh t to PLANAR C P t .The result of the classification between the VEHICLE and the PLANAR type is shown in Figure 6.
Since it is considered that a PLANAR cluster moves away at the same speed as the host vehicle and the motion of the planar object is described as translation in the topview image, the motion of C P t can be estimated.In the where R is the lapped area of C Vh t+1 and the area of C P t at the position b(C P t , t + 1) and N R is the number of j in R. Depending on D h and D S , each cluster type is determined according to the representation given in Figure 7 with the thresholds T h about D h and T S about D S .Whether a cluster is a LAPPED-type one or a HIDDEN-type one is determined by the bottom position of C Vh t+1 and C P t as follows: is HIDDEN type, otherwise.
In the case of MERGED-and LAPPED-type clusters, we need to estimate the correct bottom position of the rear vehicle area because a MERGED-or LAPPED-type cluster has HHEC that is derived from the vehicle and from the planar object.Considering the transition of the cluster type through the successive images, VEHICLE changes to MERGED, and MERGED changes to LAPPED.Therefore, by assuming that a MERGED-or LAPPED-type cluster moves with the same displacement as that when the cluster was a VEHICLE-type one, we can estimate the bottom position of the rear vehicle area.The bottom position b(C t+1 , t + 1) of the MERGED-or LAPPED-type cluster is modified to the estimated position b (C t+1 , t + 1).By updating the position of the clusters, we can eliminate the influence of a planar object; tracking is performed with the modified clusters at t + 1.
In each of the successive images, several vehicle candidates are generated as the clusters except the PLANAR type.We determine the actual bottom of the rear vehicle by selecting the cluster that has the largest b among the vehicle candidates.This cluster should have been approaching and tracked consecutively.The real distance from the host vehicle can be calculated using the formulas of perspective transformation.

Experiments
Here, we will discuss the experimental results of the vehicle detection by the proposed method.For the experiments, we use the actual images obtained by using the rear-view camera and correct their distortion.The size of all images is 640 pixels * 480 pixels.

Observation from Stationary Host
Vehicle.The experimental situation in this section is that the vehicle with the rear-view camera is stationary.The other vehicle is approaching from a distance behind with a constant velocity; it slows down and stops.In this experiment, it is assumed that the location of the road marking is known.The estimated distance of the bottom of the detected vehicle is shown in Figure 8.For the sake of comparison, the vehicle is also detected from the camera-view image by using the conventional method (e.g., [8]), since this method uses the edges for detecting a vehicle, as in the case of the proposed method.In this method, vehicle detection is limited to the same driving lane.An area in the same driving lane having horizontal edges and surrounded by vertical edges is recognized as a vehicle, and the base of the area is assumed to be the bottom of the vehicle to obtain the distance from the vehicle.We use the distance measured by a laser radar as the true value.
Figure 8 describes the range from 15 m to 40 m behind the host vehicle.In the conventional approach, the estimation of the vehicle position becomes unstable beyond 20 m and significantly departs from the true value at the distance beyond 25 m.Although the rear vehicle is approaching with constant velocity, there are the intervals where the estimated distance does not change because the camera-view image has a low distance resolution in the area projecting the far scene.In contrast, the result obtained by using the proposed method can be estimated close to the true value at a distance of up to 35 m, and the distance that cannot be described by the conventional approach, can be calculated.

Observation from Moving Host
Vehicle.We also applied the proposed method to the successive images obtained from a moving host vehicle.The host vehicle has a rear-view camera mounted on it; the vehicle is driven in the straight lane, and the rear vehicle approaches from a distance of 50 m to the host vehicle.The number of images used for the experiment is 585.We measure the distance between the host vehicle and the rear vehicle by a laser radar for the sake of comparison.For removal of the road markings, we use the velocity which obtained by the relatively accurate speed sensor of the host vehicle that observes the revolution of the wheel.Some images used for this experiment are shown in Figure 9.The distance of the rear vehicle in the image, frame no.8070, is 40 m; that in frame no.8360 is 10 m; the distance in frame no.8535 is 2.3 m; these distances are measured by using the laser radar.As shown in frame no.8070 in Figure 9, the vehicle in the distance (40 m) in the image of the rearview camera, which has a wide angle, is considerably smaller (see the ellipse in Figure 9).Figure 10 shows the result of the estimation of the distance between vehicles, obtained by the proposed method.The distance of the rear vehicle can be estimated with accuracy throughout the successive images, even though the vehicle is passing on the road markings.While the rear vehicle drives beyond 40 m, it is not detected in most of the images since the top-view image projects the range from 0 m to 40 m.For reference, the results that appear on the road images while the host vehicle is moving and are recognized as road marks are denoted by +, as shown in Figure 10.Although some road markings are detected in Figure 10, it is not a matter of importance since the purpose of the proposed method is the detection of the vehicle.By the removal of road markings, the bottom position of the rear vehicle can be detected so as to not affect the HHEC of the road markings.
In Figure 10, some frames failed to estimate the distance because the cluster which involves the HHEC of the true base of the vehicle was judged as following and the other approaching cluster was detected as the base of the rear vehicle.It is considered that this false detection can be reduced by taking account of the moving direction of the past clusters.
In the case of night time, proposed method can recognize only the light-filled area by the headlight of the rear vehicle since the horizontal edge components of the night time image are small.Therefore we need to introduce new clue such as the pair of headlight.It is also possible that shadows of the host vehicle or the ones of rear vehicles driving in the same or adjacent lane are not recognized as planar objects.In particular, when the shadows of vehicle in the adjacent lane are faster than the host vehicle, proposed method might mistake them as the approaching vehicles.By using the fact that the shadows of the vehicle have less horizontal edge components than the front of the vehicle, this mistake can be reduced.
Figure 11 shows the comparison of HHECs; the upper row is no.8070 in Figure 9 and the lower row is the one of the images where the host vehicle is moving with no vehicle behind.Though the camera-view image of lower row is not very different from no. 8070 in Figure 9, when it is transformed into the top-view image and the HHEC is obtained, the differences are clearly recognizable.In this result, the method can accurately judge whether there is an approaching vehicle from far or no vehicle behind.Figure 12 shows the examples of the results of the case where the rear vehicle is moving at the same velocity as the host vehicle.The proposed method only detects an approaching vehicle; hence, in this case, the rear vehicle is not recognizable when it is moving at the same velocity as the host vehicle.

Summary
In this paper, we proposed a method to improve the accuracy of the position estimation of the distant vehicle by using the top-view images.By extracting horizontal edge components from the top-view image and by using information about the driving environment, we can classify vehicles and the planar objects and accurately detect the bottom of a vehicle.The advantage of the proposed method is experimentally verified with actual images.By detecting vehicles using the top-view image, we can estimate the distance of vehicles up to 40 m, which is twice the distance that can be achieved using camera-view images.If the area projected to the topview image is extended, we can estimate the distance of even farther vehicles.Although this method is applied only to the area of the driving lane in the experiment, if it is applied to the area of a parallel driving lane, the vehicle in the next lane approaching from the behind can also be detected.Therefore, this method can be used as a driving support system before making a lane change as it detects a vehicle approaching from behind.We will attempt to extend the applicable driving scene such as changing lane or the existence of moving shadows in our future study.Currently, this method can be applied only when the vehicles are moving in a straight line; future studies must cover cases where the vehicles are moving along a curved line.

Figure 2 :
Figure 2: Rear camera-view image (a) and top-view image (b).

Figure 3 :
Figure 3: Vehicle image (a) and horizontal edge component image (b).

Figure 4 :
Figure 4: HHEC (a): horizontal edge component image (b) and clustering result (c).The rectangle with the white dashed line denotes the area for searching vehicles, and the rectangles with the white solid line denote the clusters.

Figure 8 :Figure 9 :
Figure 8: Distance estimated by conventional method and proposed method.

Figure 10 :
Figure 10: Distance Estimated by Proposed Method in the case of Moving Host Vehicle.

Figure 11 :Figure 12 :
Figure 11: Comparison of HHECs between the image with a distant vehicle and the one with no vehicle.