Fast Pedestrian Recognition Based on Multisensor Fusion

A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients ROI-IHOG feature extraction method is proposed later. A support vector machine SVM classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.


Introduction
Pedestrians are vulnerable participants among all objects involved in the transportation system when crashes happen, especially those in motion under urban road scenarios 1 .In 2009, it was found that in the first global road safety assessment of World Health Organization report, traffic accident is one of the major causes of death and injuries around the world.41% to 75% of road traffic fatal accidents are involving pedestrians, and the lethal possibility of pedestrians is 4 times compared with that of vehicle occupants.Therefore, pedestrian safety protection should be taken seriously 2 .

System Architecture
The research of pedestrian recognition is carried out on the multisensor vehicle platform, as shown in Figure 1.This experimental platform is a modified Jetta.It is equipped with a vision sensor, a laser scanner, and two near-infrared illuminators to detect pedestrians in the range of 90 • in front of the vehicle.
The architecture of the proposed pedestrian detection system based on multisensor is shown in Figure 2. The system is running on an Intel Core I5 CPU, 2.27 GHZ, RAM 2.0 GB PC.The system includes offline training and online recognition.For offline training, a novel pedestrian dataset adapt the urban road environment is established first, and then the pedestrian classifier is trained by SVM.For online recognition, a Sony SSC-ET185P camera installed on the top front of the experimental vehicle is used to capture continuous 320 × 240 image.Potential pedestrian candidate regions are identified in the image through the radar data from a SICK LMS211-S14 laser scanner and the perspective mapping model between world coordinates and image coordinates.For each image, all candidate regions are scaled to 64 × 128 and judged by the classifier trained offline.

Sensor Selection
The Sony SSC-ET185P camera has been chosen for several reasons.The camera has a high color reproduction and sharp images.It includes a 18x optical zoom and 12x digital highquality zoom lens with autofocus, so the camera can capture high quality color images during the day.Although the system is now being tested under daylight conditions, two near-infrared illuminators are mounted on both sides of the laser radar in front of the vehicle, which allow the object detection due to a specific illumination for the extension of its application at night.
The laser scanner is a SICK LMS211-S14.The detection capabilities scanning angle of 90 • , minimum angular resolution of 0.5 • up to 81.91 m range are suitable for our goal.The laser scanner only scans a flat data, the ranging principle is a time-of-flight method, and it measures the round trip time of flight to determine the distance by emitting light pulses to the target.It takes 13 ms of once scanning which could be able to meet the needs of real time.

Vehicle Setup
The laser scanner and two near-infrared illuminators are located in the front bumper in horizontal, as shown in Figure 3 a .The camera is placed at the top front of the vehicle, with the same centerline of the laser scanner, as show in Figure 3 b .The horizontal distance between the camera and the laser scanner is 2.3 m, and the camera height is 1.6 m, which are two key parameters of the camera calibration.
The MINE V-cap 2860 USB is used to connect between the camera and the PC.An RS-422 Industrial serial and MOXA NPort high-speed card provide an easy connection between the laser and PC. Figure 4 shows the hardware integration of the proposed system.

Potential Pedestrian Location Estimation
Most current pedestrian detection methods are simply depending on visual sensors that cannot meet the real time application.In our work, we attempt to utilize laser radar sensor to detect obstacle locations for potential pedestrian position estimation in world coordinates, and then we make use of the camera calibration and the space-image perspective mapping model to mark the pedestrian candidate region in the image.Pedestrian recognition algorithm proposed later is performed only for the candidate regions instead of the entire image, which could effectively reduce the computational time cost for a good real time application.
In our experimental platform, a SICK LMS211-S14 laser scanner is utilized.The scanning angle is 90 • in front of the host vehicle and the minimum angular resolution of 0.5 • in Figure 5 .Thus, we can get 181 data arrays from radar sensor scanning once time.Each data array includes two parameters: the angle and the distance between the obstacle and the host vehicle.A data array could be denoted as { ρ i , θ i | i 1, 2, . . ., m}, where m is the total number of the array, and ρ i , θ i is the data of the ith array.Obviously, a set of laser beams from the same target should have the similar distances and the similar angles.Based on this, a clustering method is applied for 181 data to determine which belong to the same target, which is denoted as where φ is the minimum angular resolution of the radar; r k is the distance of the kth array; D 1 , D 2 are the distance threshold.According to the installation location of the radar, the part of pedestrian's knees would be scan.Taking into account the actual physical characteristics of the pedestrian legs separated or closed in spatial, D 1 , D 2 are set as 10 cm and 70 cm, respectively.Then, the potential pedestrian location parameters the start data, the end data, and the data amount of each target are recorded.The target distance could be expressed by the average distance of all beams from the target: ρ a ρ 1 ρ 2 • • • ρ n /n.Its direction could be represented as θ a θ j1 θ j2 /2, where θ j1 is first angle value of the target, and θ j2 is the last one.Finally, we convert the radar data from polar coordinate to Cartesian coordinate as where r, θ is the data in the polar coordinate; x, y is the data in the Cartesian coordinate, which represent the target location in space.The possible pedestrian locations are 2D data in world coordinate.Their corresponding regions in the image are then located by a piecewise camera calibration and the perspective space-image mapping model.This map is projected into the image in order to identify the regions and scale to search for pedestrians in the image.The camera height is 1.6 m, which is a parameter of the camera calibration.We can obtain the space-image mapping model as follows: where X w , Y w , and Z w are location parameters in world coordinate; u, v are corresponding parameters in image coordinate.We divided the detection area into four sections which gradually to determine, respectively, the mapping model parameters u, v more accurately by the least square method.
In order to detect pedestrians more accurately and faster, we should determine the detection size of the candidate pedestrian imaging region at different distances in front of the vehicle.We assumed that the pedestrian template is 2 m height and 1 m width a little larger than real pedestrian .The relationship between the pedestrian's width and height of the imaging region and the pedestrian location in space could be found by the calibration experiment.The potential pedestrian region's width and height in the image could be denoted as h 1402y −0.97 , w 723.2y −0.99 , where y is vertical distance from the target to the host vehicle.

Feature Representation
In 2005, Dalal and Triggs 22 proposed the grids of histogram of oriented gradient HOG descriptors for pedestrian detection.Experiment results showed that HOG feature sets significantly outperformed existing feature sets for human detection.However, HOG-based algorithm is too time consuming, especially for multi-scale object detection.The approach should be further optimized because it is not suitable for real time pedestrian safety protection.
In this paper, for fast pedestrian detection, the region of interest ROI of a pedestrian sample is found by calculating the average gradient of all positive samples in RSPerson dataset mentioned below.We can find that the gradient features at the head and limbs of pedestrian samples are most obvious.On the other hand, the gradients of the background area in the sample image offer less effect for pedestrian detection which may also disturb the processing performance.Therefore, in order to reduce HOG feature vector dimension of a whole image 3780 dimensions , several important areas are considered as ROI of a selected sample image to calculate the HOG feature.Accordingly, the computation amount of HOG feature is greatly reduced, and pedestrian recognition speed is improved.Through the analysis of average gradient value of pedestrian samples which is shown in Figure 6, four regions of interest are identified as ROI: the head region, the leg region, the left arm region, and the right arm region.These regions could be part of the overlaps each other and cover the body's contours basically.For a color image, gradients of each color channel are calculated.The gradients which have the largest amplitude among three color channel are selected as the the gradient vector of each pixel.Optimal ROI location, width, and height of a sample image is shown in Table 1.
Similar with Dalal's method, for calculating the feature vector of ROI in a detection window, the cell's size is defined as 8 × 8 pixel, and the block's size is defined as a 2 × 2 cell.The window's scan step is 8 pixels, the width of a cell.A total of 49 blocks could be extracted in a detection window.For each pixel x, y in the image, the gradient vector is denoted as Δg x, y ∂f x, y /∂x, ∂f x, y /∂y .In general, one-dimensional centrosymmetric template operator −1, 0, 1 is used for calculating the gradient vector:

4.1
Accordingly, the gradient magnitude could be calculated as Δg x, y g x x, y 2 g y x, y 2 .

4.2
The gradient orientation is unsigned, it is defined as θ x, y arctan g y x, y g x x, y π 2 .

4.3
To compute the gradient histogram of a cell, each pixel casts a vote weighted by its gradient magnitude, for the bin corresponding to its gradient orientation.All of gradient orientations are group into 9 bins.Thus, every block has a gradient histogram with 36 dimensions, and ROI-HOG feature vector has 49 * 36 1764 dimensions.Furthermore, integral histograms of oriented gradients IHOG 23 are utilized for farther speed up the process of feature extraction.The histograms of oriented gradients of the pixel x, y could be expressed as follows: T x, y g 1 , . . ., g i , . . ., g 9 , g i The integral feature vectors in x-orientation is as follows: The integral feature vectors in y-orientation is as follows: As shown in Figure 7, IHOG of a cell could be calculated as Accordingly, IHOG of a block could be calculated as HOG BLOCK HOG CELL-1 , HOG CELL-2 , HOG CELL-3 , HOG CELL-4 .

4.8
IHOG method only need scan the entire image for once and storage the integral gradient data.Any area's HOG feature could be obtained with simple addition and subtraction operations without repeated calculation of the gradient orientation and magnitude of each pixel.

Sample Selection for Training
For pedestrian recognition in urban road environment, we build a pedestrian sample dataset called RSPerson Person Dataset of Road System dataset.In the sample dataset, the positive samples are including walking pedestrians, standing still pedestrians, and group pedestrians with different size, pose, gait, and clothing.Some preexperimental studies have shown that the selection of negative samples is particularly important for reduction of false alarms.Thus,  boles, trash cans, telegraph poles, and bushed which are likely to be mistaken for pedestrians, as well as some normal objects such as roads, vehicles, and other infrastructures are selected to form negative samples.This is most beneficial for our pedestrian detection system.In RSPerson dataset, each sample image is normalized to 64 × 128 pixels for training.Figure 8 shows some samples of RSPerson dataset.9.

Pedestrian Recognition with SVM
For online recognition, once the potential pedestrian locations are located by laser radar, candidate regions in the image are confirmed accordingly by the perspective mapping model.For each candidate region, scale transforming is carried out for normalization of 64 * 128 pixels, and then, ROI-IHOG feature vector could be extracted.Based on these steps, we can judge whether the candidate is a true pedestrian or not by the classifier trained with SVM.

Experimental Results
For testing the validity of the proposed method, several video sequences from realistic urban traffic scenarios are tested for performance assessment of our pedestrian recognition experimental platform.Firstly, the pedestrian candidate locations are estimated based on laser radar data processing and space-image perspective mapping model.Some candidate region segmentation results are shown in Figure 10.In this way, potential pedestrian regions are located in the image, but some other obstacles poles, shrub, etc. are also located as positives.
Secondly, the proposed ROI-IHOG SVM algorithm is tested with several video sequences.In this step, pedestrian recognition only depends on ROI-IHOG SVM for an entire image without fusing the laser information.The recall could reach 93.8% under 10 −4 FPPW.The image size is 320 * 240 pixels.The average detection time is about 600 ms/frame.Some detection results are shown in Figure 11.
Finally, fusing information from laser and vision sensor, each candidate region detected is scaled to the size of 64 * 128 pixels and extracted the ROI-IHOG feature.According to our recognition method, the candidate region is considered to be a pedestrian or not by the classifier trained with SVM.Based on multisensor fusion, the average detection time is about 18 ms for a candidate.Thus, if there are 5 candidate regions in each image of the video sequence averagely, the processing speed is about 11 frame/s which could be satisfied the real time requirement.Several recognition results Figure 12 indicate that the proposed pedestrian detection approach based on multisensor fusion has good performance, which could provide an effective support for active pedestrian safety protection.

Conclusions
A fast pedestrian recognition algorithm based on multisensor fusion is developed in this paper.Potential pedestrian candidate regions are located by laser scanning and the perspective mapping model, and then ROI-IHOG feature extraction method is proposed  for reducing computational time cost.Moreover, SVM is utilized with a novel pedestrian sample dataset which adapt to the urban road environment for online recognition.Pedestrian recognition is tested with radar, vision, and two-sensor fused, respectively.Reliable and timewise performances are shown on fusion-based pedestrian recognition.The processing speed could reach 11 frame/s which could be satisfied the real time requirement.In future work, we will further study the key technologies for pedestrian safety, such as pedestrian tracking, pedestrian behavior recognition, and conflict analysis between pedestrians and the host vehicle.

Figure 2 :
Figure 2: Architecture of the proposed multisensor pedestrian recognition system.

Figure 3 :Figure 4 :
Figure 3: a Installation location of the laser scanner and near-infrared illuminators.b Installation location of the camera.

Figure 5 :
Figure 5: The sketch of radar scanning.

Figure 10 :
Figure 10: Pedestrian candidate region estimation results under different urban scenarios.

Table 1 :
Location of ROI in the sample.
Before online recognizing pedestrian, we should construct a classifier offline trained by SVM algorithms.Firstly, training dataset and test dataset are built from RSPerson dataset.The training dataset includes 2000 pedestrian and 2000 nonpedestrian samples, and the testing dataset includes 500 pedestrian and 500 nonpedestrian samples.The training dataset samples are handled, and features are extracted to form training vectors.With cross-validation based on grid search method, the proper parameters of SVM are selected.RBF kernel is chosen as kernel function, and the penalty factor C 1024 as well as the kernel parameter g 0.0625.After that, the pedestrian classifier could be constructed.Finally, testing dataset samples are chosen to test the performance of the classifier.We use the DET curve which contains two indicators: miss rate and FPPW false positive per window to evaluate performance of SVM classifiers.The performance of pedestrian recognition based on ROI-IHOG is shown in Figure