Moving Object Classification Using 3D Point Cloud in Urban Traffic Environment

Moving object classiﬁcation is essential for autonomous vehicle to complete high-level tasks like scene understanding and motion planning. In this paper, we propose a novel approach for classifying moving objects into four classes of interest using 3D point cloud in urban traﬃc environment. Unlike most existing work on object recognition which involves dense point cloud, our approach combines extensive feature extraction with the multiframe classiﬁcation optimization to solve the classiﬁcation task when partial occlusion occurs. First, the point cloud of moving object is segmented by a data preprocessing procedure. Then, the eﬃcient features are selected via Gini index criterion applied to the extended feature set. Next, Bayes Decision Theory (BDT) is employed to incorporate the preliminary results from posterior probability Support Vector Machine (SVM) classiﬁer at consecutive frames. The point cloud data acquired from our own LIDAR as well as public KITTI dataset is used to validate the proposed moving object classiﬁcation method in the experiments. The results show that the proposed SVM-BDT classiﬁer based on 18 selected features can eﬀectively recognize the moving objects.


Introduction
Autonomous driving has become an increasingly popular domain for intelligent transportation system [1,2]. Moving object classification is a critical step to achieve reliable planning of driving trajectories for autonomous vehicles in dynamic environment, and the prior knowledge of the category attribute helps to build an appropriate dynamic model for moving objects [3][4][5]. e most commonly used sensors for object recognition are camera and LIDAR. Compared with cameras, LIDAR can obtain the accurate 3D measurements and it is invulnerable to weather and illumination. Extensive research bends the efforts to object recognition using LIDAR. Conventional techniques can be coarsely divided into two categories. e first category of the methods determines the object semantics through calculating the similarity between the scanned object and the predefined template. Simple geometric or motion model is constructed to classify rigid objects, while it is difficult to recognize pedestrians. Fang and Duan [6] employ iterative endpoint fitting algorithm to fit the segmented point cloud and calculate the number and size of line segments to determine whether the object is a vehicle or not. Petrovskaya and run [7] combine the rectangular model of point cloud in 2D occupancy grid map with the motion model established by Rao-Blackwellized particle filter to improve the vehicle classification accuracy when partial occlusion temporarily occurs. e second class of the methods mainly focuses on the effective feature descriptors of the object of interest as well as training specific classifiers [8,9]. For vehicle recognition, Yang and Dong [10] calculated the geometric features based on the optimal neighborhood size of each point, and classified the segments using SVMs. Lee and Coifman [11] checked the shape feature of each point cloud cluster and classified the vehicle into six classes of vehicles. For pedestrian recognition, Kim et al. [12] used SVM classifier with 31 layer-based features for pedestrian recognition. Arras et al. [13] defined 14 static features including roundness and compactness to train the pedestrian classifier based on the point cloud of the legs. Although this method can generate good recognition result at indoor environment, it is not suitable for outdoor. For multiclass object recognition, Azim and Aycard [14] used the simple ratio characteristics of 3D bounding box based on point cloud to recognize the vehicles and pedestrians, such as widthheight ratio and length-height ratio. However, frequent occlusion in real traffic environment leads to false size ratio so that the recognition performance is poor. Wang et al. [15] proposed 120-dimensional feature set including rotating images, shape factors, point normal vectors, and Euclidean distances and employed SVM classifier to recognize multiclass moving objects. Teichman et al. [16] constructed boosting classifier to recognize moving vehicles, pedestrians, and bicycles by integrating geometric features and motion features. Moreover, occupancy-grid-based methods [17,18] are presented to detect moving objects efficiently, but they only estimate the kinematic state of the object without the need to classify the category.
Conclusion drawn from the abovementioned related literatures can be summarized as follows. First, the existing moving object classification methods based on LIDAR are designed for the relatively dense point cloud returned from the scanned object. Second, temporary or partial occlusion at consecutive frames is seldom considered in the majority of object classification schemes, and the effectiveness of the extracted features has not been analyzed from the perspective of object category. Furthermore, most of the aforementioned classification methods are proposed to recognize common moving objects including vehicles, pedestrians, and bicycles. In real traffic scenarios, the pedestrians often appear as independent individuals or a small crowd. When two pedestrians are too close, it is so hard to segment the returned point cloud clearly that it is difficult to identify individual pedestrian. As shown in Figure 1, the pedestrians marked as A and B are segmented as a whole wrongly. It is very common that two or three human get together, and the crowd composed of two or three pedestrians may be regarded as other class of moving object due to point cloud under-segmentation. Motivated by the abovementioned analysis, we proposed a LIDAR-based classification method in this paper for four categories of moving objects, namely, vehicle, pedestrian, bicycle, and crowd. Velodyne HDL-64E LIDAR is adopted to collect 3D point cloud of the surrounding environment. Our method for moving object classification uses raw point cloud as follows. First, the points measured on moving objects are segmented from the rest of 3D point cloud. is process consists of ground segmentation, the clustering of nonground points, and moving object detection. Second, both global-and layer-based features are extracted to describe the geometric characteristics, and Gini index criterion is utilized to select the effective features based on the category attributes of training samples. Next, posteriori probability SVM classifier is employed to obtain the classification result at each frame, and BDT algorithm is further used to optimize the classification result of the tracked object at consecutive frames. Finally, the proposed SVM-BDT-based classification method is validated using the point cloud dataset collected by our own LIDAR as well as public KITTI dataset. e contributions of this paper are two-fold. First, we describe a novel approach for classifying moving objects into vehicle, pedestrian, bicycle, and crowd, since the point cloud segment of the crowd may be confused with other types of objects and even reduces the accuracy of object recognition.
is approach makes progress towards the application goal of moving object classification in real traffic environment for autonomous vehicle. Second, we adapt the idea of SVM-BDT classifier to incorporate multiframe classification results based on the effective features, and moving object classification is transformed into maximum posteriori probability solution problem. e remainder of this paper is organized as follows. Section 2 introduces the point cloud preprocessing. Section 3 presents feature extraction. Section 4 describes the classification method. Section 5 demonstrates experimental results. Finally, Section 6 offers conclusions and future works.

Point Cloud Preprocessing
e point cloud is characterized by the coordinates in the world coordinate system, and LIDAR position is marked as the origin of the world coordinate system. In this section, ground points are removed from 3D raw point cloud using the ground segmentation method in [19] which combines Markov random field models with loopy belief propagation algorithm. en, nonground points should be divided into independent clusters. Since the number of point cloud clusters of moving objects in surrounding environment is unknown and the density of point cloud varies with the range, mean shift clustering algorithm in [20] is selected. In order to reduce the influence of fixed bandwidth on the stability of clustering results, an improved mean shift clustering algorithm based on adaptive bandwidth is proposed as follows: (1) e nonground points are represented by (1) Given initial kernel radius bandwidth h 0 , Gauss kernel function G(p) and the tolerance ε, the initial kernel density estimation is calculated by where d is the dimension of the data space.
where λ is a proportional constant value and it can be calculated by log λ � n − 1 n i�1 logf(p i ).
(3) e initial kernel centroid is marked as q 1 , and the weighted mean value at q j is computed using kernel functions G and the weight h d+2 (p i ) − 1 : where m h (q j ) is the mean shift vector. Note that q j is iteratively calculated until the gradient of the convergence is zero, i.e., ‖m h (q j ) − q j ‖ < ε. Next, in order to detect moving objects, a local grid map is constructed using 3D occupancy grid algorithm in [6] to divide the surrounding environment into occupied, free, and unknown voxels. When new measurements return, the dynamic voxels are detected in the consistent grid map based on the inconsistencies between occupied space and free space. en, all the moving clusters in the dynamic voxels are extracted, as shown in Figure 2.

Feature Extraction
In general, the height of the point cloud clusters of moving objects including vehicle, pedestrian, bicycle, and crowd ranges from 1 m and 4 m. When partial occlusion occurs, the height of the clusters can be less than 1 m. Since the layerbased features describe a more detailed level of local shape characteristics than the global features, we divide the point cloud cluster into eight layers along the vertical direction of the horizontal plane. 2D features at each layer are employed to supplement the description of 3D geometric features and reduce the disturbance of partial occlusion. In addition to collecting the existing feature descriptors in the literatures, the differences of point cloud characteristics among four classes of moving objects are analyzed, and number-ofpoint-based features, shape features, and statistical features are selected, as shown in Tables 1-3.
In order to remove the features that make no significant difference on the object classification results, Gini index criterion of CART decision tree algorithm [21] is used for feature selection.
e forward search mechanism of the feature subset is combined with the subset evaluation mechanism to select the efficient features in order of priority, so that all samples falling at the subnodes belong to the same category, namely the highest purity is achieved at each subnode. Define that the proportion of the k-class sample in the training sample set U is u k (k � 1, 2, 3, 4), and Gini index is used to represent the purity probability distribution of the sample set U: e attribute w � w 1 , w 2 , . . . , w V is used to divide the sample set U; thus, V branch nodes are obtained. e samples with the attribute w V at the vth branch node are denoted as U V , and the weight of the branch node is set as |UV|/|U|. Given the attribute w, the Gini index of the sample set U is defined by e attribute w * � arg max Gini index(U, w) with the minimum Gini index is selected as the optimal boundary of the features. Based on the optimal attribute, the features are allocated for two subnodes generated from the current node. e abovementioned calculation is carried out recursively until Gini index is less than the preset the threshold. e category attributes of training sample sets are divided, respectively, for vehicle, pedestrian, bicycle, and crowd. Four decision trees of the hierarchical features for four categories of moving objects are obtained, as shown in Figure 3. In this figure, the solid line denotes yes and the dotted line indicates   Index Description Formula f 1 Number of points of the cluster n f 2 e slope of the line fitted by the number of points at each layer [13] y � kx + c 1 f 3 -f 4 e first order and second order coefficient of the quadratic curve are fitted by the number of points at each layer [13] y � bx 2 + ax + c 2 f 5 e product of the number of points and the minimum distance between the scan point and the origin point n * d min Table 2: Shape-based features.
Index Description Formula f 6 e sum of the area of the fitting rectangles of the horizontal projection points at each layer f 36 -f 43 e circle fitting level at each layer (sum of squared residuals of the vertical distances between the points and the least square fitting circle) f 44 -f 51 e radius of the circle fitted by the horizontal projection of the points at each layer Radius r � f 52 -f 59 e line fitting level at each layer (sum of squared residuals of the vertical distances between the points and the least square fitting line) [12] 1/np np i�1 (p xy,i − p l,i ) 2    Journal of Advanced Transportation e standard SVM classifier only determines the probability value as 1 or 0. In order to ensure the sparsity in support vectors of SVM classifier and the accuracy of classification results, Sigmoid function is used to convert the output of standard SVM into a posterior probability [22]: where P(y � 1 | f) denotes the probability value of correct classification when the output is f and A and B are the parameters to be fitted. Define the training set as (f i , t i ), and the output of the probability value is set as t i � (y i + 1)/2, where y i is the sample category, y i � {−1, +1}. e optimization strategy of the parameters A and B can be solved by minimizing the negative log likelihood function on the training set: where

BDT-Based Classification
Optimization. e point cloud of moving object is associated with the tracker at the consecutive frames and the tracker is updated based on the association result. e location information of each moving object at the next frame is predicted using linear Kalman filter. e moving object model is denoted as {I, L, W, x, y, if, t I , G m }, where I denotes the object index; L and W denote the fitting rectangular size of the point cloud cluster; x and y denote the center location of the point cloud cluster; if denotes whether the object has an associated tracker, the initial value is set as 0 and indicates no tracker; t I denotes the associated tracker index; and G m denotes the minimum value of the cost function between the object and the associated tracker. e tracker model is denoted as {I, L, W, x, y, v x , v y , ifLost, o I }, where I denotes the tracker index; L and W denote the fitting rectangular size of moving object matched with the tracker at the last frame; x, y, v x , and v y denote the position and speed prediction of the filtered tracker at the current frame; ifLost denotes whether the tracked object at the current frame is lost or not, the initial value is set as 1 and indicates that the tracked object is lost; and o I indicates the serial number of moving object corresponding to the tracker at the current frame when the tracked object has not been lost. e deterministic data association algorithm based on the fusion of multiple features is used to associate the moving object with the tracker. e location and geometric features of moving object are utilized as the primary and secondary constraints respectively. e objects are associated with the trackers by minimizing the cost function. Assuming that m moving objects are generated from the point cloud pre-processing procedure at t + 1th frame and n trackers exist, the cost equation based on the association between the ith moving object and the jth tracker at t + 1th frame is established: where pos (i, j) denotes the cost component between the position of moving object and the position of the tracker; box (i, j) denotes the cost component between the size of the fitting rectangle of moving object and the size of the tracker; g 1 and g 2 denote the weight of the position and the weight of the size, respectively, (define that g 1 + g 2 � 1, the weight of the position is higher than that of the position; thus, g 1 � 0.7); max n |•| represents the maximum of the association values between the ith moving object and n trackers at the t + 1th frame. When both the number of points and the shape vary constantly for the point cloud of the same moving object at  result of the category decision for the kth cluster C k (1 < k < J) is generated as {d k i }, i � 1, 2, . . . , 4. e category decision vector for the kth cluster which is tracked by the tracker T t at the tth frame is denoted as Dt k � [d k 1 , . . . , d k 4 ] T . Assuming that the observation D t at the tth frame is only related to the given category S i , the posterior distribution probability of the kth moving object cluster belonging to each category S i at the tth frame is updated by the state at the t − 1th frame: where the likelihood function at the tth frame P(D t | S i ) � sign(di t ). At the end, the maximum posteriori probability is used to estimate the category of the kth cluster tracked by the tracker at the tth frame.

Data Collection.
In order to test the performance of the proposed moving object classification method, four categories of the point cloud samples including vehicle, pedestrian, bicycle, and crowd are collected using Velodyne HDL-64E LIDAR equipped on our autonomous vehicle ( Figure 4). e videos generated by 3 external cameras on the autonomous vehicle are acquired synchronously to manually label the real categories for the samples. Meanwhile, 3D LIDAR data in public KITTI dataset [23] is also used to supplement the point cloud samples. e point cloud clusters of moving objects are extracted with the data preprocessing procedure. Note that the extracted clusters of moving objects with the same category are sensed from different view directions and distances. Figure 5 shows a few examples of moving object. All the experiments are processed on an Intel i7-4700, 3.20 GHz core processor with 8 GB RAM using C++ code.

SVM-BDT-Based Classification
Results. e framework for moving object classification is tested on the task of calculating the posterior probability that the point cloud cluster belongs to each category of moving objects at consecutive frames. We run the proposed SVM-BDT classification method using the 18 selected features, and the output of the posterior probability in multiple scenarios are shown in Figure 6. In each subgraph, the upper picture is the scene image captured synchronously and the rectangle represents one moving object tracked by 3D LIDAR, and the bottom picture shows the variation of the posterior probability frame by frame. As shown in Figure 6(b), the pedestrian in the red rectangle suddenly appears and becomes from partially occluded to be completely exposed in the point cloud environment. e posteriori probability that the cluster belongs to the pedestrian increases rapidly after several initial frames, and then it maintains at the maximum value. Figure 6(c) shows that as the bicycle goes far gradually, the number of points returned from the bicycle decreases gradually, and the posterior probability that the point cloud cluster belongs to the bicycle decreases correspondingly. As shown in Figure 6(d), multiple pedestrians walk away from the LIDAR. At first, the pedestrians walk so close that the posterior probability of the category belonging to the crowd is the highest. en, the distance among the pedestrians increases gradually, the point cloud of the crowd is successfully segmented into multiple pedestrians, and the posterior probability of the category belonging to the pedestrian increases accordingly. Later, the distances among the pedestrians decrease, and the posterior probability of the category belonging to the crowd increases again.
To evaluate the performance of the proposed SVM-BDT method quantitatively, 2000 groups of the point cloud samples are selected for each category of moving object, and 5-fold cross validation is conducted. Each group contains 10 consecutive frames. Figure 7 is the confusion matrix of the recognition results. We can see that the recognition accuracies of vehicle, pedestrian, bicycle, and crowd are 97%, 95%, 91%, and 90%, respectively. e proposed classification method shows the best recognition performance on moving vehicles, and the point cloud cluster of the crowd is the most likely to get confused with the other types of moving objects. Overall, the average recognition accuracy of SVM-BDT method is 93.25%, and the recognition result satisfies the requirements of autonomous vehicle on the recognition of surrounding moving obstacles. e total running time of the proposed SVM-BDT method increases with the number of moving objects in the traffic scenario, especially the time cost of BDT-based classification optimization stage, since the point cloud of each moving object is associated with each tracker at the consecutive frames and the trackers are updated based on the association result.  Table 4. e run time denotes the total time cost of both the feature extraction and classification stages. We can see that in terms of the SVM- Journal of Advanced Transportation 9 BDT-based moving object classification method, AUC value obtained by 18 features is close to that obtained by 68 features; thus, the characteristics of four categories of moving objects can be explained by 18 selected features well. For the same feature set, the SVM-BDT-based classification method outperforms the SVM-based method. It demonstrates the BDT algorithm using the consecutive frames optimizes the classification result at individual frame effectively. It even overcomes the partial occlusion. Moreover, for the same classifier, the method using 18 features run less time than the one using 68 features. Considering the recognition accuracy, computation complexity, and operational efficiency, it can be concluded that SVM-BDT classifier based on 18 features is the best choice to achieve the recognition of moving objects including vehicle, pedestrian, bicycle, and crowd. e crowd is regarded as a special moving object which is different from the single pedestrian in this paper. To further validate the crowd recognition performance of the proposed SVM-BDT-based method, several commonly used recognition methods are compared. e ROC curves of the classification results are shown in Figure 9, and the results clearly show the superiority of the proposed SVM-BDT-based method against Adaboost algorithm [24], Naive Bayes algorithm [25], and FLDA algorithm [26]. Although the crowd recognition accuracy of MCI-NN algorithm [27] outperforms the proposed SVM-BDT-based method, MCI-NN algorithm consumes more memory and takes more operation time due to using Markov kernel function. erefore, considering the recognition accuracy and efficiency, the proposed SVM-BDT-based method shows better crowd recognition performance.

Conclusions and Future Works
In this paper, we propose an approach for moving object classification using 3D point cloud in urban traffic environment.
is approach classifies moving objects into four classes, namely vehicle, pedestrian, bicycle, and crowd. e accurate modeling of moving object classification using 3D point cloud consists of several procedures that all affect the final classification results. To obtain the effective features of moving objects, unlike the application of simple feature description, Gini index criterion is employed in this work based on the characteristics of each category of moving objects to select from the extracted features including number-of-points-based features, shape features, and statistical features. In the classification procedure, unlike previous works where the classifier is modeled with the point cloud at single frame, the moving object is recognized based on SVM-BDT classifier to incorporate multiframe classification results. e presented method has three benefits. First, our method can classify the common moving objects in urban environment even if the pedestrians walk close or partial occlusion occurs. Second, this method digs deep into the point cloud distribution based on the category attribute to recognize moving object efficiently. Moreover, the BDT-based classification optimization is conducted on the results of the posterior probability SVM classifier at consecutive frames to improve the moving object classification performance. e method is tested using the point cloud dataset collected by our own LIDAR as well as public KITTI dataset in the experiment. e results reveal that the proposed SVM-BDT method based on 18 features can achieve better classification accuracy for vehicle, pedestrian, bicycle, and crowd, compared with several other methods. Note that the point cloud samples collected by our method are within 40 meters, beyond which the declining resolution of the point cloud caused many mistakes.
e proposed method has a limitation on classifying the moving objects at long range; thus, this challenge is treated as a subject of our future work. Another aspect of future work is the deep understanding of  the behaviour using 3D LIDAR by integrating the motion cues with the classification results of the surrounding objects in urban traffic environment.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.