Vehicle Type Recognition Combining Global and Local Features via Two-Stage Classification

1School of Information and Control, Nanjing University of Information Science & Technology, Nanjing 210044, China 2Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China 3School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China 4School of Electronic and Information Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China


Introduction
Vehicle type recognition (VTR) is one key component of intelligent transportation systems (ITS) and has a wide range of applications such as traffic flow statistics, intelligent parking systems, electronic toll collection systems, and access control systems [1].For example, it can be utilized to realize the automatic fare collection (AFC) according to different vehicle types in some paying parking lots or be applied to the nonstop toll collection system to realize automatic toll calculation in highway toll stations.Additionally, it can also be used to find and locate the vehicles that break traffic regulations and are escaping from the accident scene in traffic video monitoring.
With the extensive use of traffic surveillance cameras, image-based methods are attracting more and more attention of researchers in the VTR.The vehicle face image contains precious information for the VTR, and extracting features from the vehicle face image will lead to a better recognition result.However, illumination change, scale variation, and partial occlusion will badly influence the performance of the VTR in real-world traffic environments.In order to improve the performance of the VTR, researchers have proposed many effective methods.These existing methods mainly consist of two key steps, that is, feature extraction and classifier design, which directly determine how well the VTR method works.
There are many typical features that can be applied to the VTR, such as edge based feature [2,3], color based feature [4], symmetry based feature [5][6][7], SIFT descriptor based feature [8,9], HOG descriptor based feature [10], and Gabor filter based feature [11].The edge based feature extraction methods extract the edge of vehicle image by a certain edge operator, such as Sobel operator.The symmetry based methods utilize projection or corner detection algorithms according to the geometric symmetry of the vehicle face image in spatial profile to detect and recognize the vehicle.The two kinds of methods are able to extract the geometrical contour of vehicle 2 Mathematical Problems in Engineering image accurately and quickly using small storage space and little computation time.However, these methods are easily influenced by some adverse factors, such as illumination change, scale variation, and partial occlusion; when these factors occur, their performance in feature extraction will degrade.Therefore, these feature extraction methods are commonly used to extract the global contour of vehicle image, and the extracted features also only apply to the preliminary recognition in the VTR.
Unlike the two kinds of methods mentioned above, feature extraction methods, such as SIFT descriptor based, HOG descriptor based, or Gabor filter based, can extract structural details of vehicle image from multiple scales and orientations, and they are insensitive to illumination change or scale variation.Therefore, they are commonly used for precise recognition.However, due to extracting multiple features from multiple scales and orientations, these feature extraction methods always generate a large amount of additional feature information compared with the original image, which will increase the computational complexity of VTR algorithms.
Intuitively, global information means the holistically geometrical configuration of vehicle contour, while structural details are embedded in the local variations of vehicle appearance.Therefore, extracting both global geometrical information and local structural details from vehicle images through certain feature extraction methods and leveraging the extracted feature information via suitable classifiers will help improve the performance of the VTR.
In terms of the classifier design, typical classifiers include KNN [3,4], SVM [12][13][14], and ANN [15].For the KNN classifier, it has a simple principle and does not need training in advance.However, when the number of the samples in training set increases, its computation time will also increase accordingly.The methods based on SVM or ANN classifier can effectively utilize various vehicle features and obtain good classification performance.However, these methods need to train classifier parameters in advance by collecting many samples of different types of vehicles and are easy to fall into a local optimum solution during training the classifier parameters.The classifier based on sparse representation has been successfully applied to the face recognition due to excellent characteristics: without involving complex parameter training and only needing to consider original image samples as a dictionary without any additional transformation [16].Further research finds that if we can learn a discriminative dictionary from the original dictionary via certain dictionary learning schemes before pattern recognition, then we will achieve more accurate and reliable classification results based on the learned dictionary than based on the original dictionary [17].
Additionally, the above-mentioned classification methods adopt a single-stage classification strategy; that is, all features are incorporated into one classifier together to recognize the vehicle type.When the number of the recognized vehicles types increases, the methods based on the singlestage classification need lots of training samples to train many classifier parameters, which will inevitably increase the difficulty of classifier design for a given recognition performance [18].
To address the aforementioned limitations, this paper proposes a new VTR method combining global and local features via a two-stage classification, whereby the global feature and local feature are jointly applied to the VTR, and their advantages in expressing vehicle geometrical contour and structural details are leveraged by a proposed twostage classification strategy.The proposed method enables an accurate and reliable VTR.First, the global feature is used to preliminarily recognize the type of a vehicle from the geometrical contour viewpoint, and the local feature is further used to recognize the specific type from the structural details viewpoint.Second, due to exploiting a two-stage classification strategy, the total classification task is appropriately assigned to two different classifiers.Therefore, the design of each classifier is simplified and their design difficulty is also lowered accordingly.This improves the overall classification performance of the VTR in accuracy and reliability compared with the methods based on the single-stage classification strategy.
This paper advances the research on VTR by making the following specific contributions: First, an improved Canny edge detection algorithm with smooth filtering and nonmaxima suppression abilities is proposed to extract a continuous and complete global feature of vehicle image.Second, the whole vehicle image is partitioned into four nonoverlapping patches based on the key parts of a vehicle, and the local feature is extracted by a set of Gabor wavelet kernels with five scales and eight orientations based on four partitioned key patches.When the vehicle is partially occluded, it still can be correctly recognized by using the local feature extracted from other nonoccluded patches.Third, a -nearest neighbor probability classifier (KNNPC) with the Hausdorff distance measure is proposed to improve the reliability of the first stage of classification, where vehicle type is preliminarily recognized as a large or small vehicle from the geometrical contour viewpoint.Fourth, a discriminative sparse representation based classifier (DSRC) that adopts a dictionary learning scheme based on the Fisher discrimination criterion is introduced to the second stage of classification, which enables a more specific classification based on the extracted local feature.
The rest of this paper is organized as follows.Section 2 presents the global and local feature extraction methods as well as the image partition method based on the key parts of a vehicle.Section 3 describes a two-stage classification strategy for the VTR.Experiments and analysis are shown in Section 4 to illustrate the effectiveness of the proposed VTR method.The final section summarizes this study and future research directions.

Feature Extraction
As mentioned previously, both the global geometrical contour and local structural details of a vehicle play important roles in the VTR.Therefore, there is a need to extract these features through corresponding feature extraction methods.In this paper, the global geometrical contour is extracted by an improved Canny edge detection algorithm with smooth filtering and non-maxima suppression abilities, and the local structural details are extracted by a set of Gabor wavelet kernels with multiple scales and orientations.

Global Feature Extraction.
The edge of vehicle image contains rich contour information of the vehicle.Therefore, it is regarded as a global feature to preliminarily recognize the type of a vehicle in this paper.
Commonly, some operators can be used to extract the edge of a vehicle, such as Sobel, Roberts, Prewitt, and Canny.However, these edge detection algorithms based on a certain operator have their own limitations.For example, the Sobel and Prewitt operators have the ability to fast detect the edge of an object but cannot produce a thin edge; therefore, they are unsuitable for accurate location.The Roberts operator is capable of locating the edge accurately but is sensitive to noises; therefore, it cannot effectively suppress the noises existing in the image.The Canny operator has the abilities to smooth a strong edge and suppress noises.It also can extract accurate and complete edge under good illumination; however, when the illumination becomes poor, it cannot detect a weak edge [19].
In order to achieve a better edge, we propose an edge detection method based on the improved Canny operator to extract the global feature of vehicle images.It exploits a double-threshold algorithm based on OTSU to selfadaptively determine the edge of a vehicle according to illumination changes.Based on non-maxima suppression and double-threshold judgment, the proposed method can find a continuous and complete edge.The detailed steps are as follows.
Step 2 (calculate gradient magnitude).The gradient of each pixel in the smoothed image is determined by applying the Sobel operator.The Sobel operators for  and  directions are, respectively, In order to improve real-time performance, the gradient magnitude (, ) and gradient direction (, ) are determined by where   (, ) =   * (, ) and   (, ) =   * (, ).
Step Step 4. Double thresholds are used to determine strong and weak edges.We set two thresholds  high and  low .(i) If M(, ) ≥  high , then the pixel at (, ) is determined as an edge pixel and let M(, ) = 255.(ii) If M(, ) ≤  low , then the pixel at (, ) is determined as a nonedge pixel and let M(, ) = 0. (iii) If  low < M(, ) <  high , then continue to search in a 3 × 3 neighborhood based on the current central pixel (, ) to find whether there is a pixel whose gradient magnitude is more than  high .If such a pixel exists, then the pixel is also determined as an edge pixel and let M(, ) = 255; otherwise, the pixel is determined as a nonedge pixel and let M(, ) = 0.

Local Feature Extraction.
The global feature can be used to recognize the type of a vehicle roughly, such as large or small.In order to further recognize a specific type, such as sedan, van, bus, or truck, other features to represent the local structural details of a vehicle need to be extracted.

Image Partition Based on Key
Parts.Not all parts in a vehicle face image are useful for the VTR; only some key parts with salient features (e.g., vehicle roof, windscreen and rear-view mirror, hood, and license plate) are available.Additionally, the partial occlusion always occurs under realworld traffic environments.If we partition the vehicle face image into several key patches, even when the partial occlusion occurs, we can still recognize the vehicle type through other key parts in other nonoccluded patches.Therefore, we averagely partition the vehicle face image into four key patches from the top to the bottom, (i) vehicle roof, (ii) windscreen and rear-view mirror, (iii) hood, and (iv) license plate, as shown in Figure 1.

Local Feature Extraction.
Gabor wavelets, whose kernels act very similarly to mammalian visual cortical cells, have strong characteristics of spatial locality and orientation, making them a suitable choice for image feature extraction in the VTR [22].Therefore, the Gabor wavelet representation of the vehicle image is introduced to extract the local features in every partitioned patch in this paper, which can not only obtain better structural details with multiple scales and multiple orientations but also improve the robustness to illumination change or partial occlusion.The Gabor wavelet kernels can be defined by [22] where  and V define the orientation and scale of the Gabor kernels, respectively,  = (, ), ‖ ⋅ ‖ denotes the norm operator, (, ) represents the pixel coordinates, and the wave vector  ,V is defined as where  V =  max / V ,   =  ⋅ /8,  max is the maximum frequency, and  is the spacing factor between kernels in the frequency domain.
For Gabor feature extraction, we convolve the image () with a set of Gabor wavelet kernels defined by ( 6) at every pixel (, ): where  = (, ),  ,V () is the convolution result corresponding to the Gabor wavelet kernel at orientation  and scale V, and it also is called Gabor feature image in this paper, () expresses gray level distribution of an image, and ⊗ represents the convolution operator.Therefore, the set  = { ,V () :  ∈ {0, 1, . . ., 7}, V ∈ {0, 1, . . ., 4}} forms the Gabor wavelet representation of the image ().
Applying the convolution theorem, we can derive every  ,V () via the fast Fourier transform (FFT) [24].
where F and F −1 indicate the Fourier transform and inverse Fourier transform, respectively.
To leverage the advantage of Gabor wavelets with five scales and eight orientations, we concatenate all these Gabor feature images  ,V () in set  and derive an augmented feature vector .Before the concatenation, we first downsample every  ,V () into  ()  ,V by a factor  to reduce the space dimension and normalize it to zero mean and unit variance.We then transform every  ()  ,V into a vector by concatenating its columns.Finally, the reduced Gabor feature vector  ()   is defined as ) T , where T is the transpose operator.

Preliminary Recognition Based on Global
where ‖ ⋅ ‖ represents Euclidean norm.
Because the classical HD measure is sensitive to noises and partial occlusion, the scheme of the least trimmed square (LTS) is introduced.In the IHDM, the directed distance ℎ LTS (, ) is defined by a linear combination of order statistics: where   () () represents the th distance value in the sorted sequence (  ( where

Precise Recognition Based on Local Feature and DSRC.
To exploit the Gabor feature of vehicle image, before the following precise recognition, we need to firstly express all samples using their reduced Gabor feature vector  () that is computed by the proposed local feature extraction method in Section 2.2.Then, based on the reduced Gabor feature vectors, we set up training set and test set to design the DSRC.The core idea of the sparse representation based classification (SRC) methods is to represent a test sample using a sparse linear combination of training samples [27].Suppose that there are  classes of samples, and let  = [ 1 ,  2 , . . .,   ] be the set of training samples, called dictionary, where   is the subset of training samples from class .Let  be a test sample.The procedures of the SRC are summarized as follows.
(i) Sparsely represent  on  via 1-minimization: where  is a scalar constant.
(ii) Implement classification via identity () = arg min where   = ‖ −   α ‖ 2 and α = [α 1 ; α2 ; . . .; α ] and α is the coefficient vector associated with the class .Obviously, the SRC method classifies the test sample as the category to which the smallest representation residual   belongs.Poststudies find that the employed dictionary plays an important role in sparse representation based image classification.While learning a dictionary from the training data has led to state-of-the-art results in image classification, many models of dictionary learning harness only the onesided discriminative information in either the representation coefficients or the representation residual, which limits their performance.In this paper, we proposed a DSRC that adopts a novel dictionary learning scheme based on Fisher discrimination criterion.Based on this, a structured dictionary, whose atoms have correspondences to the subject class labels, is learned, by which both the representation residual and representation coefficients can be used to distinguish different classes.

Dictionary Learning Based on Fisher Discrimination
Criterion.Unlike the method based on the shared dictionary, we adopt a new dictionary learning scheme based on Fisher discrimination criterion [17], which learns a structured dictionary  = [ 1 ,  2 , . . .,    ], where   is the subdictionary associated with class .Let  = [ 1 ,  2 , . . .,    ] express the set of training samples with   classes, and let  be the sparse coefficient matrix of  over ; that is,  ≈ , where   is the th subset of class .We can write  as  = [ 1 ,  2 , . . .,    ], where   is the coefficient matrix of   over .Besides requiring that  should have powerful ability to represent  (i.e.,  ≈ ), we also require that  should have powerful ability to distinguish the images in .For this reason, the dictionary learning scheme based on Fisher discrimination criterion is defined as follows: where (, , ) is the discriminative data fidelity term; ‖‖ 1 is the sparsity penalty; () is a discrimination term imposed on the coefficient matrix ; and   T , where   and  are the mean vectors of   and , respectively, and   is the number of samples in class   ;  is a parameter.
Although the objective function  (,) in ( 17) is not jointly convex to (, ), we will find that it is convex with respect to each of  and  when the other is fixed.Therefore, the objective function  (,) can be divided into two subproblems by optimizing  and  alternatively: updating  with  fixed and updating  with  fixed.The alternative optimization is iteratively implemented to find the desired dictionary  and coefficient matrix .
Suppose that the dictionary  is fixed, and then the objective function in (17) where and  are the mean vector matrices (by taking the mean vector   or  as all the column vectors) of class  and all classes, respectively.We can solve (18) to obtain   using the improved iterative projection method (IPM) [28].
Then we will discuss how to update  = [ 1 ,  2 , . . .,    ], when  is fixed.We also update where Ĝ =  − ∑   =1, ̸ =     ,   is the representation matrix of  over   , and    is the representation of   over subdictionary   .Equation ( 19) can be efficiently solved to obtain every   via the algorithm like [29].

Classification Scheme.
Using the dictionary  obtained by the proposed dictionary learning scheme based on Fisher discrimination criterion to represent the test sample, both the representation residual and the representation coefficients will be discriminative, and hence we can make use of both of them to achieve more accurate classification results.
Let  = () express the reduced Gabor feature vector  () of the test sample ; then sparsely represent  on  via 1-minimization: where  is a constant, α = [α 1 , α2 , . . ., α  ], and α is the coefficient subvector associated with subdictionary   .By considering the discrimination capability of both representation residual and representation vector, we define the following metric for classification: where  is a preset weight to balance the contribution of the two terms to classification.The classification rule is defined as

Experiment Setup.
To validate the proposed method, we constructed a dataset including 6,000 vehicle images.The vehicle images are captured by a camera fixed on an overpass with 640 × 480 pixels and 256 gray scale levels.The proportion of the challenging vehicle images that are partially occluded by other vehicles or captured in a bad illumination condition is about 10% in the whole dataset.The location of each vehicle is adjusted to the center of the whole image and the size is cropped into 96 × 96 pixels by manual operations in advance.Figure 3 shows the example images of the dataset under various conditions.To facilitate the VTR, all vehicle images in the whole dataset are firstly divided into two datasets: large vehicle and small vehicle.The large vehicle dataset consists of two subdatasets: bus and truck.The small vehicle dataset consists of two subdatasets: van and sedan.The numbers of the images in every subdataset are all 1,500.
All the experiments are conducted on the computer with 3 GHz CPU and 16 Gb memory, and all program codes are compiled and run on Matlab 2014b.

Results of Global Feature Extraction.
In order to verify the advantage of the improved Canny operator, the edge detection results based on other three operators such as Sobel, Roberts, and Prewitt are compared in Figure 4.As can be seen from Figure 4, the proposed method based on the improved Canny operator in Section 2.1 can obtain a more accurate and complete edge compared to the methods based on three other operators.
In addition, we compare the global feature extraction method based on the improved Canny operator with the method based on traditional Canny operator.Comparative results are shown in Figure 5, where original gray images are in the first column, the detection results based on traditional Canny operator are in the second column, and the detection results based on the improved Canny operator are in the third column.Additionally, in order to verify the performance of the proposed global feature extraction method under various illumination, Figure 5(a) is captured in the morning in a fine day with good illumination, Figures 5(b) and 5(c) are captured at dusk in a cloudy day, and Figure 5(d) is captured in the afternoon in a fine day, but the bus is partially covered by shadow for the lighting is shielded by a building nearby.As can be seen from Figure 5, we can find that the method based on the improved Canny operator can obtain a more continuous and complete edge with respect to different kinds of vehicles compared to the method based on traditional Canny operator, even though the illumination condition was poor.

Results of Local Feature Extraction.
Based on the method proposed in Section 2.2.2, we use the Gabor wavelet kernels with five different scales and eight different orientations to extract the Gabor feature of every local patch of the detected vehicle image.Take the patch of the hood as an example, the extracted Gabor feature image by a set of Gabor wavelet kernels with five different scales and eight different orientations is shown in Figure 6.
As can be seen from Figure 6, the feature extraction method based on the Gabor wavelet kernels can extract many structural details of local patch of vehicle image from multiple scales and multiple orientations, and the extracted Gabor feature images can be regarded as local feature for the VTR.
In the paper, the resolution of every patch is defined as 96×24 pixels.After implementing the convolution operation, the dimension of augmented feature vector  will reach 92160 (40 × 96 × 24).The increased dimension will result in slow computation speed and large memory occupation, which will be adverse to the following recognition and classification.Therefore, before implementing the VTR, we need to downsample  using an appropriate sample factor .In order to select an appropriate sample factor, we experiment on the augmented Gabor feature vector  ()  Therefore, in this paper, we let  = 64, and the dimension of the augmented Gabor feature vector is reduced to 1440 (40×12×3) accordingly, which will reduce the computational complexity of VTR on the premise to assure a high recognition accuracy.

Results of Two-Stage Classification.
In order to demonstrate the performance of the proposed two-stage classification strategy, we introduce three evaluation criteria: precision, recall, and accuracy [30].Their definitions are as follows: precision = TP/(TP + FP), recall = TP/(TP + FN), and accuracy = (TP + TN)/(TP + FN + FP + TN), where TP, FP, FN, and TN are abbreviations for true positives, false positives, false negatives, and true negatives, respectively.
We randomly select 400 samples as training samples and 400 samples as test samples from four vehicle type datasets, bus, truck, van, and sedan, respectively.

Results of the First Stage of Classification.
For the first stage of classification, we experiment on the whole dataset.We randomly select 1200 samples as training samples and 400 samples as test samples.If the type of the test sample is recognized as bus or truck, then the test sample is determined as a large vehicle.Similarly, if the type of the test sample is recognized as van or sedan, then the test sample is determined as a small vehicle.Table 1 shows the experimental results where the test samples are captured under good illumination and no occlusion.Further, Table 2 gives the results under bad illumination or partial occlusion.
As can be seen from Tables 1 and 2, the first stage of classification still has high accuracy and reliability, even though the test samples are captured under bad illumination or partial occlusion.including the bus and truck images needs to be used in the following second stage of classification.Similarly, if the test sample is recognized as a small vehicle, the small vehicle dataset including the van and sedan images needs to be used.
We still randomly select 1200 samples as training samples and 400 samples as test samples from the large vehicle dataset or small vehicle dataset in the second stage of classification.Table 3 shows the experimental results where the test samples  are captured under good illumination or no occlusion.Table 4 gives the results under bad illumination or partial occlusion.
As can be seen from Tables 3 and 4, although the performance of the second stage of classification slightly degrades compared with the first stage of classification, it still has very good reliability.
To verify that the proposed method exploiting the dictionary learning scheme based on Fisher discrimination criterion is effective, after implementing the first stage of classification, we use the traditional SRC method that does not exploit the dictionary learning scheme based on Fisher discrimination criterion to implement the second stage of classification.The classification results under good illumination and no occlusion are shown in Table 5.
As can be seen from Tables 3 and 5, the proposed classification method that exploits the dictionary learning scheme based on Fisher discrimination criterion is superior to the traditional method in terms of precision, recall, and accuracy.Therefore, exploiting the dictionary learning scheme based on Fisher discrimination criterion in the second stage of classification is very effective for improving recognition performance of classifier for the VTR.
In order to demonstrate the efficacy of the two-stage classification strategy, the proposed KNNPC in Section 3.2 and the DSRC in Section 3.3 are regarded as single-stage classifiers to implement the classification task of four types of vehicles, respectively.We also randomly select 1200 samples as training samples and 400 samples as test samples from the whole dataset.The results of single-stage classification based on the KNNPC and global feature and those based on the DSRC and local feature are shown in Tables 6 and 7, respectively.It is clearly noted that the proposed two-stage classification strategy overpasses the single-stage classification strategy in terms of precision, recall, and accuracy.Further analysis finds out that the extracted global feature has an excellent ability to distinguish the large vehicles from small vehicles or to distinguish the small vehicles from large vehicles based on the KNNPC.When the four types of vehicles are mixed together, it becomes difficult for the global feature to distinguish the buses or trucks in the large vehicle dataset or distinguish the vans or sedans in the small vehicle dataset.Moreover, when the four types of vehicles are mixed together, the single-stage classification based on the DSRC and local feature needs to train more classifier parameters simultaneously using more    training samples than when two types of vehicles are mixed together for a given recognition performance.Therefore, the performance of the single-stage classification based on the DSRC and local feature will degrade compared with the proposed two-stage classification strategy.

Comparison of Results with Other Methods.
In order to compare our method with other popular methods, we test our method on the dataset used in [31].Similar to [31], the experiments on daylight images and nighttime images are performed, respectively.Before implementing the classification, we firstly divide the dataset in [31] into two categories: large vehicle dataset and small vehicle dataset, where large vehicle dataset consists of two types of vehicles, bus and truck, and small vehicle dataset consists of three types of vehicles, passenger car, minivan, and sedan.Our method averagely achieves 96.3% classification accuracy on daylight images and 89.5% on nighttime images, better than the results of previous  [32] 78.3% 73.3% Petrovic and Cootes [33] 84.3% 82.7% Peng et al. [31] 90.0% 87.6% Dong and Jia [8] 91.3% -Dong et al. [1] 96.1 89.4 Ours 96.3% 89.7% methods, as demonstrated in Table 8.Additionally, we also test our method on the BIT-Vehicle dataset provided in [1]; our method achieves 90.1% classification accuracy, yet the accuracy of the method used in [1] reaches 88.11%.The underlying reasons are as follows: the proposed Canny edge operator and Gabor wavelet kernels are able to extract discriminative global and features for VTR.The proposed two-stage classification strategy can leverage the advantages of the extracted global and local features according to their characteristics; that is, the extracted global feature that can represent the geometrical contour of a vehicle is just applied to the first stage of classification to determine whether the test sample belongs to large vehicle or small vehicle, and then the local feature that can represent the structural details of a vehicle is just applied to the second stage of classification to determine whether the sample belongs to bus or truck in the large vehicle dataset as well as van or sedan in the small vehicle dataset.The dictionary learning scheme based on Fisher discrimination criterion is able to learn a discriminative classifier for precision recognition in the second stage of classification.Extracting local feature from the four partitioned patches enables strong robustness to partial occlusion.

Conclusions
The two key steps of improving the VTR are the feature extraction and classifier design.Based on the need to recognize the vehicle type accurately and reliably, we propose a VTR method combining global and local features via twostage classification.The improved Canny edge detection algorithm is capable of extracting the continuous and complete global feature.The employed Gabor wavelet kernels with five scales and eight orientations are able to successfully extract the local feature.The proposed KNNPC is able to realize the preliminary recognition of a large vehicle or small vehicle based on the global feature.Further, the DSRC has a stronger ability in recognizing bus, truck, van, or sedan based on the local feature.As demonstrated by the experiments on the challenging dataset and a compared dataset, the proposed method can solve the VTR problem much more efficiently and outperforms existing state-of-the-art methods.
The study offers the possibility of developing more sophisticated VTR methods.First, this method can be extended to the VTR context involving more vehicle types.
Second, more effective features and corresponding feature extraction algorithms can be adopted.Third, more discriminative classifiers can be incorporated into the two-stage classification.
Feature and KNNPC.In the first stage of classification, we propose a robust classification method based on the local feature and KNNPC in the first stage of classification.This method first estimates the cumulative probabilities of the test sample on its -nearest neighbors that may belong to different classes and then selects the maximum weighted class as the classification result.The selection of the -nearest neighbors is based on an improved Hausdorff distance measure (IHDM), and the cumulative probabilities of the test sample are based on Gaussian kernel density estimation (KDE).
( E (, )) is the weight of  E (, ) belonging to the th class and  | b() E ∈  indicates that every b() E belongs to the same th class.The final classification result is determined by identify ( E (, )) = arg max  {  ( E (, ))} .(14)

4. 4 . 2 .
Results of the Second Stage of Classification.Based on the result of the first stage of classification, if the test sample is recognized as a large vehicle, the large vehicle dataset

Figure 3 :
Figure 3: Example images under various conditions.

Figure 4 :
Figure 4: Edge detection results based on improved Canny operator and other operators.

Figure 5 :
Figure 5: Global feature extraction of four types of vehicles based on traditional and improved Canny operators under various illumination.
depends on the amount of occlusion.The measure ℎ LTS (, ) is minimized by keeping the smaller  H distance values after large distance values are eliminated.3.2.2.Kernel DensityEstimation.Assume that the number of the target classes is  E , and for each class there are  from the th training sample in the sample set  E by the global feature extraction method proposed in Section 2.1, and  E = { (1) E (, ),  (2) E (, ), . . .,  E .According to (12), we can calculate the Hausdorff distance between  E (, ) and every  () E (, ), defined as ℎ LTS (),  ∈ {1, 2, . . .,  E }. Compare ℎ () LTS ; we can obtain the smallest  values of ℎ () LTS , defined as hLTS (),  ∈ {1, 2, . . ., }.The  training samples corresponding to the smallest  values will be regarded as the -nearest neighbors { b() E |  = 1, 2, . . ., ) to the test sample.Then, the KDE method [26] is used to estimate the cumulative influences on  E (, ) from its -nearest neighbors corresponding to different classes.We use Gaussian kernel function and set window width parameter  H = max ∈{1,2,...,} hLTS ()/ H in the estimation, where  H is a coefficient, to narrow (larger  H ) or expand (smaller  H ) the influences of the neighbors with different distances.Finally, we get ( = 1, 2, . . .,  E ) samples.First, we obtain the -nearest neighbors to the test sample in training set using the proposed IHDM.Suppose that  E (, ) is the point set that consists of the edge points extracted from the test sample by the global feature extraction method proposed in Section 2.1,  () E (, ) indicates the point set that consists of the edge points extracted 1 and  2 are scalar parameters.Each atom   of  is constrained to have a unit 2-norm to avoid that  has arbitrarily large 2-norm, resulting in trivial solutions of the coefficient matrix .Further, by means of the Fisher discrimination criterion, (, , ) and () are defined as (, , ) = ∑   =1 (  , ,   ) and () = tr(  () −   () + ‖‖ 2  ), where tr(⋅) denotes the trace of a matrix,   () and   () indicate the within-class scatter and between-class scatter of , respectively, is reduced to a sparse representation problem to compute  = [ 1 ,  2 , . . .,    ].We can compute   class by class.When computing   , all   ,  ̸ = , are fixed.The objective function in (17) is further simplified into min   { (  , ,   ) +  1          1 +  2   (  )} , 1 ,  2 , . . .,    ] class by class.That is, when every   is updated, all   ,  ̸ = , are fixed.The objective function in (17) is reduced to min 2 = 1,  = 1, 2, . . .,   , defined in Section 2.2.2 with five different downsampling factors, respectively:  = 16, 32, 64, 128, or 256.Experimental results show that the average accuracy rates based on the DSRC proposed in Section 3.3 are 95.8%,95.9%, 95.9%, 96.8%, 73%, and 34%, respectively, when  = 1, 16, 32, 64, 128, or 256.It is very clear that when  = 64, the DSRC has the highest accuracy rate.

Table 1 :
Results of first stage of classification under good illumination and no occlusion.

Table 2 :
Results of first stage of classification under bad illumination or partial occlusion.

Table 3 :
Results of second stage of classification under good illumination and no occlusion.

Table 4 :
Results of second stage of classification under bad illumination or partial occlusion.

Table 5 :
Results of second stage of classification without the dictionary learning scheme based on Fisher discrimination criterion.

Table 6 :
Results of single-stage classification based on the KNNPC and global feature.

Table 7 :
Results of single-stage classification based on the DSRC and local feature.

Table 8 :
between our method's other methods' results.