Moving Vehicle Detection and Classification Using Gaussian Mixture Model and Ensemble Deep Learning Technique

Vehicle Dataset and the MIOvision Tra ﬃ c Camera Dataset. In addition, adaptive histogram equalization and the Gaussian mixture model are implemented for enhancing the quality of collected vehicle images and to detect vehicles from the denoised images. Then, the Steerable Pyramid Transform and the Weber Local Descriptor are employed to extract the feature vectors from the detected vehicles. Finally, the extracted features are given as the input to an ensemble deep learning technique for vehicle classi ﬁ cation. In the simulation phase, the proposed ensemble deep learning technique obtained 99.13% and 99.28% of classi ﬁ cation accuracy on the MIOvision Tra ﬃ c Camera Dataset and the Beijing Institute of Technology Vehicle Dataset. The obtained results are e ﬀ ective compared to the standard existing benchmark techniques on both datasets.


Introduction
In recent times, developing an intelligent traffic surveillance system has become an emerging research topic, where it delivers an innovative tool to improve driver satisfaction, efficiency, and transportation safety [1]. Automatic vehicle clas-sification plays a crucial role in intelligent traffic surveillance systems, and it supports several applications like traffic flow analysis, electronic toll collection, and intelligent parking systems [2,3]. Due to the COVID-19 outbreak and mobility restrictions, citizens were allowed to move out of the home to procure only essential goods in groceries or pharmacies.
Intelligent traffic surveillance systems can track down any motorists entering to the worst-affected region from lowrisk areas.
Automatic vehicle classification is a challenging task while the videos are being collected from traffic surveillance cameras [4]. Captured traffic surveillance images are lowerresolution images and are subjected to several weather conditions, illumination conditions, and occlusion [5]. In addition, vehicle types generate a lot of intraclass and interclass similarities which affect vehicle classification performance [6]. In order to address the aforementioned problems, several machine learning methods and data manipulation techniques have been developed in order to deal with the imbalanced data classification [7][8][9]. Compared to other objects, vehicles have different structural characteristics, larger intraclass variations, and larger interclass distances, and these factors make vehicle detection and classification a challenging task [10] because a single classifier in the classification stage would seem impossible to detect. The existing research on various detection mechanisms has resulted in efficient identification of incidences while others have the same limitations of standard identification versions [11,12]. The motivation of this research study is to highlight the aforementioned issues and to deal with the imbalanced data, and a new technique is proposed in this research paper for vehicle type classification.
Initially, the surveillance videos or images are collected by the Beijing Institute of Technology (BIT) Vehicle Dataset and the MIOvision Traffic Camera Dataset (MIO-TCD). Additionally, the visual ability of the collected vehicle images is improved by implementing the Adaptive Histogram Equalization (AHE) method and then the Gaussian Mixture Model (GMM) which are utilized to detect vehicles from the denoised images. The GMM model provides higher detection accuracy, adaptation to image content, simplicity of implementation, and fast computation in vehicle detection. After recognizing the vehicles, the hybrid feature extraction is accomplished by using the Steerable Pyramid Transform (SPT) and the Weber Local Descriptor (WLD) to extract feature vectors from the detected images. By implementing high-level global descriptors, the semantic gap between the extracted feature vectors is limited and results in better classification, reduced training time, and overfitting issues. Finally, the ensemble deep learning technique is used to classify the vehicle types such as the 11 classes in MIO-TCD and the 6 classes in the BIT Vehicle Dataset. Lastly, the proposed ensemble deep learning technique performance is analyzed in terms of the False Discovery Rate (FDR), the False Omission Rate (FOR), recall, precision, and accuracy. The simulation results confirmed that the proposed ensemble deep learning technique is significant in vehicle type classification related to the state-of-the-art techniques. In contrast, one of the drawbacks of using the ensemble deep learning technique is the vanishing gradient problem, which occurs when a large input space is mapped into a smaller one; this problem can be highlighted in future work.

Related Work
Liu et al. [13] developed the Generative Adversarial Nets (GANs) to classify vehicles from traffic surveillance videos. The developed approach consists of three steps in vehicle classification. Initially GAN was trained on a collected traffic dataset to generate adversarial samples for the rare classes. In the second step, an ensemble-based Convolutional Neural Network (CNN) was trained on the imbalanced dataset, and then sample selection was carried out to eliminate the lower quality adversarial samples. Finally, the selected adversarial samples were utilized to refine the ensemble model on the augmented dataset. Extensive experiments showed that the developed GAN approach achieved effective performance in vehicle classification on MIO-TCD by means of the Cohen kappa score, mean recall, precision, and mean precision. However, degradation issues will occur in the developed GAN approach, when the deeper networks are about to converge. Fu

Wireless Communications and Mobile Computing
The performance of the developed technique was evaluated on a real-time dataset which contains real scene images captured at the crossroads. As a future enhancement, a novel technique is needed to improve the ability to detect a vehicle which is occluded due to different illumination conditions, angles, and scales of the images. Zhuo et al. [17] developed a CNN model for vehicle classification which includes two important steps such as fine tuning and pretraining. In the pretraining step, GoogLeNet was applied on the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) dataset in order to get the initial model with connection weight. In the fine tuning step, the obtained initial model was fine-tuned on the vehicle dataset to achieve final classification. In this literature study, the collected highway surveillance videos include six vehicle categories like van, minibus, truck, bus, car, and motorcycle. In the experimental phase, the performance analysis was carried out on the vehicle dataset by means of accuracy. However, the developed CNN model is computationally expensive and has a major problem of "overfitting." Murugan and Vijaykumar [18] developed a new framework for vehicle type classification that includes six main phases such as data preprocessing, detection of the vehicles, vehicle tracking, structural matching, extraction of the features, and vehicle classification. After collecting the traffic surveillance videos, data preprocessing was accomplished by using noise removal and color conversion. Further, the Otsu thresholding algorithm and background subtraction were used to detect the vehicles. Then, vehicle tracking was accomplished using the Kalman filter in order to track the moving vehicles. Additionally, the log Gabor filter and the Harrish corner detector were used to extract the feature vectors, and then the obtained features were fed to the Artificial Neural Fuzzy Inference System (ANFIS) for classification of the vehicles. Extensive experiments showed that the developed framework achieved significant performance in vehicle classification in light of error rate and accuracy. The developed framework increases the dimensionality issue that accounts for the model complexity. Dong et al. [19] implemented a new semisupervised CNN architecture for vehicle type classification. In the developed architecture, a sparse Laplacian filter was applied to extract the rich and discriminative information of the vehicles. In the output layer, a softmax classifier was trained by multitask learning for vehicle type classification. In this literature study, the features learned by the semisupervised CNN architecture were discriminative to work significantly in the complex scenes. Extensive experiments were evaluated on the BIT Vehicle Dataset and a public dataset in order to analyze the efficiency of the developed architecture in light of classification accuracy. The semisupervised CNN architecture includes several layers, so the training process consumes more time. Hedeya et al. [20] introduced a new densely connected single-split super learner and applied variants for vehicle type classification on the BIT Vehicle Dataset and MIO-TCD. The developed model was simple, and it does not require any logic reasoning and hand-crafted features to achieve better vehicle type classification performance. In the complex datasets, the developed model introduces the vanishing gradient problem that is a major concern in this literature study. Soon et al. [21] implemented a new semisupervised model for vehicle type classification on the basis of Principal Component Analysis Convolutional Network (PCN). In the developed model, convolutional filters were utilized to extract the hierarchical and discriminative features. The simulation result showed that the developed model obtained better performance in real-time applications, due to its robustness against noise contaminations, illumination conditions, rotation, and translation. The developed PCN model contains a greater number of training parameters that lead to an overfitting problem.
Awang et al. [22] developed the Sparse-Filtered CNN with Layer Skipping (SF-CNNLS) approach for vehicle type classification. In this literature study, three channels of the SF-CNNLS approach were applied to extract discriminant and rich vehicle features. Additionally, the global and local features of the vehicles were extracted from the three channels of an image based on their color, brightness, and shape. In the Experimental Results and Discussion, the performance of the developed SF-CNNLS approach was validated on a benchmark dataset. Finally, the softmax regression classifier was used to classify the vehicle types like truck, minivan, bus, passenger, taxi, car, and SUV. The developed softmax regression classifier includes higher-level layers; however, by embedding lower-resolution vehicle images, there may be a loss of vehicle type information. Nasaruddin et al. [23] developed an attention-based approach and a deep CNN technique for lightweight moving vehicle classification. In this literature, the developed model performance was validated on a real-time dataset by means of specificity, precision, and f -score. However, the developed model performance was limited in such circumstances as baseline, camera jitter classes, and bad weather. The methods undertaken, datasets, advantage of using the developed methods in vehicle type classification, and disadvantage of the methods are clearly given for each literature paper. In order to address the above stated issues, a new ensemble deep learning technique is proposed in this research paper to improve vehicle type classification.
This paper is organized as follows. Methodology introduces two vehicle datasets and their parameters, as well as

Methodology
In a recent scenario, vehicle type classification is the emerging research area in intelligent traffic systems, due to its wide range of applications that includes intelligent parking systems and traffic flow statistics [24]. Many approaches have been developed using vehicle type classification, which are commonly based on cameras, magnetic induction, and optic fibres [25]. The image-based approaches received great attention in the computer vision community with the extensive use of traffic surveillance cameras. The flow diagram of the ensemble deep learning technique is given in Figure 1 3.1. Image Collection. In this research study, the proposed ensemble deep learning technique performance is tested on the BIT Vehicle Dataset and MIO-TCD. The BIT Vehicle Dataset is comprised of 9850 vehicle images with pixel sizes of 1920 × 1080 and 1600 × 1200, which have been captured using two different cameras at different places and time. The BIT Vehicle Dataset consists of six vehicle types, namely, sedan, microbus, SUV, minivan, bus, and truck, and there are 5922, 883, 1392, 476, 558, and 822 images for each corresponding vehicle type [26]. The captured images are varied in terms of view points, surface color of the vehicles, scales, position of the vehicles, and illumination conditions. Due to the sizes of the vehicles and capturing delay, the top and bottom parts of the vehicles are not included in the images. The location of every vehicle is preannotated in the BIT Vehicle Dataset, because some images include one or two vehicles in the same image. The sample images of the BIT Vehicle Dataset are given in Figure 2. The BIT Vehicle Dataset link is as follows:https://www.programmersought.com/ article/7654351045/.
In addition, the MIO-TCD classification dataset is comprised of 648,959 vehicle images, and it includes eleven vehicle types: bicycle, articulated truck, motorcycle, nonmotorized vehicle, bus, car, pedestrian, work van, pickup truck, single-unit truck, and background [27]. The data statistics of the MIO-TCD classification dataset is stated in Table 1. Every annotated image in the BIT Vehicle Dataset and MIO-TCD is stored in a structured format. Sample images of the MIO-TCD classification dataset are given in Figure 3. The MIO-TCD classification dataset link is as follows: https://github.com/hakimamarouche/MIO-TCDclassification.  [28]. Hence, the probability of a grey level occurrence is computed using

Image Preprocessing and
where L is indicated as the number of image gray levels, which ranges between 0 and 255; n is denoted as total image pixels; and p x ðiÞ is stated as the histogram value of the image pixel which is normalized between ½0, 1. Further, the cumulative distribution function (CDF) is computed for p x using Then, a transformation form y = ðxÞ is developed to generate a new image y with the flat histogram values. The transformed vehicle images have a linear CDF which is mathematically stated in where K and T are represented as constant variables that range between ½0, 1, and the k variable is in the range of ½0, L. In the AHE technique, a simple transformation is applied to map the pixel values back into their original image, which is mathematically determined in After image preprocessing, GMM is applied to detect vehicles from the preprocessed images, y′. In the field of vehicle type classification, GMM is used for detecting and recognizing moving objects [29]. GMM is a statistical model, which describes spatial distribution and the properties of the data in the parameter space. GMM includes a parametric probability density function, which is comprised of numerous Gaussian component functions for detecting vehicles   9 Wireless Communications and Mobile Computing from the images [30], that is mathematically defined in equation (6). Sample preprocessed and vehicle-detected images are graphically represented in Figure 4: where R i ðy′ | μ i , C i Þ is denoted as bivariate normal distribution with mean vector μ i , π i is denoted as the ith prior probability of Gaussian distribution, where the data sample produces, and C i is indicated as a covariance matrix.

Feature Extraction and Vehicle Classification.
After vehicle detection, SPT and WLD are combined to extract feature vectors from the detected images, which decreases the overfitting risks, speeds up the training process, and enhances the data visualization ability. SPT is a linear multiorientation and multiscale image decomposition method, and it is developed to overcome the concerns of orthogonal separable wavelet decomposition [31]. At first, the SPT decomposition method categorizes the detected images into several orientations, and then scales the images based on the derivate operators in different directions with variable sizes, even though the bandwidth orientation of the subbands are equal to2π/o , where o is stated as the number of orientations. The resultant subbands of the SPT method are rotation invariant and translation invariant [32].
In the SPT method, the detected images are decomposed into high-and low-frequency components using H0 and L0 filters. Additionally, the lower-frequency components are decomposed into two oriented band-pass components, and the low-frequency components are decomposed by using the oriented band-pass filters B0 and B1 and the low-pass filter L1. The more the number of orientations (increasing the derivative degree), the greater the number of pyramid levels produced and the finer is the orientation and scale tuning, which means a more robust representation of the images. In the SPT method, orientation of the filters should satisfy the following conditions: Next, every subband is convolved with the texture descriptor WLD to extract the active features from the images. WLD is a robust local texture feature descriptor, which is inspired from Weber's law. WLD is comprised of two components such as image orientation and differential excitation to extract texture features from the vehicledetected image. Hence, the differential excitation component is used for reflecting the changes of current pixels [33][34][35], which is computed by utilizing where ξðx a Þ is represented as the differential excitation of the current pixel x a , G ratio ðx a Þ is stated as the ratio of the difference in current pixel intensity, x i ði = 0, 1, 2, 3 ⋯ m − 1Þ is represented as ith neighboring pixel of x a , and m is stated as the number of neighbors. Further, the gradient orientation component of the current pixel x a is calculated using where v 11 s and v 10 s are the outputs of two filters, filter 11 and filter 10 , which are used to compute the differences between current and neighborhood image pixels, and θ is in the range of ½−π/2, ðπ/2Þ. Next, the extracted active feature vectors F are fed to the ensemble deep learning technique for vehicle classification.
Additionally, an ensemble deep learning technique is proposed for vehicle type classification on traffic surveillance videos. The extracted features F are fed to the ensemble deep As represented in Figure 5, the ensemble deep learning technique includes ResNet-152, ResNet-101, and ResNet-50. The proposed ensemble deep learning technique consists of three key phases: CNN techniques with good initial parameters, fine tuning of network parameters, and averaging models.
The residual networks (ResNets) are easy to optimize with limited training error, and it also gains higher classification accuracy from large datasets like the BIT Vehicle Dataset and MIO-TCD. The training error of ResNet-152, ResNet-101, and ResNet-50 on MIO-TCD is indicated in Figure 6. By increasing the number of epochs, the error percentage gradually decreases in the ResNet-152, ResNet-101, and ResNet-50 techniques. Pseudocode of the ensemble deep learning technique is given below.

Experimental Results and Discussion
In this research, the proposed ensemble deep learning technique performance is simulated using MATLAB 2019a software with the following system requirements: operating system-Windows 10 (64 bit); processor-Intel core i9; hard disk-3 TB; and RAM-16 GB. In this research, the ensemble deep learning technique performance is validated by compar-ing with a few benchmark techniques such as the GAN-based deep ensemble technique [13], the tiny YOLO with SVM [15], the semisupervised CNN model [19], PCN [21], and the three channels of SF-CNNLS (TC-SF-CNNLS) approach [22]. The primary goal of this research study is to classify the vehicle types from the BIT Vehicle Dataset and MIO-TCD. The proposed ensemble deep learning technique performance is validated using 10-crossfold validation. Let FP be indicated as false positive, FN be denoted as false negative, TP be stated as true positive, and TN be represented as true negative. Five performance measures are used to analyze the performance of the proposed ensemble deep learning technique such as accuracy, precision, recall, FDR, and FOR [34]. The mathematical expressions of accuracy, precision,   Figure 7. Similarly, in Table 3, the proposed ensemble deep learning technique performance is validated in terms of FDR and FOR on the BIT Vehicle Dataset. By inspecting Table 3, the combination of the ensemble deep learning technique with hybrid feature extraction achieved a minimum FDR of 3.92% and an FOR of 1.90% which are effective compared to other combinations in vehicle type classification. In the BIT Vehicle Dataset, 7,880 vehicle images are utilized for training, and 1,970 vehicle images are utilized for testing. The graphical comparison of the proposed ensemble deep learning technique on the BIT Vehicle Dataset in terms of FDR and FOR is represented in Figure 8. In addition to this, the running time of the proposed ensemble deep learning technique on the BIT Vehicle Dataset is 1.6 seconds per frame.   Figure 9.
In Table 5, the proposed ensemble deep learning technique achieved a minimum FDR value of 0.44 and an FOR value of 0.32 compared to other combinations on MIO-TCD. In this study, the ensemble deep learning technique effectively maximizes the percentage of correct predictions that reduces the misclassification in dominant and minority vehicle categories. Graphical comparison of the ensemble deep learning technique on MIO-TCD by means of FDR and FOR is stated in Figure 10. Similarly, the running time of the proposed ensemble deep learning technique on MIO-TCD is 1.44 seconds per frame.

Comparative Analysis.
The comparative analysis between the proposed and existing techniques are given in Table 6 [22] developed the TC-SF-CNNLS approach for vehicle type classification. In the experimental phase, the developed approach performance was validated on the BIT Vehicle Dataset in terms of recall, precision, and accuracy. The developed TC-SF-CNNLS approach achieved 93.8% accuracy by classifying the vehicle types like truck, minivan, bus, passenger, taxi, car, and SUV. 4.4. Discussion. As previously discussed, feature extraction and classification are the integral parts of vehicle type classification. In this research study, hybrid (SPT + WLD) descriptors are used to extract active feature vectors from the vehicle images that speed up the training process, reduce overfitting risk, and improve the data visualization ability. Hence, the effect of hybrid feature extraction in vehicle type classification is given in Tables 2, 3, 4, and 5. Additionally, a new ensemble deep learning technique is proposed in this research paper for learning the original dataset in order to classify unknown data. In most of the existing research works, an individual classifier causes bias in terms of a fixed set of parameters, where such bias is reduced by developing an ensemble classifier. In contrast, the performance of the ensemble classifier completely depends on the accuracy of the constituent classifiers, which has stronger generalization ability than the individual classifiers.

Conclusion
In this article, an ensemble deep learning technique is proposed for vehicle type classification which was primarily used for traffic surveillance systems. Nowadays, video surveillance has been utilised for additional reasons across the world during the COVID-19 pandemic. Our application uses a deep learning approach that consists of two major phases in vehicle type classification such as feature extraction and classification. In this research, hybrid (SPT + WLD) feature descriptors are applied to extract active feature vectors that reduce training time, improve classification accuracy, and diminish overfitting problems in the ensemble deep learning technique. In this study, the ensemble deep learning technique classifies 11 classes in MIO-TCD and 6 classes in the BIT Vehicle Dataset. In Experimental Results and Discussion, the ensemble deep learning technique achieved better performance in vehicle type classification compared to other classification techniques in terms of precision, recall, accuracy, FDR, and FOR. Compared to the existing benchmark techniques like the GAN-based deep ensemble technique, the Tiny YOLO with SVM, the semisupervised CNN model, the TC-SF-CNNLS, and the PCN with a softmax classifier, the proposed technique showed a maximum of 11.17% improvement in vehicle type classification by means of classification accuracy. In future work, a clustering-based segmentation algorithm is included in the proposed technique for improving vehicle type detection and classification. In addition to this, three-dimensional modelling, vehicle tracking, and occlusion handling are given emphasis for an effective intelligent transportation system.

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.