This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.
The functionality of automatically detecting incidents on freeways is a primary objective of advanced traffic management systems (ATMS), an integral component of the Nation’s Intelligent Transportation Systems (ITS) [
Incident detection is essentially a pattern classification problem, where the incident and nonincident traffic patterns are to be recognized or classified [
Automated incident detection (AID) systems, which employ an incident detection algorithm to detect incidents from traffic data, aim to improve the accuracy and efficiency of incident detection over a large traffic network. Early AID algorithm development focused on simple comparison methods using raw traffic data [
There are drawbacks which limit its applications; the optimal threshold and parameter of Naïve Bayes (NB) have a great effect on the generalization performance, and setting the parameters of the NB classifier is a challenging task. At present, there is no structured method to choose them. Typically, the optimal threshold and parameters have to been chosen and tuned by trial and error. Some studies have applied search techniques for this problem; however, a large amount of computation time will still be involved in such search techniques, which are themselves computationally demanding.
A natural and reasonable question is whether we can increase or at least maintain NB performance, while, at the same time, avoiding the burden of choosing the optimal threshold and tuning the parameters. Some researchers have proposed classifier ensembles to address this problem. The performance of classifier ensemble has been investigated experimentally, and it appears to consistently give better results [
The remaining part of the paper is structured as follows. Section
A dataset generally consists of feature vectors, where each feature vector is a description of an object by using a set of features. For example, take a look at the synthetic dataset as shown in Figure
The synthetic dataset.
Naïve Bayes classifier ensemble is a predictive model that we want to construct or discover from the dataset. The process of generating models from data is called learning or training, which is accomplished by a learning algorithm. In supervised learning, the goal is to predict the value of a target feature on unseen instances, and the learned model is also called a predictor. For example, if we want to predict the color of the synthetic data points, we call “yellow” and “red” labels, and the predictor should be able to predict the label of an instance for which the label information is unknown, for example, (0.7, 0.7). If the label is categorical, such as color, the task is called classification and the learner is also called classifier.
To classify a test instance
The most widely used probability combination rules [
Product rule:
Sum rule:
Min rule:
Max rule:
Majority vote rule:
The Naïve Bayesian tree (NBTree) algorithm is similar to the classical recursive partitioning schemes, except that the leaf nodes created are naïve Bayesian classifiers instead of nodes predicting a single class [
Four primary measures of performance, namely, detection rate (DR), false alarm rate (FAR), mean time to detection (MTTD), and classification rate (CR), are used to evaluate traffic incident detection algorithms. We will quote the definitions from [
DR is defined as the number of incidents correctly detected by the traffic incident detection algorithm divided by the total number of incidents known to have occurred during the observation period:
Receiver operator characteristic (ROC) curves illustrate the relationship between the DR and the FAR from 0 to 1. Often the comparison of two or more ROC curves consists of either looking at the area under the ROC curve (AUC) or focusing on a particular part of the curves and identifying which curve dominates the other in order to select the best-performing algorithm. AUC, when using normalized units, is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming “positive” ranks higher than “negative”) [
In statistics, the mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by
To describe the experiments clearly, we first present the definitions for all parameters and symbols used in experiments. Then, we describe the experiment procedures in detail.
Some parameters are adopted to make the procedures of the experiments more automatic and optimized. In addition, some symbols are used to denote specified conceptions. For clarity, we have presented the definitions of each parameter and symbol in Table
Definition of Symbols and Parameters.
Parameter/Symbol | Definition | |
---|---|---|
Symbol |
|
The whole dataset |
|
Training set | |
|
Test set | |
|
The |
|
|
The |
|
|
The |
|
|
The |
|
|
||
Parameter |
|
The total number of the training subsets |
|
The ration of | |
|
The number of samples in test set | |
|
The number of samples in training set | |
|
The total number of samples in the whole dataset | |
|
The number of samples in the training subset (each training subset has the same number of samples) | |
|
The number of incident samples in the whole dataset | |
|
The number of non-incident samples in the whole dataset |
The traffic data mentioned refers to three basic traffic flow parameters, namely, volume, speed, and occupancy. The incident is detected based on section, which means that the traffic data collected from the upstream and the downstream detection stations are usually used as model inputs in AID systems. In Figure
Construction of traffic datasets.
One instance consists of at least the following items: speed, volume, and occupancy of the upstream detector, speed, volume, and density of the downstream detector, traffic state (incident or nonincident),
where the item “traffic state” is a label. The value of the label is −1 or 1, referring to nonincident or incident, respectively, which is determined by the incident dataset. Typically, the model is fit for part of the data (the training set), and the quality of the fit is judged by how well it predicts the other part of the data (the test set). The entire dataset was divided into two parts: a training set that was used to build the model and a test set that was used to test the model’s detection ability. Where each row is composed of one observation,
The experiments were performed according to the procedures as shown in Figure
A freeway incident detection model based on Naïve Bayes classifier ensemble.
Divide the whole dataset
Perform sampling with replacement from training set
While use the training subset use the training subset use the all the
While test the performances of the test the performances of the test the performances of the
We proceeded to real world data. These data were collected by Petty et al. from the I-880 Freeway in the San Francisco Bay area, California, USA. This is the most recent and probably the most well-known freeway incident dataset collected, and these data have been used in many studies related to incident detection. Loop detector data, with and without incident, was collected from a 9.2 miles (14.8 km) segment of the I-880 Freeway between the Marina and Wipple exits, in both directions. There were 18 loop detector stations in the northbound direction and 17 stations in the southbound direction. The data collected included traffic volume, occupancy, and speed, averaged across all lanes in 30 s intervals at the same station. In summary, the training dataset has 45,518 training instances, of which 2100 are incident instances (from 22 incident cases). The testing dataset has 45,138 instances in all, including 2036 incident instances (from 23 incident cases). Thus, incident examples are very rare in this dataset, as only approximately 4.6% and 4.5% incident examples are contained in the training set and the testing set, respectively. Each instance has seven features. In addition to the measurements of speed, volume, and occupancy collected at both the upstream detector station and the downstream detector station, the last one is the class label, −1 for nonincident state and 1 for incident state.
To divide
Parameter Setting of Experiments.
Parameter Setting of Experiments for I-880 Dataset | ||||||||
Parameter |
|
|
|
|
|
|
|
|
|
||||||||
Value | 20 | 0.05 | 45138 | 45518 | 2275 | 90656 | 4136 | 86520 |
|
||||||||
Parameter Setting of Experiments for AYE Dataset | ||||||||
Parameter |
|
|
|
|
|
|
|
|
|
||||||||
Value | 20 | 0.05 | 16000 | 13500 | 675 | 29500 | 6000 | 23500 |
In our experiments, we constructed 20 individual NB classifiers, 20 NB ensemble classifiers, and 20 NBTree classifiers. We tested the performances of the five rules of each classifier on the I-880 dataset. Then, we calculated the averages and variances of the performances of the 20 individual NB classifiers, 20 NB ensemble classifiers, and 20 NBTree classifiers. The summarized results are in Table
Experimental Results of NB, Five Rules and NBTree Based as Applied to the I-880 Dataset (The performances are presented in the form average ± variance).
Algorithm | DR | FAR | MTTD | CR | AUC | Kappa | MAE | RMSE | EC |
---|---|---|---|---|---|---|---|---|---|
Naïve Bayes classifier |
|
0.0398 ± 6.32 |
|
0.9540 ± 6.43 |
0.8915 ± 6.50 |
0.59479 ± 2.63 |
0.01619 ± 3.11 |
0.17995 ± 9.47 |
0.99184 ± 8.03 |
|
|||||||||
Product rule ensemble | 0.8404 ± 2.91 |
0.0380 ± 3.03 |
1.7615 ± 1.04744 | 0.9557 ± 2.22 |
0.9012 ± 7.78 |
0.61383 ± 2.28 |
0.01602 ± 5.33 |
0.17898 ± 1.60 |
0.99192 ± 1.38 |
|
|||||||||
Sum rule ensemble |
|
0.0297 ± 5.87 |
1.6906 ± 0.89473 | 0.9636 ± 3.66 |
|
0.6932 ± 1.91 |
|
|
|
|
|||||||||
Max rule ensemble | 0.8960 ± 5.58 |
0.0297 ± 1.62 |
1.6236 ± 0.84537 | 0.9636 ± 1.63 |
0.9331 ± 1.33 |
0.69268 ± 6.81 |
0.01606 ± 8.09 |
0.17924 ± 2.52 |
0.99190 ± 2.09 |
|
|||||||||
Min Rule ensemble | 0.8962 ± 2.29 |
0.0304 ± 2.99 |
1.6193 ± 0.85452 | 0.9629 ± 3.13 |
0.9329 ± 2.53 |
0.68845 ± 1.28 |
0.01603 ± 1.06 |
0.17902 ± 3.28 |
0.99192 ± 2.73 |
|
|||||||||
MV rule ensemble | 0.8193 ± 6.25 |
0.0410 ± 2.72 |
1.7802 ± 1.15158 | 0.9527 ± 2.02 |
0.8892 ± 1.29 |
0.58639 ± 5.16 |
0.01630 ± 5.09 |
0.18052 ± 1.52 |
0.99178 ± 1.32 |
|
|||||||||
NBTree classifier | 0.8143 ± 0.00112 |
|
1.3622 ± 0.00798 |
|
0.9027 ± 2.7953 |
|
0.02626 ± 6.3487 |
0.22894 ± 1.202 |
0.98669 ± 1.67 |
Experimental results of five rules for the Naïve Bayes ensembles as applied to the I-880 dataset: (a) performance with DR; (b) performance with FAR; (c) performance with MTTD; (d) performance with CR; (e) performance with AUC; (f) performance with Kappa.
Bar chart comparison of five rules for Naïve Bayes classifier ensembles as applied to the I-880 dataset. The red bar is MAE, the green bar is RMSE, and the blue bar is EC.
The traffic data used in this study for the development of the incident detection models was produced from a simulated traffic system. A 5.8 km section of the Ayer Rajah Expressway (AYE) in Singapore was selected to simulate incident and nonincident conditions. This site was selected for incident detection study because of its diverse geometric configurations that can cover a variety of incident patterns [
The simulation system generated volume, occupancy, and speed data at upstream and downstream sites for both incident and nonincident traffic conditions. The traffic dataset consisted of 300 incident cases that had been simulated based on AYE traffic. The simulation of each incident case consisted of three parts. The first part was the nonincident period that lasted for 5 min. This was after a simulation of a 5 min warm-up time. During the warm-up time, the data contains noise. The second part was the 10 min incident period. This was followed by a 30 min postincident period. Each input pattern included traffic volume, speed, and lane occupancy accumulated at 30 s intervals, averaged across all the lanes, as well as the traffic state. The value of the traffic state label is −1 or 1, referring to nonincident or incident states, respectively.
As we used a new dataset to perform the experiments, the parameter values of the experiments needed to be updated. The updated parameter values can be seen in Table
As mentioned in Section
Experimental Results of NB, Five Rules and NBTree Based as Applied to the AYE Dataset (The performances are presented in the form average ± variance).
Algorithm | DR | FAR | MTTD | CR | AUC | Kappa | MAE | RMSE | EC |
---|---|---|---|---|---|---|---|---|---|
Naïve Bayes classifier | 0.7112 ± 0.00348 | 0.0499 ± 8.16 |
1.394 ± 0.08869 |
|
0.8663 ± 4.39 |
0.6247 ± 9.14 |
0.16769 ± 3.07 |
0.57895 ± 5.31 |
0.90854 ± 1.83 |
|
|||||||||
Product rule ensemble | 0.6457 ± 0.00127 | 0.0557 ± 2.40 |
1.8685 ± 1.24139 | 0.8732 ± 1.64 |
0.8619 ± 4.85 |
0.7479 ± 3.58 |
0.16690 ± 1.04 |
0.57775 ± 3.11 |
0.90895 ± 3.68 |
|
|||||||||
Sum rule ensemble | 0.7923 ± 5.57 |
0.0500 ± 6.73 |
1.4384 ± 0.63868 | 0.8774 ± 3.55 |
|
0.7439 ± 1.29 |
0.16739 ± 4.94 |
0.57861 ± 1.47 |
0.90866 ± 1.75 |
|
|||||||||
Max rule ensemble | 0.7869 ± 1.79 |
0.0500 ± 2.21 |
1.4571 ± 0.72272 | 0.8771 ± 1.25 |
0.8663 ± 2.02 |
|
0.16803 ± 4.96 |
0.57969 ± 1.46 |
0.90828 ± 1.77 |
|
|||||||||
Min Rule ensemble |
|
0.0498 ± 2.76 |
1.4646 ± 0.73539 | 0.8781 ± 1.10 |
|
0.6052 ± 1.86 |
|
0.57684 ± 9.83 |
0.90926 ± 1.15 |
|
|||||||||
MV rule ensemble | 0.6246 ± 1.15 |
0.0572 ± 2.29 |
1.8358 ± 1.19124 | 0.8720 ± 3.84 |
0.7837 ± 1.18 |
0.7341 ± 5.31 |
0.16687 ± 2.27 |
0.57769 ± 6.87 |
|
|
|||||||||
NBTree classifier | 0.7275 ± 3.25 |
|
|
0.9031 ± 3.52 |
0.8512 ± 1.04 |
0.7085 ± 3.26 |
0.1711 ± 6.42 |
|
0.90553 ± 2.06 |
Experimental results of five rules for Naïve Bayes ensemble as applied to the AYE dataset: (a) performance with DR; (b) performance with FAR; (c) performance with MTTD; (d) performance with CR; (e) performance with AUC; (f) performance with Kappa.
Bar chart comparison of five rules for Naïve Bayes classifier ensembles as applied to the AYE dataset. The red bar is MAE, the green bar is RMSE, and the blue bar is EC.
In this subsection, we have evaluated the performances of all three algorithms, standard NB, NB ensemble, and NBTree, using the I-880 dataset and the AYE dataset with noisy data. In Figures
From Figures
In Table
From Figures
In Tables
Therefore, we should avoid drawing noisy data into the NB ensemble. The standard Naïve Bayes and NBtree are both individual classifiers, and they only need to train one time. In contrast to standard Naïve Bayes and NBtree, NB ensemble needs to train many individual NB classifiers to construct the NB ensemble. The training time of the NB ensemble is relatively long. From Figures
The Naïve Bayes classifier ensemble is a type of ensemble classifier based on Naïve Bayes for AID. In contrast to Naïve Bayes, the NB classifier ensemble algorithm trains many individual NB classifiers to construct the classifier ensemble and then uses this classifier ensemble to detect the traffic incidents, and it avoids the burden of choosing the optimal threshold and tuning the parameters. In our research, we take the traffic incident detection problem as a binary classification problem based on the ILD data and use the NB ensemble to divide the traffic patterns into two groups: an incident traffic pattern and nonincident traffic pattern. In this paper, we have performed two groups of experiments to evaluate the performances of the three algorithms: standard Naïve Bayes, NB ensemble, and NBTree. In the first group of experiments, we used all the three algorithms as applied to the I-880 dataset without noisy data. The results indicate that the performances of the five rules of the NB ensemble are significantly better than those of standard Naïve Bayes and slightly better than those of NBTree in terms of some indicators. More importantly, the NB ensemble performance is very stable. To further test the stability of the three algorithms, we applied the three algorithms to the AYE dataset with noisy data in the second group of experiments. The experimental results indicate that the NB ensemble has the best ability to tolerate the noisy data among the three algorithms. After analyzing the experimental results, we found that if the average accuracy of the individual NB classifiers is lower, the average accuracy of the ensemble classifiers constructed by these individual classifiers will become even lower than the average accuracy of the individual NB classifiers. To obtain good results for the NB ensemble classifier, we should avoid drawing the noisy data into the ensemble. NBTree is an individual classifier that needs to train only one time, whereas the NB ensemble needs to train many individual NB classifiers to construct the NB ensemble. As a result, compared with the NBTree algorithm, the NBTree algorithm reduces the time cost. The contribution of this paper is that it presents the development of a freeway incident detection model based on the Naïve Bayes classifier ensemble algorithm. The NB ensemble not only improves the performances of traffic incidents detection but also enhances the stability of the performances with an increased number of classifiers. The advantage of NBTree is that the MTTD value is better than that of the NB ensemble algorithm. We believe that the NB ensemble algorithm and NBTree can be successfully utilized in traffic incidents detection and the other classification problems. In a future study, we will concentrate on constructing an NB ensemble and scaling up the accuracy of NBTree to detect traffic incidents.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is part of an ongoing research Supported by National High Technology Research and Development Program of China under Grant no. 2012AA112304 and Research and Innovation Project for College Graduates of Jiangsu Province no. CXZZ13_0119. Qingchao Liu would like to make a grateful acknowledgment to all the members of Traffic Engineering Lab in Southeast University, China, for their help and many useful suggestions in this study.