Learning to Detect Traffic Incidents from Data Based on Tree Augmented Naive Bayesian Classifiers

This study develops a tree augmented naive Bayesian (TAN) classifier based incident detection algorithm. Compared with the Bayesian networks based detection algorithms developed in the previous studies, this algorithm has less dependency on experts’ knowledge. The structure of TAN classifier for incident detection is learned from data. The discretization of continuous attributes is processed using an entropy-based method automatically. A simulation dataset on the section of the Ayer Rajah Expressway (AYE) in Singapore is used to demonstrate the development of proposed algorithm, including wavelet denoising, normalization, entropy-based discretization, and structure learning. The performance of TAN based algorithm is evaluated compared with the previous developed Bayesian network (BN) based and multilayer feed forward (MLF) neural networks based algorithms with the same AYE data. The experiment results show that the TAN based algorithms perform better than the BN classifiers and have a similar performance to the MLF based algorithm. However, TAN based algorithm would have wider vista of applications because the theory of TAN classifiers is much less complicated than MLF. It should be found from the experiment that the TAN classifier based algorithm has a significant superiority over the speed of model training and calibration compared with MLF.


Introduction
Traffic congestion is a challenging problem in most of the big cities all over the world.Congestion leads to increasing traffic delays, higher fuel consumption, and negative environmental effects.The cost of total delay caused by traffic congestion in rural and urban areas is estimated to be around $1 trillion per year in the United States [1].Traffic congestion can be classified into two categories: recurrent congestion generated by excess demand and unrecurrent congestion caused by incidents.Some studies have estimated that around 60% of all traffic congestion on highways is caused by incidents [2].Due to the large losses caused by unrecurrent congestion, incident manage system (IMS) has been a key component of the advanced traffic management system now.
As a core technique in IMS, automatic incident detection (AID) algorithms have also become an interesting and active research topic.A considerable amount of research has addressed this problem and several techniques have been developed over the last few decades.Depending on the methodology, an algorithm is usually classified into one of five major categories: comparative algorithms, statistical algorithms, time-series and filtering based algorithms, traffic theory based algorithms, and advanced algorithms.
The California algorithms are classic examples of the comparative algorithms, which are one of the first widely implemented incident detection algorithms developed [3].They are usually used as benchmarks for evaluating the performance of other algorithms.Examples of the statistical algorithms include the standard normal deviate (SND) algorithm [4] and the Bayesian Algorithms [5], which use standard statistical techniques to identify sudden changes in the traffic flow parameters.Time-series and filtering algorithms treat the traffic flow parameters as time-series.The deviation from the modeled time-series behavior is used for indication of incidents.The moving average (MA) algorithm [6] and the exponential smoothing based algorithms [7] are included in this category.One classic example of traffic theory based algorithms is McMaster algorithm, which is based on the catastrophe theory.McMaster algorithm determines the state of traffic based on its position in the flow-density plot and detects incidents based on the transition of the point from one state to another [8].In order to improve the detection performance, the latest trend has been the development of advanced algorithms, which are based on advanced mathematical formulation.Neural Networks based algorithms [9,10], which have an attractive performance in the lab environment, are classified into this category.
By reviewing the existing incident detection algorithms, it should be noted that most of these algorithms, which perform well in the lab environment, have not been used in practice.The biggest reason is that, it is usually difficult to transfer these algorithms from site to site and keep a well performance.To meet the universality requirements for which the advanced traffic management systems called, the Bayesian networks have been used to develop universal algorithms [11,12].According to the testing results, these algorithms demonstrate very stable performance and strong transferability.However, in the previous studies, the Bayesian networks based algorithms strongly depend on the knowledge of experts.For instance, the determination of the Bayesian network structure and the discretization of continuous attributes are both artificially predetermined in these studies.
In this study, in order to reduce the dependency on experts' knowledge, a tree augmented naive Bayesian (TAN) classifier, which is a special form of Bayesian networks, is chosen to develop an incident detection algorithm in this paper.The TAN classifier can reduce the dependency on experts' knowledge, because the structure and parameters of the proposed TAN classifier are both learned from the data, and an entropy-based method is proposed for the discretization of continuous variables completely depending on the data.The performance of this algorithm would also be evaluated using a simulation dataset.
The remaining part of the article is structured as follows: Section 2 is a simple introduction of TAN classifier.Section 3 presents the procedure of data preprocessing.Section 4 discusses the development and implementation of the TAN classifier for incident detection.In Section 5, an experiment of this algorithm with a simulation dataset is carried out.Finally, Section 6 gives some conclusions and remarks.

TAN Classifier
2.1.Bayesian Networks.Since TAN classifier is a special case of Bayesian networks, it is necessary to introduce Bayesian networks first.Bayesian networks are directed acyclic graphs that allow efficient and effective representation of the joint probability distribution over a set of random variables [20].Formally, a Bayesian network for a set of random variables U = { 1 , . . .,   } is a pair  = (, ).The first component, , is a directed acyclic graph whose vertices correspond to the random variables  1 , . . .,   , and whose edges represent direct dependencies between the variables.The graph , which is also called the structure of this Bayesian network, encodes independence assumption: each variable   is independent of its no-descendants given its parents in .Because of the independence assumption, in a Bayesian network for incident detection, the class variable INC must get links to all the attribute variables; therefore each and every piece of information about attribute variables can be made use of to update the incident probability.Therefore, the structure of this Bayesian network cannot be learned using any unrestricted learning algorithms.In the previous studies, it is manually determined according to experts knowledge.Figure 1(a) presents a possible structure of the Bayesian networks for incident detection.

Naive Bayesian (NB) Classifier. As analyzed above, in a
Bayesian network for incident detection, the class variable INC must get links to all attribute variables.Therefore, in order to determine the structure of this Bayesian network, the only remaining problem is how to add the edges between the attribute variables.The simplest way to solve this problem is to assume that every attribute ( 1 , . . .,   ) is independent from the rest of the attributes, given the value of the class variable.Thus, there is no edges between each pair of attribute variables.This form of Bayesian networks is often referred to as naive Bayesian (NB) classifier.The structure of NB classifier for incident detection can be demonstrated in Figure 1(b).

TAN Classifier.
Although the performance of NB classifiers is surprising in some practical cases, the main assumption behind them is clearly unrealistic.Accordingly, a class of network structures based on the NB classifier, which are called tree augmented naive Bayesian (TAN) classifiers, is selected to develop incident detection algorithm in this paper.In a TAN classier, as shown in Figure 1(c), the class variable INC has no parent and each attribute has the class variable and at most one other attribute as parents.This ensures that, in the learned network, the probability (INC |  1 , . . .,   ) will take all attributes into account, and the strong assumption of independence in NB classifier is relaxed.The edges between attribute variables can be learned from the training dataset.

Data Preprocessing
The raw traffic measurements collected by the detectors cannot be used to develop and implement detection algorithms without a procedure of data preprocessing.This procedure contains three parts in this study: wavelet denoising, normalization, and discretization.

Wavelet Denoising.
The purpose of wavelet denoising is to eliminate the random fluctuation from raw traffic measurements.The random fluctuation is caused by the randomness of drivers' behaviors and the measurement errors of detectors.In most of the previous studies, the raw traffic measurements are smoothed by moving average method.Although this method is very simple in theory and easy to be implemented, it will increase the mean time to detection (MTTD) because the sudden changes in the traffic flow parameters caused by incidents are also smoothed.Accordingly, the wavelet denoising is chosen instead of smoothing method in this study.
The general wavelet denoising procedure is as follows.
Step 1. Select appropriate wavelet function and apply wavelet transform to the noisy signal to produce the noisy wavelet coefficients to the level which can meet the requirements.
Step 2. Select appropriate threshold limit at each level and threshold method (hard or soft thresholding) to best remove the noises.
Step 3. Inverse wavelet transform of the thresholded wavelet coefficients to obtain a denoised signal.
In this procedure, there are three things that should be decided first for the wavelet denoising process: wavelet function, level of decomposition, and threshold method.
When applied in the detection algorithm, the raw traffic measurements obtained from the previous  − 1 time intervals to the present time interval , a sequence contains  elements ( = 16 in this study), is wavelet denoised.In the next time interval  + 1, the raw traffic measurement obtained at  − ( − 1) time interval is removed from the sequence, and the measurement obtained at interval  + 1 is added to the sequence.A new sequence which still contains  elements is made up, and then it is wavelet denoised again.

Normalization.
After the wavelet denoising process, the denoised traffic measurements should be normalized, which means converting all the measurements to a value range of [0, 1].The normalization process is mainly to enhance the transferability of algorithm.It should be noted that the same traffic measurements collected in different sites are usually in different value ranges.For instance, the average traffic speeds on a freeway will be in a range of [0, 100] km/h, while the speeds on an arterial road cannot be more than 60 km/h because of the speed limitation.Therefore, if the denoised traffic measurements are used to develop the detection algorithm directly, this algorithm cannot be applied in different roads.
Additionally, the different traffic measurements collected by detectors in the same site are also in different ranges.For instance, the occupancies are in the range of [0, 1], while the maximum value of speeds collected by detectors may be more than 100.Their various orders of magnitude would also be inconvenient to the calculation work.
The normalization process is conducted by where   is the normalized traffic measurement and  is the denoised traffic measurement.  and   are the minimum and maximum values of the denoised traffic measurements, respectively, or manually decided according to the frequency histograms of  in the training dataset.

Discretization.
All of the attribute variables in the TAN classifier should be discrete, because the value ranges of the traffic parameters are all continuous.Therefore, the normalized traffic parameters should be compared against the predefined split points to ascertain their states (e.g., volume is high, medium, or low); this process is called discretization.Instead of depending on the experiences of experts, an entropy-based method is proposed in this paper.Entropy is one of the most commonly used discretization measures.Entropy-based discretization is a supervised, top-down splitting technique.It explores class distribution information in its calculation and determination of split points (data values for partitioning an attribute range).To discretize a normalized traffic measure, , the method selects the value that minimizes the entropy of the training dataset as a split point and recursively partitions the resulting intervals to arrive at a hierarchical discretization.Let  consist of data tuples defined by a set of attributes and a class-label attribute.The class-label attribute provides the class information per tuple.The basic method for entropy-based discretization of an attribute X within the set is as follows [21].
Step 1.Since the traffic measures are normalized, the value range of  is [0, 1].From 0 to 1, values are taken at intervals of 0.01, as potential split points (denoted split point) to partition the range of A. That is, a split point for A can partition the tuples in D into two subsets satisfying the conditions  ≤   and  >  , respectively, thereby creating a binary discretization.The set of all potential split points is denoted by .
Step 2. Entropy-based discretization, as mentioned above, uses information regarding the class label of tuples.In this case, the labels are the states of traffic; therefore the tuples could be divided into two classes: incident state class (denoted by class  1 ) and incident-free class (denoted by class  2 ).Suppose we want to classify the tuples in  by partitioning on attribute  and some split point.Ideally, we would like this partitioning to result in an exact classification of the tuples.Which means that we would hope that all of the tuples of class  1 will fall into one partition, and all of the tuples of class  2 will fall into the other partition.However, this is unlikely.For example, the first partition may contain many tuples of  1 , but also some of  2 .How much more information would we still need for a perfect classification, after this partitioning?This amount is called the expected information requirement for classifying a tuple in  based on partitioning by .It is given by where  1 and  2 correspond to the tuples in  satisfying the conditions  ≤   and  >  , respectively.
|| is the number of tuples in , and so on.The entropy function for a given set is calculated based on the class distribution of the tuples in the set.For incident detection, only given 2 classes,  1 and  2 , the entropy of  1 is where   is the probability of class   in  1 , determined by diving the number of tuples of class   in  1 by | 1 |, the total number of tuples in  1 .The entropy of  2 can be computed in the same way.Therefore, when selecting a split point for attribute A, it is to find the split point in the set S that minimizes Info  ().Using this split point, the range of A would be partitioned into two intervals, corresponding to  ≤   and  >  .
Step 3. The process of determining a split point is recursively applied to each partition obtained (to the partition with a lager entropy first), until the number of intervals is greater than a threshold, max interval.
After the wavelet denoising, normalization, and discretization processes, the raw traffic measurements are converted to traffic cases that can be input to the TAN classifier.

TAN Classifier Based Detection Algorithm
4.1.Learning and Inference.According to the introductions in Section 2, the structure of TAN classifier for incident detection is not given exactly, and the structure with optimal parameter setting and maximal likelihood should be learned from data.Let  be a dataset over the variables { 1 , . . .,   , INC}.Using a slight modification of the Chow-Liu algorithm [22], a TAN of maximal likelihood can be constructed as follows.
Step 3. Build a maximal-weight spanning tree for the complete MI-weighted graph.
Step 4. Direct the resulting tree by choosing any variable as a root and setting the directions of the links to be outward from it.
Step 5. Add the node INC and a directed link from INC to each attribute node.
Step 6. Learn the parameters.
In Step 6, the parameter learning algorithm used in this paper is MLE algorithm, which is to choose a parameter estimate θ that maximizes the likelihood: where  = { 1 ,  2 , . . .,   } is the training dataset and d i is the instances of a variable contained in the TAN classifier, respectively.
The purpose of inference is to compute the post probability (INC = 1 |  1 , . . .,   ).The junction tree algorithm is used for inference in this paper.Once the TAN classifier is transformed into a junction tree, the inference involves the following steps [23].
Step 1.Each item of evidence must be incorporated into the junction tree potentials.For each item of evidence, an evidence function is multiplied onto an appropriate clique potential.
Step 2. Some clique is selected.This clique is referred to as the root of the propagation.
Step 3. Then messages are passed toward the selected root.The messages are passed through the separators of the junction tree (i.e., along the links of the tree).These messages cause the potentials of the receiving cliques and separators to be updated.This phase is known as CollectInformation.
Step 4. Now messages are passed in the opposite direction (i.e., from the root toward the leaves of the junction tree).This phase is known as DistributeInformation.
Step 5. When the calls are completed, the table that contains updated probability distribution of each node is normalized so that it sums to one.This paper will not describe more details of learning and inference algorithms.The Kevin Murphy's Bayes Net Toolbox (BNT) for MATLAB is used in this research to develop the application programs for incident detection.

Incident Report.
When the TAN classifier is implemented in IMS,  1 , . . .,  6 are input in real time.After the process of inference, the post probability (INC = 1 |  1 , . . .,   ) can be obtained.If this post probability is compared with a predefined threshold directly, a high false alarm rate (FAR) will be caused.Therefore, in this study, a smoothing method is used to reduce the FAR.
The post probability updated at time interval  is denoted by (); the final estimate of incident probability for incident report Î() can be calculated by where  ∈ [0, 1] is a coefficient that can adjust the performance of this detection algorithm.The smaller  will cause lower FAR and higher MTTD, and vice versa.After each time interval, the Î() is updated and compared with a predefined threshold.If Î() exceeds the decision threshold, an incident is alarmed.A larger threshold will cause lower FAR and higher detection rate (DR), and vice versa.

Data Description and Variable Selection.
The dataset used in this study is the AYE dataset.This dataset was produced from a traffic simulated system, and it has been used in some researches related to incident detection [10,24].A 5.8 km section of the Ayer Rajah Expressway (AYE) in Singapore was selected to simulate incident and incident-free conditions.The simulation system generated volume, occupancy, and speed data upstream and downstream.The traffic dataset consisted of 300 incident cases that had been simulated based on AYE traffic.The simulation of each incident case consisted of three parts.The first part was the incident-free period that lasted for 5 min.This was after a simulation of 5 min warmup time.The second part was the 10-min incident period.This was followed by a 30 min postincident period.The above 300 incidents were split into two partitions, training data set and testing data set.Each data set had 3000 input patterns for incident state and 10500 patterns corresponding to incidentfree state.Each input pattern included traffic volume and speed and lane occupancy accumulated at 30-s intervals, averaged across all the lanes, as well as the label of incident state.In this research, the AYE data set is used to illustrate the procedure of data preprocessing, model development, and performance evaluation of the TAN based detection algorithm.
To develop the TAN classifier for incident detection, the variables contained in the models should be decided at first.According to the contents of datasets and the previous research, eight variables are chosen as the nodes of the TAN classifier; the details are shown in Abbreviations.

Wavelet Denoising.
According to the description in Section 3.1, there are three points that should be predetermined for the wavelet denoising process: wavelet function, level of decomposition, and threshold method.In this study, using the wavelet toolbox of MATLAB, Sym8 wavelet function and Sqtwolog soft threshold method are chosen manually to denoise the raw traffic measurements.The remaining question is to decide the level of decomposition.The raw traffic measurements are wavelet denoised with the decomposition level assigned from 1 to 5, and the partial results are shown in Figure 2.
It could be found that the higher level of decomposition will have better denoising performance, but will also increase the computation time and distortions level.According to the experiment results, the optimal decomposition level is 3 in this study.Both the training and testing datasets are wavelet denoised according to the procedure described in Section 3.1.

Normalization.
The next step of data preprocess is normalization.According to the frequency histograms of the denoised traffic measures in training dataset,   and   in (2) could be decided manually and are shown in Table 1.Then both the training and testing datasets are normalized using (2).

Discretization.
Since increasing the number of variable states will lead to exponential growth of the parameter size of a TAN classifier, all of the traffic flow parameters would be divided into three states.The split points are decided and shown in Table 2 using the training dataset according to the entropy-based discretization procedure described in Section 3.3.Then both the training dataset and testing dataset are discretized with comparing the traffic parameters with the split points.

Structure Learning.
Using BNT, the structure of TAN classifier for incident detection can be learned from the preprocessed training dataset and shown in Figure 3.As an  example the parameters of node   are shown in Table 3 as an example.
where   is the length of time between the start of the incident and the time the alarm is initiated and  is the number of incident cases detected successfully.If a single incident instance is identified within the period of actual occurrence of one incident case, which often consists of many continuous incident instances, the incident case is regarded as detected.
There are several different formulas for FAR; following is the one used in this study: FAR = number of false alarm cases total number of in put instances .
The number of false alarm cases is computed through taking one instance misclassified as incident instance as one false alarm case.
Both the DR and FAR measure the effectiveness of an algorithm, and the MTTD reflects the efficiency of the algorithm.for incident report Î() can be calculated by (7) using the output of the TAN classifier.The value of  is assigned as 0.7 in this study.Figure 4 shows a typical output of the incident detection algorithm.

Experiment Result with AYE
The performance measures are calculated according to the outputs of the algorithm.The TAN based algorithms are tested with different values of the incident report threshold.The testing results are shown in Table 3 compared with neural networks based algorithm and the Bayesian network (BN) based method in previous study [11], which are trained and tested using the same data set.
The neural network models mainly focused on the applications of multilayer feed forward (MLF) neural networks for incident detection.Since the MLF was excellent in the previous researches, it is used as a benchmark for comparison in this paper.The MLF based algorithms used in this paper are developed and evaluated by Wang et al. [24] using the same AYE data, and this paper just refers to their work.In their research, 10 network classifiers are trained using the training data set containing 50.00% of incident instances, and they tested them on the testing data set.The networks fall into two different structures, one has three layers with six neurons in input layer, three neurons in hidden layer, and one neuron in output layer; another has the same number of input neurons and output neuron but has six neurons in hidden layer.The output indicates the traffic state compared to a predefined threshold, here set as zero; that is, if it is larger than 0, it indicates the occurrence of incident for this instance otherwise nonincident.The parameters for training network were set as follows: learning rate is 0.1, the maximal According to the evaluation results, the TAN based algorithms have better performance than the previous proposed BN based method.A possible reason is that the structure of TAN is learned from the data, while that of BN is based on experts' experiences.The performance of the TAN classifier based algorithm is also comparable to the MLF based algorithm.Although the TAN algorithm has a lower DR, it is superior to MLF based algorithm on FAR.The performance of these two algorithms on MTTD is comparable.However, it should be noted that MLF is much more complicated than TAN classifier in theory.MLF is a black box model (it is difficult to explain the knowledge hidden in it) while TAN classifier is just based on theory of probability and statistics, which has been widely recognized.Additionally, it is much more difficult to train MLF than TAN classifier.The training process of MLF networks needs more data and time.In this study, the training time of the TAN classifier is than 2 seconds, but the average training time of MLF networks for incident detection is 344.70 seconds.From the analysis above, compared with MLF, the TAN based incident detection algorithm would have wider vista of applications.

Conclusions
In this paper, a special form of Bayesian networks, which is called TAN classifier, is used to develop an incident detection algorithm.The Bayesian networks based detection algorithms are developed and evaluated in the previous studies and have shown the superiorities on performance and transferability.However the development of the previous Bayesian networks based algorithms strongly depends on the knowledge of experts.To reduce the dependency on experts' knowledge, the structure of TAN classifier for incident detection is learned from data.The discretization of continuous attributes is processed using an entropy-based method automatically.A simulation dataset on the section of the Ayer Rajah Expressway (AYE) in Singapore is used to demonstrate the procedures of the development of TAN classifier based detection algorithm, including wavelet denoising, normalization, discretization, and structure learning.
The performance of TAN based algorithm is evaluated compared with the MLF based algorithm and the previous developed BN based algorithm using the same AYE data.The experiment results show that the TAN based algorithms have better performances than the previous proposed BN based method.A possible reason is that the structure of TAN is learned from the data, while that of BN is based on experts' experiences.The TAN classifier based algorithm has a similar performance to the MLF based algorithm.However, TAN based algorithm would have wider vista of applications because the theory of TAN classifiers is much less complicated than MLF.It should be found from the experiment that the TAN classifier based algorithm has a significant superiority on the speed of model training and calibration.
Because of the lack of appropriate datasets, the transferability of TAN based algorithm is not evaluated in this paper.In the future, a more comprehensive and detailed evaluation of the TAN based algorithm should be carried out.In the next study, we will also improve our methodology by treating the traffic flow parameters as continuous variables implementing various prior distributions or using the dynamic TAN classifiers.

Figure 1 :
Figure 1: Typical structures of Bayesian network, NB classifier, and TAN classifier for incident detection.

Figure 2 :
Figure 2: Wavelet denoising results at different levels of decomposition.

Figure 3 :
Figure 3: The structure of the TAN classifier for incident detection learned from the training dataset.

5. 4 .
Evaluation and Analysis 5.4.1.Performance Measures.The common performance measures used to evaluate the incident detection algorithms include the detection rate (DR), the false alarm rate (FAR), and the mean time to detection (MTTD).DR and MTTD are written as DR = number of incident cases detected total number of incident cases , Data.The preprocessed instances in testing dataset are input to the TAN classifier developed in Section 4. The real time incident probability

Figure 4 :
Figure 4: A typical output of TAN classifier based incident detection algorithm.
The second component of the pair represents the set of parameters that quantifies the network.It contains a parameter    |Π   =   (  | Π   ) for each possible value   of   , and Π   of Π   , where Π   denotes the set of parents of   in .A Bayesian network  defines a unique joint probability distribution over U given by , . . .,   .INC = 1 means an incident has happened. 1 , . . .,   denote traffic flow parameters.When it is implemented in IMS,  1 , . . .,  6 are input in real time, and this Bayesian network is used to update the post probability (INC = 1 |  1 , . . .,   ).

Table 1 :
The values of   and   in (2) according to training dataset.

Table 2 :
The split points of continuous variables in TAN classifier.

Table 3 :
The parameters of   .

Table 4 :
Comparison between TAN and MLF with AYE data.