Application of the Empirical Bayes Method with the Finite Mixture Model for Identifying Accident-Prone Spots

,


Introduction
For the purpose of prioritizing safety improvements on roadway network, identifying sites with consistently elevated accident risk, often referred to as hotspots or black spots, is of critical importance.To address this need, a number of analytical methods for hotspot identification (HSID) have been developed over the last several decades, with the overarching objective of optimizing the allocation of limited funding.An inaccurate HSID method will result in inefficient allocation of safety treatment resources, with potentially serious costs in terms of overall safety performance of the network.The need for accurate methods to identify and prioritize accidentprone locations is underscored by the U.S. 2012 Federal Moving Ahead for Progress in the 21st Century Act (MAP-21), which emphasizes data-driven crash risk analysis and safety treatment prioritization.Further, the performance reporting requirements outlined in MAP-21 provide an additional layer of incentive for public agencies to maximize the impact of safety spending by selecting and treating sites with high improvement potential.
A number of papers in past years have focused on the accident frequency (AF) or accident rate (AR) based HSID methods, which rely on observed accident counts as the primary measure of accident risk.Because sites are ranked and identified based on observed accident data only, there is no mechanism for identifying sites with elevated risk (due to some combination of geometric and traffic characteristics) but few accidents.Further, these methods cannot distinguish between actual high risk locations and those with higher occurrence of accidents due to random fluctuations.The empirical Bayes (EB) HSID method addresses these issues by combining two clues, the historical crash record of the entity and the expected number of crashes obtained from a safety performance function (SPF) for similar entities.This approach is less sensitive to random fluctuations in accident frequency and in theory can identify truly high risk locations with greater accuracy.Building on the EB approach, additional methods have been developed based on estimated accident reduction potential (ARP).Such methods attempt to quantify the difference between the actual accident count at the location of interest (as estimated using EB method) and the expected accident count for similar locations, under the supposition that this difference represents the potential for improvement.
Unfortunately, the definition of "similar" sites is a somewhat open question, and the accuracy of the EB method depends largely on the selection of the reference population.When estimating the SPF, the studied crash data are often collected from different geographic locations to ensure the adequacy of sample size for valid statistical estimation.Since crash data observed from different geographic locations may exhibit different characteristics, the aggregation of these crash data may result in heterogeneity.For the aggregated crash data, it is reasonable to assume that the sites with different combinations of characteristics (i.e., geometric design features) can constitute distinct subpopulations [1].Under this assumption, applying the EB method to the entire crash data may become inappropriate because the second clue of the EB method requires the homogeneity of the crash data.Therefore, as proposed by some previous studies [2][3][4][5][6][7][8], it is reasonable to assume that the crashes on highway entities (i.e., road segments or intersections) are generated from a certain number of hidden subpopulations.Since entities are heterogeneous across but homogeneous within the subpopulations, the EB method can be applied to the crash data in each subpopulation.
The primary objectives of this research are (1) to propose three different classification-based EB methods to identify accident-prone sites and (2) to compare the proposed methods with the commonly used HSID approaches (i.e., AF, AR, EB, and ARP methods).To accomplish the objective of this study, the finite mixture of NB regression models with fixed or varying weight parameter is considered to classify the crash data into different subgroups.The effectiveness of the proposed HSID approaches is examined using the crash data collected at 4-lane undivided rural highways in Texas and performance is evaluated using criteria proposed by Cheng and Washington [9].

Background
2.1.Hotspot Identification Methods.AF based HSID methods have been in use for many years.Such approaches typically rank locations or segments along a highway by observed accident count over a specified time interval and define hotspots as those exceeding some critical value [10].Road segments or intersections are ranked by accident count among similar locations (such as along a relatively homogeneous section of highway), to insure that the identified hotspots represent specific opportunities for remediation instead of some inherent characteristic of a particular roadway class or driver population.One criticism often raised with regard to the AF method is that this approach lacks the ability to differentiate between actual hotspots and locations with increased accident frequency attributable to the randomness of traffic accidents [9,10].
It is readily apparent that, all else being equal, a segment with higher traffic volume can be expected to have a higher accident count, and so hotspots identified using AF methods tend to overrepresent high volume locations that may or may not be amenable to remediation efforts [11].In response, AR methods have been developed which rely on accident count per unit traffic volume for HSID, typically in units of accidents per million vehicle miles traveled.Similar to the AF methodology, sites are ranked by accident rate and those exceeding a critical value are identified as hotspots.Implicit in this approach is the assumption that accident count and exposure are linearly related, which is often not the case.In addition, by normalizing accident count by entering traffic volume, locations with very low traffic volume are sometimes over represented [11,12].
The EB method for traffic accident HSID was introduced by Abbess et al. [13] to address issues with existing methodologies, most notably regression-to-the-mean (RTM) bias and low precision due to limited accident history.It has since been refined and widely used in a range of safety performance modeling applications [14][15][16].In the EB crash modeling procedure, the expected number of crashes at a location is estimated by combining two pieces of information: (1) the accident count at the location of interest and (2) the expected accident count at locations determined to be similar based on traffic and roadway characteristics [17].It is assumed that the actual accident count for the location of interest is available, and the expected accident count for similar locations is generally estimated from the SPF.The SPF describing accident counts as a function of traffic volume, lane width, and so forth is typically fitted using the negative binomial (NB) regression model.The observed accident count for a given roadway segment is combined with the expected value estimate as shown in where μ is the EB estimate of the expected number of crashes per year for site ; μ is the estimated number of crashes per year by the SPF for given site  (estimated using a NB model);   = 1/(1+μ  ) is the weight factor estimated as a function of μ and dispersion parameter ; and   is the observed number of crashes per year at site .
Another measure often used in HSID is ARP.It was originally suggested that ARP be estimated as the difference between the observed accident count at the site of interest and the expected count as estimated from a set of reference sites.More recently, it has been proposed that the observed accident count at the site of interest be replaced by the EBestimated accident count.This approach can account for random fluctuations in accident frequency and so gives a better estimate of the true safety of the location of interest.Using the EB-estimated accident count, the ARP is calculated as shown in where ARP  = ARP for site .
Persaud et al. [12] suggested that a better estimate of the true ARP can be derived using a full predictor set in the EBestimated accident count and a subset of available regressors (i.e., those not describing a correctable site-specific geometric feature) in the expected accident count model.This way, the estimated ARP is a measure of the difference between the EBestimated "true" safety and the expected safety of what could be considered a base scenario.

Negative Binomial Model.
In highway safety, the dispersion parameter of NB models refines the estimates of the predicted mean when the EB method is used.So far, the NB distribution is the most frequently used model by transportation safety analysts to generate SPFs [18].The NB model has the following structure: the number of crashes   during some time period is assumed to be Poisson distributed, which is defined by where  is the mean response of the observation.
The NB distribution can be viewed as a mixture of Poisson distributions where the Poisson rate is gamma distributed.For the complete derivation of the NB model, the reader is referred to Hilbe [19].The probability density function (PDF) of the NB is defined as follows: where  is the mean response of the observation and  is the dispersion parameter.
Compared to the Poisson distribution, the NB distribution can allow for overdispersion.

Finite Mixture of NB Regression Models.
This study adopts a -component finite mixture of NB regression models (termed as the FMNB- model) to classify heterogeneous crash data.For the FMNB- model, it is assumed that the marginal distribution of   follows a mixture of NB distributions, as shown in the following: where   is the weight of component  (weight parameter), with   > 0, and ∑  =1   = 1;  is the number of components;   = exp(x    ), the mean rate of component ; x  is a vector of covariates;   is a vector of the regression coefficients for component ; Θ = {( 1 , . . .,   ), ( 1 , . . .,   ), w} = {( 1 , . . .,   ), w} for  = 1, 2, . . ., ; and   are the vectors of parameters for the component .
The term   is assumed to be fixed for the FMNB- models.However, instead of estimating a fixed weight parameter, a generalized FMNB- model (GFMNB- model) can be derived by modeling the varying weight   as a function of covariates, shown in (8).The GFMNB- model allows each entity to have a different weight that is dependent on the sites' attributes (i.e., covariates).Previous studies [1] have shown that the GFMNB- model can provide more reasonable classification results than the FMNB- model.Note that the GFMNB- model has the same PDF shown in ( 5), but the weight factors are calculated using where   is the estimated weight of component  at segment ;   = ( 0 ,  1 ,  2 , . . .,   )  are the estimated coefficients for component ,  being the number of coefficients; and x  is a vector of covariates.

Classification-Based EB Methods.
Since the aggregated crash data may contain heterogeneity, the FMNB- and GFMNB- models are proposed to classify the crash data into different subpopulations.Besides the mixture model, a simple mean-based grouping method is also considered and compared with the FMNB- and GFMNB- models.After separating the aggregated crash data into different subgroups, the EB estimates are calculated based on the crash data in each individual subpopulation.The three grouping methods and the procedure for prioritizing the hotspots are described as follows.
The first classification method assumes the crash data are generated from two subpopulations (i.e., one subpopulation contains accident-prone sites and the other consists of lowrisk sites).To separate the sites into two groups, the mean of the number of crashes across the entire dataset is calculated.The sites with the observed number of crashes greater than the mean are labeled as the accident-prone group, and the sites with the observed number of crashes smaller than the mean are labeled as low-risk group.The mean-based classification method for hotspot identification consists of three steps.First, separate the entire crash data into two subgroups using the crash mean as the threshold value.Second, the NB regression model is estimated using the crash data in accident-prone group and the corresponding EB estimates are calculated; likewise, the NB regression model is estimated using the crash data in the low-risk group and the EB estimates are obtained.Third, the two sets of EB estimates obtained from step two are aggregated and ranked; then the hotspots are identified based on the aggregated EB estimates.Hereinafter, we denote this HSID approach as the meanbased EB method.
The second and third classification methods assume that the aggregated crash data contain heterogeneity.Heterogeneity implies that crash data are generated from different subpopulations (i.e., crash data in the same subpopulation share common characteristics, while crash data across the subpopulations may exhibit different characteristics).Given the advantages of finite mixture models in describing the heterogeneity in crash data, FMNB- and GFMNB- models are used to classify the crash data into  components based on the site characteristics.The FMNB-based or GFMNBbased classification method for hotspot identification consists of four steps.First, fit the FMNB- or GFMNB- model to entire crash data, and select the number of components using the Bayesian information criterion (BIC).Second, after determining the number of components, separate the entire crash data into  subgroups using the FMNB- or GFMNB- model.Third, the NB regression model is estimated using the crash data in each subgroup and the corresponding EB estimates are obtained.Fourth, the  sets of EB estimates obtained from step three are aggregated and ranked; then the hotspots are identified based on the aggregated EB estimates.Hereinafter, the second and third HSID approaches are denoted as the FMNB-based EB method and the GFMNBbased EB method, respectively.

Hotspot Identification Method Evaluation Criteria
With the objective of prioritizing safety treatments, the performance of a HSID method must be described in terms of both its ability to identify truly high risk sites and its ability to accurately rank sites by accident risk.In combination, the three tests proposed by Cheng and Washington [9] address both of these performance considerations.

Site Consistency Test.
Cheng and Washington [9] introduced the Site Consistency Test (SCT) as a measure of a HSID method's consistency over subsequent time periods.It is based on the idea that actual high risk sites will experience consistently elevated accident frequencies, which means that a desirable HSID method should identify as hotspots sites which can be expected to have poor safety performance in subsequent time periods assuming no safety treatments or other changes have been applied.The SCT considers the sites identified as hotspots by method  during time period  and compares the methods based on the sum of accidents at high risk sites during future time period  + 1.The optimal HSID method as determined by the SCT is the one with the greatest number of accidents occurring on the sites identified as high risk by that method during time period  + 1.Out of  sites, a threshold is defined for each method such that  ×  are designated as high risk by each method.For example, with  = 200 total sites and  = 0.05, 10 sites will be selected by each method under comparison.With sites ranked 1, 2, . . .,  ×  in order of increasing accident risk, the test statistic for each method is defined by Cheng and Washington [9] as shown below in (9).The best method according to the SCT, then, is that which produces the highest  sc : where  is the total number of sites under analysis;  is the accident count for site ;  is the threshold for high risk sites, defined as the fraction of all sites  that are designated high risk;  is the HSID method which is being compared; and  is the time period of observation.

Method Consistency
Test.Similar to the SCT, the Method Consistency Test (MCT) is based on the notion that actual high risk sites will consistently experience poor safety performance, assuming that operating conditions are similar and that no safety treatments have been applied.However, the MCT estimates the performance of a HSID method by the extent to which the same sites are identified as hotspots over two consecutive time periods.Given  ×  hotspots identified by each method , the best performing method is that which identifies the greatest number of hotspots that are consistent between time period  and time period  + 1.From Cheng and Washington [9], the performance of method  is computed as shown in 3.3.Total Rank Differences Test.Similar to the MCT, the Total rank Differences Test (TRDT) is a measure of a HSID method's consistency across multiple time periods, again assuming that none of the sites have undergone safety treatments.However, the TRDT explicitly takes into account the safety performance ranking assigned by each method and estimates performance based on the ranking consistency between subsequent time periods.While the MCT would assign a high score to a method which consistently identifies the same sites in two time periods as being in the top  × 100% in terms of accident risk, the TRDT considers the relative rankings of all sites identified in time period .The performance of each method is calculated as the sum of differences between the rank assigned to all × high risk sites in time period  and the rank assigned to the same × sites in time period  + 1.Note that the sites identified in time period  are compared by rank in the two time periods whether or not they are identified as high risk in time period  + 1.The performance measure for the TRDT is computed as described by Cheng and Washington 2008 [9], shown in  where R( , ) is the rank of site  from method  for time period .

Data Description
The crash dataset used for comparing different HSID methods was collected at 4-lane undivided rural segments in Texas as a part of NCHRP 17-29 research project.This dataset records crash data collected on 1,499 undivided rural segments over a five-year period from 1997 to 2001.In order to compare different HSID methods using the three criteria, the five-year crash data were divided into two time periods, Period 1 (years 1997 and 1998) and Period 2 (years 1999, 2000, and 2001).Table 1 provides the summary statistics for individual road segments in the Texas data for time periods 1 and 2.

Modeling Results
. This section describes the modeling results for the NB, FMNB-, and GFMNB- models.For the NB model, a mean function of the form shown in the following is adopted: where   is the estimated number of crashes at segment  over the study period;   is the segment length in miles for segment ;   is the traffic flow (average daily traffic over the study period) traveling on segment ; LW  is the lane width in feet for segment ; SW  is the total shoulder width in feet for segment ; CD  is curve density (curves per mile) for segment ; and  0 ,  1 ,  2 ,  3 , and  4 are estimated coefficients.
The NB model is applied to the Texas data, and the parameters were estimated separately for Periods 1 and 2. The results of the fitted NB models are shown in Table 2, along with the NB dispersion parameter () estimated for each time period.The estimated coefficients are reasonable and consistent between the two time periods.This fitted NB model is used in the EB and ARP HSID methods only.
For the FMNB- and GFMNB- models, the mean functional form for each component is adopted as follows: where  , are the estimated numbers of crashes at segment  for component  and  ,0 ,  ,1 ,  ,2 ,  ,3 , and  ,4 are the estimated coefficients for component .
For the GFMNB- model, the weight parameter is modeled using all available explainable variables: where   is the estimated weight of component  at segment  and   = ( 0 ,  1 ,  2 , . . .,   )  are the estimated coefficients for component ,  being the number of coefficients.To evaluate the performance of classification-based EB methods, the crash data in Periods 1 and 2 are separated into different subgroups using the mean-based classification method and FMNB- and GFMNB- models.For each period, the NB regression model is estimated using the crash data in each subgroup and the corresponding EB estimates are calculated.For each period, the hotspots are identified based on the EB estimates obtained from each subgroup.
To determine the number of components for FMNB- and GFMNB- models, an approach suggested by Park et al. [20] is applied to the crash data in Periods 1 and 2. Specifically, this approach fits a series of models with an increasing number of components and selects the most plausible model using model choice criteria.As discussed by Eluru et al. [21], compared to the Akaike information criterion (AIC), the BIC imposes a higher penalty on overfitting with excess parameters.Thus, the BIC is preferred for selecting the optimal number of components.For the crash data in Periods 1 and 2, we applied the FMNB- and GFMNB- models with an increasing number of components  = 2, 3, 4. As shown in Table 3,  = 2 is preferred for both models, which provides the smallest BIC value.Overall, based on reported BIC values, we selected the optimal number of components  = 2 and use the FMNB-2 and GFMNB-2 models to group the crash data.Compared to the standard NB model, the GFMNB-2 model has a significantly smaller BIC value.The goodnessof-fit statistics suggest that the crash data may contain two subpopulations with different characteristics, rather than a single population.
The parameter estimation results for Periods 1 and 2 are provided in Table 4. Table 4 shows that some models include some insignificant covariates.Note that, for the FMNB-2 model in Period 1, the estimated coefficient for variable curve density in component 1 is counterintuitive and it may indicate that the FMNB-2 model provides an unreasonable group classification.Due to the inappropriate grouping, the estimated coefficient is counterintuitive.

Grouping Results.
The FMNB-2 and GFMNB-2 models were used to classify the crash data observed in Period 1 (years 1997 and 1998).Based on the modeling results of FMNB-2 and GFMNB-2 models, the 1,499 road segments were classified into two groups by assigning each site to the component with the highest posterior probability.The posterior probability is used to calculate the probability that observation   is from component  [1].In the EM algorithm, at iteration  + 1, the posterior probability ε(+1)  that observation   is from component , given   and Θ() , is defined as [22] ε(+1) where   is the indicator variable and ŵ() is the prior probability that observation   is from component , given Θ() , which is estimated from iteration .
The grouping results from mean-based classification method and FMNB-2 and GFMNB-2 models are provided in Table 5 for Periods 1 and 2. Note that the means and standard deviations of the variables are calculated for each group.For Periods 1 and 2, the two components generated from meanbased classification method show a significant difference in the mean value of number of crashes; on the other hand, the difference in the mean values of other variables is not very noticeable.For Periods 1 and 2, the two components in GFMNB-2 model demonstrate a remarkable difference in the mean values of segment length and curve density.Note that the difference in the mean values of variables is not significant in FMNB-2 model.For each period (i.e., Period 1 or 2), the NB regression model is estimated using the crash data in each subgroup and the corresponding EB estimates are obtained.For mean-based EB method, FMNB-based EB, and GFMNBbased EB methods, the hotspots are identified based on the EB estimates and the three criteria are calculated.

Testing Results
. This section describes the HSID method performance evaluation and comparison results, which was conducted using the tests described in Section 3.For each test, seven HSID methods are compared over the two time periods identified in Table 1.The following high risk cutoff points are used to compare the methods:  = {0.99,0.90, 0.95}.The  value in this case describes the fraction of the total number of points that will be identified as high risk.For example, the selected dataset contains 1,499 highway segments, which will result in approximately 150 high risk sites being identified as hotspots given  = 0.90.
As described previously, the SCT is a measure of a HSID method's ability to identify truly high risk sites, which it does by comparing the safety performance of sites identified as hotspots during an out of sample test observation period.Table 6 shows the accident count during time period 2, for hotspots identified by each method during time period 1.It is clear from these results that the EB with GFMNB-based classification method performs best for all high risk cutoff points .The worst performing subpopulation EB method is FMNB-based classification in all cases.
The MCT is a measure of a HSID method's consistency over two subsequent time periods.That is, the best performing method will identify the largest number of sites    that are consistent between two observation periods.The results in Table 7 show that the EB with GFMNB-based classification method performs best for all cutoff values (except for 0.90, when the EB method is slightly better).The least consistent subpopulation EB method is again FMNBbased classification.
The TRDT compares the accident risk rankings over two subsequent observation periods, with the best method producing the smallest sum of rank differences between time periods 1 and 2. Table 8 shows the results of the TRDT in which, unlike the SCT and MCT, a lower value indicates better performance.As with the SCT and MCT, the GFMNBbased classification method performs best overall with the lowest sum rank differences for all values of .
To summarize the results of the three HSID performance tests, the EB with GFMNB-based classification method performed the best overall in all three tests.The most substantial relative performance advantage offered by the GFMNB-based method is in the TRDT, more so at higher values of .The regular EB and mean-based classification EB methods performed somewhat comparably in all tests, and both outperformed the AF, AR, ARP, and FMNB-based classification methods.Generally, EB methods (i.e., EB, mean-based EB, FMNB-based EB, and GFMNB-based EB methods) perform better than other methods (i.e., AF, AR, and ARP), which is consistent with previous studies [9,[23][24][25].

Discussion
The HSID results in Section 5.3 suggest that the EB with GFMNB-based classification method provides the most accurate ranking in identifying crash-prone locations for the Texas dataset.The possible explanation for this is that the EB method is based on two clues, the historical crash record of the entity and the expected number of crashes obtained from a safety performance function for similar entities.If the aggregated crash data contain heterogeneity (i.e., the crash data are collected from different regions with different characteristics), the requirement for the second clue is not satisfied.As a result, although the EB method increases the precision of estimation and corrects for the RTM bias, the advantage of EB method may be restricted to some extent when analyzing the heterogeneous crash data.Therefore, the GFMNB- model is adopted to separate the aggregated crash data into a certain number of homogeneous subgroups and the EB method is applied to the sites in each subpopulation.Since the entities in the same subgroup share common characteristics, the proposed EB with GFMNB-based classification method is capable of analyzing the heterogeneous crash data.
There are several other important points worth mentioning.First, the results in this study support the findings in some previous studies [1,26] that the modeling of the weight parameter is necessary when using the finite mixture of NB regression models to analyze the crash data.For example, as shown in Section 5.3, the EB with FMNB-based classification method even provides worse HSID results than the simple EB with mean-based classification method.Second, since some crash datasets may contain a small number of observations, one important issue associated with EB with GFMNB-based classification method is that this approach may suffer from the small sample bias problem.As discussed by Lord [27], data characterized by small sample size can result in biased estimated coefficients for the NB regression models.If only a small number of observations (e.g., 100 sites or less) are assigned to one subgroup, the estimated coefficients for this subgroup may be unreliable and erroneous inferences may be drawn from the EB method.Thus, when applying the EB with GFMNB-based classification method for HSID, transportation safety analysts are suggested to examine the size of the data sample for each subpopulation to ensure reliable parameter estimation and sound statistical inferences.

Summary and Conclusions
This study proposed three different classification model based EB methods to identify hotspots, that is, mean-based classification, FMNB, and GFMNB methods.The new classification-based EB methods were evaluated against four conventional HSID methods (i.e., AF, AR, EB, and ARP) using the new criteria proposed by Cheng and Washington [9].The important findings can be summarized as follows: first, for the considered Texas crash dataset, the EB with GFMNB-based classification method yields better results in identifying hotspots than the standard EB and other methods.This implies that the HSID accuracy can be possibly improved by properly classifying roadway segments based on the heterogeneity in crash data.Second, caution should be taken when classifying roadway segments.Inappropriate classification of roadway segments can result in worse results.And finally, the EB methods generally perform better than other methods, which is consistent with previous studies.For future works, accident datasets collected at other locations should be used to further examine the performances of the GFMNB-based EB method.

Table 1 :
Summary statistics of characteristics for individual road segments in the Texas data for Periods 1 and 2.

Table 2 :
Modeling results of NB models for Periods 1 and 2.
SE: standard error.
* Not significant at 5% significance level; SE: standard error.

Table 5 :
Summary statistics of each component for Periods 1 and 2.

Table 6 :
Accumulated results of Site Consistency Test of various methods.

Table 7 :
Accumulated results of Method Consistency Test of various methods.
Number in bold indicates the best result under each cutoff level.

Table 8 :
Accumulated results of Total Rank Differences Test of various methods.
Number in bold indicates the best result under each cutoff level.