Comparative Analysis of Performance Measures for Network Screening : A Case Study of Brazilian Urban Areas

The overall effectiveness of the roadway safety management process relies on a robust method for identifying and ranking sites with major potential for safety improvements. In Brazil, guidelines for hotspot identification are usually based only on crash frequency and Crash Rate as safety performance measures. This study presents a comparative analysis of safety performance measures, considering its limitations of applicability in a sample of signalized intersections from Fortaleza city, Brazil. The performance of each measure to rank the sample intersection was obtained through the rank difference between each safety performance measure and the Excess ExpectedAverage Crash Frequencywith EBAdjustment (EEB). In addition, it has taken a temporal analysis based on the consistency of safety performance measures during subsequent time periods.The results have suggested a reasonable matching between the most comprehensive safety performance measure (EEB) and very simple safety performance measures such as crash frequency and Crash Rate. It is recommended to investigate the consistency of the results for longer observation period as well as for a different jurisdiction in Brazil.


Introduction
Every year the lives of almost 1.24 million people are abbreviated due to traffic accidents.These events cause substantial economic losses for the victims and their families and to the nations as a whole [1].In Brazil, in the year 2010, approximately 41 thousand fatal victims in road crashes were recorded [2].
Establishing effective safety improvement programs frequently relies on a good network screening process.The complete sequence of activities from identifying sites with promise, to investigating contributing factors, to selecting countermeasures, to developing and comparing safety projects, encompasses what is known as the roadway safety management process [3].
Selecting a location with potential for the reduction of accidents is the first step of the roadway safety management process (RSMP).The errors in the identification of critical points may produce false negatives ("genuinely" dangerous sites designated as "safe") and false positives (relatively safe locations that are identified as hazardous).These failures result in the inefficient use of resources for safety improvements and may reduce the global effectiveness of the RSMP.Therefore, the correct identification of critical points is essential to the successful implementation of any plan for road safety [4][5][6].
Several methods of network screening are presented in the literature.These methods are usually based on safety performance measures ranging from observed crash frequency to the most elaborated ones obtained from statistical modeling or composite indexes [3,4,[7][8][9].These methods differ among themselves regarding the data requirements and modeling skills needed for their application and the limitations in considering the rare random nature of crashes.
The main methods of network screening applied to the Brazilian scenario present in some guidelines have not incorporated in a proper manner the stochastic nature of crashes [10,11].It is recognized that data limitations are one of the highest impediments for applying advanced network screening safety performance measures, especially in developing countries jurisdictions.Furthermore, exploring the differences in the network screening process using different safety performance measures applied to a developing country jurisdiction will definitely contribute to increasing the scope of both the RSMP methodology as well as to improving our knowledge regarding the usefulness of main safety performance measures.
The main objective of this paper is to provide a comparative analysis between the safety performance measures proposed by HSM applied to urban signalized intersections taking into account the suitability of these safety performance measures to the Brazilian urban road environment as well as the efforts for the acquisition of information and modeling required to implement them.The main findings of this work will serve to support improvements on accident data systems as well as investments to improve practitioners overall modeling skills.

The Network Screening Process
The global effectiveness of the RSMP relies on a robust method for identifying and ranking sites with major potential for safety improvements.The Highway Safety Manual (HSM) provides decision makers and engineers with the information and tools to improve roadway safety performance.Network screening is the first activity performed in the cyclical roadway safety management process proposed by HSM.This activity presents five major steps: (i) establishing focus, (ii) identifying the network and establish reference populations, (iii) selecting safety performance measures (SPM), (iv) selecting screening method, and (v) screening and evaluating results.The Highway Safety Manual (HSM) proposes 13 SPM for hotspot identification which requires different levels of completeness and accuracy of accident data systems as well as different modeling skills from safety staff.Therefore, it is common that jurisdictions in the infant stage of the RSMP would apply less informative SPM.
In Brazil, methods for identifying hotspots are presented in two road safety manuals from national agencies [10,11] consisting basically of the following steps: (i) data collection, (ii) application of an SPM, and (iii) list with critical sites.Table 1 shows the SPM proposed by HSM, the data needed for its application, and which ones are included in the Brazilian manuals.Among the necessary information are crash data by location and date (1), traffic volume (2), crash data by severity and by location (3), crash costs by crash severity (4), crash data by type and location (5), crash costs by crash type (6), geometric and operational characteristics of the road (7), and calibrated safety performance functions (SPFs) and overdispersion parameters (8).
To implement the methods proposed by the HSM, a major challenge for state and local agencies is the collection of necessary roadway information along thousands of miles of highways.Collecting roadway asset inventory data often incurs significant but unknown costs [12].Applying more complex SPM can yield better estimates to the expected crash frequency; however, more data and more specialized transportation professionals are needed for the application.For example, in the estimation of an SPM that requires SPF, a considerably large amount of information regarding road crashes, roadway characteristics, and traffic conditions are needed, and this certainly brings a new layer of difficulty for developing countries.
In addition to the methodology proposed by the HSM, there are methods which propose another methodology, such as a probabilistic method [13] and composite safety performance index [5].
The probabilistic method steps are as follows: (i) establishing reference populations, (ii) generating a binary regression model, and (iii) elaboration of a critical locations list.For the generation of a binary regression model, Couto and Ferreira [13] proposed the simulation of a database based on a crash sample from the city of Porto, Portugal.The construction of simulated database enables a larger sample compared to the real database, that is, a more significant sample for calibration and validation of binary regression model.The binary regression model has as the response variable 1 to critical locations and 0 to safe locations.It is important to notice that the random generation of the so-called "independent variables" can yield combinations of geometric and operational attributes which are not representative of any intersection on the network.
Coll et al. [5] propose a method to combine different SPM into a single index called composite safety performance index (CSPI).The steps for index construction are as follows: (i) the selection of the road safety indicators to be aggregated, (ii) the pairwise comparisons of indicators, and (iii) the development of the CSPI.A comparison made by Coll et al. suggests that the new aggregation method is very similar to the results of the qualifying lists, in temporal terms, when compared to a more basic technique of aggregation.The study also pointed out that choosing adequate indicators as well as their respective weights in order to produce the composite index is still open to discussion.

Safety Performance Measures Comparison Methods
Different efforts aiming at comparing the use of different SPM is readily available.According to Cheng and Washington [9], the comparison criteria based on false positives and false negatives are not sufficient to explore variations in the classification of critical sites.The tests developed by Cheng and Washington are based on the premise that road sections are in the same or similar underlying operational states and that their expected safety performance remains virtually unaltered over the two periods.The tests are site consistency test (T1), the consistency test (T2), the total rank differences test (T3), and the Poisson mean differences test.More details on these tests can be found in the methodology section.
The basic premise of the tests proposed by Cheng and Washington [9] is that a good method is the one that can identify critical locations over subsequent periods of time.In their application crash frequency simple rank was applied to identify critical points and since this SPM is subjected to RTM bias the results could be misleading.
Montella [4] developed a unique index combining the tests proposed by Cheng and Washington [9], making the evaluation of a certain method more practical by obtaining the total score test.The total score test gives an effectiveness measure relatively estimated from the composition of the absolute efficacies of each test proposed by Cheng and Washington [9], being weighted by the maximum and minimum values of the tests.Ferreira and Martins [6] compared the binary model proposed by Couto and Ferreira [13] with other methods for identification of critical points applying the temporal consistency tests discussed above.For a total sample of intersections (without reference population) it was obtained that the binary model has better performance, while, for a sample of intersections with light signals, the EEB method was more efficient.

Methodological Procedure
The methodological approach to reach the entire objective can be divided into four stages presented in Figure 1.The implementation and evaluation of the proposed method will be performed through a case study for signalized intersections of Fortaleza city, Brazil.

Selection of Reference Population.
For comparison between the estimated SPM, it is necessary to define a reference population, because sites with very atypical or specific characteristics can have an estimate of the expected frequency of accidents which differs from others solely by its characteristics.The selected years to implement the SPM should be those where road sections are in the same or similar underlying operational states (similar traffic volumes, geometric designs, weather fluctuations, etc.).
The following criteria will be used for the selection of the reference population: traffic light implementation date, availability of information on the vehicular flow, and number and geometric configuration of the intersection (skew angle).Figure 2 shows an example of intersection removed from the sample due to a nonrepresentative configuration.requires a consolidation of at least two sources of information: one related to crash frequency and severity and the other one related to traffic flow.Additionally, a few SPM also need information from road inventory as well as crash type and costs.One of the main challenges to estimating SPM that are Journal of Advanced Transportation capable of handling crash overdispersion and the regression to the mean phenomenon is the development of SPF.The raw accident data from the years 2009, 2010, and 2011 were collected from the municipal accident data system (SIAT-FOR).The SIAT-FOR is a georeferenced database that compiles the traffic accidents such as date, time, location (street name and number/reference), crash type, vehicles involved, and the severity of the accident (without casualties, with injured victims, and with fatalities) as well as information on victims and gender, age, and type (driver, passenger, pedestrian, cyclist, etc.) [14].
The annual average daily traffic (AADT) was estimated using the monthly average daily traffic (October/2010) provided by the CTAFOR (Traffic Control Center) database whose traffic signal control is done by the SCOOT system (Split Cycle Offset Optimization Technique).
The safety performance functions used for estimating model based SPM was calibrated in the work of Cunto et al. [14] as follows: where Y is expected number of accidents in 2009 (signalized intersections); AADT is average annual daily traffic in 2009; and lanes = number of lanes.
It should be noted that the above SPF was calibrated for 2009 data.In this application, it was assumed there was no variation among the calibration coefficients for the SPF.The use of SPF brings a new variable (number of lanes) that was not available in the databases used.The number of lanes for each intersection was obtained by using aerial photographs from Google Earth ©.

SPM Estimation.
The SPM to be investigated in this study were selected based on two criteria: data availability and the possibility of producing comparative ranks.It is also important to compare SPM that correspond to the same crash type as well as severity level.The present analysis used SPM that estimate values in terms of total accidents and for all types of collision.Another criterion for comparing the ranks is that SPM can rank all sites from the highest to the lowest value.
Among the 13 SPM proposed by HSM seven will be assessed as follows: Average Crash Frequency (ACF), Crash Rate (CR), Equivalent Property Damage Only Average Crash Frequency (EPDO), Level of Service of Safety (LOSS), Excess Predicted Average Crash Frequency Using Safety Performance Functions (SPFs), Expected Average Crash Frequency with EB Adjustments (EB), and Excess Expected Average Crash Frequency with EB Adjustment (EEB).

Comparative Analysis of Safety Performance Measures.
The comparative analysis among SPM will be based on two approaches.The first is based on the differences between sites ranking and the second based on the temporal consistency of the ranking.These approaches seek to compare a given SPM (subject SPM) to a reference SPM that has the best capacity to estimate the places with the greatest potential for the reduction of accidents, incorporating in the best way the stochastic nature of crashes.In this exercise, the Excess Expected Average Crash Frequency with EB Adjustment (EEB) was chosen as the reference SPM.

Comparative Analysis Based on Rank
Positions.For this analysis three types of tests were proposed, as follows: the root mean square error (RMSE) between EEB rank and the tested SPM and the difference in the ranking position (DRP) and a Number of Sites Identified as Critical (NSIC).
Initially, subject SPM will be evaluated according to the average difference between its ranking list and the EEB ranks using the root mean square error [15], as follows: where RMSE is root means square error;   is rank position of reference SPM;   is rank position of other performance measures that will be compared.
The analysis is divided into blocks of 10 positions so that the first position corresponds to the location obtained as more critical by the reference measurement and so on.The smaller the RMSE, the better the subject SPM performance as compared to the reference SPM.
Next, the difference in the ranking position (DRP) of the 30 most critical sites is visually assessed.DRP can be expressed as where   is the rank position which is determined by the subject SPM and   is the rank position determined by the reference SPM.The DRP analysis will be done by plotting a graph where the performance measures are compared with the EEB.The lower the DRP, the smaller the difference between the positions on the rank of performance measures for a given location.
Finally, the Number of Sites Identified as Critical (NSIC) will be estimated.This evaluation will identify which sites are critical based on the number of performance measures that identify them as critical.The higher the NSIC, the greater the chances of a particular location to be truly critical.

Temporal Consistency Tests.
The temporal consistency tests were adapted from a two-period temporal comparison proposed by Montella [4] which unites in a single index three temporal indicators proposed by Cheng and Washington [9].The first period of time ( 1 ) corresponds to the oldest year of the set of years available for study.The second time period ( 2 ) can be obtained by averaging the years after the year of the first period of time.
For the first test (T1) the classification list of the subject SPM for  1 is made, identifying the critical sites.For each of these sites, the reference measure for  2 is calculated.The result of the first test corresponds to the sum of the values obtained by the reference measure in  2 .Applying the test to the available set of performance measures, the best measure will be the one that gets the highest sum.
For the second test (T2), the classificatory list is initially made in  2 and the "truly critical sites" are those appointed by the reference SPM.Then, the classification list of the subject SPM under analysis is calculated in  1 .The result of this test is the number of "truly critical sites" identified in  1 by the measure under study.The best measure of performance is the one that gets the highest value.
For the third test (T3), the classificatory list in  1 is calculated at first for the measure to be evaluated.Next, the list is calculated in  2 for the reference SPM.The value of this test corresponds to the sum of the difference in module in the positions of the classificatory lists obtained by the measure to be evaluated and the reference measure.The best performance measure is the one that gets the lowest value.The formulation of the index proposed by Montella [4] is as follows: In that,  corresponds to the performance measure to be evaluated and max and min correspond to the highest and lowest value obtained in a given test, respectively.

Results and Discussions
Table 2 presents a descriptive analysis of the variables used to estimate the performance measures.
The proposed method was applied to a sample of signalized intersections using the year of 2009, 2010, and 2011.The reference population was defined as four-leg signalized intersection with traffic light implementation before July 2008 and orthogonal approaches (skew angle between 75 and 90 degrees) more than 50-meter spacing between intersections.
Applying the above criteria for defining the reference population a total of 106 signalized intersections were selected.

Comparative Analysis Based on the Rank Positions.
Table 3 presents the rank RMSE for the six performance measures assuming the EEB as the benchmark.The rank RMSE was estimated for blocks of 10 intersections each until intersection ranked 40th.
It can be noticed that the lowest RMSE of the top ten (Block 1), as well as the other blocks, is obtained by the SPF estimates.The second lowest RMSE in most blocks is the LOSS, being also the third lowest one in the first block by a small difference in relation to the CR.It is worth noticing that, despite its formulation, the EB performance measure presented the highest RMSEs in all blocks.This is probably due to the influence of traffic exposure when comparing different sites.
The second test is also based on differences between the positions of ranks (DRP) which is presented in Figures 3 and  4. On the -axis is the difference between the ranks and in -axis are the thirty locations ranked as more critical by EEB, for example, the number 1 of the x-axis corresponds to the first most critical site according to the EEB, and the value 2 is the second most critical location in the EEB classification list.
It can be observed from Figures 3 and 4 that there is a general trend of rank underestimation for the vast majority of performance measures when compared to the EEB; that is, there is a clear trend of producing false negative.This is especially significant for EB, EPDO, and ACF.Another important finding, at least for this sample, is that the relative position (to EEB) tends to be more relevant after the 10th ranked intersection.
As with the analysis of Table 3, the EB again presents considerable difference in comparison to EEB.This trend could be because EB estimates the crash frequency weighting  together the observed crashes and those predicted by SPF in such a way that the predictive variables included in the SPF such as AADT and number of lanes usually prioritize sites with higher exposure in exchange for sites with relatively lower traffic flow.In Figure 3, the valleys for sites 5 and 7, for example, have the top positions for EEB because despite their characteristics estimating a low frequency of accidents in the year under analysis, its frequency by EB was relatively high, and thus the potential for improvement is greater than the others.
It is noticed that the curves ACF, EPDO, and EB go together to the peaks and valleys, and this occurs because high values of ACF resulted in high values of EPDO and EB.The EPDO uses weighting factors for severity type for the estimated frequency of accidents; the more the accidents, the greater the values obtained by EPDO.Excessive values of a specific crash type could cause the EPDO to be different than the ACF in terms of rank, and this occurred of the local 12 and 17.As for the EB to the sample in question, most of the places with high frequency of observed accidents also presented a high number of lanes and volumes, and thus greater ACF values resulted in higher EB values.
The CR has little variation in top positions, with a more marked variation only in the 13th position.The CR could prioritize sites with low volume, as can be seen at position 19.For the Level of Service of Safety (LOSS), the degree of deviation from the predicted average crash frequency is divided into four classes.Thus various locations will belong to the same position in rank, which can be perceived in the local 12 to 26.
The SPF had the best performance in relation to EEB because both of these measures calculate the excess.The difference between the observed and predicted crash frequencies is the excess predicted crash frequency using SPFs.
The third test for rank comparison is presented in Table 4.This table shows the first 20 places ranked by EEB and checks marks for sites that would have been included in the list of the 20 most dangerous sites using other measures.The results indicated that, regardless of rank differences, the list of the 20 most dangerous sites would match for EB and EPDO.For CR 16 out of 20 sites would match.The worst performance was observed for EB with only 10 out of 20 sites.Due to its formulation, the EEB performance indicator can result in negative values for specific sites.This means that the expected crash frequency for the site is lower than the expected frequency for sites with similar traits; however, this does not preclude the site from being critical.In order to simplify the interpretation of the results the lowest EEB estimated value was added to the sample to avoid negative numbers.It is worth noticing that this linear transformation does not alter the ranking.

Temporal Consistency
After the calculation of the three tests, value of the total score tests (TST) was obtained for the three-period analysis (Table 5).As discussed previously, the TST has the advantage of being a single indicator, facilitating the analysis, so that the best performance in three tests (T1, T2, and T3) will reach the 100 score.According to the TST, SPF and EEB performance measures have had the best results which were expected, as these measures deal with the crash frequency excess in its formulation.LOSS performance measure also provided high levels of TST; however, it is noteworthy that, as in the previous analysis, time consistency tests are also based on the difference between the ranking lists and LOSS is not a continuous indicator (four levels) which makes comparisons not straightforward.
When comparing ACF and CR it can be observed that CR performed consistently better than ACF.It appears that the AADT present in CR formulation as a rate did not decrease its prediction of sites with promises.This may be due to the possible linear behavior between traffic flow and crashes within the range of traffic flow from the sample.The ACF despite being subjected to random fluctuations showed a better performance than the EB for the sample under study.The performance reasons obtained by EPDO and the EB are similar to the comparative analysis based on the qualifying list.

Concluding Remarks
The decision on which method to use for the identification of sites with potential for safety improvements is a major factor in terms of effectiveness of road safety management process.The Highway Safety Manual provides a set of 13 performance measures that can be applied either isolated or in conjunction to the network screening process.HSM SPM require different data acquisition and modeling efforts from the safety analyst which may not be available especially in developing countries jurisdictions.This paper focused on providing a quantitative comparison of site ranking among major SPM suggested by the HSM when applied to a set of signalized intersections from an urban area in Brazil.The reference population (106 signalized four-leg intersections) was selected according to crash data availability as well as traffic flow and geometric attributes provided by the municipal department of traffic (AMC).In order to evaluate the performance measures proposed by HSM in the Brazilian context, seven SPM were compared using the EEB as a benchmark.Sites were ranked according to each of the SPM and then compared to the rank provided by the EEB method using the root mean square error.The individual stability of each SPM as compared to the EEB was graphically evaluated using a comparative rank plot showing the rank difference between EEB and the specific SPM for the reference population.A final test was made to check the consistency of the 20 most dangerous intersections according to the EEB throughout the other SPM.
The SPF presented the best performance followed by the LOSS and CR.The ranking of the LOSS is based on a standard deviation that needs a calibrated SPF.In addition, many sites remain in the same position in rank because there are only four Levels of Service of Safety, which may impair the prioritization of critical locations.The CR performed well throughout the analysis proposed by the methodological process, but, for the Brazilian reality, the application of this SPM may prove to be more difficult because it requires traffic flow data.
The measures that had the worst performance were the EB and EPDO.The EB despite being a robust performance measure showed a poor performance when compared with the EEB.The EPDO may overemphasize locations with a low frequency of severe crashes depending on weighting factors used.
Furthermore, for the sample used in this exercise, the results obtained from the three evaluations suggested that simple measures such as ACF and CR were not outperformed by more comprehensive measures such as EB and EPDO.This finding should be seen with caution since it is well known in the literature; the ACF and CR do not consider appropriately the rare and stochastic nature of crashes.It just appears that aspects such as the RTM bias were not apparent in the data due to the sampling procedure (none of these sites were selected to have traffic signal or other interventions implemented due to a high crash record from a previous year).In order to enhance the vision about SPM and the network screening process applied to the Brazilian road environment, it is recommended to apply the proposed methodology to other jurisdictions, with applicable adjustments and with a longer observation period.

4. 2 .Figure 1 :
Figure 1: Flowchart for comparative analysis of safety performance measures for hotspot identification.

Figure 2 :
Figure 2: Example atypical intersection removed from the sample.

Figure 4 :
Figure 4: Difference between EEB and ACF and EB and EPDO ranks.
Tests.The temporal consistency analysis was performed for three-time combinations as follows: (1) 2009 and 2010, (2) 2010 and 2011, and (3) 2009 and 2011.The truly critical sites were defined as the 30 locations in the second period which have the highest expected frequency of accidents defined by the EB.The total score test was estimated for 30 critical locations according to the three individual tests.The applied reference performance indicator was the Excess Expected Average Crash Frequency with EB Adjustment (EEB).

Table 1 :
HSM and SPM used in Brazilian safety manuals.

Table 2 :
Descriptive statistics of variables.

Table 3 :
Root means square error of performance measures.
Figure 3: Difference between EEB and CR and LOSS and SPF ranks.

Table 4 :
Critical sites for EEB performance measure.

Table 5 :
Total score tests results for three-time periods.