Predicting Work Zone Collision Probabilities via Clustering: Application in Optimal Deployment of Highway Response Teams

This paper proposes a clustering approach to predict the probability of a collision occurring in the proximity of planned road maintenance operations (i.e., work zones). The proposed method is applied to over 54,000 short-term work zones in the state of Maryland and demonstrates an ability to predict work zone collision probabilities. One of the key applications of this work is using the predicted probabilities at the operational level to help allocate highway response teams. To this end, a two-stage stochastic program is used to locate response vehicles on the Maryland highway network in order to minimize expected response times.


Introduction
Work zone collisions accounted for approximately 1.2 to 1.7 percent of all 2006-2013 crashes in the United States, amounting to 67,523 incidents in 2013 [1]. These collisions resulted in numerous injuries (e.g., over 47,000 in 2013), as well as fatalities that occurred in about 1 percent of the cases. Mohan and Gautam [2] performed a thorough analysis of work zone accidents in the US between 1995 and 1997 and estimated the direct costs resulting from worker and passenger injuries to be approximately $6.2 billion, which composes only part of the overall cost. Although this value may be somewhat lower now given that annual work zone collisions have mirrored the gradual downward trend of overall collisions [1], there are still enormous safety and financial benefits that can be achieved by avoiding or lessening the severity of such accidents. Here we consider two ways in which transportation agencies may mitigate the impact of work zone collisions: (i) Optimally allocating response teams. Transportation agencies could allocate response vehicles in accordance with anticipated risks at work zones and thereby reduce the time it takes to reach the scene of an accident. This would improve their ability to assist injured motorists, clear blocked traffic lanes, and help prevent additional secondary accidents.
(ii) Adjusting work zone characteristics. Transportation agencies could identify risky work zone projects in the planning phase and adjust work zone characteristics in order to improve their safety and operational performance.
Both of these applications depend on the ability to quantify how work zone characteristics affect collision occurrence risk in work zones, which is precisely the focus of this paper. We briefly describe how this fits into the broader context of work zone research and focus in particular on two types of approaches which may be employed to model this relationship: statistical modeling and machine learning. Existing work zone research tends to be oriented towards mobility or safety applications. The mobilityoriented research generally focuses on the effect of work 2 Journal of Advanced Transportation zones on traffic capacity, queuing delays, and other performance metrics [3][4][5][6][7][8], as well as optimizing controllable work zone parameters [9][10][11][12][13][14]. Research focused on safety commonly describes how work zones affect collision frequency [15][16][17][18], severity [19][20][21][22][23], or both [24][25][26][27][28]. In particular, this paper examines how work zone characteristics (e.g., duration, time of day, and number of lanes) impact the likelihood of work zone collisions occurring. Here we review the two prevailing methodologies used to quantify this relationship, considering each approach separately.
(i) Statistical Modeling. The majority of literature in this area uses regression techniques to model the relationship between explanatory variables describing work zones and the frequency with which incidents occur, employing traditional maximum likelihood and Bayesian methods to estimate the parameters. However, least squares regression is not ideal for representing the relationship between work zone characteristics and crash frequency, because it assumes normally distributed errors and the dependent variable (i.e., number of crashes) is not continuous [29].
While isolated examples of linear regression models exist in the literature (e.g., [30]), researchers have found that discrete models with Poisson-distributed responses are better suited for modeling crash frequency. Accordingly, the Poisson and related models (e.g., Negative Binomial (NB), Poisson-lognormal, and zero-inflated Poisson and NB) are more prevalent in the literature [29]. The Poisson model's primary deficiency is that it has poor estimation performance when the variance is greater than the mean (i.e., overdispersion), which the NB model addresses by specifying a gamma distribution for the error term [29]. There are numerous applications of the NB model [31][32][33][34][35][36], focusing on different aspects of crash frequency modeling (e.g., location within a work zone, time of day, duration, fatal/injury/property damage crash classifications). For example, Venugopal and Tarko [32] estimate work zone crash frequency based on traffic levels, work zone lengths, work duration, on/off ramps, and classification of work zones, observing that there is not a statistically significant difference in crash rate for day and nighttime work zones. From a slightly different perspective, Srinivasan et al. [33] use the NB models to predict crash modification factors, while Yang et al. [37] incorporate concepts of measurement error in the NB modeling framework. However, NB models are susceptible to underdispersion for small datasets, which Poisson-lognormal models seek to counter [24,25,38]. Finally, zero-inflated models attempt to address the fact that work zone datasets do not contain many incidents (i.e., the response variable primarily takes the value of zero), which can cause overdispersion. The zero-inflated approach generally uses two states: one where accidents can occur and the other where they cannot. Binary choice models (e.g., binary probit or logit) are often used to determine which system the state is in, and Poisson and NB approaches are used to model crash frequency for the state in which accidents are allowed [39][40][41][42][43].
Another class of advanced statistical techniques includes random parameter [44][45][46][47], generalized additive [48,49], Markov switching [50][51][52], and hierarchical models [53,54]. We do not spend extensive time discussing these methods here, primarily because existing literature in this area focuses on nonwork zone crash frequency modeling applications, which is not the focus of this paper.
The models described thus far are primarily useful for predicting the number of crashes that will occur in a work zone over a period of time. As Yang, Ozbay, Xie, and Bartin [55] point out that sometimes it is useful to frame the problem in terms of crash probability (also referred to as crash risk) instead, where the goal is to predict the probability that a work zone will encounter a collision based on its characteristics. In this case, the response variable is binary (i.e., accident or no accident) rather than a positive integer (i.e., number of collisions). Logistic regression is a popular approach for handling binary responses, examples of which include Al-Ghamdi [56], Harb et al. [57], and Bham et al. [58]. However, when modeling crash risk for short work zone durations, the estimates are often biased because there are rarely crashes during the specified time period [55]. Being able to estimate crash probabilities for short durations is one of the core objectives of this paper and has not been studied extensively in the literature. To the best of our knowledge, the only paper which addresses this problem for short durations is Yang, Ozbay, Xie, and Bartin [55], which implements a rare event logistic regression model to account for the biases. They illustrate this approach by considering 466 work zones along a 25mile section of the New Jersey Turnpike.
(ii) Machine Learning. An alternative approach to model relationships between work zone characteristics and incident risks is through machine learning techniques, including neural and Bayesian neural networks, support vector machines, and clustering. However, existing work zone research employing these techniques tends to focus on characterizing work zone capacity, delays, incident duration, and other metrics rather than quantifying crash risks (e.g., [59][60][61][62][63][64]). For example, Jiang and Adeli [61] use neural network models to assess work zone capacity, concluding that they can estimate the WZ capacity with less than 10% error with this approach. In terms of work zone safety, the literature deals primarily with crash counts (not crash probability) for generic roadway incidents that do not consider the specific characteristics of work zones. Examples of this are Xie et al. [65] and Abdulhafedh [66], both of which compare neural network approaches to traditional NB models, noting that the neural networks show superior predictive capability in their case studies.
Li et al. [67] perform a similar comparison using support vector machines (SVM) and find that the SVM approach better estimates crash frequency in comparison to an NB model. They also point out that although neural networks and SVMs provide similar prediction capabilities, the SVM approach does not overfit the data. From a clustering approach, Ma and Kockelman [20] group road segments with similar characteristics and use a linear regression model to estimate crash frequency within each cluster. Other clustering applications include roadway crashes [68,69] and safety projects [70], but to our knowledge current literature does not include papers which cluster work zones based their characteristics in order to infer crash risk.
Despite many different modeling approaches and perspectives, the majority of literature focuses on either work zones or crash frequency independently, with comparatively little research dedicated to modeling collision frequency/risk based on work zone characteristics. Within this niche body of research, most papers conclude that increased vehicle demand, work zone length, and duration all increase the number of collisions that occur at a work zone over a period of time [71]. However, we are more interested in determining the probability of a collision rather than the number of collisions that occur. This poses problems when considering short time durations with classical statistical approaches due to the fact that accidents occur so infrequently. The one paper that addresses this issue (i.e., the rare event logistic regression approach by Yang, Ozbay, Xie, and Bartin [55]) showed good performance in a case study, although the model was tested on a single corridor with less than 500 work zones. While machine learning models appear promising based on closely related studies, none of them have been applied specifically to estimating collision probabilities based on work zone characteristics. Accordingly, we focus on this underresearched area and present a clustering approach to estimate the probabilities of work zone collisions, which is useful for determining where to best locate response teams and identifying work zone characteristics that can be modified to reduce collision probabilities. We make the following contributions: (i) We propose a scalable, unsupervised learning approach to predict the probability of a collision occurring at a short-term work zone. In contrast to most classical regression approaches which do not explicitly calculate crash risk/probability and are not appropriate for short-term durations, this machine learning approach clusters all work zones based on salient characteristics, calculates a collision probability for each cluster, and assigns new work zones to existing ones with similar features. Furthermore, it yields collision predictions without having access to current traffic volumes or work zone lengths and would easily scale to larger work zone datasets (e.g., see [72] for a clustering application involving 1 billion data points).
(ii) We present an integrated approach which combines both work zone collision risk predictions and actionable response recommendations, which is intuitive and applicable for practitioners. This approach takes historic information about work zones and returns the optimal allocation of highway response teams, which could be readily implemented by an agency such as the Coordinated Highways Action Response Team [73], an example of which is illustrated in the case study using the previously described approach and a stochastic optimization model.
The remainder of this paper is organized in the following manner. We begin by discussing the historical work zone and collision dataset that is used for model development in subsequent sections. Next, we describe clustering methods and explain how we determine the number of clusters and quantify model performance. We then apply this clustering framework to the work zone data set, noting how different approaches affect the model performance. Afterwards, we discuss example applications, focusing on optimally locating highway response vehicles by using the work zone incident probabilities as an input to a stochastic optimization model. Finally, we draw conclusions and suggest future steps to extend the research.

Data.
Data describing work zones (WZs) and collisions that take place in their proximity (i.e., within 1 mile distance) were collected from the Regional Integrated Transportation Information System [77] and Coordinated Highways Action Response Team [73], whose joint objective is to improve operations of Maryland's highway system. We focus on those WZs that were set up and cleared in less than one day, which account for about 99% of all the WZs observed during 2010-2015. The resulting dataset considered in our analysis includes 54,463 WZs ( Figure 1) and 380 WZ collisions. In most cases, a WZ had one or two lanes closed, and those closures took place along either the main or shoulder lanes ( Figure 2). Furthermore, maintenance work lasted for 6 hours on average ( Figure 3). In order to minimize negative effects on mobility and safety, only about 7% of the maintenance work was conducted during the peak-periods (7-9 AM and 4-6 PM). It is worth noting that about 72% of the work was done in daylight ( Figure 3) and that WZs were typically set up on weekdays ( Figure 4) The observed 380 WZ collisions occurred at a very small subset of all the recorded WZs. There were hardly any WZs with more than one collision; however, a few observed as many as three collisions ( Figure 5). The highest number of WZ collisions took place on Thursdays, while far fewer occurred on Saturdays and Sundays ( Figure 6), which is unsurprising due to lower maintenance activity over the weekends (Figure 4). The seasonal distribution of WZ collisions ( Figure 6) implies that most took place in summer and fall, which is expected due to higher maintenance activities during these seasons ( Figure 4). Additionally, Figure 7 shows a visual representation of Average Annual  Daily Traffic (AADT) on the Maryland road network in 2013, indicating that a wide range of AADT values are observed, and highlighting the spatial distribution of high-volume roads. Based on this descriptive analysis of available data, we selected twelve input variables ( Table 1) that might affect safety levels in the proximity of a WZ and thereby influence occurrence of a collision. The correlation between a subset of these variables and the number of WZ collisions is provided in Figure 8 and indicates relatively weak correlation between the WZ collisions and other variables.

Data
Processing. Data preprocessing can significantly improve performance of the clustering algorithm presented in the following section. Categorical variables were modeled using their binary counterparts, and the peak and day duration of a WZ were divided by its overall duration because we are interested in predicting the one-hour probability of a collision. The off-peak and night durations were removed from the model as redundant information (e.g., relative peak and relative off-peak add up to one), and all the variables were normalized in order to enhance performance of the clustering approach presented in the next section. The WZ features used for clustering are presented in Table 2.
Note that WZ lengths were not available for the 54,000 WZs considered in this study, but since WZ lengths have been shown to influence crash frequency [71], we expect that including this variable in the future would improve the model accuracy. Additionally, a variety of other temporal variables would likely enhance model performance (e.g., speed variance and weather), but these characteristics would be unavailable ahead of time, and the proposed approach focuses on predicting the crash probability at a future work zone. Along these lines, it may be worth considering using hourly volume profiles [78] instead of AADT, as it would allow the model to account for the average hourly volumes during the specific work hours (e.g., 7 AM to 9 AM). Of course, like the AADT, the volume profiles would only help give a rough idea about the future volumes, because these volumes would be affected by the presence of the work zone.       Figure 9: Graphical description of the methodology via a trivial example including 2 dimensions, 3 clusters, and 41 data points (i.e., WZs). After clustering historical WZs, for each cluster, we compute the number of WZs with at least one collision over the total number of WZs within the cluster. A newly scheduled WZ is attributed to the cluster with the most similar features. The predicted probability of a collision occurring at this WZ corresponds to the ratio computed for the cluster it was attributed to. a future WZ (e.g., road maintenance work scheduled for the next week), based on the underlying assumption that WZs with similar features should have comparable safety levels. Accordingly, we take the set of 54,000 WZs observed in the past (Section 2.1) and partition it into clusters of WZs with similar features (Table 2). After grouping WZs with similar characteristics, we calculate the collision probability for each cluster by counting the number of WZs where at least one collision occurred and dividing it by the total number of WZs within that cluster. Thus, the probability of a collision occurring at a future WZ can be predicted by assigning it to the closest cluster (in terms of its features) and looking up that cluster' observed collision probability, a process that is illustrated in Figure 9. The relation between predicted overall collision probability denoted by and onehour predicted collision probability delivered by the model (i.e., ℎ ) is computed as where is duration of the observed WZ. The historical data can be partitioned into clusters of WZs with similar features via classical -means clustering [79,80]. Given a set of observations (x 1 , . . . , x ), where observation x is a -dimensional vector of variables describing th WZ, the -means clustering seeks to partition these observations into sets S = { 1 , . . . , } so as to minimize the within-cluster sum of squares. More formally, WZs are clustered by solving an optimization problem arg min where is the mean of points in . Since finding an exact solution to the above problem is NP-hard, many efficient heuristic techniques have been devised over the past 50 years. We apply -means++ algorithm [81], which employs a randomized seeding technique to select initial cluster centers in a way that reduces the chance of the algorithm getting stuck in a suboptimal solution (Algorithm 1). However, even the -means++ algorithm is not guaranteed to find an optimal solution in instances that include many dimensions and data points. To alleviate this problem, we perform clustering 100 times using different initial cluster centers selected by the -means++ technique and choose the one with the smallest cost function for further analysis.
To evaluate the model, the dataset is first separated into training and testing datasets. Upon selecting the number of clusters (discussed below), the -means++ algorithm is used to cluster the training data into discrete groups. Each point in the test dataset is then associated with its most similar cluster from the training dataset in order to compute the Procedure -means++ Select centers for clusters by using randomized seeding technique repeat Assign each point to the closest cluster center Compute new cluster centers using assigned points until Convergence (cluster centers did not change during the last step) Algorithm 1: Outline of the -means++.
forecasted probability of a collision. These points are then ordered from lowest to highest probability (i.e., decreasing in safety) and separated into a specified number of groups, referred to as quantiles. For quantile , the mean of the forecasted probability can be calculated (denoted ), which can be compared to the actual probability of collision (found by computing the ratio of work zones from quantile which had at least one collision to the number of work zones in , denoted ). More specifically, the symmetric mean absolute percentage error ( ) is used to quantify the error between the actual and forecasted probabilities across all quantiles, which is computed as wherẽis the number of quantiles.
In order to determine the number of clusters to use, we employ three different methods: Elbow, Silhouette, and Cross-Validation. In the Elbow method, the number of clusters is increased until the improvement in the objective function becomes marginal. The Silhouette method [82] requires computing each point's similarity to both its own cluster and the others. This measure is computed as where represents the average dissimilarity of data point with all other data within the same cluster, and is the lowest average dissimilarity of the data point to any cluster of which is not a member. A value of close to 1 implies that the point was appropriately clustered; values close to 0 mean that the points are on the border of two clusters, whereas negative values imply misclustering. The third approach is based on cross-validation, where the training dataset is divided into training and cross-validation sets. The model is then trained using the new, reduced training dataset, and evaluated by using the cross-validation dataset. After selecting the number of clusters, the model is retrained using the entire (undivided) training set.
Additionally, these three methods are helpful in determining the relevant model scenarios. In the base scenario, we assume that all the variables are of equal importance and thus are normalized from 0 to 1. To explore the influence of individual variables, we inflate their normalized values and study how this affects the clustering results. Thus, the Elbow, Silhouette, and Cross-Validation methods help provide insight into the number of clusters and appropriate variables to use.

Results and Discussion
The proposed clustering approach may overfit the data (i.e., memorize data points rather than detect patterns). In order to check for possible overfitting, it is helpful to train models with various sizes of training data and then check the behavior of the test error. When overfitting occurs, the error should significantly decrease as the training dataset size increases, but if the model is able to generalize well (i.e., does not overfit), further changes in training dataset size should not result in a meaningful reduction of test error. The results for the base scenario are presented in Figure 10 and imply that the model does not overfit the data for training sets greater than 50% of the entire dataset. Thus, the employed clustering approach may be used to predict WZ collision probabilities.

Number of Clusters.
Selecting the number of clusters is essential for obtaining satisfactory results. After some preliminary experiments, the lower and upper bound on the number of clusters were set to 8 and 21, respectively. It should be noted that if a model includes few clusters, the WZs in each cluster may be quite diverse. This results in relatively small differences among mean collision probabilities in each cluster. Consequently, even if the accuracy of clustering is very high, the model cannot be employed for predictions. On the other hand, having many clusters should (in theory) improve the prediction accuracy. However, if some clusters do not include enough data points with collisions (due to relatively few collisions in the entire dataset), then the estimated collision probabilities may be inaccurate for these clusters. Therefore the upper bound on the number of clusters should correspond to the size of the training set and the number of collisions in the entire dataset.
As argued before, the authors considered three methods to select the number of clusters: Elbow, Silhouette, and Cross-Validation. The accuracy of each was estimated using a relative difference between the error of the model indicated by the method and error of the best model selected using a test set. The results for each method were computed for different scenarios and for different training dataset sizes (due to the overfitting analysis only the results for training dataset sizes greater or equal of 50% of entire dataset size were taken into consideration), which are summarized in Table 3.  The results indicate that the Silhouette method has the best accuracy and smallest standard deviation amongst the three measures, meaning that it is less prone to changes of scenario or training dataset size and, consequently, more reliable. Accordingly, the Silhouette method was chosen to determine the number of clusters.

Choosing a Scenario.
The base scenario assumes that all the features are equal, but in actuality some may have a larger impact on the predicted values and others. In order to verify this, different scenarios were created and tested. In each scenario some features are more (or less) important than others, and the proximity in dimensions corresponding to these features has a greater impact on the attribution of WZs to certain clusters. A list of tested scenarios is presented in Table 4, while errors associated with the performance of the best model in each scenario are shown in Figure 11. Scenarios 1-7 imply that AADT can deteriorate model accuracy. Specifically, increasing the importance of AADT inflates errors regardless of the data size. Conversely, scenarios 4, 6, and 7, which correspond to lanes, day/night, and peak/off-peak, are characterized by relatively smaller errors. Scenarios 8 and 9 and to some extent 11 show that AADT has no significant impact on model accuracy. In fact, reducing the significance of AADT or completely removing it does not affect the accuracy. This is unsurprising because AADT represents an average annual measure, which may be a poor indicator of traffic volumes during particular hours of road maintenance work. Moreover, scenarios 10-12 indicate that WZ hours of operations are highly relevant, as models that emphasize day/night and peak/off-peak values provide more accurate predictions. Finally, scenarios 13 and 14 show that the number of lanes, day/night, and peak/off-peak data have a crucial impact on prediction accuracy. However, performance of all these models is comparable, so we use the Silhouette method to select both the number of clusters and the best scenario.

The Best Model.
The error for the model whose specifications are determined using the Silhouette method for 3 quantiles is 2.95%, indicating a high level of model accuracy. The accuracy for each quantile is shown in Figure 12(a), while the errors based on the number of quantiles for the selected model are shown in Figure 12(b). While the number of quantiles (the resolution of the model) is increasing, the errors also increase, meaning accuracy of the model is reduced. It is worth noticing that for 4, 5, 6 and 7 quantiles the errors are almost constant (from 9.84% to 10.88% with 10.5% mean), thus implying that if 11% error is acceptable, it is possible to increase the resolution of the model up to 7 quantiles. When the resolution of the model increases above 7 quantiles, the error grows to an unacceptable level as a result of an insufficient amount of training data. Since the predicted probability of collision in each quantile is computed by dividing the number of work zones with a collision occurrence by the total number of WZs in the quantile, when there are not enough WZs in a quantile, even a single collision can significantly change predictions. Base scenario, all features equally important 1 Importance of seasons increased 2 Importance of road type increased 3 Importance of weekday/weekend increased 4 Importance of number of lanes increased 5 Importance of AADT increased 6 Importance of peak/off-peak increased 7 Importance of day/night increased 8 Importance of AADT decreased 9 AADT data removed 10 Importance of peak/off-peak and day/night increased 11 Importance of peak/off-peak and day/night increased, AADT data removed 12 Uses only peak/off-peak and day/night data 13 Importance of number of lanes, peak/off-peak and day/night increased 14 Uses only number of lanes, peak/off-peak and day/night data Accordingly, it is possible to increase the accuracy of the model for higher resolutions by increasing the size of the training dataset.

Example Application.
Using the previously proposed clustering approach to predict the probability of a collision occurring within a WZ, we now provide an illustrative application of the proposed model. This hypothetical case study pertains to the jurisdiction of the Coordinated Highways Action Response Team, whose objective is to improve operations of Maryland's highway system. Suppose that this agency has a list of planned maintenance work for the following day and is interested in deploying a fixed number of response units to tackle collisions that may happen within these WZs. Clearly, we can assign these WZs to the clusters derived in the previous section and consequently estimate collision probability for each of the WZs scheduled for the following day ( Figure 9). Once these probabilities are computed, the allocation of response units becomes a stochastic facility location problem, which has been tackled extensively in the operations research literature [83,84]. In order to solve this response team allocation problem based on collision probabilities obtained from clustering, we formulate a two-stage stochastic program [85], the details of which are described in the Appendix.
In this illustrative example we randomly sample the 40 WZs that were used to test the clustering methodology and pretend they represent the maintenance work scheduled for the following day. Consequently, we assign the 40 WZs to clusters built based on historical WZs (2010-2015), in order to compute the collision probabilities associated with each of the 40 newly-scheduled WZs (Figure 13(a)). Finally, using the optimization model provided in the Appendix, we determine the optimal location of the response units on the Maryland highway network for the observed time period. Figure 13(b) visualizes the optimal allocation when 5 response units are available, and additional allocations are provided in the Appendix (Figure 14) as the number of available response vehicles is perturbed from this value. The optimal allocation can be updated as the situation in the field evolves (e.g., one WZ is cleared and another on is set up), by simply reapplying the model given a new set of inputs (i.e., new configuration of WZs and corresponding collision probabilities). Alternatively, the proposed optimization model can be extended into a multiperiod stochastic program [85], which would better address dynamic relocation of response vehicles. This extended model would also require the collision probabilities estimated in this paper as inputs.
In addition to optimal allocation of highway response units, the proposed clustering method can be used to determine or adjust WZ parameters. Specifically, the presented model could help modify WZ parameters (e.g., lanes closed, day/night, and peak/off-peak duration) in order to meet certain safety levels (e.g., keep the collision probability below a specified threshold). For example, the easternmost WZ in Figure 13(b) has higher collision probability than others, so its features may be modified to improve its safety level. A government agency might also introduce regulations to keep accident probabilities below certain level or provide financial incentives for those contractors who perform roadwork while maintaining high safety levels.

Conclusions
This paper proposed a clustering approach to predict the probability of a collision occurring in the proximity of planned maintenance work, which is important for allocation of highway response units. We presented the first application of clustering in the analysis of WZ collisions, which involved a large dataset of over 54,000 WZs in Maryland. The model showed good prediction accuracy, and its potential application was illustrated by optimally allocating response units in the Maryland highway network. Namely, collision probabilities determined via clustering were used as inputs to a two-stage stochastic program to optimally deploy highway response teams. Additionally, the proposed clustering approach can be used to adjust features of WZs to meet specified safety levels.
The proposed clustering method has certain limitations corresponding to the number of quantiles, so in some cases it may be used for classification of WZs rather than to predict the exact collision probabilities. Including more data would allow for additional quantiles to be used (i.e., preparation of higher-resolution models) and including additional WZ features may be useful as well. It would also be interesting to test clustering algorithms other than the -means++, such as -medoids or -means and compare their performance. For the allocation of highway response teams, one could consider a multiperiod extension of the stochastic model used in this paper.

Appendix
The problem of allocating highway response units is modeled as a two-stage stochastic integer program, with the goal to locate units in a way that would minimize the expected response time to WZs where collisions occur. In the first stage, we need to decide the location of response units, which may be placed at any of the WZ locations (or in their proximity). In the second stage, we assign these response units to WZs depending on the collision occurrences in each scenario and observe response times. More formally, let denote a set of WZs indexed by and . Let be a random parameter indicating the probability of a collision occurring at WZ ∈ , which was estimated using the proposed clustering approach. A particular realization of the aforementioned random parameters is denoted by ∈ Ω. Define as an integer variable indicating the number of response units located in the proximity of WZ ∈ . We define ( ) ⊆ as a set of WZ locations where collisions occurred in a particular realization of random parameters ∈ Ω. Let ( ) be a binary variable which equals 1 if a response unit located at WZ ∈ is dispatched to WZ ∈ ( ), given a particular realization of random parameters ∈ Ω. Let ( ) be the distance (or travel time) from WZ ∈ to WZ ∈ ( ).
We formulate the first-stage problem as where ( , ( )) is the solution to the following second-stage problem: ( , ( )) = min As argued above, the first-stage problem seeks to allocate teams in order to minimize the expected response time subject to vehicle availability constraint. The response time for a particular realization of random parameters is computed in the second stage. Constraints (A.4) ensure that vehicles are dispatched only from locations where they are available, while (A.5) guarantee that a response team is directed to each WZ collision. To ensure feasibility of the second state, we let ≥ | ( )|, ∀ ∈ Ω, which implies that the number of response teams is greater or equal than the total number of locations at which collisions occur simultaneously. This condition, however, can be relaxed by extending the model to consider priorities in serving collision locations. Finally, the proposed two-stage stochastic model is implemented in GAMS and solved in the extensive form using CPLEX solver. We observe computation times of several seconds for the case study involving 40 WZs, 5 response teams and 1,000 scenarios (i.e., 1,000 realizations of ). The corresponding optimal allocation of response teams is provided in Figure 13(b), with additional allocations shown in Figure 14 for different numbers of available response teams.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
This research did not receive specific funding, but was performed as part of employment at the Center for Advanced Transportation Technology, University of Maryland.