Research Article Risk Measurement Model for Vehicle Group Based on Temporal and Spatial Similarities

. Vehicle rear-end collisions are primarily caused by tight car following in a continuous traﬃc ﬂow, as well as a driver’s incorrect perception of the traﬃc environment ahead and delayed response. To facilitate an investigation pertaining to rear-end collision mechanisms and accurately measure the risk, the concept of a vehicle group is introduced herein. A risk measurement model for a vehicle group (RMVG) based on temporal and spatial similarities is proposed. First, vehicles are categorized based on their temporal and spatial similarities. Risk measurement metrics are deﬁned based on the traﬃc composition, movement state, and conﬂict extent. Subsequently, vehicle group risk identiﬁcation and risk measurement models based on an isolation forest are established. the rear-end collision risk of the vehicle groups is analyzed both qualitatively and quantitatively. Finally, the RMVG is tested using the vehicle trajectory data set of Longpan South Road, Nanjing City, Jiangsu Province, China, and the results are compared with those of a support vector machine and local outlier factor. The results show that the accuracy of the RMVG is higher than those of other models: its accuracy rate and speciﬁcity are 95.68% and 88.89%, respectively, whereas its false alarm rate is only 3.47%.


Introduction
Owing to the continuously increasing demand for travel and car ownership, traffic collisions have become more prevalent, thus posing severe safety risks to road users. According to the Global Status Report on Road Safety, road traffic crashes, which is the eighth leading cause of death, caused 1.3 million fatalities in 2016 worldwide [1]. A significant proportion of road-traffic crashes involve rear-end collisions. According to the National Highway Traffic Safety Administration, rear-end collisions in the United States constituted 32.5% of all crashes in 2019 [2]. In Shanghai, China, approximately 20% of all road crashes were rear-end collisions: 49% are elevated expressway collisions, and 67% are tunnel collisions [3]. erefore, rearend collisions must be prevented to improve road traffic safety. However, most existing studies focus on the macroevaluation of traffic flow [4,5] or the microrecognition of individual vehicles [6,7]. ese methods cannot simultaneously consider the effects of individual behaviors and interactions among surrounding vehicles on driving safety. Owing to the significant increase in traffic volume, the close car-following phenomenon during driving and the group response phenomenon of some traffic flows to random interference have become more evident. ese factors contribute significantly to vehicle rear-end collisions and pile-ups and cannot be considered as merely a microscopic or macroscopic traffic flow. erefore, comprehensive investigations must be performed from a new perspective. Herein, a risk measurement model for a vehicle group (RMVG) based on temporal and spatial similarities is proposed.
is model considers a vehicle group as an object for the identification and quantification of rear-end collision risk.

Conventional Traffic Safety Analysis Methods.
Traffic safety has been investigated extensively, and the results obtained have been effectively applied to solve practical engineering problems [8,9]. Conventional methods primarily use crash data and statistical techniques for traffic safety analysis. Generally, the crash rate or death rate is combined with possible influencing factors, including speed [10], population [11], traffic volume [12], and land use [13], to analyze traffic safety. For instance, the Smeed model uses regression analysis to combine population, motor vehicle ownership, and crash fatalities to analyze traffic safety [14]. Similarly, Ng used regression analysis to predict traffic collisions and identify high risks [15]. Rabbani established a time-series model to predict collision rates based on seasonality in historical collision data [16]. Dong used a mixed logit model to analyze the effects of traffic flow and road environment on single-vehicle and multivehicle collisions [17].

Traffic Safety Analysis Method Based on Surrogate Safety
Measures (SSMs). Many studies involving traffic safety analysis have been conducted based on historical collision data. However, the use of collision data poses several problems, including difficult access, data scarcity, and a long data acquisition period [8].
erefore, SSMs were proposed for safety evaluation. Hayward introduced the time-to-collision (TTC) concept [18]. Similarly, YDEN proposed the concept of postencroachment time [19]. Because the TTC concept does not apply to cases with a speed difference of 0 km/h, Balas presented the inverse time-to-collision (TTC -1 ) concept [20,21]. Tarko investigated the causal relationship between conflict and collisions, where the results proved that SSMs can be used as a basis for safety evaluation [22,23]. SSMs based on temporal logic are widely used in investigations pertaining to rearend collisions [24,25] and lane-changing safety [26,27]. Meanwhile, some SSMs are based on distance logic, in which the safety distance is used as a risk evaluation factor. Wu used the stopping sight distance (SSD) to establish a rear-end collision risk index and identified the risk of vehicles in a foggy environment [6]. Hema evaluated road traffic safety by comparing the SSDs of preceding and following vehicles [28].

Modern Traffic Safety Analysis Methods.
Owing to the continuous development of big data technology and machine learning, several researchers have used a significant amount of microtraffic data combined with intelligent learning and SSMs to identify and quantify traffic safety hazards. Zhanyong used the support vector machine (SVM) method based on the maximum classification interval to train and optimize a complex traffic accident black spot model [29]. Torok used a one-class SVM method to detect human-related emergencies during driving; this system can assist self-driving cars in generating risk warnings [30]. Djenouri used the local outlier factor (LOF) algorithm to analyze the effects of events, particular weather conditions, or planning decisions on traffic flow in an urban area [31]. Elassad established a real-time collision-prediction fusion framework that integrates Bayesian learners, k-nearest neighbors, an SVM, and a multilayer perceptron to predict traffic collisions [32]. Shen combined Bayesian deep learning and Gaussian mixture clustering based on SSMs to predict the risk of road traffic collisions [33]. In addition, back propagation neural networks [34,35], generative adversarial networks [36,37], convolutional neural networks [38], XGBoost [39], long short-term memory [40], and random forests [41] have been widely used to measure driving risks.
In summary, traffic safety analysis has received significant attention from researchers and the industry. A series of important results have been obtained through basic theoretical research and technical applications. In existing analysis methods, the data sources used primarily include historical collision and conflict data based on SSMs. Although methods based on collision data are reliable, the data acquisition process involved is difficult and time-consuming. By contrast, indirect evaluation methods based on SSMs can be used more widely for traffic safety analyses. Generally, these methods exhibit high accuracy, flexibility, and stability, among others. However, they depend significantly on field data to derive various conflict indicators. e research scope for these methods primarily includes macroanalysis based on traffic flow and microanalysis based on individual vehicles. e analysis methods based on traffic flow provide average results for the entire traffic scenario without considering the effect of individual vehicle behavior on safety. Analyses based on individual vehicles primarily consider front and rear vehicles and disregard the effects of interactions among surrounding vehicles on driving safety. erefore, driving safety has been analyzed from the perspective of vehicle groups [42], focusing on the correlation between the main vehicle and its surrounding vehicles. is correlation considers the preceding and following cars in the same lane as well as other surrounding vehicles. Most existing safety analysis methods employ statistical regression or machine learning to analyze traffic or driving risks [43]. In the absence of collision data, data attributes must be manually annotated when using supervised learning methods-a process that is highly dependent on experience. However, unsupervised learning methods do not require advance data labeling.

Contributions and Framework.
e contributions of this study are as follows: (1) A vehicle group categorization rule is proposed to categorize vehicles based on temporal and spatial characteristics, through which continuous vehicles with mutual influence can be separated. (2) An RMVG was proposed. e RMVG comprehensively reflects individual behaviors and group effects during the driving process. Using this model, the source of risk can be considered more comprehensively while the scope of risk investigation is reduced. (3) A rear-end collision risk quantification method that considers the possibility and severity of collisions is established. Conflict probability and severity represent the risk levels in different dimensions. A comprehensive consideration of these two factors can improve the effectiveness of risk measurements. e overall framework of the RMVG is presented in Figure 1. e remainder of this paper is organized as follows: Section 3 introduces the data sources used in this study. Section 4 describes the vehicle group categorization method based on temporal and spatial similarities. Section 5 describes a risk evaluation index system and the proposed RMVG. Section 6 presents a validation of the proposed model using a real trajectory data set. Finally, Section 7 concludes the paper.

Data Preparation
e vehicle trajectory data used to analyze the rear-end collision risk of the vehicle groups were provided by the Southeast University Intelligent Traffic System (ITS) laboratory. ese data were obtained from an elevated section of the expressway on Longpan South Road, Nanjing, China.
e acquisition began at 7:30 a.m. on April 23, 2018 (Monday), and the weather was clear at that time. e study section was 427 m long and located in the East-West direction; the east and west sections were two-way eight-lane and six-lane roads, respectively, as shown in Figure 2. e data were continuously obtained for 4 min and 15 s using a DJI Mavic 2 drone at an altitude of 310 m with a frame rate of 24 frames per second, including 498,266 trajectory data points from 921 vehicles. e ITS researchers extracted the complete vehicle trajectory data from the video and manually verified them to ensure that the public data can realize complete vehicle identification and tracking. is data set provides trajectory information with a time accuracy of 0.1 s, including the speed, acceleration, lane, and driving distance of each vehicle. e data formats are listed in Table 1. e data set used in this study was smoothed via Kalman filtering to eliminate any possible noise.

Vehicle Group Categorization Rule Based on Temporal and Spatial Similarities
Most rear-end collisions are caused by vehicles trailing extremely closely; in such cases, accurate traffic information cannot be obtained timely. To measure the risk of rear-end collisions more conveniently, a categorization rule was proposed. Vehicles that trail closely and indicate group responses to random interference factors are categorized into the same vehicle group. Rear-end collisions and pile-ups can be expressed as a process in which the stable and close car-following state of the vehicle group is discontinued because of random interference factors. After the categorization, the group vehicles were slightly affected by the external vehicles. Collision risk primarily arises from other vehicles in the same group, as shown in Figure 3. Close car-following implies that the following distance in vehicle groups should be less than a critical value. However, the car-following distance cannot completely characterize the mutual influence between vehicles. Short spatial and temporal distances between vehicles indicate that the vehicles demonstrate a significant level of mutual influence. erefore, time and distance parameters must be considered to effectively classify the vehicles into different categories. In this study, the time headway and distance along the lane line (vehicle position) were considered to identify the time and space characteristics. e vehicle positions reflect the distance between all vehicles on the road. e hierarchical clustering method was employed to classify the vehicle groups, as shown in Figure 4. e classification was based on the specified threshold and the distance among clusters. e number of clusters need not be determined in advance. e smaller the distance, the more likely the vehicle groups will be classified into the same category. e distance and time headway between adjacent vehicles are smaller than those between nonadjacent vehicles. erefore, vehicles in the same group are guaranteed to be adjacent to each other, as shown in Figure 3. e vehicle group composition should be dynamic because the driving state of a vehicle changes dynamically. erefore, the vehicle groups were categorized in real time in this study, which may cause the vehicles to be classified into different groups during different time slices. e frame rate of the data set was 24 frames per second, which enables the real-time categorization of the vehicle group.
e temporal and spatial attributes of a vehicle were defined as X i (x i , y i ) in this study, where x i and y i are the time headway and the distance along the lane line, respectively. In the selected data set, the headway and distance distributions were relatively concentrated. e maximum-minimum method is a classic normalization method that is widely used in traffic safety research [44] and data processing prior to clustering [45]. Compared with other normalization methods, this method can distribute the data set selected in this study more evenly in the interval [0, 1] and maintain the relative linear relationships of their values [46]. erefore, to eliminate the effects of dimensional differences and consider the data set structure, the maximum-minimum normalization method was used to normalize the data to the interval [0, 1]. Subsequently, the similarity between vehicles was measured using the Euclidean distance d (X 1 , X 2 ). e proximity of clusters a and b can be measured by the average distance d (a, b) between them. e similarity between vehicles can be measured as follows: e similarity between clusters can be obtained as follows: where n a and n b are the numbers of samples in clusters a and b, respectively, and p and p' are the data in a and b, respectively. Statistical methods typically used for determining the threshold include the 85% quantile and interquartile range methods [47]. In this study, the 25% quantile (D 25% ), median Journal of Advanced Transportation (D 50% ), 75% quantile (D 75% ), and 85% quantile (D 85% ) of the temporal and spatial similarities were used as thresholds to categorize the vehicle group. e composition of the vehicle group changed dynamically over time.
e results at a certain time are shown in Figure 5, where numbers 1-6 in the legend represent the different vehicle groups. All vehicles in the data set were categorized into groups. e results show that D 50% is the preferable threshold. When D 25% was used as the threshold, the categorization conditions were extremely strict, which rendered it difficult to categorize vehicles with high proximity into different groups. It is a challenge to limit the driving risk within a group. When D 75% or D 85% was used as the threshold, the classification conditions were extremely lenient; all vehicles can be easily categorized into the same  vehicle group, which resulted in a significantly different degree of interaction between the internal vehicles. Hence, the appropriate categorization could not be achieved. However, vehicles with proximate time headways and positions are classified into the same cluster if D 50% is used as the threshold. e time headway and position between the clusters differed significantly. In other words, vehicles with high interactions can be classified into the same vehicle group more easily. e effect between the vehicle groups was insignificant, and the risk was associated with the vehicles within the group. erefore, the median D 50% was used as the threshold in this study.

Selection of Rear-End Collision Risk Measurement Metrics.
Rear-end collision risks are associated closely with various factors, such as traffic composition, driving status, drivers' risk perception, traffic conflict, and road conditions. erefore, were extracted in this study while considering three aspects: traffic composition, vehicle driving status, and conflict degree, as listed in Table 2.
where SSD i-1 (t) and SSD i (t) denote the stopping sight distance of PV and FV at time t, respectively. d(t) represents the distance between these two vehicles at time t. l i-1 is the length of the PV.
where v(t) is the vehicle speed at time t. f is the road friction coefficient. According to the friction coefficient standard of dry pavement, f is valued as 0.6. g is the road gradient, which is temporarily valued as 0. t r is the driver's perception reaction time.
where v i− 1 (t) and v i (t) denote the speed of PV and FV at time t, respectively; x i− 1 (t) and x i (t) denote the positions of the PV and FV at time t, respectively; and l i− 1 is the length of the PV.

Correlation Analysis of Metrics.
Multiple variables were selected for the RMVG. However, the redundant features of the variables affected the accuracy of the results without providing any new information to the model. erefore, the  correlation between Qi(t) must be analyzed, and overlapping information must be removed. Reshef proposed the maximum information coefficient (MIC), which can be used to measure the linear and nonlinear relationships between variables in big data, as well as to determine their nonfunctional dependencies [50]. In this study, an analysis was performed to determine whether a correlation exists and the strength degree among variables. Although the correlation between the variables is dynamic, the average correlation between them can be determined by calculating the MIC when the sample size is sufficiently large. e mutual information for random variables X and Y can be calculated as follows: where p(x, y) is the joint probability distribution of X and Y and p(x) and p(y) are the marginal probability distributions of X and Y, respectively.
In calculating the MIC, the sample data are first placed in a two-dimensional space. Next, meshing is performed. Subsequently, random variables X and Y are selected from data set Q to form set D. Random variables X and Y are equally classified into x and y, respectively. e probability of each grid (x i , y j ) is calculated as follows: where n (x i , y i ) is the number of data points in the (x i , y j ) grid. n is the total number of data points. Similarly, p (x i ) and p (y j ) can be calculated. e probability distribution under the current categorization method is denoted as D| x * y . Mutual information I (D| x * y ) can be calculated using equation (6). First, the maximum mutual information value is max I (D| x * y ) for all categorizations under the same segmentation scale. Next, let I' [ D (x, y) ] � max I (D| x * y ), and standardize it.
Subsequently, the MIC of random variables X and Y at different segmentation scales can be calculated as follows: e MIC between variables in Q i (t) is calculated as shown in Figure 6. e calculated MIC indicates the existence of correlations between the variables. Principal component analysis was performed to extract effective information and simplify the calculation of the model. e variance of each component in Q i (t) is presented in Table 3. Currently, a popular method for determining the number of principal components is based on eigenvalues. Components with eigenvalues greater than 1 are identified as principal components. Another widely used approach is to select principal components based on the cumulative percent variance according to the amount of information to be retained [51]. In this study, a slight difference in the variables affected the effectiveness of risk identification. erefore, the premise of selecting principal components should be to preserve effective information to the greatest extent. To retain most of the information while reducing the dimensions, the cumulative percentage variance of the principal components must be greater than 90% [52]. e variance of each component in Q i (t) is presented in Table 3.  Journal of Advanced Transportation Considering that the cumulative percent variance exceeded 90%, the first six components U ik (t) [u i1 (t), u i2 (t),. . .u i6 (t)] were selected in this study. e eigenvectors E k [e 1 , e 2 ,. . .e 6 ] of U ik (t) are listed in Table 4, and the calculation formulas are presented in equation (10).

Development of Rear-End Risk Measurement Model.
During the driving process, affected by several factors including road and traffic conditions, vehicles may exhibit abnormal driving behaviors, such as trailing extremely closely or decelerating rapidly. Generally, abnormal driving behavior causes a single-vehicle collision; however, this abnormal behavior may result in a pile-up if it significantly affects the surrounding vehicles. Collisions are more likely to occur when the driving state is abnormal. e rear-end collision risk measurement for a vehicle group is used to determine the possibility of a collision and to quantify its risk level. e RMVG model is realized via two procedures: first, the safe state of the vehicle group is determined based on trajectory data. Second, the risk level is quantified based on the possibility and severity of the collision.

Possible Identification Model for Vehicle Group Rear-End Collision Based on Isolation Forest (IF).
IF is an unsupervised machine learning method that isolates outliers by continuously segmenting the data set [53]. is algorithm uses the isolated structure of a binary tree (iTree). By randomly selecting sample features without replacement, the  Journal of Advanced Transportation data set is segmented continuously until each sample is isolated. Because the outliers present are few, distinct, and sparsely distributed, the path is extremely short during isolation. erefore, abnormal points are isolated closer to the root of the tree, whereas normal points are isolated from deeper regions of the tree. Compared with other methods, the IF can provide an abnormal probability to each sample, thus reflecting the possibility of a collision [54]. e dimensionality reduction metrics U i (t) in Section 5.2 are the input variables of identification models based on the IF. e output results indicate the possibility and assessment of collisions within the vehicle group. e realization process is shown in Figure 7.
(1) Training Phase. e model was trained to build isolated trees (iTrees) and an isolated forest (iForest).
Step 1: Randomly select φ subsamples from the data set without replacement.
Step 2: Randomly select the characteristic attribute q as the starting node. Subsequently, select a split value p between the maximum and minimum values of q.
Step 3: Assign subsamples with attribute values less than p to the left branch of the binary tree; otherwise, assign them to the right branch.
Step 4: Repeat steps 2 and 3 until the segmentation is completed or the desired tree depth is reached. e depth limit is calculated using l � ceiling log φ 2 .
Step 5: Repeat steps 1-4 until the number of iTrees reaches the limit. ese iTrees are joined to form an iForest � [iTree 1 , iTree 2 . . .iTree n ]. (2) Testing Phase. After the construction is completed, the iForest can be used to identify data abnormalities based on the abnormal scores.
Step 6: Allow the test sample to traverse iTrees in iForest and compute the average path length when the traversal halts.
Step 7: Calculate the abnormal probability of the sample and determine whether it is abnormal. e formula to calculate the abnormal score is where E (h (x)) is the average path length required to separate sample x in iForest and c (n) is the average tree length, which is calculated as follows: When E (h(x)) approaches (n), S tends to 0.5, and the sample is regarded as normal; in this case, the model outputs a judgment result of "1." When E(h (x)) approaches 0, S tends to 1, and the sample is regarded as abnormal; in this case, the model outputs a judgment result of "− 1."

Quantification Model for Vehicle Group Rear-End
Collision Risk. In the typically used quantification method, the risk value is calculated based on the probability and severity of a collision. e abnormality degree reflects the possibility of collision in a vehicle group. Meanwhile, the risk severity depends on the coupling between various influencing factors. erefore, the abnormal score, as calculated in Section 5.3.1, was adopted in this study to reflect the possibility of a collision, and risk metrics Q i (t) were used to reflect the severity.
For vehicle group i at time t, the risk can be quantified using e standard deviation of acceleration Acceleration is the absolute value of each car's acceleration in the vehicle group. - Conflict degree e proportion of unsafe TTC -1 e proportion of vehicles in the vehicle group whose TTC -1 is greater than the safety threshold (0.25/s) [48]. e calculation method of TTC is shown in equation (15).
q i9 (t) e proportion of unsafe stopping distance index (SDI) e proportion of vehicles in the vehicle group whose SDI is less than the safety threshold (0) [49]. e calculation method of SDI is shown in equation (3).
q i10 (t) e proportion of unsafe deceleration rate to avoid a crash (DRAC) e proportion of vehicles in the vehicle group whose DRAC is greater than the maximum available deceleration rate [26]. e calculation method of DRAC is shown in equation (5).
q i11 (t) e proportion of the following vehicle (FV) speed greater than the preceding vehicle (PV) speed e proportion of vehicles in the vehicle group whose speed is greater than that of the preceding vehicle.
where H(t) i is the risk value, s(t) i is the abnormal score, and q(t) ij ′ is the driving risk metric after normalization. After calculating the risk values, K-means clustering was adopted to separate the risk levels and thresholds [55].
(1) Calculate the silhouette coefficient for different cluster numbers to select the most appropriate risk level n. (2) Determine the cluster centers [c 1 , c 2 , . . ., c n ]. Consider e between (c i , c i + 1 ) (i ∈ n) with a step size of 0.01 to classify the sample and calculate the accuracy rates. Select e with the highest accuracy rate as the categorization threshold.

Experimental Analysis
6.1. Data Processing. First, a Kalman filter was adopted to denoise the original data set. Subsequently, based on the categorization rules in Section 4, all vehicles on the road segment were categorized into different vehicle groups. Finally, the rear-end collision risk measurement metrics were calculated based on the data set, and the results are listed in Table 5.

Empirical Analysis of RMVG.
In this study, 537 vehicle groups were selected from the data set, including 12,075 trajectory data points. e empirical analysis comprised three stages: (1) identifying the possibility of rear-end collisions in vehicle groups based on the RMVG, (2) quantifying the degree of rear-end collision risk based on the RMVG and classifying the risk level, (3) analyzing the feasibility of the model via accuracy evaluation.

Vehicle Group Rear-End Collision Possibility
Identification. A vehicle group rear-end collision identification model was established based on the IF algorithm through Python. To construct iForest, 70% of the data were randomly selected as the training set, and the remaining 30% were used as the test set.
is model can automatically identify the collision probability and output either "− 1" or   Table 6, where "1" and "− 1" represent vehicle groups with lower and higher probabilities of collision, respectively. e higher the abnormal score, the greater the possibility of an anomaly.
In the aforementioned analysis, time and space risks were indicated during vehicle group driving; a small distance between cars and a short TTC may cause collisions. erefore, the TTC and margin to collision (MTC) can be used as time and space risk evaluation indicators, respectively, to identify whether the vehicle is susceptible to a rearend collision [56]. e TTC is the predicted time of a collision between PV and FV when the two vehicles maintain the current relative velocity. Collisions are more likely to occur when the TTC is   between 0 and 5 s [48]. Furthermore, researchers have shown that when the TTC is less than 5 s, drivers tend to feel nervous and perform more incorrect actions [57].
where x i-1 (t) and x i (t) denote the positions of the PV and FV at time t, respectively; v i− 1 (t) and v i (t) denote the speed of the PV and FV at time t, respectively; and l i-1 is the length of the PV. MTC indicates the final relative position of the PV and FV when the two vehicles decelerate abruptly. An MTC of less than 1 indicates that the stopping distance of the FV is greater than the summation of the intervehicular distance and stopping distance of the PV. In this case, a collision may occur between the vehicles. e lower the MTC, the higher the probability of collision.
where D(t) is the distance between vehicles at time t; a is the braking deceleration, which was set as 6.86 m/s 2 in this study [58]; v i− 1 (t) and v i (t) denote the speed of the PV and FV at time t, respectively; and t 0 is the driver's reaction time, which was set as 1.5 s in this study [58]. e vehicle group represented a whole. When the proportion of serious conflicts is high, the internal driving situation is chaotic. In this situation, conflicts significantly affect the internal vehicles and are more likely to cause crashes. erefore, in this study, the collision probability of the vehicle group was measured based on the proportion of severe conflicts. e proportions of TTC less than 5 s and MTC less than 1 in the vehicle group were calculated, and the results are listed in Table 6. Vehicle groups with a high proportion of abnormal TTC or MTC were identified as anomalous and indicated high abnormal scores. Additionally, it can be seen that the RMVG model has effective risk identification capabilities.

Accuracy of Rear-End Collision Possibility Identification Method.
It is considered that the vehicle group has a comparatively higher possibility of collision when its TTC proportion for less than 5 s or its MTC proportion for less than 1 exceeds 25%. Among the 537 selected vehicle groups, 51 indicated a high possibility of collision, including 183 trajectory data points. A total of 486 vehicle groups exhibited a low collision probability, including 11, 892 trajectory data points. Vehicle group rear-end collision identification models were established based on the RMVG, SVM, and LOF by selecting 70% of the data as the training set and 30% as the test set. e model evaluation indicators were calculated as follows: where TN refers to observations correctly identified as unsafe, TP is the correct prediction of safe conditions, FN is the incorrect labeling of safe samples as unsafe, and FP is the incorrect prediction of unsafe samples as safe. e accuracy levels determined by calculating the confusion matrix of the prediction results are presented in Table 7.
e RMVG exhibited the highest accuracy rate, specificity, and the lowest false alarm rate. e receiver operating characteristic (ROC) curves of the three algorithms based on their sensitivity and specificity are shown in Figure 8. Comparative analysis shows that the area under the curve (AUC) of the RMVG was 0.93, which was higher than that of the SVM (AUC � 0.74) and LOF (AUC � 0.90). Additionally, the RMVG demonstrated better recognition ability than the other models under the same data conditions.

Vehicle Group Rear-End Collision Risk Quantification.
e method discussed in Section 6.2.1 can only be used to identify the possibility of rear-end collisions in the vehicle group; however, the risk degree remains ambiguous. erefore, the rear-end collision risk was quantified based on the quantitative model proposed in Section 5.3.2, and the results are presented in Table 6. e silhouette coefficient results obtained using the risk classification method presented in Section 5.3.2 are shown in Figure 9. When the cluster numbers were 2, 3, 4, and 5, the contour coefficients were 0.597, 0.558, 0.576, and 0.581,    respectively. However, if the number of risk levels is low, then the difference between risk values is difficult to define. e rear-end collision risk of the vehicle group was categorized into five levels; the higher the risk quantification value, the greater the risk. e thresholds were e 1 � 1.12, e 2 � 1.51, e 3 � 1.87, and e 4 � 2.32, and the accuracy rates were 99.70%, 99.72%, 100%, and 100%, respectively. e classification results are listed in Table 8.
As shown in Tables 6 and 8, the risk level is directly proportional to the risk value. However, the risk level is a combination of the collision probability and severity. erefore, when the RMVG recognizes that the collision probability of a vehicle group is high, it may indicate different risk levels.
For example, the RMVG recognizes that vehicle group 16250 has a high collision probability; however, its risk value is 1.213. Its internal characteristics are as follows: the number of internal vehicles, 3; maximum speed difference, 1.61 km/h; maximum acceleration, 0.72 m/s 2 ; and the proportion of unsafe SDI, 0.67. is shows that high traffic conflicts occurred within the vehicle group, that is, the probability of collision is high. However, because the dispersion of speed and acceleration is low and the number of internal vehicles is small, the collision severity is low. By contrast, the RMVG recognizes that the collision probability of vehicle group 16781 is low; however, its risk level is 1.556. e characteristics of this group are as follows: the number of vehicles, 45; maximum speed difference, 24.289 km/h; and maximum acceleration, 1.4972 m/s 2 . e proportions of unsafe TTC − 1 , SDI, and DRAC are 0.09, 0.11, and 0.02, respectively. is indicates that the degree of traffic conflict is relatively low. However, owing to the large dispersion of acceleration and speed, the driving stability of the vehicle group can degrade easily. Once a collision occurs, the severity is comparatively great, and it may appear as a multivehicle rear-end collision.
For the road section with a length of 427 m in this study, all the vehicles were categorized into multiple groups. Owing to the different driving characteristics within each group, the risks at various spatial locations along the road segment were different. Moreover, the composition of a vehicle group changes dynamically with the vehicle position and driving status. erefore, the risk is dynamic and varies over time and space, as shown in Figure 10. Based on the real-time categorization of vehicle groups, dynamic risk detection was realized in this study for different spatial positions in long road sections.  Journal of Advanced Transportation 13

Conclusions
A risk measurement model for vehicle groups was proposed herein based on temporal and spatial similarities. In contrast to conventional macrorisk identification, which focuses on traffic flow, or microrisk identification, which focuses on individual vehicles, the research object of this study was a vehicle group. It can narrow the recognition range as well as comprehensively consider the effects of individual behaviors and the interaction among surrounding vehicles on rear-end collisions. First, vehicles that trailed closely and showed group responses to random interference factors were categorized into the same vehicle group. Rear-end collisions can be expressed as a process in which the stable and close carfollowing state in the vehicle group is discontinued. After the categorization, the effect of external vehicles on the vehicles inside a group becomes less significant. e risk primarily arises from internal vehicles within groups. Considering vehicle groups as research objects can provide a new perspective for investigating traffic safety problems. Subsequently, based on the IF, an RMVG was established, which considers the probability and severity of a collision to identify and quantify the risk of rear-end collisions. Additionally, the k-means clustering algorithm was used to separate the risk level and threshold. Finally, the RMVG was tested using a vehicle trajectory data set published by the ITS Laboratory at Southeast University. e results showed that the AUC, accuracy, specificity, and false alarm rate of the RMVG were 0.93, 95.68%, 88.89%, and 3.47%, respectively.
is study provides a theoretical basis and technical support for the effective prevention of rear-end collisions, thereby reducing traffic crashes and economic losses. As connected vehicles and holographic road technologies are further developed, the results of this study can provide useful suggestions for drivers and road traffic management authorities. Vehicle trajectory data can be obtained from holographic roads, whereas vehicle communication can be accomplished via a connected vehicle environment. Combining these technologies with the RMVG allows rear-end conflicts to be monitored in real time. For road traffic management, the RMVG considers both the individual behavior and group responses of vehicles. It is more effective than a single strategy for identifying driving risks. For drivers, receiving traffic management information in real time allows them actively avoid risks, thereby improving driving safety. e limitations of the proposed method are as follows: (1) it is temporarily impossible to validate the risk clusters with the ground truth owing to insufficient data. (2) It may not be suitable for a traffic congestion state because the categorization method will be invalid. When the traffic flow is smooth, this method can flexibly identify vehicles with a high degree of mutual influence, thereby effectively categorizing them into groups. However, when the traffic is congested, all vehicles are categorized into the same group. In this case, it will be meaningless to categorize the vehicle groups. Consequently, a possible research direction would be to improve the methods based on these scenarios. (3) e penetration rate of large vehicles, vehicle driving status, and conflict degree were used in this study to quantify the risk of rear-end collisions; however, certain limitations were indicated. erefore, future research can also attempt to quantify the severity of crashes based on kinetic energy loss. e change in kinetic energy can be calculated from the velocities and collision angles based on more comprehensive data. (4) Driving safety is affected by weather and road conditions, in addition to driving conditions and surrounding vehicles. In future studies, weather conditions, road linearity, and other factors should be integrated to form a more comprehensive risk measurement metric system to improve accuracy. Moreover, the extreme value theory (EVT) framework is widely used for crash prediction and has been demonstrated to be effective. Hence, EVT could be combined with machine learning in the future to improve the accuracy of collision prediction. Data Availability e data set used to support the findings of this study was obtained from the ITS Laboratory of Southeast University. e data set is publicly available and can be downloaded and accessed from http://seutraffic.com/#/download.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.