A Vision-Based Video Crash Detection Framework for Mixed Traffic Flow Environment Considering Low-Visibility Condition

. In this paper, a vision-based crash detection framework was proposed to quickly detect various crash types in mixed traffic flow environment, considering low-visibility conditions. First, Retinex image enhancement algorithm was introduced to improve the quality of images, collected under low-visibility conditions (e.g., heavy rainy days, foggy days and dark night with poor lights). Then, a Yolo v3 model was trained to detect multiple objects from images, including fallen pedestrians/cyclists, vehicle rollover, moving/ stopped vehicles, moving/stopped cyclists/pedestrians, and so on. Then, a set of features were developed from the Yolo outputs, based on which a decision model was trained for crash detection. An experiment was conducted to validate the model framework. The results showed that the proposed framework achieved a high detection rate of 92.5%, with relatively low false alarm rate of 7.5%. There are some useful findings: (1) the proposed model outperformed empirical rule-based detection models; (2) image enhancement method can largely improve crash detection performance under low-visibility conditions; (3) the accuracy of object detection (e.g., bounding box prediction) can impact crash detection performance, especially for minor motor-vehicle crashes. Overall, the proposed framework can be considered as a promising tool for quick crash detection in mixed traffic flow environment under various visibility conditions. Some limitations are also discussed in the paper.


Introduction
Emergency response to roadway crashes is very important for tra c management. On the one hand, people injured in a crash need to be sent to the nearest hospital in the rst place to prevent their health condition from being worsened, on the other hand, serious crashes o en cause nonrecurrent congestions, if emergency response or clearance is not carried out in time. In order to mitigate those negative impacts, roadway crashes need to be quickly detected.
Crash detection can be conducted by analyzing tra c ow data from roadway detectors, such as loops and microwaves. However, such method is o en inaccurate due to systematic errors caused by both algorithms and data quality [1][2][3][4][5]. us, in practice, crashes were o en detected by human observers through CCTV in Tra c Management Centers (TMC). e advantage of CCTV is that it can directly capture crash scenes within its range. With the development of intelligent transportation system (ITS), more and more CCTVs have been implemented in big cities and highways. Although human observations through CCTV can be reliable, it is sometime too labor-intensive and time-consuming. us, it is very meaningful to develop other reliable automatic crash detection methods based on CCTV [6,7].
In recent years, computer vision technologies have undergone a fast development and largely utilized in transportation eld [8,9], thanks to the increasing power of computers and deep learning methods. e performance of vision-based object detection, based on deep learning methods, has been signi cantly improved. us, researchers have been focusing on developing crash detection models based on complex deep learning frameworks [10,11]. eir results also showed the capability of computer vision in crash detection. However, sometimes a complex deep learning framework require high computational costs and di cult to be implemented in practice.
To note, previous literature mainly focused on detecting crashes in motorized tra c environment in developed countries. In developing countries, a larger number of pedestrians and cyclists could share roadways with automobiles.
us, crash detection in mixed tra c ow environment could be an even more important task for those countries. Moreover, in order to be used in practice, a vision-based crash detection model needs to be robust to various conditions, especially low-visibility ones such as heavy rain, fog, poor lights, and so on. Sometimes, even deep-learning based vision algorithms did not perform well in those low-visibility conditions [12][13][14], due to relatively low image quality. us, some additional e orts were o en added to improve detection performance, such as image enhancement methods [15][16][17][18][19].
Considering these, a vision-based crash detection model framework was developed for mixed tra c ow environment in this study. Regarding low-visibility conditions, an image enhancement method was also introduced to improve image quality so that a deep learning algorithm can better identify moving objects. Regarding quick crash detection, a Yolo v3 model was employed to extract features from images, based on which a decision tree model was trained for detecting various crash types that could occur in mixed tra c ow environment. e paper is organized as follows: the second section discuss previous literature related to vision-based crash detection and image enhancement. Section 3 introduces the Retinex algorithm, Yolo v3, and decision tree-based framework of crash detection. Section 4 discusses the results of an experiment. Section 5 concludes the ndings of this research.

Literature Review
In the past twenty years, researchers have conducted many studies on vision-based tra c crash detection, which can be classi ed into three categories: (1) modeling of tra c ow patterns; (2) modeling of vehicle interactions; and (3) analysis of vehicle activities [10]. e rst method is to compare vehicle trajectories to typical vehicle motion patterns that can be learned from large data samples. In this framework, if a trajectory is not consistent with typical trajectory patterns, it can be considered as a tra c incident [20][21][22]. However, it is not easy to identify whether this incident is a crash due to limited crash trajectory data that can be collected in the real world. e second method determines crash occurrence based on speed change information, which applies social force model and intelligent driver model to model interactions among vehicles. is method requires a larger number of training samples. e third method largely depends on trackers because it needs to continuously calculate vehicle motion features (e.g., distance, acceleration, direction etc.) [23][24][25][26][27]. As such, aberrant behaviors [28,29] related to tra c incidents could be detected. However, it is o en di cult to be utilized in practice, limited by high computational costs and unsatisfactory tracking performance in congested tra c environment [30]. In general, fruitful results have been achieved for vision-based crash detection. However, most literature focused on motor vehicle crash instead of crashes involving nonmotorized modes, such as bicycle-rated and pedestrian-related crashes [7,23,31]. Moreover, many models are compute-intense, by constructing complicated deep learning structures.
Another practical issue for crash detection method is the ability to deal with low-visibility conditions (e.g., fog, heavy rain, dark night). Image enhancement methods were usually utilized to improve the robustness of video detection to low-visibility conditions. Image enhance methods can adjust digital images so that key features are more easily to be identi ed [32,33]. Such technology was also used to provide better image quality to improve the performance of crash detection [34,35].
ere are two major types of image enhancement methods: physical model and tensile transformation. e rst method usually develops a physical model considering fog formation. Sometimes, it is di cult to guarantee enough accuracy under various conditions. e second method normally uses histogram equalization [36], wavelet transform [37], homomorphic ltering [38] to enhance low-quality images (e.g., those with raindrops and fogs). e robustness of such a method could be limited in some conditions, for it requires large number of parameters and thresholds to be tuned.

Methods
In this study, a crash detection framework was proposed for mixed tra c ow environment. e framework has three major components. First, Retinex image enhancement algorithm was introduced to enhance image quality. Second, Yolo v3 was utilized to detect moving objects, such as vehicles, pedestrians, and bicyclists/motorcyclists. ird, a decision tree-based framework was proposed to determine various crash scenarios bin mixed tra c ow environment.

Retinex Image Enhancement Algorithms.
Retinex is an image enhancement algorithm proposed by Edwin H. Land. e basic theory is that the color of an object is determined by the ability of the object to re ect light from long waves (red), medium waves (green), and short waves (blue), rather than the absolute value of the intensity of the re ected light. e color of an object is not a ected by illumination nonuniformity, but possesses consistency. Unlike traditional linear and nonlinear algorithms that only enhance a certain type of image, Retinex algorithm can balance dynamic range compression, edge enhancement and color constancy. us, it can be used for the adaptive enhancements of various image types, which is a feasible choice in this research. Figure 1 shows the theory of Retinex that a given image , can be decomposed into two di erent images: a re ected image , and a luminance image (also called as incident image) , .
e image can be formulated as: Convert it into logarithmic domain: And it can be written as: (1) , = , ⋅ , .
. Journal of Advanced Transportation where , is the output image, * is convolution operator, and , is surround function. e surround function, , is given as: where c is the scale that control the extent of the surround. Mathematically, solving , is a singular problem that can only be calculated by mathematically approximated estimates. e steps of Retinex are as follows: Step 1. Read in the initial image , , and separate , , and channels of the image; Step 2. Convert the pixel values of each channel from integers to oats and convert them to the logarithmic domain; Step 3. Input the scale c, and calculate the value of λ which is equal to 1/ ∬ , ; Step 4. Calculate the value of , of each channel; Step 5. Convert , from logarithmic domain to real domain; Step 6. Stretch , linearly and output in the corresponding format.

Yolo v3. YOU ONLY LOOK ONCE (YOLO) is a state-
of-the-art, real-time object detection system. e core idea of Yolo v3 is to use the picture as a network input, which is to return to the position of the bounding box and its subordinate categories (e.g., vehicles, trees, or pedestrians etc.) directly in the output layer. e overall stages of Yolo v3 which is consisted of four periods are illustrated below.

Bounding Box Prediction.
Sum of squared error loss is used to predict the coordinate value, so the error can be calculated rapidly. Yolo v3 predicts the score of an object for each bounding box by logistic regression. Each bounding box needs four values to represent its position of the images: ( , , , ℎ ), which respect separately: (the coordinate of center point, the coordinate of center point, weight of bounding box, height of bounding box).
where , are the coordinate o sets of the grid, and ℎ , are the side lengths of the preset anchor box, the resulting frame coordinates are , , , ℎ and the network learning goals are , , , ℎ .
If the bounding box prior overlaps a ground truth object by more than any other bounding box prior, then the value is 1. If the overlap does not reach a threshold (setting 0.5), the prediction of bounding box will be ignored, and it is displayed as no loss.

Class Prediction.
To classify di erent kinds of objections, independent logistic classi ers are used instead of a So Max. When training, binary cross-entropy loss is used for the class predictions.
A er learning by Logistic regression classi er, there are a set of weights: 0 , 1 , . . . , , and the features of each sample can be written as 1 , 2 , . . . , , when the data of test samples are input, which can be combined with the weights linearly: e sigmoid function is: e prediction probability in sigmoid function can be expressed as: where g( ) is .
Journal of Advanced Transportation 4 type of crashes. Moreover, since the computational cost of object detection and tracking has already been high, an even more complex framework by integrating other deep learning models (e.g., recursive neural network) would become too compute-intensive. us, in this paper, we consider a simpli ed framework for quick crash detection that can be implemented in practice.
Decision tree model was considered for crash classi cation, based on features obtained from Yolo v3. It has several advantages: (1) the cost of using the tree (i.e., predicting data) is logarithmic; (2) it requires little data preparation and can handle both numerical and categorical data; (3) it is simple to understand and to interpret.
Given training features X i and label y, a decision tree recursively partitions the space: where representing the data at node , is a candidate split consisting of a feature and threshold , , and gℎ are subsets partitioned by the decision tree at node . e impurity at m can be calculated by an impurity function (), the choice of which is based on the task being considered: If it is a classi cation task with outcomes from 0 to for node , representing a region with observations, let

Predictions Across Scales.
Yolo v3 predicts di erent boxes at three di erent scales. Yolo v3 uses FPN (feature pyramid network) to extract feature from scales, and nally predicts a 3-D tensor, containing the bounding box information, object information, and class information.

Feature Extractor.
Yolo v3 uses a complex network for performing feature extraction, which has 53 convolutional layers, called Darknert-53. is new network is much more powerful than Darknet-19 but still more e cient than ResNet-101 or ResNet-152. e loss function of YOLO is: e ow chart of YOLO is shown in Figure 2.

Decision-Tree Based Crash Detection Framework.
In mixed tra c ow environment, crashes could occur between motorists and nonmotorists. us, a motion-based method (e.g., modeling of vehicle interactions, analysis of vehicle activities, etc.) may not have full capability to detect such (10) types observed, including multi-vehicle crashes, pedestrianvehicle crashes, and cyclist-vehicle crashes. Moreover, many low-visibility conditions were included in the dataset, such as dark night with poor lights, heavy rains, and foggy days. In this study, 15000 crash frames and 40000 normal frames were used to create training samples, while the remaining frames were used for model testing.

Results and Discussion.
First, Retinex was utilized to improved image quality. Figure 5 provides some examples of image enhancement. It can be found that more image details can be seen a er the enhancement. A er image enhancement, Yolo v3 was used for detecting objects from images. In the training dataset, crash samples were extracted from videos including fallen people, fallen bicycle/motorcycle, and vehicle rollovers. ose samples were then distorted and scaled to further enlarge the crash sample size. Normal people, bicycle, motorcycle, and vehicles were also collected as normal samples. Figure 6 provides some examples in the training dataset.
A er 5000 iterations, the model became convergent. Figure 7 provides the real-time detection performance based on Yolo v3. e training and testing accuracy of the Yolo model are shown in Figure 8. According to the graph, the training model has no over tting issue. ree crash types were observed in the current video dataset, including: (1) Pedestrian/cyclist related crash: If this type occurs, fallen people, fallen cyclists, stopped vehicle, stopped people, and stopped cyclists could be detected in the scene. where is the training data in node . Parameters are selected such that the impurity can be minimized: e framework is shown in Figure 3.

Experimental Evaluation
In order to validate the framework, an experiment was conducted on a computer with speci cation Intel(R) Core (TM) i5-4200 CPU @ 2.50 GHz (4 CPUs), ~2.5 GHz, 8 GB RAM with NVIDIA Corporation GeForce 840 M.

Dataset Used.
We collected large number of CCTV videos from online since there is no public database for crash detection. Figure 4 shows the samples of the video data. In general, a video clip records 10-20 s before and a er a crash. Our dataset has 127362 frames, in which 45214 contain crash scenes and 82148 are normal frames. ere are various crash gain of the tree model, some features were found as important including: fallen people, IOU duration, stopped people, stopped vehicle, and vehicle rollover. Based on the ndings, three empirical rule-based models were also developed as follows: Rule 1: If fallen people or fallen nonmotorized vehicle is continuously detected during a period (e.g., 10 s), the condition can be determined as a crash. Rule 2: If two vehicles are detected as overlapped during a period (e.g., 10 s), and other stopped people are detected around the vehicles, the condition can be determined as a crash. Rule 3: If a car rollover is detected during a period (e.g., 2 s), the condition can be determined as a crash.
Rule 1 model could detect crash types related to pedestrians and cyclists (e.g., bicycles, motorcycles). A relative long period time detection may avoid miss-detection of those occasionally fallen o . Rule 2 model was designed for nonserious (3) Serious motor-vehicle crash: If this type occurs, vehicle rollover, stopped vehicles, and stopped people/ cyclists could be detected in the scene.
In order to detect those three crash types, a set of features were developed based on Yolo v3 outputs, including: number of moving vehicle (the number of moving vehicles), number of stopped vehicle (the number of stopped vehicles), number of stopped people (the number of moving pedestrians and cyclists), number of moving people (the number of stopped pedestrians and cyclists), fallen people (the number of fallen people), vehicle rollover (the number of vehicle rollover), intersection of union (IOU), and IOU duration. IOU is o en used to measure the overlap between two bounding boxes (e.g., two vehicles). Note that in this study, IOU represents the maximum IOU values that remain unchanged over the observation period, while IOU duration indicates the longest time period that IOU remains changed. A decision tree was trained using these features as inputs, as shown in Figure 9. e average precision is 0.95. According to entropy and information   All model performances were compared, as shown in Figure 10. Figure 10(a) provides the ROC curves of all those crash types, including minor multivehicle and single-vehicle crashes. In those situations, vehicles may not be damaged seriously or no fallen objects could be detected. According to previous literature, such types could be detected by analyzing

Conclusions
e paper proposed a vision-based crash detection framework for mixed traffic flow environment considering low-visibility conditions. Retinex algorithm was introduced to enhance image quality of low-visibility conditions, such as night, foggy, and rainy days. A deep learning model (i.e., Yolo v3) was trained to detect objects in mixed traffic flow environment and a decision tree model was developed for crash detection, considering various crash scenarios between motorized and nonmotorized traffic. e proposed method achieved a hit rate of 92.5% and a false alarm rate of 7.5%. Interesting findings include: (1) the proposed model outperformed empirical rule-based detection models; (2) image enhancement method can largely improve crash detection performance under low-visibility conditions; (3) the accuracy of object detection (e.g., bounding boxes prediction) can impact crash detection performance, especially for minor motor-vehicle crashes.
Overall, the results are encouraging and the framework is promising. Admittedly, there are still some issues that can be further addressed. First, different image enhancement methods could be tried to improve the overall performance. Second, other deep learning method can be used and compared to original Yolo v3 model. ird, other more complex deep learning structure can be examined and compared to the current framework, in terms of accuracy and computational speed.

Data Availability
Data were large number of video clips.

Conflicts of Interest
e authors declare that they have no conflicts of interest. models with Retinex enhancement in mixed traffic flow environment, which indicates the relationship between sensitivity (True Positive) and specificity (False Positive). It can be seen from Figure 10(a), the decision tree model had the best performance than other models, according to ROC curves. e combined rule model outperformed each single-ruled model. According to Figure 10(b), Retinex enhancement made a considerable improvement on crash detection performance for the decision tree model. Without Retinex, the overall performance of the decision tree model appeared to be lower than the combined rule-based model with Retinex.
Overall, the proposed framework can correctly detect 92.5 % of crashes in the testing dataset. e false alarm rate is 7.5 %. e AUC values for all crash detection models are listed in Table 1.
In general, decision tree-based model appeared to be better than empirical rule-based models. Although the proposed framework achieved relatively high detection accuracy, there are still some issues: (1) Fallen pedestrians/cyclists can sometimes be blocked by other objects, increasing false alarm rate. (2) In highly congested mixed traffic flow environment, crashes can also be falsely alarmed. is could be due to inaccurate detection of Yolo v3 model (e.g., bounding boxes prediction). (3) Retinex can handle most low-visibility conditions in this study. However, when video quality is too low, crashes can still be missed or false alarmed. For example, fast-moving vehicle could be sometimes falsely detected as fallen vehicles, according to our observation.

Comparison with the Existing Methods.
Due to the lack of public database, limited research has been identified on this topic. Since most studies were based on private datasets that cannot be accessed, their results are somewhat incomparable. However, we still listed the results here. Yun [39] achieved a detection rate of 0.8950 for crash detection. RTADS [20] reported a hit rate of 92% and false alarm rate of 0.77%. ARRS [21] presented a true positive rate of 0.63 with a false alarm rate of 0.06. Singh [27] reported a hit rate of 77.5% and a false alarm rate of 22.5%. Sadek [40] achieved 99.6% detection rate and 5.2% false alarm rate.
Although some literature has reported high detection rates, such could encounter over-fitting issues due to limited sample size. Second, some literature created complicated deep learning structures, requiring high computational capability.
ird, limited literature has focused on crash detection in mixed traffic flow environment under low-visibility conditions.