Spatial and Temporal Distribution Analysis of Traffic Accidents Using GIS-Based Data in Harbin

,


Introduction
e casualties and property damage caused by road traffic accidents are serious. According to the World Health Organization [1], approximately 1.3 million deaths and injuries occur as a result of traffic accidents each year worldwide. Road traffic-related deaths are the 8th leading contributor to human deaths worldwide. In 2016, 63,093 people died in traffic accidents in China, 1.7 times more than in the United States. e death rate per 10,000 vehicles reached 2.72, 2.1 times that in the United States [2]. To decrease the number of traffic accidents, it is necessary to determine where and when accidents occur frequently. Previous studies have shown that accidents occur with certain spatial and temporal patterns. Areas with a high frequency and severity of crashes vary over time and occur in different areas within cities. erefore, the frequency and severity of accidents are combined to understand the spatial and temporal distribution patterns of accidents.
is information helps traffic managers take targeted preventive measures to reduce fatalities and injuries.
Most of the previous traffic accident data analyses used mathematical statistical methods [3][4][5].
ese methods mainly use accident frequency to determine the location of accident hot spots [6]. eir operation and expression are relatively simple, but these methods have many limitations, such as the lack of visualization and inability to connect space and time. In contrast, spatial statistics methods can be used to visualize the spatial distribution characteristics of traffic accidents through GIS technology. Compared with traditional mathematical statistics, spatial statistics fully utilize the advantages of GIS in spatial data processing. On the one hand, the distribution of traffic accidents can be visualized through GIS visualization technology [7][8][9]. On the other hand, by using a variety of spatial analysis tools in GIS, scholars can explore the spatial distribution characteristics of traffic accidents and the spatial relationship between different traffic accidents from a variety of perspectives [10][11][12]. e most common spatial statistical methods in GIS are density analysis, which accomplishes spatial visualization of accidents through kernel density and point density methods [13][14][15][16], cluster analysis, which can identify the spatial distribution of traffic accidents as aggregation, diffusion, or random distributions by nearest neighbor distance, and Ripley's K function method [17][18][19], which can identify traffic accident hotspot areas by hotspot analysis [20][21][22] and spatial autocorrelation analysis [23][24][25].
In general, existing studies have reported a series of results in the analysis of traffic accident data, but there are still some shortcomings. First, in previous studies on accident density analysis, only the accident density was considered, and the influence of road network density on the accident density was not considered. Second, the measure of traffic safety level includes not only the frequency of traffic accidents but also the consequences of the severity of the accidents themselves. Although previous studies have examined the spatial distribution characteristics of regions with higher accident severity, few have linked accident frequency with the spatial distribution characteristics of accident severity. erefore, in view of the shortcomings of previous studies, accidents were first spatially located by geocoding methods in this paper. Second, accidents were divided into four categories according to the season in which they occurred.
en, using density analysis and cluster analysis, areas with a high frequency and severity of traffic accidents were identified. Finally, the results of the two analysis methods were combined to determine accident-prone areas of different severities.
is paper aims to investigate the spatial and temporal patterns of accidents from two perspectives: accident frequency and accident severity. e remainder of the article is arranged as follows. Section 2 describes the traffic accident data. Section 3 shows the main methods used in this study. e results of density analysis and cluster analysis are presented in Section 4. Finally, Section 5 presents the conclusions.

Data Processing
is study focused on the city of Harbin, located in northeastern China. Harbin covers an area of approximately 53,100 km 2 and had a population of 9.55 million in 2017.
is study focused on only the main urban area of the city. Each data point contains basic accident information, such as time of occurrence, location of occurrence, accident casualties, road type, and weather data. In this research, some of this information (shown in Table 1) was selected for the study.
In GIS, the location of a traffic accident is generally marked by latitude and longitude coordinates. However, latitude and longitude were not included in the raw data. erefore, the longitude and latitude coordinates needed to be determined from the description of the traffic accident location.
en, the spatial location of the accident was finalized. is process is called geocoding [26]. Tian et al. [27] evaluated the quality of four mainstream geocoding services in China (Baidu, Gaode, Sogou, and Tencent). e service quality of Tencent's geocoding API was considered relatively high, with higher data quality and more complete address data than the other services. erefore, Tencent was selected to complete the conversion of the accident coordinates and then import the converted data into GIS. When the Tencent API returns the coded result, it also returns the reliability of the results. Reliability values range from 1 (low reliability) to 10 (high reliability). A result is considered credible when it has a reliability score of 7 and above, so these results were retained in this study. After the geocoding process, 5850 accidents were identified for further study. Figure 1 illustrates the overall spatial distribution of traffic accidents from 2016 to 2018. e figure suggests that the accidents occurred mostly in the central urban area.

Comap Method.
e comap technique can help us to recognize the location of traffic accidents over time. It has been widely used in temporal-spatial integration [28,29]. In this paper, traffic accident data from 3 years were divided into four subsets by season in accordance with Harbin's climatic conditions. en, density analysis and cluster analysis were applied to calculate the intensity of each subset. Finally, the results were arranged sequentially in a graphic to show the spatial distribution of traffic accidents over time. According to the suggestion of related literature [30], class boundaries should overlap. As shown in Table 2, accidents were divided into four subsets by season.
ere is some overlap between subsets to avoid temporal boundaries.

Density Analysis Method.
is study used point density and line density to identify spatial patterns of traffic accidents. e former is obtained by calculating the number of accidents per unit area. e latter is obtained by calculating the length of the section per unit area. e calculation of the density analysis was performed in GIS with the neighborhood method. For example, the study area was divided into several small square cells with side length d when calculating the density of the accident points. Each cell ultimately corresponds to a pixel in the output map. e accident density in the region where cell k is located is D k accident , and the radius of the neighborhood is set to r. N k (r) is the number of accidents within a neighborhood centered at the center of cell k and with radius r. e point density is calculated as follows: (1) e road network density in the area where cell k is located is D k road . Similarly, L k (r) is the length of the road within the same neighborhood. e specific formula is as follows: Journal of Advanced Transportation In addition, d and r need to be determined in the actual calculation.
ey are usually obtained in GIS from the minimum of the output image height and width, which are 1/30th and 1/250th of the minimum. e study area is between 126.15 and 127.15 east longitude and 46.09 and 45.52 north latitude. e minimum output image width was obtained after longitude and latitude were converted to the actual distance. A cell length of 230 m and a neighborhood radius of 1900 m were selected for the density analysis.

Cluster Analysis Method.
Cluster analysis is a more rigorous data analysis process. Spatial clustering analysis divides collections of physical or abstract objects in spatial data into similar classes. In turn, the spatial patterns of similar classes of data are obtained [31]. In this paper, outlier analysis was performed in GIS to explore the spatial pattern of accident severity. First, this approach is in contrast to traditional cluster analysis approaches, such as hierarchical or divisional cluster-based methods.
ese methods can determine only whether a sample belongs to a certain category [32]. However, outlier analysis identifies samples that do not belong to any category.
is provides a more comprehensive analysis of the spatial pattern of accidents. Second, the calculation of the outlier analysis is based on the attributes of the individual accident sample points. Original data attributes are largely preserved. is facilitates an indepth study of the accidents.
Outlier analysis is performed by calculating the local Moran's I of an accident. It measures the correlation between the attributes of each incident point and the values of other neighboring points. Outlier analysis is calculated as follows: where I i is the local Moran's I statistic of data point i, n is the total number of accidents, x i and x j are the attributes of data points i and j (which correspond to the accident severity in this paper), X is the global mean of the attribute, ω i,j is the spatial weight between i and j, and S 2 i is a second-order sample matrix of the attributes of data points.
Formally, S 2 i can be expressed as where the z I i score for the data point is calculated as where E[I i ] and V[I i ] can be expressed as ere were five types of statistical results of the outlier analyses: high-high clustering (H-H), high-low clustering (H-L), low-high clustering (L-H), low-low clustering (L-L), and nonsignificant. In general, a 95% confidence level was used to signify statistical significance. In other words, a result was considered statistically significant when the p value was less than 0.05.
e corresponding z-score should have ranged between −1.96 and + 1.96 according to the normal distribution. If a result was statistically significant and I > 0,   then the data point had the same level of high or low attributes as the adjacent points. e attribute values of the point were compared with the average attribute values of all the data points to determine whether the results indicated H-H or L-L clustering. If I < 0, then the properties of the data point differ significantly from those of adjacent points, and the point is an outlier.

Density Analysis
e frequency of accidents per unit area is an indicator. is indicator is used to measure the level of traffic safety on urban roads. In this section, point density analysis was used to calculate this value. Moreover, the temporal and spatial distribution of traffic accidents was generated. is approach helps to determine whether an accident hot spot is subject to temporal fluctuations in the accidents. In addition, the maximum value normalization method was used to normalize the density values. is method facilitates the classification and comparison of accident density intervals.
According to relevant literature [28], the accident density results were classified into 3 levels. Density values of 0.5 and 0.8 were used as the two cutoff points. Density values between 0 and 0.5 indicate a low-density area of accidents, values between 0.5 and 0.8 indicate an intermediate density area of accidents, and values between 0.8 and 1 indicate a high-density area of accident. e density calculation results are shown in Figure 2.
As shown in Figure 2, the number of accidents was similar in spring and fall. e locations of intermediate to high densities of accidents were similar in these two seasons. In addition, the number of accidents was higher in summer than in the other seasons. e areas with intermediate to high densities of accidents were also more widely distributed in summer. e high-density accident area statistics are shown in Table 3. e high-density accident areas were found in the same administrative divisions in spring, summer, and autumn. In winter, these areas were located only in the Daoli District. e high-density area was the largest in summer, at approximately 5 km 2 . e accident rates in high-density areas were highest in the fall.
Statistics of areas with an intermediate density of accidents are shown in Table 4. e areas with high densities of accidents were found in the same administrative divisions in spring, summer, and autumn, and these areas were mainly located in the Daoli, Daowai, Xiangfang, and Nangang Districts. e areas with a high density of accidents in winter in addition to the other three seasons include the Pingfang District. e area with an intermediate density of accidents is largest in summer, at approximately 35.22 km 2 . e accident rates in areas with an intermediate density of accidents were highest in winter.
In conclusion, areas with intermediate to high densities of accidents were concentrated near large shopping malls, schools, and hospitals (yellow circles), especially in the vicinity of the First Hospital and Ha Station. ese areas were larger in the summer and winter and smaller in the spring and fall. In addition, the accident density in some areas (yellow circles) was found to have not changed significantly in space and time through the comap technique. However, the density in some areas fluctuated in space and time. For example, the green box in Figure 2(d) was not identified in any of the other seasons. Notably, the southern area that had a high density of accidents in the winter (highlighted by the box in Figure 2(b)) was not identified as a high-density area in any of the other seasons.

Distribution of Accident Point Density considering Road Network Density.
e density of accident points per unit area was determined in the previous section. However, that analysis did not consider the density of the road network.
e spatial-temporal pattern of accidents was not fully reflected. erefore, a new spatial-temporal accident pattern was obtained by calculation. is calculation was performed by dividing the point density by the road network density (D k accident /D k road ) to obtain the accident frequency per unit road length. e division of the density value is consistent with the above. e new pattern, which considers the road network density, is shown in Figure 3.
A significant change is shown in Figure 3. e spatial and temporal patterns of accidents differed significantly by season. In addition, new areas with intermediate to high densities of accidents were identified. e statistics of new high-density areas of accidents are shown in Table 5. A comparison of the accident density with the administrative divisions indicated that, in spring and fall, accidents occurred in the Daowai and Xiangfang Districts, while in summer and winter, they occurred in the Daoli District. A comparison of accident density and road networks indicated that, in spring and autumn, accidents were concentrated on Nanzhi Road, Pioneer Road, Gongbin Road, and Hongqi Street. In summer and winter, accidents were concentrated on Pioneer Road and South Straight Road. e area with a high density of accidents in autumn was the largest, at approximately 14.11 km 2 . Accident rates in the high-density areas were largest in spring. e statistics of the updated intermediate density areas of accidents are shown in Table 6 In summary, the size of areas with a high density of accidents in spring and winter was smaller, while that in summer and autumn were larger and more widely distributed. In addition, a comap was generated to demonstrate temporal-spatial patterns of accidents in each season. Some areas (marked by boxes 1, 2, and 3 in Figure 4)     Compared with the results shown in Figure 2, both analyses demonstrated relatively similar areas of intermediate and high densities of accidents, although significant changes were observed in some areas. Some findings were obtained after considering the road network density. First, the color of some areas was lighter in Figure 3 than in Figure 2. For example, the areas in the yellow circle in Figure 2 and the green box in Figure 2(d) became areas with low or intermediate densities of accidents. is indicates that the density results in these areas were due to an overly dense road network. Second, new areas with intermediate to high densities of accidents were identified in each season. e frequency of accidents per unit road length is higher in boxes 4 and 5 in Figure 3(b) and box 4 in Figure 3(b). Finally, there were no significant changes in the area (box 3 in Figure 3(b)). Whether considering the frequency of accidents per unit area or per unit road length, intermediate-to high-density areas of accidents were identified in this area.

Cluster Analysis.
e previous section presents the results obtained when the frequency of accidents was analyzed without considering the severity of accidents. Traffic managers measure the severity of accidents considering not only the frequency of accidents but also the property damage and casualties caused by the accidents. In fact, if an area occasionally has a particularly serious accident, it deserves more attention than areas where minor accidents occur frequently. erefore, according to the relevant literature and the data, the accident severity was classified into 3 levels in this section (as shown in Table 7).
ere are certain patterns in the distribution of traffic accidents of differing severity in time and space. In this analysis, accident severity was used as a factor to evaluate the results of the spatial clustering of accidents. e clustering results are shown in Figure 4. e position of the box in Figure 4 corresponds to that in Figure 3. e dark red points (H-H) represent the highseverity accident class. e dark blue points (L-L) represent the low-severity accident class. e light red points (H-L) indicate a few high-severity accident points contained within the spatial extent occupied by many low-severity accident points. e light blue points (L-H) indicate a few low-severity accident points within the spatial area occupied by many high-severity accident points. e gray points indicate that the incident points had no obvious clustering features.
As shown in Figure 4, different types of clustering features were identified in each season. Some areas of Pingfang District showed a tendency for high-severity accidents. Likewise, this pattern was also observed in the southwestern part of the study area in spring. Most of these areas were concentrated in peripheral and suburban areas away from urban centers. In contrast, most accidents are dispersed within urban centers. To a certain degree, traffic accidents can cause casualties and property damage, although most accidents have a low severity. In addition, the traffic accidents in the northeastern area of Daoli District in winter were in the L-L class, indicating that this area was a hot spot for low-severity accidents.

Combined Density Analysis and Clustering Analysis.
is approach allows us to clearly understand the results of combined clustering and density analyses. By combining these analyses, accident severity in areas with intermediate and high densities of accidents was identified. More importantly, these areas had a ranking that should be noted. Traffic managers could thus target individual areas for better management and regulation. For example, in spring, there were three areas with intermediate to high densities of accidents (surrounded by boxes in Figure 4). For example, M m is the number of accidents in Region 1 after the cluster analysis, with m � 1, 2, 3, 4, 5, representing accidents in the H-H, H-L, L-H, and L-L classes and nonsignificant accidents, respectively. e proportion P m , which indicates the likelihood that this region will eventually exhibit a certain clustering feature, is calculated as Similarly, the proportion of clustering results for each season was calculated using equation (7). e calculation results are shown in Table 8. e clustering results for some regions were nonsignificant and are not presented in this table. As shown in Table 8, accidents in the L-L class account for a relatively large proportion of accidents in Region 1 (Figure 4(a)), in Regions 1 and 4 (Figure 4(b)), and in Regions 1 and 2 (Figure 4(d)).
is result indicated that although the frequency of accidents per unit length of roadway is greater in these areas, the severity of accidents is   (Figure 4(c)). is result indicated that the frequency of accidents per unit road length is higher in this area and the severity of accidents is higher.

Conclusion
(1) In this paper, density analysis was combined with comap technology to study the spatial and temporal patterns of traffic accidents by season from the perspective of accident frequency. e results show that the accident density distribution is more diffuse in summer and winter when the road network density is not considered. After considering road network density, the accident density distribution is more diffuse in summer and autumn. (2) Accident severity is divided into three levels: accidents that cause property damage, accidents that lead to injury, and fatal accidents. Based on these three levels, cluster analysis was used to explore the spatial and temporal patterns of accidents. e following conclusions were drawn. Traffic accidents in urban centers mostly show characteristics of the H-L and L-L classes. Traffic accidents in the central part of the city are generally low-severity accidents. Conversely, the results show that accidents with characteristics of the H-H class were mainly found in the outer urban areas and suburban areas. is indicates that most traffic accidents in these areas are high-severity accidents.
(3) Density analysis is a regional analysis. is method reflects a coarse picture of the spatial-temporal pattern of accidents. Clustering analysis can be accurate at the level of the accident points. Urban areas prone to accidents of different severities were identified by combining two methods. e results show that the accident severity was lower in Region 1 in spring, summer, and winter, although the frequency of accidents was higher. In Region 5 in the fall, not only was the frequency of accidents greater but also the severity of accidents was generally higher. (4) Due to data limitations, further analysis of causal factors of accidents leading to spatial patterns is needed. e next step will be to conduct an in-depth study of the causes of various types of traffic accidents, taking into account road characteristics and infrastructure, with a view to providing a more detailed basis for traffic safety management.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper. Journal of Advanced Transportation 9