^{1}

^{2}

^{1}

^{1}

^{3}

^{1}

^{1}

^{1}

^{2}

^{3}

This study presents a spatial approach for the macrolevel traffic crashes analysis based on point-of-interest (POI) data and other related data from an open source. The spatial autoregression is explored by Moran’s I Index with three spatial weight features (i.e., (a) Rook, (b) Queen, and (c) Euclidean distance). The traditional Ordinary Least Square (OLS) model, the Spatial Lag Model (SLM), the Spatial Error Model (SEM), and the Spatial Durbin Model (SDM) were developed to describe the spatial correlations among 2,114 Traffic Analysis Zones (TAZs) of Tianjin, one of the four municipalities in China. Results of the models indicated that the SDM with the Rook spatial weight feature is found to be the optimal spatial model to characterize the relationship of various variables and crashes. The results show that population density, consumption density, intersection density, and road density have significantly positive influence on traffic crashes, whereas company density, hotel density, and residential density have significant but negative effects in the local TAZ. The spillover effects coefficient of population density and road density are positive, indicating that the increase of these variables in the surrounding TAZs will lead to the increase of crashes in the target zone. The impacts of company density and hotel density are just the opposite. In general, the research findings can help transportation planners and managers better understand the general characteristics of traffic crashes and improve the situation of traffic security.

In 2013, 1.25 million people were killed by the road traffic crashes worldwide and more than 50 million were injured [

The microlevel research focuses on the specific influencing factors of traffic crashes and casualties in the field of traffic safety research. The purpose of the microlevel research is to propose targeted measures to improve the vehicle, road, and environment. It is easy to understand the correlation between these direct contributing factors and traffic crashes. However, from another perspective, the macrolevel research focuses on the relationship between traffic crashes and society, economy, and environment. Compared with the microlevel safety research, the macrolevel safety analysis can identify safety problems more effectively in a larger area, which is more useful in helping establish a long-term planning policy to improve the traffic safety [

Though great progress had been made, the obtainment of data about traffic crashes and related influence factors is the main obstacle for crash analysis in under-developed countries [

More importantly, with the continual development of data mining technology, open source data has raised more and more attention in recent years. The point-of-interest (POI) data are the more specific data of land use factors with exact information of location which are supposed to be highly related to the user characteristics and traffic crashes in both macro- and microaspects [

The remainder of the paper is organized as follows. In Section

A wide variety of exposure variables were described in the traffic crash models in the previous studies. All the factors can be divided into five parts:

As collisions are believed to be discrete, nonnegative, and random, most of the previous literature related to the collision models is accountable for the Poisson regression models. Poisson model requires that the variance of data be equal to the mean, which is difficult to achieve in practice. Therefore, Poisson lognormal (PLN) model and Negative Binomial (NB) regression model are proposed to overcome these shortcomings [

Most studies used spatial models to examine collision spatial correlation (spatial effects), since the cause of the crashes is often related to the specific influencing factors in the nearby areas. There are several ways to study the spatial effect in the models for count data, such as Bayesian hierarchical models and spatial econometric models [

At last, spatial econometric models are mathematical models based on statistical theory that takes full account of population characteristics, social behavior, road traffic, and land attributes. Therefore, it is an important method to evaluate the safety levels in a certain area, and it is regarded as scientific and reasonable [

This study focuses on the macrolevel traffic crashes using the spatial econometric model in TAZs level. The POI data has been used to estimate the spatial spillover effects of traffic crashes influencing factors. In summary, we believe that by distinguishing the vital POI features and quantitative analyzing spatial spillover effects on the occurrence of traffic crashes we can contribute some recommendations to improve safety through traffic control policy management.

This study conducts an empirical research of Tianjin municipality of China. Tianjin is one of the four municipalities and is the industrial center in North China. It has a land area of 11 thousand square kilometers and a population of 15 million. By the end of 2017, Tianjin had 16 municipal districts with a total of 245 township-level districts. It is adjacent to Beijing, the capital of China, as shown in Figure

Spatial distribution of total crashes in the TAZ of Tianjin.

The Traffic Analysis Zones (TAZs) scale has been proven to be reasonable and reliable in previous literature. With the aim of urban traffic planning and management, urban land is divided into several TAZs by Traffic Planning Department according to the principle of land attribute and so on [

Based on the Amap Application Programming Interface (API), the longitude and latitude coordinates of crashes location and POI location were extracted. As shown in Figure

POI point density calculation.

Connection of point layer and surface layer

Calculating the total number of points in TAZs

When there is a multiple collinearity problem between the explanatory variables, it may lead to an estimation deviation of the independent effect of the explanatory variables. Thus, correlation test and linear regression are used to diagnose any multicollinearity problems by SPSS 25 before model parameter estimation. There are strong correlations among retail store density, restaurant density, and entertainment venues density. Firstly, principal component analysis is used to reduce the dimensions of these three variables, and a unified consumption place density variable is obtained. Then all the data were processed through logarithm transforming to eliminate the heteroscedasticity or skewness, avoiding the model being too sensitive to the extreme value and reduce the range of variables [

Descriptive statistical table of variables.

Variables | Minimum | Maximum | Mean | S.D. | VIF |
---|---|---|---|---|---|

Ln_Population_Density | -0.298 | 4.959 | 3.050 | 1.067 | 1.814 |

| |||||

Ln_Admin Dept._Density | -1.921 | 2.410 | 0.171 | 0.581 | 2.669 |

| |||||

Ln_School_Density | -1.672 | 2.594 | 0.206 | 0.613 | 3.518 |

| |||||

Ln_Company_Density | -1.459 | 2.235 | 0.731 | 0.641 | 2.658 |

| |||||

Ln_Hosptial _ Density | -2.107 | 2.167 | 0.244 | 0.608 | 4.899 |

| |||||

Ln_Consumption _Density | -7.834 | 8.728 | 0.000 | 2.791 | 7.716 |

| |||||

Ln_Hotel_Density | -2.015 | 1.799 | 0.096 | 0.452 | 2.020 |

| |||||

Ln_Residential_Density | -2.107 | 2.241 | 0.337 | 0.669 | 4.110 |

| |||||

Ln_Intersection _ Density | -1.882 | 2.016 | 0.580 | 0.686 | 4.458 |

| |||||

Ln_Road_Density | -2.930 | 1.345 | 0.350 | 0.521 | 3.656 |

| |||||

Ln_Toal Crashes | 0.000 | 2.632 | 0.652 | 0.603 |

In this paper, the global Moran’s I Index is applied to measure the spatial autocorrelation between each regional unit and the adjacent regional unit in the TAZ scale. The Moran’s I Index is shown in formula (

where

The spatial weight matrix expresses the adjacency relationship between spatial units. It is pointed out that the choice of spatial weight matrix affects the degree of spatial autocorrelation observed in geographic studies.

The spatial weight matrix generally refers to an adjacency spatial weight matrix (0–1 swf) or a distance spatial weight matrix (GCD swf). The 0–1 swf includes two types of “Rook” and “Queen”. If there is a common boundary between TAZ_{i} and TAZ_{j}, “Rook” adjacency relation is defined as “_{ij}=1”. Otherwise, it is defined as “_{ij} = 0 ”. The “Queen” adjacency relation includes a common boundary or a common point. The GCD swf means that distances (e.g., Euclidean distance and Manhattan distance) are used to reflect the correlation of different zones._{ij} equals the different types of weight between the zones. The “rook”, the “queen” matrix, and “Euclidean distance” weight matrix are used to conduct the study. Every spatial weight matrix is normalized so that the factors of each row are summed up to unity before modeling process.

The spatial econometric model is used to solve the spatial dependence problem. The general spatial econometric model with all types of interactions is shown as follows [

where_{u} refers to the interaction effects among the disturbance term of the different units._{n} is a vector of ones associated with the constant term parameter

After processing the related data, a series of related spatial econometric model are built to obtain the regression results. The main steps are shown as Steps

The Moran’s Index was calculated to explore spatial autoregression of crashes based on different spatial features before the regression analysis.

The traditional OLS model is built. The model is checked by the F-test and the variables are evaluated by the t-test. The Lagrange Multiplier test (LM) and the Robust Lagrange Multiplier test (Robust LM) are used to judge whether a spatial lag or spatial error terms exist in the model.

If the test above indicates the existence of the spatial effect, the SAR and SEM model are built. The log-likelihood, Akaike’s Information Criterion (AIC), and Schwarz information criterion (SIC) [

The SDM will further describe the direct effect, indirect effect, and total effect of the alcohol outlets densities on road crashes.

In this study, parameter estimates were obtained by the maximum likelihood (ML) method based on the above tests. All spatial regression models and tests were conducted by using Elhorst’s spatial econometrics MATLAB toolbox and the software Geoda 1.12.

The spatial autocorrelation of all the crashes was analyzed based on different spatial weighting matrices. The Moran’s Index is calculated by Geoda1.12 and is shown in Figure

Moran’s I Index under different spatial weighting matrices.

Rook

Queen

Euclidean distance

As shown in Tables

Results of Lagrange Multiplier tests.

LM test | Value | p value |
---|---|---|

LM (lag) | 72.093 | 0.001 |

| ||

Robust LM (lag) | 37.998 | 0.001 |

| ||

LM (error) | 39.047 | 0.001 |

| ||

Robust LM (error) | 4.952 | 0.026 |

Note:

Estimation results of OLS, SAR, and SEM model.

Variables | OLS | SAR | SEM |
---|---|---|---|

Ln_Population_Density | 0.184 | 0.161 | 0.171 |

| |||

Ln_Company_ Density | -0.667 | -0.619 | -0.642 |

| |||

Ln_Consumption_Density | 0.089 | 0.086 | 0.087 |

| |||

Ln_Hotel_Density | -0.095 | -0.091 | -0.090 |

| |||

Ln_Residential_Density | -0.071 | -0.077 | -0.071 |

| |||

Ln_Intersection_Density | 0.066 | 0.072 | 0.073 |

| |||

Ln_Road_Density | 0.239 | 0.216 | 0.219 |

| |||

/ | 0.219 | 0.195 | |

| |||

Adjust R^{2} | 0.342 | 0.436 | 0.434 |

| |||

Log-likelihood | -1485.7 | -718.872 | -733.734 |

| |||

AIC | 2987.690 | 2740.370 | 2765.440 |

| |||

SIC | 3032.940 | 2791.280 | 2810.690 |

Note:

Firstly, the results show that the OLS model passed through

Secondly, the results of LM-lag and LM-error reject the null hypothesis at 1% level, Robust LM-lag passes the 1% level of significance test, and Robust LM-error passes the 5% level of significance test. The diagnostic tests indicated clear autocorrelation problems.

Thirdly, the adjust^{2} of OLS model is 0.34 which shows a relatively strong explanatory power for the occurrence of the crashes. However, unlike some studies, it is important to note that the value of goodness-of-fit cannot be used as basis for spatial model comparison and selection [

At the same time, the outcomes indicate that the coefficient

As is shown in Table

Estimation results of SDM model.

Variables | Coefficient | Variables | Coefficient |
---|---|---|---|

Ln_Population_Density | 0.147 | | 0.058 |

| |||

Ln_Company_ Density | -0.612 | | -0.056 |

| |||

Ln_Consumption _ Density | 0.086 | | 0.002 |

| |||

Ln_Hotel_ Density | -0.089 | | -0.027 |

| |||

Ln_Residential_density | -0.084 | | 0.009 |

| |||

Ln_Intersection _ Density | 0.075 | | -0.080 |

| |||

Ln_Road_ Density | 0.208 | | 0.124 |

| |||

| 0.183 | ||

| |||

Log-likelihood | -713.783 |

Note:

In addition, direct effects, spillover effects, and total effects of explanatory variables are further studied. As is shown in Table

Direct, indirect, and total effects of SDM model.

Variables | Direct Effect | Indirect Effect | Total Effect | |||
---|---|---|---|---|---|---|

Coefficient | T Statistic | Coefficient | T Statistic | Coefficient | T Statistic | |

Ln_Population_Density | 0.154 | 1.951 | 0.102 | 4.387 | 0.256 | 2.544 |

| ||||||

Ln_Company_ Density | -0.620 | -16.946 | -0.199 | -2.485 | -0.819 | -8.503 |

| ||||||

Ln_Consumption _ Density | 0.089 | 2.297 | 0.023 | 0.252 | 0.112 | 1.102 |

| ||||||

Ln_Hotel_ Density | -0.091 | -5.641 | -0.052 | -1.736 | -0.143 | -4.285 |

| ||||||

Ln_Residential_density | -0.084 | -2.509 | -0.004 | -0.072 | -0.088 | -1.402 |

| ||||||

Ln_Intersection _ Density | 0.070 | 2.248 | -0.079 | -1.145 | -0.009 | -0.121 |

| ||||||

Ln_Road_ Density | 0.214 | 6.667 | 0.187 | 2.408 | 0.401 | 4.631 |

Note:

The spillover effect coefficient of population density and road density is positive, indicating that the increase of these variables in the surrounding areas will lead to the increase of crashes in the target areas. The impact of company density and hotel density is just the opposite. The result of consumption density is same as other documents [

As is shown in Table _{1} is adjacent to five traffic zones. If there are many intersections in the adjacent zones, drivers will face more traffic conflicts and risks in the process of going through the surrounding traffic zones to_{1}. Therefore, the increase of intersections in adjacent areas will lead to a reduction in crashes in the region. However, the indirect effect and total effect for intersection density are not significant. It means that the negative effect of intersection density should not be considered.

Impact analysis diagram of traffic network conditions.

Although the intersection density and road density are all represent road complexity, the latter is slightly different. The density of the road network in the adjacent area represents the accessibility to the_{1} area. The greater the road density in the adjacent areas, the easier the driver will reach the_{1} area, which will lead to crash in the area. Therefore, the density of adjacent intersections inhibits the arrival of vehicles in the region, so the spillover effect is negative, while the density of road network in adjacent areas promotes the arrival of vehicles in the region, so the spillover effect is positive.

This study proposed a spatial econometric model to evaluate the spatial direct effect, spillover effects and total effects based on open source POI data and related data. Our main findings are as follows.

By using multivariate regression and spatial analyses, the results confirm that a clear spatial association exists among the traffic crashes in Tianjin of China. A high population density and road density in the region and adjacent areas will lead to an increase of crashes. More interestingly, this study can serve as a proof-of-concept that the direct effect and spillover effect of the company density and hotel density are negative, which indicates that the increase of the hotel density will lead to the decrease of crash rate in the target area and adjacent area. It is partly because that most companies have strict rules in working. Another reason to explain this might be that a free shuttle bus operates between the company location and the homeplace; the professional driver of the bus will reduce the crashes. The reason about hotel density may be that most of people living in hotels are strangers and most of them use bus or taxies rather than self-driving to hotels, which results in a lower probability of crashes.

Based on the findings of this empirical study, there are some important and direct implications for transport policies. Firstly, the results will be helpful to make traffic safety planning based on the POI data. Traffic safety planning requires that traffic safety be taken into account at all levels of traffic planning, emphasizing the prediction and planning of safety level from macro- to microlevel. The macrolevel crash prediction model may aid transportation agencies in more proactively incorporating safety consideration into the long-term transportation planning process [

Though some important discoveries have been revealed in this study, there are still some research limitations. First, the POI densities only reflect the number of different points of mixed land use, but not the size. Thus, further research needs to be carried out through more detailed investigations to get more specific data about different types of POI. Second, different types of traffic crashes have different influence factors. For example, the occurrence of drunk driving may be more likely to relate to restaurants density. It is necessary to study on typical traffic crashes in the future. Third, spatial heterogeneity of crashes should not be ignored.

This empirical research investigated different techniques to estimate the correlation of POI data and other related data with traffic crashes at a macrolevel. Four types of models are discussed in the paper, i.e., OLS, SAR, SEM, and SDM. Data from 2,114 TAZs in the city of Tianjin were applied to develop the macrolevel crash models which incorporate the original covariates related to land use and environment. The results indicate that spatial effect, especially the spillover effect, should be considered to build the crash model and the spatial model can be much better than the traditional OLS model. In contrast with the safety performance of spatial regression models, it has shown that the Spatial Durbin Model enjoys the highest degree of fitting.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This research was funded by the National Key Research and Development Program of China, grant number 2017YFC0803903, the Key Project of the Natural Science Foundation of Tianjin city, grant number 16JCZDJC38200, and the Major Projects in Artificial Intelligence Science and Technology of Tianjin city, grant number 17ZXRGGX00070.