Analysis of Traffic Accident Based on Knowledge Graph

,


Introduction
e prosperity of the transportation industry not only brings positive benefits to the society but also brings negative impacts that are difficult to reconcile [1]. Among them, the frequent occurrence of road traffic accidents has become one of the important factors inhibiting the steady development of cities. Faced with the severe road traffic safety situation, it is urgent to carry out research on road safety analysis and active prevention and control of road risks.
e traffic management departments have accumulated a large amount of accident data in their daily management work. How to discover and reuse the potential value of these data, and dig out the potential laws and inducing factors of accidents, has become a major research hotspot today [2]. In 2017, "the 13th Five-Year Plan for Road Traffic Safety" issued by the State Council pointed out that the comprehensive collection of traffic accident data can promote the improvement of traffic safety big data to provide data basis and theoretical analysis support for the improvement of traffic safety [3]. At present, the domestic traffic control departments have accumulated a large number of original accident data by adopting standardized accident information collection technology, including the specific data of "people, vehicles, roads, and environment" related to the accident. However, the data value has not been fully excavated and remains in the descriptive statistics of the four indicators of the accident. How to excavate the hidden value of traffic accident data to prevent and reduce the occurrence of traffic accidents, and combine the relevant advanced traffic safety technology [4][5][6][7][8] to put it into practical application instead of "empty talk," has become one of the key directions of current research.
ere are many ways of traffic accident data mining. At present, there are three main research angles.
(1) Descriptive analysis of accident data based on traditional statistical analysis. e road traffic accident information system of China contains more than 60 items of accident data, which can describe the situation objectively and comprehensively situation when the accident occurs. Simple statistical analysis of the data collected in the system is the basic reference for China to formulate traffic safety management planning [9]. In addition, in 2005, the Shanghai Traffic Police Corps, together with Tongji University and the German Volkswagen Group, carried out the research on road traffic accidents in China and statistically analyzed the national standardized accident data collected. e mechanism of traffic accidents was summarized [10]. (2) Based on big data algorithm, the hidden value information of accident data is mined deeply from the point of view of data association and data collision. e literature [11] adopted the analysis method of data mining technology and multistandard decisionmaking method to mine the French traffic accident database BAAC, ranked the importance of the mining association rules by ELECTRE method, and selected the association rules with higher ranking as the basis for formulating accident prevention strategies and policies, so as to improve the road safety environment in France. e literature [12][13][14] used Apriori algorithm or FP growth algorithm to mine association rules among various factors in traffic accidents to provide data decision support for accident early warning and traffic safety management.
(3) Visual analysis of accident data is carried out based on data understanding. e combination of human cognition and machine cognition through the human-computer interface improves the human ability to understand huge and complex data. e literature [15] designed and implemented a multiview association visualization analysis system based on the spatial semantic enhancement model of traffic accident data, revealing the spatial semantic mode of traffic accidents for users and contributing to indepth analysis of the causes of traffic accidents.
Knowledge graph (KG) [16] is a new research method of data mining, which represents the mutually independent entities and their relationships in the objective world in the structured form of the graph and forms the basic units of the graph in the form of "entity-relation-entity" triple. e relationship is the link that connects the entity, and both the entity and the relationship can have attributes, thus forming a network structure [17,18]. In short, through the graph structure pattern of triple [19], KG transforms the knowledge in the objective world into a structure that can be understood and processed by machines and can display intuitive and visual characteristics to be understood and reused by human beings.
e literature [20] established a traffic knowledge graph based on multisource and heterogeneous data, which combines the four elements of "people, vehicles, roads, and environment." At the same time, based on the relationship among three kinds of traffic events (traffic accident, traffic congestion, and traffic feedback), the traffic reason graph was established, and the recognition of text traffic incident and Weibo traffic incident was realized by using knowledge and reason graph. It provided a solution for finding traffic problems and early warning of traffic accidents. e literature [21] (2021) used Word2vec word vector model to extract and classify the keywords (accident features and accident cause attributes) in traffic accident text and generated the knowledge graph of traffic accident domain based on Neo4j and realized the visual analysis of it based on Gephi. Traffic accident data, as the basic data of traffic safety research, can provide multidimensional data support for traffic safety management decision-making. e integration of knowledge related traffic accident based on KG can effectively give full play to the value of data resources and inject knowledge support into traffic safety management decision-making, which has important research value to improve traffic safety.
Based on the structured traffic accident case data, this paper establishes the graph structure and visual traffic accident knowledge graph, that is, the traffic accident knowledge graph that integrates "people, vehicles, roads, and environment." e knowledge is stored by using Neo4j graph database. e multidimensional and multilevel analysis of accidents is realized by using Cyber sentences, including accident portraits, accident classification, accident statistics, and accident association path analysis.

The Definition of Knowledge Graph and Its Constituent Elements
Taking the word "knowledge graph" apart analyzes its meaning. First of all, "knowledge," from a philosophical perspective, is the achievements obtained by human beings from various ways of life and production, which come from the objective world, and is the systematic understanding of human beings to summarize, refine, and sublimate all kinds of facts, descriptions, information, and so on. From the "system of knowledge and wisdom of data information" described by Rowley J, that is, the DIKW system [22], as shown in Figure 1, the formation of "knowledge" goes through a process from data to information and then to the transformation of knowledge, which is the cognition of human beings after processing data. e ultimate destination of "knowledge" is the "wisdom" used by human beings, that is, the application of knowledge. Second, the meaning of "graph" is to form a network structure in the form of "graph." In the study of graph theory, in 1878, "graph" was first proposed by Sylvester [23]. A graph is composed of multiple nodes and multiple edges as shown in Figure 2, which shows a simple graph composed of five nodes and six edges. To sum up the above definition, the knowledge graph is to express the knowledge in the form of graph. e nodes of a graph represent concepts or entities, and the edges represent the relationship between nodes. With an objective fact that "Taiwan is a provincial administrative region of China" as an example, the semantic relationship of this fact can be expressed as a language that can be understood by machines in the form of "China, provincial administrative region, Taiwan." Among them, "China" and "Taiwan" are two nodes, and the edge is used to indicate that the relationship between the two nodes is a "provincial administrative region." Some domestic experts and scholars have also made some descriptions on the de nition of knowledge graph. e literature [24] de ned it as: " e knowledge graph describes concepts, entities and their relationships in the objective world through a structured way, expressing the information of the Internet into a form closer to the human cognitive world. It provides a better ability to organize, manage and understand the massive information on the Internet." e literature [25] de ned it as: "Knowledge graph is essentially a knowledge base of semantic network [26], that is, a knowledge base with directed graph structure, in which the nodes of the graph represent entities or concepts, and the edges of the graph represent various semantic relationships between entities or concepts." e unique de nition of knowledge graph has not been given in academic circles, but the commonness of these de nitions is that entities and relations are the basic elements of knowledge graph.
(1) Entity: it refers to the concrete things that exist in the objective world, corresponding to the ontology in the semantic network. For example, in a tra c accident, the entity can be the name of the person involved in the accident, illegal behavior, license plate, a certain road, rainy day, and so on. (2) Concept: it is also known as the type of entity, which is an abstract generalization of things that share common characteristics. For example, the concept of "person" is a summary of "the name, age, and illegal behavior of the person involved in the accident," in which "person name and illegal act" are the entity of "person." (3) Relationship: it refers to a connection between different concepts or entities. For example, there is a "party to the accident" relationship between "accident" and "person." In addition, attributes can be used to describe certain characteristics of an entity or relationship. For example, the attributes of age, gender, and so on, can be included in the entity of "name." General KG and domain KG are the two major categories of current KG. e former, as its name implies, is a general domain-oriented and common-sense knowledge graph, which mainly serves web search and encyclopedic questions and answers, such as DBpedia [28], Yago [29], and Freebase [30], while the latter is oriented to a speci c vertical professional eld with professional knowledge, such asnancial knowledge graph [31], medical knowledge graph [32], and so on. e tra c accident knowledge graph constructed in this paper is obviously aimed at the domain knowledge graph of the vertical professional eld of transportation.

Construction of Traffic Accident Knowledge Graph
Tra c accident knowledge graph is a knowledge graph oriented to the professional eld of transportation, and its construction process and objectives need to be determined according to the requirements of professional knowledge. Following the construction principle of "beginning with demand and ending from application," this paper divides the life cycle of constructing tra c accident knowledge graph into ve stages, that is, knowledge demand, knowledge modeling, knowledge extraction, knowledge storage, and knowledge application as shown in Figure 3. Among them, knowledge modeling, knowledge extraction, and knowledge storage are the core links of constructing knowledge graph.
Knowledge modeling means that based on the analysis of data sources, concepts and relational patterns are selected to form a knowledge structure that meets knowledge demand, that is, concept and relational schema design. Knowledge extraction is to extract knowledge elements from multisource data. Knowledge storage is to store the acquired knowledge in the database for the application of knowledge.
As the data of this study are structured road tra c accident case data, the main problem is how to map the entities, relationships, and attributes contained in a larger number of data into the database completely. Based on the above life cycle diagram, the speci c process of constructing the tra c accident knowledge graph is described as follows: rst, according to the existing data sources, the knowledge demand for tra c accident analysis is determined. en, knowledge extraction of entities, relationships, and attributes is carried out from structured data. All elements are written into the Neo4j graph database to complete the construction of the tra c accident knowledge graph. Finally, based on this KG, multidimensional visualization analysis of accident data is realized by the Cypher query statement.

Knowledge Demand Analysis of Tra c
Accidents. e application value of knowledge graph in tra c accidents is mainly re ected in the following aspects:  (1) Knowledge graph can play a comprehensive data supporting role in accident analysis. On the basis of the knowledge graph of traffic accidents, multilevel and multidimensional accident analysis can be carried out according to the knowledge semantic relation network, which broadens the ideas of accident analysis and accident prevention. (2) e accident knowledge network formed by the knowledge graph can present all kinds of accident query results in a visual way. Given an entity, it can search for another entity along its relationship path and finally display the accident query results with a network diagram composed of entities and relationships, which is a visual approach of accident knowledge interpretation. (3) Knowledge graph can realize the accident risk analysis with human as the core. e accident risk database of the driver is established by integrating the driver's accident record, traffic violation, driving vehicle, age, driving age, and other information into the map. rough the selection of risk characteristics and quantitative construction of accident risk early warning model, accident early warning for drivers, especially long-distance truck drivers, can reduce the occurrence of serious accidents.
At present, the knowledge graph is still in the exploratory research stage in the aspects of traffic accident knowledge integration and accident prevention. With the continuous development of artificial intelligence and big data technology, knowledge graph will play a huge advantage and wide application prospect in accident analysis, accident risk prediction, traffic management decision support, and other aspects.
e data used in this paper are traffic accident case data, which is structured relational data. e data cover the period from January to December 2017, with 9,941 pieces of data. Each row of data represents the information recorded by the traffic police after an accident, including the accident number, the location of the accident (road), the time point of the accident, the type of road, the jurisdiction to which the road belongs, the cause of the accident, the form of the accident, the identity of the parties involved in the accident, negligence and illegal behavior, the license plate information of the vehicle involved in the accident, and the type of vehicle.
Combining the above advantage analysis and data source structure, the traffic accident knowledge graph constructed in this paper aims to effectively mine the value information contained in the traffic accident case data. A graph database containing traffic accident characteristic factor information and accident result information is established, based on which multidimensional and multilevel and visual accident analysis is realized, such as accident portrait, accident distribution, accident statistics, and so on.

Knowledge Modeling Design of Traffic Accident.
Knowledge modeling design includes conceptual pattern and relational pattern, which is for the abstract mapping of the concepts and relationships of real things. A concept is an abstract description of a certain type of entity in the objective world. Entities of the same type may have different attributes. Relationship is to explain the existence of some kind of link between entities, which is diverse.
According to the demand of accident knowledge, this paper initially forms the core concept of traffic accident and then combines professional knowledge to expand the relationship between entities, so as to lay a good foundation for knowledge extraction [33]. e core concepts in the field of traffic accidents established in this paper are shown in Table 1. e field relationship of traffic accidents is shown in Table 2.
In order to distinguish concepts, relations, and attributes, this paper adopts different distinguishing marks to express them. e concept labels are in English and uppercase.
e relationship labels are in English and the content words are in uppercase. e attribute labels are in English, and the content words are in lowercase, such as concept tag: ACCIDENT, PEOPLE, VEHICLE, ENVI-RONMENT, etc.; relationship tag: Located_in, Juris-diction_over, Weather, etc.; attribute tag: accident_time, person_age, person_gender, etc.  According to the concept and relationship description in the field of traffic accidents, the conceptual and relational pattern structure diagram of traffic accident knowledge graph finally formed is shown in Figure 4.

Knowledge Extraction of Traffic Accident.
Entity is an objective thing in the real world, a concrete example in the concept layer, and a key knowledge element in the knowledge graph. Entity extraction is a process of recognition, that is, entities with specific meaning are identified in some way. e entity scope in the field of traffic accidents studied in this paper is mainly the concepts listed in the conceptual pattern design.
In general, entities and relationships have strong personalized features, and they are constantly increasing and updating, so knowledge extraction needs to be carried out according to the given corpus and features. Knowledge extraction technology refers to the technology of obtaining entities and relationships from multisource and heterogeneous data, which is not only the basis of constructing knowledge graph but also the derivation of big data technology. With the explosive growth of data, how to obtain useful knowledge from the data is the current technical difficulty. According to the different data source structure, it can be divided into knowledge extraction of structured data, semistructured data and unstructured data. e corresponding extraction method is shown in Figure 5.
e accident data based on this paper are a structured traffic accident data table, which is mapped into RDF triple by direct mapping method, so as to realize the knowledge extraction of entities and relations. According to the concepts and relationship patterns designed above, a total of 17182 entities and 51992 relationships are extracted. e extraction results of different types of entities and relationships are shown in Table 3.

Storage and Visualization of Traffic Accident Knowledge
Graph Based on Neo4j. Neo4j is a graph database based on Java, which is used to store graph structure data of entities and relationships. Two entities and relations form a knowledge unit, and the relationship is used to connect two nodes, which is directional. Both entities and relationships can have attributes, which have names and various values. Tags are used to distinguish between different types of entities and relationships.
CREATE statement, LOADCSV, and Neo4j import are currently the main methods to import triple data into Neo4j in batches [33]. e running speed, advantages and      disadvantages, and of application of the three data batch import methods are shown in Table 4. is paper mainly uses the method of Neo4j import is mainly used to realize the batch import of data, that is, the py2neo module package in Python is used to realize the rapid import of CSV files and the creation of nodes and relationships. Finally, the traffic accident domain knowledge, mainly involving accident cases, is formed as shown in Figure 6.

Case Analysis of Traffic Accidents Based on
Knowledge Graph e knowledge application of domain knowledge graph should be based on the needs of this field. is paper constructs a knowledge network around the knowledge elements of "people, vehicles, roads, environment, and accident results," which makes the accident analysis results visual. rough the analysis of traffic accidents, the summary of the occurrence law of accidents and the multidimensional classification and statistics of accidents are realized, which provides data support for improving the level of traffic safety management.
Cypher is a declarative query language in Neo4j, which can query and update data efficiently. e commonly used statements include MATCH (used to match the pattern of the graph), WHERE (used with MATCH to add constraints to the pattern of the graph), and RETURN (to determine the type of results returned, which can be entities and relationships of the graph structure, or tables). e traffic accident analysis process based on knowledge graph is shown in Figure 7.

Accident Portrait.
Enterprise portraits, customer portraits, product portraits, and so on are one of the major applications of knowledge graph. is kind of portrait is that the knowledge graph fuses multisource data to make a more comprehensive description of the characteristics of entity objects and presents them in a visual way. is paper takes a single road traffic accident as the center and comprehensively describes the accident situation through its associated entities, relationships, and attributes, which constitutes an "accident portrait." e traffic accident portrait based on the knowledge graph can describe the location, time, cause of the accident, basic information of the accident parties, negligence and illegal behavior, and the final handling of the accident, namely, the determination of responsibility, and so on. Taking an accident portrait as an example is as shown in Figure 8. At 1 : 25 on January 1, 2017, an intervehicle  Neo4j must be disabled during the import process; you cannot reimport new data using import in an established database Over ten million nodes Journal of Advanced Transportation 7

Accident Classi cation.
All accident cases are carried on the accident classi cation inquiry according to a certain classi cation standard. is classi cation standard can be called accident characteristic dimension, including accident location, accident form, accident cause, and so on. According to the relationship between all kinds of entities associated with the feature dimension, the query statement conforming to Cypher is constructed.
In the tra c accident knowledge graph, in order to realize the accident classi cation query according to the accident location, namely, the name of road, each name in the concept "ROAD" is regarded as an entity. ere is a relationship "Located_in" between "ACCIDENT" and "ROAD." e returned result is all the relationship paths between the two concepts. As a result, the Cypher query sentence is as follows: "MATCH p (n: "ACCIDENT")-[r: Located_in] -> (m: "ROAD") RETURN p" e returned results that visually show the distribution of the accident location are as shown in Figure 9. As can be seen from the picture, there are many accidents on Donghuan Road. erefore, it is necessary to further investigate the hidden dangers of this road.

Accident Statistics. Accident statistical indicators usually have statistical indicators and classi cation indicators.
Statistical indicators usually refer to four indicators commonly used in tra c accident data statistics (the number of accidents, the number of deaths, the number of injuries, and direct economic losses). e classi cation indicators include the location of the accident, the time of the accident, and so on. Users can select indicators as required to construct Cypher query statements. e returned result of "RTURN" is de ned by using the "COUNT" and "SUM" commands alone or in combination. If the result users need to return is a table, they can use the "ORDERBY" command to sort the query results.

4.3.1.
e Location Analysis of Accident-Prone Places. e number of accidents is selected as the statistical index, the road as the classi cation index. e name of road is selected as an entity. ere is a relationship "Located_in" between the concept "ACCIDENT" and "ROAD." e results need to be returned as "the name of road" and "the type of road," and the number of accidents is counted and sorted in descending order. From this, the Cypher query sentence can be constructed as follows: Displaying the returned results in tabular form is as shown in Figure 10. It can be seen that the top 10 roads with more accidents include Xinghu Street, Donghuan Road, Modern Avenue, Jinji Lake Avenue, Fengting Avenue, Zhongxin Avenue East, Zhongyuan Road, Songtao Street, Loujiang Avenue, and Weisheng Road.

Analysis of Time Elements of Accident-Prone Places.
e time elements of accident-prone places can be year, month, and day, or they can be subdivided into hours, minutes, and seconds. Since the graph constructed in this paper stores the time element as the attribute of the accident, the query of it needs to construct the Cypher statement according to the characteristics of the attribute. is section selects the month, week, and time of the accident for analysis. e number of accidents is sorted in descending order according to the attributes of each entity in the concept "ACCIDENT" including month, week, and time. e results to be returned are "month," "week," "hour," and "the number of accidents." e number of accidents is counted  Figure 11.
From the monthly statistics of traffic accidents (Figure 11(a)), March, May, and April were the top three months with more than 750 accidents per month and an average of more than 25 accidents per day. is is mainly due to the fact that March to May is the month of rapid economic development after the Spring festival. Economic development can be inseparable from the development of the transportation industry, while excessive traffic flow caused accidents. e lowest number of accidents occurred in August, which is mainly due to the extreme heat and small road traffic flow.
From the weekly statistics of traffic accidents (Figure 11(b)), the number of accidents on weekdays (Monday to Friday) was basically maintained at more than 700 accidents, while the number of traffic accidents on Sundays is the smallest.
is is mainly due to the large commuting traffic on weekdays.
From the time statistics of traffic accidents (Figure 11(c)), the occurrence of traffic accidents is mainly concentrated in the morning and evening peak, which corresponds to the time period from 7 : 00 to 9: 00 and 17:00 to 19:00. is is mainly due to the rush hour in the morning and evening; drivers and pedestrians who were in a hurry tend to ignore the safety of traffic travel and coupled with the increase of traffic flow, passenger flow, and the conflict between them, which can easily lead to accidents. In addition, the morning and evening peak coincides with the alternating time of day and night, and the switching between street lights and natural light at night can easily lead to deviation in the driver's perspective and line of sight, and then misjudge some traffic conditions. Wrong driving behavior decisions are also the main cause of accidents.

Analysis of the Characteristics of Accident Parties.
e characteristics of accident parties mainly refer to gender and age. According to the goal of accident analysis, the Cypher sentence can be constructed as follows: "MATCH (n: "PEOPLE") RETURN n. person_gender AS Gender, n. person_age_group AS Age, COUNT ( * ) AS Population ORDER BY Population DESC" e results returned are as shown in Figure 12. Except for people with unknown gender records, male accident victims aged 18 to 50 are the most accident-prone, which was more than twice as many as female accident victims between 18 and 50 years old. Although male drivers made up a high proportion of all drivers, their greater risk of accidents was also related to their aggressive driving behavior.

Analysis of Accident Association Paths.
ere may be one or more intermediate entities among different entities in the knowledge graph, in which an association path is formed by the relationship. rough the analysis of the associated path of accidents, it can find that there are some related accident groups. By focusing on the analysis of intermediate entities, problems existing in traffic management can be found. e length of the associated path can be represented by the number of intermediate entities, as shown in the following formula: (1) In the formula, L represents the length of the associated path and N represents the number of intermediate entities, If the length of an associated path is 2, there is an entity in the middle. e concept of the entity at the beginning and end of the path is "ACCIDENT" and "DEPARTMENT," respectively, and the concept of intermediate entity is "ROAD." From this, the Cypher statement can be constructed as follows: "MATCH p � (n1: "ACCIDENT")-[r1]-(n2: "ROAD")-[r2]-(n3: "DEPARTMENT") RETURN p" e results returned are as shown in Figure 13. It can be clearly seen from the picture that the number of accidents associated with the road "Zhongxin Avenue East" is the largest, and the "Hudong Squadron" of the area to which the road belongs should conduct a special investigation of its traffic safety risks.

Conclusion and Prospect
To study the analysis method for massive traffic accident data, based on the knowledge graph, this paper constructs a traffic accident knowledge graph, which integrates the four elements of "people, vehicle, road, and environment." e knowledge hidden in a large number of structured traffic accident case data stored by the traffic management department is effectively acquired and reused by using the knowledge graph, and the multidimensional and multilevel analysis of traffic accident data is realized. e visual mesh graph is used to directly show the relationship between all kinds of traffic accident knowledge. e research results obtained not only are helpful for researchers to understand the characteristics of traffic accidents and the relationship between causative factors in a more intuitive way but also can provide direct and effective knowledge support and decision-making basis for traffic management departments to implement reasonable traffic management measures. It is helpful to the prevention of traffic accidents and the overall improvement of traffic safety environment, so it has important application value. In addition, the method system adopted in constructing traffic accident knowledge graph enriches the theory of traffic data mining in theory and has a certain theoretical research significance.
In the follow-up research, the following points are worth paying attention to: (1) e knowledge scope of traffic accidents is wide.
is paper mainly takes the structured traffic accident case data of specific regions provided by the traffic management department as the knowledge source and constructs the knowledge graph centering on "people, vehicles, roads, and environment." However, due to the single source of knowledge, the constructed KG has certain limitations and does not have universal applicability. In addition to structured data, the carriers of traffic accident-related knowledge exist in a large number of unstructured text records, web pages, pictures and videos. Knowledge extraction from multisource heterogeneous data is the focus of research in the next stage.
(2) e traffic knowledge graph of superior quality can provide comprehensive and reliable knowledge support for various decision-making needs, such as travel decision, safety management decision, and so on. Its application scenarios and values are considerable. is paper only studies one branch of traffic field, namely traffic accident, and only studies the application value of knowledge graph from the angle of accident analysis. e traffic branch studied by knowledge graph and the application value of knowledge support need to be further expanded and excavated.
(3) Due to the data source whose no latitude and longitude coordinates, only the accident frequency calculated by traffic accident knowledge graph is used to determine accident-prone roads, which has certain limitations. In the follow-up, the geographical coordinates corresponding to the accident location will be integrated into the knowledge graph, and the graph algorithm [34][35][36] will be used to further improve the accuracy of accident-prone road determination.