Examining the Environmental, Vehicle, and Driver Factors Associated with Crossing Crashes of Elderly Drivers Using Association Rules Mining

In the aging society, reducing vehicle crashes caused by elderly drivers has become a crucial issue. To find effective methods to reduce these vehicle crashes, it is necessary to give some insights into the characteristics of vehicle crashes and those of traffic violations caused by elderly drivers. However, multiple significant factors associated with crossing crashes due to elderly drivers were not extensively observed in previous studies. To fill this research gap, this study identifies the crash pattern and examines the environmental, vehicle, and driver factors associated with crossing crashes due to elderly drivers. (e 5-year crash data in Toyota City, Japan, are used for empirical analysis. (e emerging data mining method called association rules mining is applied to discover various factors associated with crossing crashes of elderly and nonelderly drivers, respectively. (e significant findings indicate that (1) elderly drivers are more likely to lead to crossing or right-turn crashes, compared with nonelderly drivers; (2) there are more factors including crash location (intersection without signal), lighting (daylight), road condition (dry and other), weather condition (clear and raining), vehicle type (light motor truck), and traffic violation (fail to confirm safety) associated with the large proportion of crossing crashes due to elderly drivers. (e findings of this study can be used by traffic safety professionals to implement some countermeasures to reduce the crossing crashes due to elderly drivers.


Introduction
Vehicle crashes due to elderly drivers have been a significant concern for roadway traffic safety issues in Japan. e proportion of vehicle crashes due to elderly drivers has been increased by up to 20%, although the number of vehicle crashes has a trend to decrease from 2005 to 2015 [1]. Moreover, it is reported that population distribution in Japan is shifting toward a more significant representation of elderly people. It is estimated that the proportion of elderly people (≥65 years) is up to 31.6% in 2030, although this figure was 26.8% in 2013 [2]. It is expected that the number of elderly drivers will increase continuously over the next two decades.
Some incentive measures are implemented to ensure the driving safety of elderly drivers in Japan. For example, some local government distributes discount coupons for public facilities or free bus tickets to the elderly drivers who have returned their licenses voluntarily. However, it is reported that elderly drivers are unwilling to return licenses when there are not sufficient public transportation facilities near home, and private cars are indispensable for their daily life. As a result, the return rates of licenses in the metropolis, such as Tokyo and Osaka, are more significant than those in local cities in Japan, where the public transportation system is not sufficient, and many residents are living in suburban areas.
To reduce vehicle crashes due to elderly drivers, Japan National Police Agency has revised the Road Traffic Law and requires drivers older than 74 years who made some particular types of traffic violations to go to hospital for checking their cognitive ability to judge whether they are still suitable for safety driving or not [3]. ese particular types of traffic violations are related to the cognitive problem of elderly drivers. However, it is reported that the number of doctors cannot fulfill the massive demand for cognitive ability diagnosis for elderly drivers, and this demand will increase continuously in the next decade.
As one traditional measure for preventing and reducing vehicle crashes due to elderly drivers, an education program is considered as an ideal and effective way. To make the education program more effective, it is necessary to understand the distinctive crash pattern of elderly drivers compared with nonelderly drivers.
is study aimed to identify the distinctive crash pattern due to elderly drivers compared with nonelderly drivers and examine the environmental, vehicle, and driver factors associated with crossing crashes due to elderly drivers. Here, a crossing crash indicates a broadside collision where the side of one vehicle is impacted by the front or rear of another vehicle, which accounted for the most significant ratio of vehicle crashes due to elderly drivers. e 5-year vehicle crash data from 2009 to 2013 in Toyota City, Japan, are used for empirical analysis. One emerging data mining method called association rules mining is applied to discover various factors associated with crossing crashes of elderly and nonelderly drivers, respectively. Based on the findings of this study, knowledge of crash characteristics such as environmental, vehicle, and driver factors can be used to guide the design of countermeasures to improve the driving safety of elderly drivers. e remainder of this article is organized as follows. Section 2 gives a brief literature review concerning the crash pattern analysis of elderly drivers and the data mining methodology, including classification trees and association rules mining. Section 3 introduces the association rules mining methodology implemented in this study. Section 4 describes the dataset used for empirical study and the results of fundamental statistical analysis of the different crash patterns between elderly and nonelderly drivers. Section 5 reports the results of association rules mining and discusses the different characteristics of association rules related to elderly and nonelderly drivers. Finally, this study is concluded in Section 6.

Literature Review
To propose effective countermeasures to reduce vehicle crashes due to elderly drivers, it is essential to understand the crash types in which they are involved and the circumstances that lead to their crashes. It is known that elderly drivers are overinvolved in angle, overtaking, merging, and intersection crashes, especially on the occasions when elderly drivers were turning left [4]. Meanwhile, elderly drivers are significantly overrepresented in intersection-related crashes. For example, it is reported that between 48% and 55% of fatal crashes involving drivers aged 80 years or older occurred in intersections, more than twice the driver aged 50 or less (23%) [5]. is might result from the fact that age-related cognitive, visual, and physical can impact their ability to perform driving tasks and navigate the types of complicated roadway situations where crashes due to elderly drivers often occur [6].
To propose an effective education program for elderly drivers to prevent vehicle crashes, it is crucial to understand the distinctive crash pattern of elderly drivers compared with nonelderly drivers. Previous studies have indicated that the crash pattern involving elderly drivers is different from that of nonelderly drivers [7,8]. Elderly drivers are more likely to be involved in the crashes occurring in the intersections without signals, and crossing crashes take the most significant proportion among the crash types. It is well recognized that crossing crashes usually cause severe injury for drivers. To know the reasons for crossing crashes, previous studies have investigated the associated factors of crossing crashes [8][9][10]. ese studies used the crash data to reveal the associated factors of crossing crashes. Based on these findings, we can give some countermeasures to prevent crossing crashes due to elderly drivers.
However, these previous studies were based on the traditional statistical methodology, which has a limited ability to reveal the associated relation between multiple factors and crash patterns. In this study, we are aiming to investigate various factors associated with crossing crashes due to elderly drivers rather than the frequency of crossing crashes. Literature reviews of previous studies using count data models, such as the poison regression or negative binomial regression model, were not illustrated because we are focusing on data mining methodologies used for crash pattern analysis in this study. One vehicle crash is defined as a rare, random, multifactor event always preceded by a state in which road users fail to cope with the current environment, and one crash results from a series of directly or indirectly associated events [11]. erefore, the emerging data mining methodologies can help us find some valuable insights into the research field of vehicle crash pattern analysis by performing knowledge discovery from a large vehicle crash dataset compared with the traditional statistical methodology.
ere are mainly two types of data mining methodologies used in previous studies for crash pattern analysis, namely the classification trees and association rules analysis. e research works using the classification trees can be found in some previous studies [12,13]. One recent research work implemented by Montella et al. has indicated that from the methodological point of view, both the classification trees and association rules analysis were useful in providing nontrivial and unsuspected relations in vehicle crash analysis. at study concluded that classification trees structure allowed a straightforward understanding of the phenomenon under study. Meanwhile, association rules analysis provided new information hidden in the sample data [14]. erefore, association rules mining is an ideal methodology because it might help us discover new dependence between various factors and crash patterns based on the vehicle crash data.
Research works applying the association rules mining method to roadway traffic safety problems can be found in some previous studies. For example, Pande and Abdel-Aty developed closely associated crash characteristics in the form of rules based on the association rules mining methodology [15]. Mirabadi and Sharifian applied this methodology to extend knowledge discovery and reveal association patterns of railway crashes in Iran [16]. Montella applied this methodology to investigate the contributing factors to different crash patterns at urban roundabouts [11]. Based on the literature review, we found that this methodology is seldom applied in the field of crash pattern analysis of elderly drivers.
To find the countermeasures to reduce vehicle crashes due to elderly drivers, it is necessary to investigate the characteristics of the crash pattern of elderly drivers from various viewpoints by the association rules mining methodology, which can help us discover the knowledge behind the crash dataset. For this research motivation, this study applies the association rules mining method to investigate various factors related to crossing crashes of elderly drivers, which was not extensively investigated in most previous studies. e findings of this study can give some insights into significant factors associated with crossing crashes, which can be used in the education program for elderly drivers when they renew driver licenses.

Methodology
is study used association rules mining technology to perform the empirical analysis. Recently, this methodology is prevailed and is applied in the research field of traffic safety in previous studies [17,18]. A brief introduction to this methodology is described here. A more detailed introduction to this methodology can be found in the study proposed by Hahsler et al. [19]. e data mining methodology on the transaction data using the association rules mining was proposed by Agrawal et al. [20].
is methodology is an association discovery approach used to discover the relative frequency of sets of items (i.e., crossing crash in this study) occurring alone and together in a given event (i.e., a crash observation in this study). e rules have the form "X ⟶ Y" in which X is the antecedent and Y is the consequent. In association rules, each rule can be expressed by three indexes: support, confidence, and lift. Support is the percentage of this rule existing in the dataset. Confidence is the ratio of support to the percentage of the antecedent in the dataset. Lift is a mathematical measurement to quantify the statistical dependence of a rule by the ratio of confidence to the percentage of the consequent. e computation methods of these indexes related to association rules are listed as follows: where S(X) is the support of the antecedent X, σ(X) is the number of observations with the antecedent is the number of observations with the antecedent X and consequent Y, N is the total number of observations in the dataset, }. e lift of rule indicates the frequency of co-occurrence of the antecedent and the consequent to the expected cooccurrence under the assumption that they are independent. A value smaller than one indicates the contrary between them. A value equal to one indicates independence, and a value more significant than one indicates positive dependence. e higher value of lift indicates greater dependence [21].
e association rule in this study might involve multiple explanatory variables being set as antecedents. As a result, it can discover many valuable relations between single or multiple factors related to crossing crashes due to elderly drivers. A rule with one antecedent and one consequent is defined as a 2-product rule. Just like this, a rule with two antecedents and one consequent is defined as a 3-product rule.
For example, in a rule "violation � disobey stop sign ⟶ crossing crash" (support � 2%, confidence � 70%, lift � 3.5), support indicates that percentage of observations, including both violation called disobey stop sign and crossing crash, is 2% in the whole dataset; confidence indicates that the percentage of observations, including both the violation called disobey stop sign and crossing crash, is 70% of the dataset; and lift indicates that violation called disobey stop sign is positively associated with crossing crash.
To implement this data mining technology, the apriori algorithm proposed by Agrawal and Srikant is applied in this study, which is a level-wise, breadth-first algorithm counting transactions [22]. Free statistical software R has a package called "arules" to make an analysis of association rules mining using this algorithm.

Data Preparation
is study used 5-year of vehicle crash records (2009-2013) obtained from the Traffic Safety and Crime Prevention Division, Social Affairs Department of Toyota City. e data were stored in a sorted format by occurring time in Microsoft Excel worksheet tables. Vehicle crash records in this study are the injured crash data, in which there was at least one person involved was injured. In the sample data, a vehicle crash is indicated in two rows in a table, in which each row records one actor in a vehicle crash. Here, the definition of one actor indicates a driver, a pedestrian, or an object. Meanwhile, two records are sorted by the severe level of fault: the order of the first actor and the second one. Each crash record had many attributes describing timestamp, environmental factors, traffic conditions, and driver characteristics.
is study Journal of Advanced Transportation only prepared a dataset, including the crash record of the driver who had higher faults, i.e., records of the first actor.
Here, data of the second actor were excluded from the sample data because this study aimed to investigate the main contributor (the driver with a higher fault level) to a vehicle crash. Meanwhile, only crashes occurring in intersections or segments were used in this study because crossing crashes rarely occur in other locations such as the parking lot or square. e total number of the sample data in this study is 9,706 (from 2009 to 2013), including 1,313 crashes due to elderly drivers (≥65 years old) and 8,393 crashed due to nonelderly drivers (<65 years old). Figure 1 illustrates the spatial distribution of vehicle crashes in Toyota City due to elderly and nonelderly drivers, respectively. It indicates that most crashes occurred in the urban area of Toyota City. is might indicate the trend that the level of social activities is higher in areas with a dense population, which leads to an increased risk of accidents [23].
e vehicle crash database contains many attributes related to the detail of crashes. We conducted a detailed literature review to investigate significant factors associated with the traffic violation and the crash type. Vehicle crashes of elderly and nonelderly drivers were examined in terms of frequency of the location, environmental, vehicle, and driver factors that were involved to know which factors were more likely to characterize the crash pattern of elderly drivers. Table 1 lists descriptive statistics of significant variables.
Location: Elderly drivers were not surprisingly, significantly more likely to crash in intersections without signal, consistent with the fact that they are the dangerous parts of the network because they present a driver with many points for possible conflict with other road users, often at high speeds and with minimal time to respond, and a lack of adequate in-vehicle crashworthiness opportunities [24]. By contrast, nonelderly drivers were significantly involved in crashes occurring in the segment. It might indicate that the driving region of nonelderly drivers is broader than that of elderly ones, and the risk of crashes occurring in the segment is increased.
Environmental factors: Elderly drivers were significantly more likely to crash in the lighting of daylight, whereas nonelderly drivers were more likely to crash in the lighting of night. Meanwhile, there were no significant differences between elderly and nonelderly drivers in the road condition or weather conditions being present when crashes occurred.
Vehicle factor: Elderly drivers were significantly more likely to cause crashes of light motor trucks, whereas nonelderly drivers were significantly more likely to cause crashes of ordinary motor trucks. is significant difference between elderly and nonelderly drivers might indicate that the primary purpose of driving ordinary motor trucks is transporting industrial commodities, which are seldom used by elderly drivers after retirement. By contrast, it is inferred that elderly drivers are more likely to drive light motor trucks for agricultural works in suburban areas of Toyota compared with nonelderly drivers.
Driver factor: For the traffic violation that was attributed to the cause of crashes, elderly drivers were significantly likely to fail to confirm safety, while nonelderly drivers were likely to be inattention. ese differences might indicate that elderly drivers are paying attention to drive. However, they are likely to fail to confirm safety due to aging effects. For the type of crashes, elderly drivers were significantly likely to cause crossing or right-turn crashes, whereas nonelderly drivers were likely to cause rear-end crashes, consistent with the previous study [8].
To summarize, the crashes of elderly and nonelderly drivers differenced in location, lighting, vehicle type, traffic violation, and the type of crash. Elderly drivers are more likely to crash in intersections without signals and in the lighting level of daylight. ey are also more likely to cause crashes of light motor trucks, make traffic violations in which they failed to confirm safety and be involved in crossing, and right-turn crashes.

Results and Discussion
is study used a package of "arules" in open-source statistical software R to conduct the association analysis [19]. To understand the difference between elderly and nonelderly drivers, we applied this methodology to sample data of them. e association rules of environmental, vehicle, and driver factors with crossing crashes are extracted from the generated rules using the apriori algorithm. Creating association rules for elderly and nonelderly drivers includes 5 steps: (1) generate rules with equal to or more than 2 items, (2) determine threshold values, (3) eliminate the rules with lift values outside the threshold, (4) eliminate the rules that have both support and confidence values lower than the thresholds, and (5) eliminate the redundant rules referring to the items of the antecedent. To find the association rules highly related to crossing crashes of elderly and nonelderly drivers, the threshold value for support is set to be 1% and that for confidence is set to be 70%. e association rules of environmental, vehicle, and driver factors and crossing associated with crashes for elderly and nonelderly drivers are listed in Table 2. As the first rule related to elderly drivers, traffic violation of disobey stop sign was highly associated with crossing crashes (support � 0.023, confidence � 1.000, lift � 3.446). e explanation of the first rule is 2.3% of vehicle crashes were because of disobeying the stop sign and led to crossing crash; of traffic violation of disobey stop sign, 100% was crossing crashes; the proportion of crossing crashes with disobey stop sign was 3.446 times the proportion of crossing crashes in the complete dataset.
For elderly drivers, there were two rules having the highest lift value (3.446): "Violation � Disobey stop sign ⟶ Crossing crash" and "Violation � Disobey traffic lights, Location � Intersection with signal, Lighting � Daylight ⟶ Crossing crash." ese two rules indicated the single factor or combination of factors that had the most significant proportion of crossing crash inside the crash type for elderly drivers. For nonelderly drivers, the highest lift value (lift � 4.355) is found for a two-product rule: "Violation � Disobey stop sign ⟶ Crossing crash," indicating that the proportion of crossing crashes involving disobey stop sign is more than four times for proportion of crossing crash inside the crash type.
Compared with data mining results related to nonelderly drivers, different factors associated with relatively large proportion of crossing crashes included location (intersection without signal), lighting (daylight), road condition (dry or other), weather (clear or raining), vehicle type (light motor truck), and traffic violation (fail to confirm safety). ese different factors might indicate different characteristics between elderly and nonelderly drivers, and elderly drivers might lead to crossing crashes associated with more factors compared with nonelderly drivers. Findings in this study can help us make some countermeasures to improve traffic safety by educating them. An interesting finding was that the traffic violation involving fail to confirm safety was highly associated with crossing crashes on some occasions, and these occasions should be set as the education targets for elderly drivers because elderly drivers were likely to make a traffic violation involving fail to confirm safety shown in Table 1. e reasons for the different association rules extracted from vehicle crash data of elderly and nonelderly drivers are listed as follows: (1) e proportion of crossing crashes concerning elderly drivers (29.0%) is significantly more than that of nonelderly drivers (23.0%) as shown in Table 1. e threshold value of confidence applied to association rules mining in this study is set as 70%.
erefore, the association rules concerning nonelderly drivers cannot be extracted from the dataset.
(2) Elderly drivers are likely to cause vehicle crashes in the daylight condition, which might indicate life and activity patterns of elderly people that they would like to go shopping or for leisure in the daytime reported in one previous study [25]. (3) Elderly drivers have a higher ratio of vehicle crashes caused by light motor trucks because households with elderly owners are more likely to own light motor trucks compared with that with nonelderly owners [26]. It might result from the fact that light motor trucks are helpful for farm or transportation works in Toyota City. (4) Elderly drivers have a large proportion of traffic violations called fail to confirm safety. Here, the violation called fail to confirm safety is highly related to the crossing crashes, which is concluded in one previous study [8].
e strength of this study is that it can help us extend the knowledge to driving safety issues of elderly drivers. e crucial factors leading to crossing crashes of elderly drivers were indicated in this study, which was not observed in most previous studies. e findings from this study can give us some policy implications for elderly Journal of Advanced Transportation drivers' safety issues. To reduce the crossing crashes due to elderly drivers, an adequate education program for elderly drivers can be proposed to indicate risk factors such as the location, time period, and traffic violation. Meanwhile, the development of an advanced driving assistant system is crucial to supplement the traffic violation called fail to confirm safety, which is highly related to crossing crashes. In addition, it might be more necessary for the drivers of light motor trucks because this type of vehicle is related to crossing crashes, indicated by the results of the association rules applying to elderly drivers. e limitations of this study are listed as follows: First, vehicle crash data used in this study were the vehicle crashes in which at least one person (a passenger or a driver) was injured. erefore, property-only crash data were not included in the sample, which might indicate different results of crash analysis studies, including both property-only and injured crashes. We have interviewed the researcher in the National Research Institute of Police Science, Japan, to understand the reason why propertyonly crashes are not included in the electrical data. e answer to this question is that the number of propertyonly crashes is vast, and policemen did not record this type of vehicle crashes in the Microsoft Office Excel worksheet.
Second, this study did not consider the factor of regional characteristics, such as the difference in urban and suburban areas.
is factor might be related to crossing crashes in the sample data because this type of crashes usually occurs in an intersection without signal control. In this study, the association rules mining method was applied in the vehicle crash data collected in the region of Toyota. Here, Figure 1 illustrates that the east part of Toyota with a sparse road network and the west part of Toyota with a erefore, results based on vehicle crash data, in general, cannot reflect the impact of regional characteristics on vehicle crash types.

Conclusions and Future Tasks
e current study used the crash data due to elderly and nonelderly drivers for five years (2009)(2010)(2011)(2012)(2013) in Toyota City to identify the crash pattern and investigate the significant environmental, vehicle, and driver factors associated with crossing crashes of elderly drivers. A data mining technology called association rules mining is applied in this study, which can identify the valid and understandable pattern underlying in a massive crash dataset. e association rules mining is implemented using a package "arules" included in statistical software R.
Results of fundamental statistical analysis have indicated that elderly drivers are more likely to crash in the intersections without signals and in the lighting of daylight. ey are also more likely to cause crashes of the light motor truck, make traffic violations in which they failed to confirm safety, and be involved in crossing, and right-turn crashes. Results of association rules mining have indicated that there are more factors associated with crossing crashes of elderly drivers than nonelderly drivers. ese factors include the crash location (intersection without signal), lighting (daylight), road condition (dry and other), weather condition (clear and raining), vehicle type (light motor truck), and traffic violation (fail to confirm safety). ese results might reveal the different characteristics of the crash pattern of elderly drivers due to their aging effects.
For one future task of this study, we will incorporate the factor of regional characteristics, i.e., the difference in urban and suburban areas indicated in the limitation of this study. erefore, we will divide the sample data into two categories, namely, the category in urban areas and that in suburban areas. e association rules mining method will be applied to two data categories, respectively. Meanwhile, it is expected that extracted association rules are different for these two categories because the occurrence of crossing crashes have a higher probability in urban areas with a high density of intersections, compared to suburban areas.
Data Availability e vehicle crash data were provided by the Traffic Safety and Crime Prevention Division, Social Affairs Department of Toyota City. Meanwhile, these data in Toyota City are collected and recorded by the Aichi Prefectural Police in Japan.

Disclosure
Two earlier versions of this study have been presented at the 12th International Conference of the Eastern Asia Society for Transportation Studies in Ho Chi Minh City, Vietnam, and the 98th Annual Meeting of the Transportation Research Board in Washington, D.C., USA, respectively.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article. Note. One 6-product rule related to elderly drivers was not shown in Table 2.
Journal of Advanced Transportation 7