Predictive Probability Models of Road Tra�c Human Deaths with Demographic Factors in Ghana

Road traffic carnages are global concerns and seemingly on the rise in Ghana. Several risk factors have been studied as associated with road traffic fatalities. However, inadequate road traffic fatality (RTF) data and inconsistent probability outcomes for RTF remain major challenges. The objective of this study is to illustrate and estimate probability models that can predict road traffic fatalities. We relied on 66,159 recorded casualties who were involved in road traffic accidents in Ghana from 2015 to 2019. Three generalized linear models, namely logistic regression, probit regression and linear probability model were used for the analysis. We found that gender and age groups have significant effects in predicting the probability of road traffic fatality for all three models. Through a likelihood ratio test, however, it was determined that the logit regression model produced consistent probabilities of traffic fatalities which are very close to the actual probability values across the age groups and gender, compared to the other two models. Thus, we recommend intensified campaign for the use of seat belts in vehicles, targeted at the aged and male users of road transport to reduce the possibility of death in any RTA.


Introduction
Reducing risks of road traffic fatalities (RTF) remains the ultimate objective in many road safety regulations and studies. Deaths resulting from road traffic crushes have become an existential threat, as available statistics indicate that the world loses close to 1.35 million people through road accidents each year, of which majority are young people between the ages of 5 to 29 years (Ullah, Farooq & Shah, 2021). The repercussions are severe given the immense losses they bring to the victims' families and their communities. Further, it is estimated that several countries make economic losses of about three to five percent of their gross domestic product (GDP) due to road traffic crashes (World Health Organization [WHO], 2018). The situation persists as the world missed the Sustainable Development Goals (SDG) target 3.6, to half road traffic deaths by 2020 (United Nations [UN] GA/10920, 2020).
This notwithstanding, the UN General Assembly in its resolution 74/299 on improving global road safety, reemphasized the target of halving the global number of deaths and injuries from road traffic crashes by 2030 (UN, 2020). To this end, examining the risks of road traffic fatalities by victim's characteristics is crucial to comprehensively understand the impact of various risk factors contributing to the fatalities so that appropriate safety interventions can be identified and implemented to reduce the number of deaths from these crashes. Road traffic accident fatalities, according to WHO (1979), include only deaths which occur within 30 days following a road accident.
Statistics available from WHO (2018) suggest significant differences in the rate of road traffic deaths per 100,000 people across different regions of the world. Among these regions (Africa, America, Eastern Mediterranean, Europe, South-East Asia and Western Pacific), Africa was noted to have the highest rate of road traffic deaths (26.6/100,000 people) compared to Europe (9.3/100,000 people). Further evidence from OECD (2020) mentioned South Africa to have had a road traffic mortality rate of 22.4/100,000 people in 2018. These figures present an unpleasant trend in terms of the progress made in fighting road traffic carnages in Africa. Particularly in Sub-Saharan Africa, Aga et al. (2021) explained that the situation of road traffic accidents is severe, and that the region has the highest road traffic death rate, with significant number of properties damaged through road traffic accidents.
The evidence in Ghana is frightening as Blankson and Lartey (2020) confirmed that deaths resulting from road traffic accidents constitute 62 percent of all emergency cases reported at designated referral hospitals for accident victims in Ghana. This was corroborated by available statistics from WHO (2020), which indicates an average of 8 persons in Ghana out of every 100,000 population die from RTA annually over the past decade (Blankson & Lartey, 2020). Many of these RTA related deaths, according to Konlan et al. (2020), are caused by road traffic behavior of motorcyclists. Predictor variables such as age, alcohol influence, excessive speeding, bad roads, overloading and disregard for road regulations have been studied to be significant risk factors of road fatalities in Ghana (Agyemang, Abledu, & Semevoh, 2013;Siaw, Duodu, & Sarkodie, 2013;Nyamuame, Aglina, Akple, Philip & Klomegah, 2015;Asare & Mensah, 2020). However, these studies are devoid of vigorous probability models that attempts to predict road traffic human deaths in Ghana. For instance, Asare and Mensah (2020) only applied the ordinal regression model to identify factors that contribute to accident severity in Ghana.

Literature Review
This section discusses contributory risk factors to RTF and methodological approaches used in estimating these risk factors. It further presents a brief empirical review of RTF studies across the globe. Extensive research has already provided many insights into risk factors that influence road traffic crashes. The literature discussions have focused on six areas; demographic factors, human factors, road factors, vehicle factors, circumstantial factors and environmental factors. Demographic characteristics such as gender, age, education, employment sector and income earned by drivers dominate recent studies (Mehdizadeh, Shariat-Mohaymany & Nordfjaern, 2018;Machado-Leon et al., 2016), but road traffic crushes resulting in deaths are not exclusive to only drivers of the vehicle. Other studies (Regev et al., 2018;Mishra et al., 2010) show that RTF cuts across different spectrum of road users with different demographic backgrounds. Regev et al. (2018) and Melchor et al. (2015) noted that men are more likely to suffer death in a road accident because of their higher frequency in engaging in the transport and distribution business compared to women. In relation to age, Vaa (2003) observed that the biological and psychological system of a person deteriorate faster as they age. This makes the aged (60 years and above) more likely to suffer death in a road accident. Vaa's (2003) observation is in contrast with Hesse and Ofosu's (2014) study where between the periods of 2001 to 2010, they found that persons between the ages of 26 to 35 were the highest casualties in road traffic fatalities in Ghana.
On the part of human factors, Wu and Xu (2017), Rolison et al. (2018) and Abele et al. (2018) highlighted that driver behaviors such as speeding, drunk-driving, fatigue, safety measures adopted and risk-taking behaviors are the most influential causes of traffic-related causalities. Febres et al. (2019) blamed young drivers for risky driving behaviors. Mazankova (2017) maintained that these human behaviors are the main cause of about 70 percent of the road traffic fatalities. Zhang et al. (2009) and Yau, Zhang and Li (2016) found road related factors such as increased motorization, lane changing and overtaking cars as having negative effects on traffic safety. Lack of using appropriate safety accessories in vehicles, according to Febres et al. (2019), contribute a higher probability of human deaths on motorways. The environmental factors including road types, nighttime travel and weather conditions were examined by Altwaijri et al. (2011) to have amplified the risk of exposure to fatal road injuries causing deaths.
The literature presents a plateau of statistical models used in predicting human road traffic fatalities and injuries given some risk factors. In Farooq and Moslem's (2020) studies, analytic network process was used to conclude that driving without alcohol and obeying speed limits were significant factors compared to other factors causing road traffic injuries in Hungary. This conclusion is problematic as the study is limited in its sampling. Twenty drivers were used without regards to any probabilistic approaches; besides, analytic network process is only a decision analysis tool. The analytic network process functions like the artificial neural networks, as applied in similar previous studies such Delen et al. (2006) and Chimba and Sando (2009). A related study by Febres et al. (2019) used Bayesian network to conclude that lack of using appropriate safety accessories, high speed violations, distractions as well as errors have higher probability of predicting fatal injuries for drivers in Spain. Febres et al.'s (2019) study used secondary data where 66,253 drivers were selected using systematic sampling compared to Farooq and Moslem's (2020) study that gathered data through a questionnaire survey.
The use of Bayesian statistics in predicting risk of RTF is increasing. Varied approaches such as Bayesian ordered probit and Bayesian hierarchical binomial logit are common in the literature (Xie et al., 2009;Huang et al., 2008). For instance, Hesse et al. (2014) used Bayesian analysis to confirm that population and numbers of registered vehicles were the predominant factors influencing road traffic fatalities in Ghana. The weakness of the Bayesian approach is however exposed mainly in its subjective choice of priors.
Another statistical concept common in RTF literature is the logistic regression analysis. Another commonly used probabilistic model in RTF data analysis is the linear regression model. Previous studies from Ghana (Agyemang et al., 2013;Siaw et al., 2013) mainly applied multiple regression analysis to draw conclusion that the road traffic accidents were surging and at a faster rate in Ghana. However, Shankar, Milton and Mannering (1997) critiqued the use of linear regression models as inappropriate for making probabilistic statements about the occurrences of vehicular accidents on the road because it lacks the ability to establish a precise relationship between the dependent variable and independent variables for smaller sample sizes. In such circumstances, Abdullah and Zamri (2012) proposed the use of a fuzzy linear regression models to analyze factors responsible for RTF.
In other related studies (Prasetijo & Musa, 2016;Naji, Xue, Zheng & Lyu, 2020;Sami, Amin, & Butt, 2022), Poisson regression models were fitted for RTF data with the basic assumption that the data produced same mean and variance. However, Shaik and Hossain (2020) faulted the use of the Poisson regression model as in many instances RTF data tends to have a larger variance or over dispersed. For instance, it indicates that Poisson regression model is not suitable for the analysis of both under dispersion and over dispersion data set.
With the knowledge gained in the reviewed literature, we proceed to present statistical methods suitable for predicting RTF in Ghana.

Model formulation
In generalized linear model (GLM), it is assumed that a linear relationship exists between a variable called the dependent variable (outcome variable) and k independent variables, , Y where π is the probability of success. Since () , EY = π it follows that η is a function of . π In subsequent sections, we present three different link functions that can be used to estimate . π

Logistic regression model
The link function for logistic regression model is the logit or log-odds function, which is defined, according to McCullagh and Nelder (1989) (3), the likelihood function is given by The maximum likelihood estimates of the components of the vector β are the values of Setting each partial derivative in Equation (8)

Probit regression model
The probit regression method uses the cumulative distribution function of the normal distribution to explain the function of the equation. In the probit model, the inverse standard normal distribution of the probability is modeled as a linear combination of the predictors (Fox, 2008;Garson, 2013). Thus, the link function for the model is probit( ) , ii z π= ……………………………………………………………..……….. (10) where ( ) ii z Φ= π , and Φ is the distribution function of the standard normal distribution.
Thus, the identity regression model can be written as: Log-likelihood function is The first derivative of i ′ x β with respect to j β is , ij x thus:  is true, then G has the chi-square distribution with k degrees of freedom (Hosmer et al., 1989). It can be shown that a ( )

Study Setting
The National Road Safety Authority (NRSA) in Ghana, which is responsible for ensuring road safety and compliance of road regulation, has recently revealed that a total of 2,924 persons died in 2021 through road crashes. In addition, 15,972 road crashes were recorded within the same period, resulting in 13,048 injuries. We conducted this study in retrospect to past road accidents that occurred from January 2015 to December 2019 in Ghana. Secondary data in the form of recorded road traffic fatalities were obtained from the registry of the NRSA of Ghana for this study. Sixty-six thousand one hundred and fifty-nine (66,159) recorded casualties were included in the current study.

Data Processing and Analysis
The secondary data procured from the NRSA were tested for completeness. This was done by first keying all values into Microsoft Excel for cleaning, editing and interpolating for missing values. Thereafter, numerical methods were applied to estimate the proportion of road traffic fatalities across various age groups and gender. The estimation of the models parameters were done using the R statistical application. The respective R programs can be found in appendices A1 and A2.  The dependent variable for our models was the number of human deaths per road traffic accident in Ghana, measured on a nominal scale. It is named Casualty (y) and coded as Fatality ≡ 1 and Injury ≡ 0. A human death includes passengers, pedestrians, road users and any other person whose death was due to road traffic crush. Two main independent variables were used to assess robustness of our model, namely age groups (x1) and Gender (x2). The

Descriptive Statistics
It is observed from Table 1 that the overall male fatalities out number their female counterparts by an approximate ratio of 4:1 (78% vs. 21%), for the period 2015 -2019.
Considering the fact that the national population split is slightly in favour of females allude to the fact that male fatalities are highly over-represented in road traffic fatalities. Similar conclusion was reached by Regev et al. (2018) and Melchor et al. (2015), who noted that men are more likely to suffer death in a road accident because of their higher patronage of motor transports for economic activities compared to women.
From Table 2, it can be observed that over the 5 year-period, the 'over 65' is the age group with the highest national fatality rate. That is, about 32% of all road traffic casualties who were over 65 years lost their lives while 27% of casualties who were 5 years old or less died as a result of road traffic accidents. This finding is consistent with the works of McCoy, Johnston and Duthie (1989), Vaa (2003) and Etehad et al. (2015). Vaa (2003) findings reiterates that people aged 60 years and above may have their biological and psychological system deteriorating faster, and therefore more likely to suffer death in a road accident.
Similar explanation could be true for those 5 years and below since they may have a very weak or immature biological and psychological make-up, and therefore more prone to death in case of a road accident. Table 2 shows the fatality indices of each of the eight age groups, computed based on road traffic accident data in Ghana from 2015 to 2019.

Estimated Models
Three models were fitted to the data described in Table 1 where i π = th (the road traffic casualty dies) Pi , 1i x is the age group of the th i casualty, 2i x is the gender of the th i casualty and interaction of the variables, 1 2 . The objective is to predict the probability of road traffic fatality as a function of Age group 1 () x and Gender 2 () x as stated in model B and then compare it with Model A which has only age group as predictor. The analysis using Model B assumes that there is no interaction between Age group and Gender. Therefore, to test for interaction, we compare the full model (Model C) with Model B. The coefficient estimates for the three models and the corresponding standard errors together with their p-values are given in Table 3. This result supports the recent findings of Islam and Mannering (2021) and Useche et al. (2021). Islam and Mannering's (2021) study confirmed significant differences within gender behavior in predicting the likelihood of a road traffic fatality. Similar conclusion was reached by Useche et al. (2021) whose study support the influence of gender in predicting risky road accidents resulting in deaths. Appendix A3 presents an R function used in estimating the model parameters as given in Table 3. Results from Table 3  We fitted the identity regression model using the R code as specified in Appendix A4. The coefficient estimates for Model E and the corresponding standard errors together with the estimates of Model B and Model D are given in Table 3. Like the logit and probit regression models, the linear probability model is significant in predicting road traffic fatality in Ghana.
Comparing the Probit Model E with that of the Logit Model B, via the likelihood ratio test, the value of the test statistic is computed as 53.69624 with a p-value which is less than 0.001, indicating that the logistic regression model is preferred over that of the linear probability model. Table 4 shows the sample proportions of road traffic fatalities and the fitted values for the logistic, probit and linear probability regression models across age groups and gender. The fitted values for the logistic regression model, shown in Table 4, are similar to those for the linear probability and probit regression models. When the values are rounded up to two decimal places, the probit and logistic regression models provide the same estimates.

Conclusion and Recommendation
In this study, we have formulated and estimated three generalized linear models that have shown strong statistical significance in predicting road traffic fatalities given some demographic factors in Ghana. First, the estimated logit regression model proved to be robust when fitted with the predictors, gender and age groupings. However, the interaction of gender and age groupings did not show any significant effect in predicting the probability of road traffic fatality. Secondly, when compared to a probit regression model via a likelihood ratio test, the logit model gave better estimates. Finally, the logit model was preferred over a fitted linear probability model with the same predictor variables when a likelihood ratio test was conducted to compare the two models. These findings illustrate that a logistic regression model produced consistent probabilities of traffic fatalities which are very close to the actual probability values across age groups and gender. The finding implies that gender and age groups have significant effect in predicting the probability of road traffic fatality. What this means is that the probability of male fatality was high compared to female in a RTA. Also, the aged (over 65 years) were found to be more likely to suffer death in a road accident compared to other categories of age.
Thus, it is recommended to road traffic regulators to consider the possibility of death based on victim's gender and age group when formulating road safety interventions and regulations to reduce the growing deaths on the road. Road safety education campaigns through the media should be targeted to the male users of transport and the aged, particularly on the use of seat belts in vehicles to reduce the possibility of death in any RTA.

Conflict of interest
The authors declare that there is no conflict of interest.

Data Availability Statement
The dataset used in this study is referenced in the paper. However, upon a reasonable request from the corresponding author, the datasets used and/or analysed during the current study could be made available.

Funding
Not applicable