The Importance of Exercise and General Mental Health on Prediction of Property-Damage-Only Accidents among Taxi Drivers in Tehran : A Study Using ANFIS-PSO and Regression Models

The rate of traffic accidents in Iran is high, and the majority of the causes that must be investigated are human factors. The present study examined the effects of exercise and general health as human factors on the prediction of crash likelihood with the data collected from taxi drivers of Tehran.Thedatawere collected using the general health questionnaire and a form entailing some items regarding the duration of daily exercise and sociodemographic information.The adaptive neurofuzzy inference system and particle swarm optimization (ANFIS-PSO) was used for tuning the parameters of membership function of the fuzzy model applied for this prediction. Thus system was compared with the more conventional methods, such as multiple regression and Poisson regression. To avoid the overfitting issue, the data were divided into 70% for training and 30% for validation. The root-mean-square error (RMSE) was also utilized as a determinant of goodness of fit between ANFIS-PSO and regression methods. The findings indicated that the number of minutes of daily exercise and mental health significantly influence property-damage-only (PDO) accidents of taxi drivers in Tehran, Iran. Furthermore, the results revealed that the hybrid model (ANFIS-PSO) not only had a better fit but also produced different results from those of the traditional regression models, which may be used in policymaking regarding the reduction of PDO accidents. Based on the results, performing daily exercise for more than 10 minutes would substantially reduce the PDO accidents among the taxi drivers in Tehran. The findings showed that ANFIS-PSO could be effectively implemented in the studies addressing accident frequency. Consequently, the policy makers should simply adopt some interventions to encourage the taxi drivers to perform daily exercise that not only improves their wellbeing but also reduces the risk of PDO accidents.


Introduction
The rate of traffic accidents in Iran is much more than the global average.Accordingly, this country suffers from the extensive consequences of traffic injuries, mortalities, and costs imposed on the society.Iranian experts' opinion relates this anomaly to psychological and health problems in Iran [1].Therefore, more investigations regarding the probable relationships between these problems and traffic accidents may be helpful for addressing an appropriate policy targeted toward the reduction of the rate of traffic accidents in Iran.
There is a growing trend toward investigating the relationships between accident predictions and traffic operating characteristics, such as road environment, traffic, and weather conditions [2].Moreover, there are increasing numbers of studies investigating the extensive causes of crashes occurring due to human factors as the most important elements of traffic accidents [3,4].As traffic accidents are the main reasons of disability and the second reason of mortality in Iran [5], it is vital to identify the factors influencing these accidents.Therefore, the main aim of this study was to focus on two less investigated factors which are the daily exercise and general mental health, on the prediction of PDO crashes among taxi drivers in Tehran.
Previously, several studies used the data of crash severity for modelling accident severity by many methods, such as regressions, logit models, artificial neural networks, fuzzy models, and time series models [8][9][10][11].According to the results of a study carried out on the relationship between mental health and driving behaviors of taxi drivers in Kerman, the largest province in Iran, based on Manchester Driving Behavior Questionnaire (MDBQ) and General Health Questionnaire (GHQ) it was revealed that there is a meaningful positive relationship between mental health and driving behavior.This finding indicated that if drivers were in a proper mental health, their driving behavior would be also eligible.In addition, physical activity was proved to reduce hypertension among taxi drivers in Brazil [12].
Furthermore, there are studies focused on determining the epidemiology of urban traffic accidents in Tehran [13] and crash generation concept in Mashhad, Iran [14].More recently, based on a study investigating the social determinants of risky driving on the intercity roads of Tehran province, it was shown that people with driving, chronic disease, poor socioeconomic status, family dispute, nonreligious attitudes, being under medical supervision, secondary education, female gender, and drug usage had more road traffic accidents.According to the aforementioned study, it was also concluded that, among all the above-mentioned significant factors, those related to socioeconomic status, health condition, and family structure had a greater role [15].
Previous studies considered many vehicles, roads, environments, and human factors in the prediction of traffic accidents.To the best of our knowledge, few studies have addressed the influence of exercise and mental health on accident frequencies.Accordingly, the main objective of the present study was to examine the relationships of the mentioned human factors with the property-damage-only (PDO) accidents among taxi drivers of Tehran.Selection of this target population was due to their routine exposure to crashes in specific routes as normal drivers.As the taxi drivers of Tehran are rarely involved in more severe accidents and most of their accidents are PDO accidents, PDO accidents could be a good indicator of the safety performance of this population.

2.1.
Subjects.This descriptive-analytical cross-sectional study was performed during October and November 2017.All the taxi drivers of Tehran were considered as the target population.A total number of 294 taxi drivers in six major taxi stations (located at north, south, east, west and two stations in the center of Tehran) were chosen using proportional allocation sampling technique.Proportional allocation is a sampling technique to divide a sample into strata in a stratified sample survey.The sample survey gathers data from a population in order to estimate population characteristics.Furthermore, each taxi station was selected through systematic random sampling method.Systematic random sampling The study objectives were verbally explained to the participants via face-to-face interview.They were assured that the gathered information would remain confidential and anonymous.The subjects completed the questionnaires and returned them immediately after completion on site.

Data Collection Tools.
The data were collected using the general health questionnaire and a form entailing some items regarding the duration of daily exercise, demographic information (e.g., age and educational level), and clinical data, including the history of physical diseases (e.g., brain and cardiovascular diseases, liver and kidney diseases, gastrointestinal disease, and musculoskeletal disorders), disabilities, are they smoking or not, and number of minutes of daily exercise.Moreover, the number of traffic accidents they had during the past six months was enquired about in the General Health Questionnaire (GHQ-12).

General Health
Questionnaire.The 12-item General Health Questionnaire (GHQ-12), which is the shortest version of this questionnaire (the original GHQ has 60 items), is a widely used screening instrument developed by Goldberg in the 1970s to assess current (over the past few weeks) mental health of individuals [16].In addition, this version has been used in many countries and languages [17].Each item is rated on a four-point scale (less than usual, not more than usual, rather more than usual, or much more than usual) using one of the two most common scoring methods, namely, dichotomous (0-0-1-1) or Likert type (0-1-2-3) [17].In the present study, the Likert type scoring method was used within the range of 0-36.The minimum value (0) illustrates the individual with no mental health problems and the maximum value indicates serious symptoms of mental health problems.Table 1 tabulates the items of GHQ-12.

Regression Method.
The regression method investigates the relationship between a dependent variable and independent variables.In other words, the dependent variable is modeled as a function of the independent variables as follows [18]: where Y is a dependent variable,  are independent variables,  are unknown parameters, and  is an error term.If the regression function is unknown, the function must first be estimated and a trial-and-error process must be applied to find the best function.The regression function can be linear, exponential, power, logarithmic, polynomial, or so on [18].

Poisson Regression Model.
Poisson regression or a loglinear model is a generalized linear form of regression analysis used to predict a dependent variable, including count data.Count data is a type of data in which the observations can take only the nonnegative integer.Poisson regression assumes that the response variable Y has a Poisson distribution and supposes that the logarithm of its expected value can be modeled by a linear combination of unknown parameters [18].
The mean accident frequency is  = () that can be interpreted by the Poisson distribution function.The mean and variance of the Poisson distribution are equal as follows [18]: Since the mean is equal to the variance, any factor that affects one will also affect the other.Therefore, negative binomial model is used if the observed variance is greater than the mean (known as overdispersion).Moreover, Poisson and negative binomial models are not applicable if many zero accidents are observed.Consequently, in this case, the zeroaltered probability processes, such as zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model, must be used [18].

Fuzzy and Artificial Neuro-Fuzzy Inference System (ANFIS).
The concept of a fuzzy set was introduced in 1964 by Zadeh [19,20] who was working on the problem of computer understanding of natural language which is not easily translated into the absolute terms of 0 and 1.Therefore, fuzzy logic is intended to model logical reasoning with vague or imprecise statements.The most common fuzzy methodology is Mamdani fuzzy inference method that was proposed in 1975 by Ebrahim Mamdani [21].The purpose of this method is to control a combination of steam engine and boiler by synthesizing a set of linguistic control rules obtained from experienced human operators.Figure 1(a) illustrates the general structure of a typical fuzzy logic system.
In safety and reliability analysis, a membership function () is defined by the typical convex functions as triangular, trapezoidal, rectangular, and Gaussian type that defines how each point in the input space is mapped to a membership value between 0 and 1.The shape of the membership function commonly does not affect the final results.Gaussian type was used in this study due to the fact that based on the two variables, namely, the number of minutes of exercise and mental health, this type may be more appropriate.
ANFIS is an adaptive system that is based on Takagi-Sugeno-fuzzy inference system [22].It has the ability to approximate nonlinear functions [7]. Figure 1(b) shows the structure of Sugeno-fuzzy inference system.The rules in Sugeno are as follows: where  1 and  2 are the input variables,   refers to the linguistic variables, and   is the consequent part of ith rule.Furthermore,   denotes the weights used for calculating ŷ , which is the estimate of   [20].For more detailed reading about Takagi-Sugeno structure, refer to Ismail et al. [20].

Using of ANFIS-PSO.
The problem of fuzzy inference system is the lack of flexibility in the determination of fuzzy numbers and fuzzy shapes that can better represent the expert experiences [23].Adaptive network based fuzzy inference system (ANFIS) was introduced in which the learning procedure of artificial neural networks (ANN) is performed by interleaving the optimization of the parameters of antecedent and conclusion parts [23].Some researchers reported a hybrid learning algorithm that is based on particle swarm optimization (PSO) for training the antecedent part [24].
Particle Swarm Optimization (PSO) is a search algorithm designed to simulate the social behavior of the flocks of birds and schools of fish [25].It is initiated with a population of random solutions and searches for optima by updating the generations, and the potential solutions or particles move through the problem space by following the current optimum particles [23].In the present study, hybrid ANFIS-PSO was used by MATLAB R2016a software based on a study carried out by Basser et al. [24].The obtained results were compared with the results of other conventional regression models.The reader should refer to this paper for more detailed information regarding ANFIS-PSO.
It must be noted that ANFIS-PSO needs an objective function for minimization.This objective function is rootmean-square error (RMSE) that is reported in the following equation: where   is the observed value for the ith observation and ŷ is the predicted (estimated) output based on the fuzzy  model.In addition,  is the number of training data pairs [20].
In order to overcome the overtraining issue, the subjects were randomly divided into 70% (181) for training and 30% (78) for validation.Therefore, RMSE was separately calculated for each group.

Descriptive Demographic Findings.
Table 2 tabulates the main descriptive demographic findings of the sample of taxi drivers in Tehran.It should be noted that all taxi drivers that participated were male.The mean and standard deviation for exercise duration and GHQ values were 11.66±17.89and 11.62±6.24,respectively.

Validity and Factor Structure of General Health
Questionnaire-12 in Iranian Samples and Taxi Drivers.The validity and factor structure of GHQ-12 have been widely investigated in different Iranian samples.In a study [26], the validity, reliability, and factor structure of GHQ-12 were examined among university students in Iran, and it was shown that this questionnaire has a two-factor structure.
Construct validity of this instrument was also examined by the calculation of correlations between the subscales of the questionnaire and the total score, indicating a significant correlation.In the mentioned study, Cronbach's alpha coefficient and Spearman-Brown split half reliability were estimated at 0.92 and 0.91, respectively.In the present study, the same results were obtained for factor structure and construct validity with Cronbach's alpha coefficient of 0.839 and the Spearman-Brown split half reliability of 0.799.Furthermore, the data consisted of 183 zero values for the dependent variable (property damage only), which were slightly more than 70% of the sample size.Therefore, zeroinflated Poisson regression model was calibrated to investigate the necessity of data calibration by this model type.For this purpose, we calculated the Vuong test which provides a test of the zero-inflated model versus the standard Poisson model.The Z value of this test was 0.28 indicating that zeroinflated Poisson model is not better than standard Poisson model [27].Regarding this, only the results of multiple regression analysis and standard Poisson regression model were reported in this study.

Findings of Multiple Regression Analysis and Standard Poisson Regression Model. As illustrated in
Table 3 summarizes the result of calibration of multiple and Poisson regression model.A correlation analysis was conducted between exercise and general health variables to assure that these two variables are independent.The Pearson correlation coefficient of -0.1 was obtained for these two variables; as a result they can be concluded to be independent variables.The variance inflation factor (VIF) is also computed to be 1.01 indicating the lack of serious multicollinearity issues in the model.Figure 2 illustrates the residual plots for multiple and Poisson regression.Checking the residual plots is important in testing the model adequacy.Residual plots are used to assess the data for nonnormality, nonrandom variation, nonconstant variance, and outliers.In the Normal plot of residuals the residuals followed a straight line, which is indicative of the normal distribution of the errors; in Figures 2(a) and 2(b) histogram of the residuals is bell-shaped indicating the absence of skewness and outliers.Based on residuals versus fits (log of fits for Standard Poisson Regression) plot, the residuals scatter randomly around zero; hence, there is no evidence of nonconstant variance.The residuals in the residuals versus order plot fluctuate in a random pattern about the center line and therefore no evidence exists that the error terms are correlated with one another.Therefore, residual analysis determines that the model fits the data adequately.

Findings of ANFIS-PSO Model. Figures 3(a) and 3(b)
depict the results of the prediction performance of ANFIS-PSO for training (70%) and testing (30%) data, respectively.In each part the mean and standard deviation of residuals, RMSE (see (5)) and differences between the targets and output of ANFIS-PSO were plotted for comparison.

Comparisons between Poisson Regression and ANFIS-PSO
Forecasts.Figure 4 depicts the 3-dimension (3D) surface plot for comparisons between Poisson Regression and ANFIS-PSO.

Discussion
The major objective of the present study was to investigate the relationships among sociodemographic variables, general health, and number of minutes of daily exercise of taxi drivers of Tehran with PDO traffic accidents they had based on an instrument consisting of GHQ-12 and some sociodemographic variables.For this purpose, the current study used multiple and Poisson regression analyses and ANFIS-PSO method for determining their goodness of fit and prediction power regarding traffic safety.
Based on demographic variables, the majority of taxi drivers (79.9%) aged ≥ 41 years.The mean age of 48.5 years for taxi drivers is consistent with aging problem of taxi drivers [28,29].Only 11.2% of the subjects had an academic degree and 38.6% of them reported current or previous prominent known disease.The rate of smoking is high among the taxi drivers of Tehran (32.8%) compared to prevalence of cigarette smoking 10.6% in Tehran [30].
After the calibration of many models, the results showed that such variables as age, education level, history of the current or previous disease, smoking, and physical disabilities were not significant in the prediction of PDO traffic accidents among taxi drivers of Tehran.However, this finding is  contrary to the results of a number of previous studies [15,31].This result might be due to the fact that taxi drivers spend much of their time in traffic and at the wheel; accordingly, they are expert in this field.In this regard, the socioeconomic variables such as age are more effective in other traffic accidents involving injury and death.Therefore, it is required to perform further investigations in this area to fully understand this phenomenon.
Despite the nonsignificance of socioeconomic variables, the results of this study revealed that number of minutes of daily exercise and general health played a significant role in the prediction of PDO.The results of multiple regression and Poisson regression demonstrated that, with increasing minutes of daily exercise and decreasing the mental health problems, the number of PDO decreases among taxi drivers.This finding is in line with those of previous studies indicating that driving performance is affected by health related changes [31].In addition, physical activity was shown to be a protective factor for hypertension among taxi drivers in Brazil, when considering the deleterious effect of time as a taxi driver [12].It is noteworthy to emphasize the results of overall goodness of fit based on R 2 since the two calibrated regressions were low (only 8.51% for multiple regression and 11.07% for Poisson regression).Despite the low fit criterion, it should be considered that if only two variables, namely, exercise and general health, can explain nearly 10% of variations in PDO, they would be considerable, especially by focusing on these variable besides other factors previously determined in other studies.However, as depicted in Figure 2 the residuals were not distributed normally; therefore the basic assumptions of regression were not fulfilled.
The ANFIS-PSO model does not have the problem of regression assumptions; moreover the results of this model were better in the prediction of PDO due to its lower RMSE.In this regard, the prediction power of the ANFIS-PSO  model was obtained at 0.51, while those of multiple and Poisson regression were estimated at 0.56 and 0.88, respectively.This finding is in line with those of previous research demonstrating that fuzzy logic systems and artificial intelligence technique produce favorable results for highway safety studies [32].Furthermore, it is interesting that the output of ANFIS-PSO was relatively different from that of Poisson regression.With regard to Figure 4, Poisson regression suggested a continuous relationship in which the number of PDO exponentially increased with lower minutes of exercise and higher values of general health problems.On the other hand, the results of ANFIS-PSO demonstrated that these two variables exerted a low linear impact, except for minutes of exercise less than 10 minutes and general health problem greater than 20.
The results of ANFIS-PSO could bring important implications for policy making.For instances, doing daily exercises for more than 10 minutes would substantially decrease the risk of PDO accidents among taxi drivers.The literature contains similar results regarding the positive effects of exercise on medical conditions, such as hypertension [12].
A number of strategies, such as the awareness of risks behaviors and periodical training, have been suggested for the prevention of taxi accidents in a study performed in China [33].Another strategy suggested in this research is the provision of policies encouraging the taxi drivers to participate in exercises to keep healthy and reduce property damage accidents.

Conclusions and Future Directions
The frequency of traffic accidents is a good measure for studying traffic accidents.Many researches have tried to identify the factors influencing traffic accidents using different types of models [34].
The findings showed that ANFIS-PSO could be effectively implemented in accident frequency studies.The policy makers may simply adopt an intervention to encourage the taxi drivers spending much of their time on traffic, to perform more daily exercise.Daily exercise would not only bring this population wellbeing but also reduce the risk of PDO accidents.In addition, the improvement of mental health by providing training programs for taxi drivers may result in fewer traffic crashes.Consequently, taxi drivers are vulnerable to many risks and need more attention.
Given the impact of culture and driving style in Iran and Tehran on many aspects of human factors [1], it is recommended to perform further studies to consider the cultural approaches in addition to mental and physical health.Moreover, based on the evidence, the number of accidents in Tehran varies in different seasons of the year [11].Therefore, it is appropriate to perform more studies addressing this issue in different seasons and produce conclusions based on the differences in seasonal trends.
The data of this study are based on self-reported accidents; therefore, they may be data specific.Moreover, physical activity and general health may be influenced by age, educational level, income, and other demographic variables that need further investigation.Finally, this study focused on PDO; regarding this, more investigations are required to study the other types of traffic accidents, which may result in different findings for those types of accidents.
General structure of fuzzy logic system

Figure 1 :
Figure 1: (a) The general structure of a typical fuzzy logic system and (b) Sugeno-type fuzzy inference system [6].
Residual plots for standard Poisson regression

Figure 2 :
Figure 2: (a) Residual plots for multiple regression and (b) for standard Poisson regression.

Figure 3 :
Figure 3: (a) General structure of a typical fuzzy logic system and (b) Sugeno-type fuzzy inference [7].
Modelling.Collected data were analyzed using Stata 14 (for zero-inflated Poisson regression), Minitab 17 (for multiple regression analysis and standard Poisson regression), and MATLAB R2016a (for ANFIS-PSO model).These models were compared based on root-meansquare error (RMSE) values in which the lowest values of RMSE indicate the best model in terms of prediction.

Table 2
zero-inflated Poisson regression models.After attempting for calibrating several models, it was found that only two variables, namely, the number of minutes of exercise and general health values, were significant in the models.Therefore, only these two variables remained in the final models.
a The taxi drivers had not experienced injury or fatal accidents during the past few months.