Analysis of Work-Zone Crashes Using the Ordered Probit Model with Factor Analysis in Egypt

Work-zones, due to their nature, are predisposed to hazardous situations. This is a consequence of conducting construction work within the vicinity of, or near, vehicular traffic. The exposure to danger underscores the need for proper understanding of the occurrence of work-zone crashes, as well as the risk factors associated with them. This paper aims mainly to develop a hybrid approach that combines a factor analysis method and an ordered probit model to carry out a comprehensive analysis of workzone crashes. The paper presents an analysis of work-zone data crashes from 2010 to 2015 that occurred in Egyptian long-term highway maintenance and rehabilitation projects. Factor analysis is used to identify the main and common factors that influence work-zone crashes and to calculate the weight of every factor.The ordered probit model is developed, based on the results of factor analysis scores, to examine the contribution of common factors in the severity of work-zones.Themost influential factors that have contributed to work-zone crashes are weather condition, number of lane closures, type of surface construction, road character, day of week, and so forth. In addition, the results indicated that four common factors are significantly affecting the severity of work-zone crashes in Egypt.


Introduction
For the last few years, Egypt has recorded such a large number of traffic crashes that "death on the road" is frequently seen on signs as a way of describing highway safety in Egypt.The fatalities from road traffic crashes for 2013 were 10,466, which had a great impact on the emerging economy [1].As highway infrastructure ages, there is a growing need for regular maintenance and rehabilitation.This leads to an increase in the number of work-zones on highways, with the attendant disruption and interruption of traffic flow, resulting in traffic safety problems.Closing off highways to traffic, while maintenance and rehabilitation works are ongoing, is usually unrealistic.This leads to only a partial closure, with redirection of traffic into alternative lanes and routes.Scenarios like this are typical, creating traffic work-zones, resulting in constricted space for traffic.As a result of all this, the safety of work-zones occupies a place of extreme importance in Egypt and the Middle East in general.Therefore, the pressure on government and road safety managers to institute measures that can reduce the frequency of road crashes and the severity of injuries is on the increase.However, no previous research about work-zone crashes in Egypt has been published.Thus, research work on crash severity may lead to the discovery of factors that influence changes in severity and that may be potentially beneficial in developing traffic controls, to minimize the incidence of severe highway crashes.
This paper mainly aims to develop a hybrid approach, which combines the factor analysis and ordered probit model to carry out a comprehensive analysis of work-zone crashes.The first stage, the factor analysis method, is used based on work-zone crash data elements to determine the weight of each factor to classify the main factors and identify the common factors.In the second stage, the severity model is developed based on common factor scores to determine the relative effects of each common factor.An ordered probit model is an appropriate model to identify the characteristics that contribute to crash severity, because crash severity, as a dependent variable, is ordered in nature [2][3][4].the ordered probit models in predicting injury level for crashes at construction zones, involving trucks, to identify characteristics that contribute to the severity of accidents involving older drivers, while [2] applied this technique to show the similarities and differences in factors contributing to crash injury severity on roadway sections, signalized intersections, and toll plazas; he concluded that results produced by the model were the best, besides it is worth due to its simplicity.Thus, he recommended using the order probit approach to model driver injury severity.Additionally, the ordered probit model was used for crash severity level, and as a result, the study encouraged brief reporting of occurrences, as this made their retention and storage in crash databases easier [21].On the same note, [4] used the same model in investigating the risk of different injury levels sustained in two-vehicle crashes and single-vehicle crashes, with the conclusion that pickups and sports vehicles were less safe than passenger cars.Also, the study concluded that males and younger drivers, in newer vehicles at lower speeds, suffered less severe injuries.The same technique was applied by [22] to check factors influencing the severity of injury faced by motor-vehicle occupants.According to the authors, rural areas recorded more serious crashes than in urban areas, and women were more likely to be involved in serious or fatal injuries crashes than men.Qi et al. [23] conducted a study on rear-end crashes in the work-zone, through the stepwise elimination selection method, for calibrating the ordered probit model.A limitation of the methodology is that it lacks vital information, such as various vehicle characteristics, external factors, such as light and weather conditions, and the drivers' age and sex, in their severity model.According to [24], surface conditions contributed to the severity of work-zone crashes.For instance, dry conditions were more dangerous and severe than wet ones.Furthermore, middleaged drivers were more susceptible to work-zones crashes, according to [25].
Xu [26] applied decision tree analysis, correlation analysis, and cluster analysis as data mining methods to analyze traffic crash data.Sun et al. [27] developed the Analytic Hierarchy Process (AHP) considering causes and environment, road, and the influence of people on traffic crashes.Cheng [28] studied the correlation between traffic crashes and different factors were analyzed.The study concluded that traffic crashes had a significant correlation with drivers, number of vehicles, and population.Chen et al. [29] conducted a study using factor analysis to study the major factors contributing to road traffic crashes.The researchers concluded that the major factors in crashes depend on their weights, faulty behavior, driving experience, the condition of the vehicle, the purpose of the vehicle, and so on.On the same note, Yuan et al. [30] used the same technique to determine the weight of main factors through the factor score matrix.
This study is an attempt to bridge the gap in knowledge of the work-zone crashes in Egypt.To our knowledge, no previous research about work-zone crashes in Egypt has been published before.This study is among the first to investigate the factors affecting the severity of Egyptian highways workzone crashes.Second, the results and conclusions of this paper can provide insightful information for stakeholders in the planning and management of work-zones.

Data Collection
The focus of this paper rested on work-zone crashes recorded during highway maintenance and rehabilitation projects, on both urban and rural highway facilities, with long-term projects (duration greater than one year) with both day and night work.In Egypt, the Ministry of the Interior has a traffic department, with a research unit responsible for managing the database of national road crashes.A subsidiary of the Ministry of Transport, the General Authority for Roads, Bridges and Land Transport (GARBLT), also regularly collects crash data, especially those that occur on federal highways.Crash data of about 345 cases, spanning 2010 through 2015, was used in this paper.Crash variables extracted from the database for each incident included information on the driver, vehicle information, time, characteristics of the road, work-zone information, and environmental conditions.Six tables were used in presenting the data that was obtained.These included fault drivers

Methodology
. .Factor Analysis.Factor analysis [31] is a method of statistics which aims at finding the most important factors that can cause an event, for a number of plausible reasons.It is a collection of various methods, used to study the reason behind underlying constructs that may, or can, influence the responses to the values of observed data.Its main aim is to determine the most important and common factors that are influencing a set of observed measures.In this study, we apply the principal component method to extract maximum variance from the set of work-zone crash variables and put them into common factors as index for the analysis.The general model of factor analysis, according to [32], is given by Here  is the observable random vector (i.e., original observed variables);  is the common factor of ,  is the coefficient of  (factor loading matrix),   is the correlation coefficient,   is the error factor, and B is the special factor of X (usually ignored when analyzing).  determines the degree or strength of a linear relationship between the variables that are present in the rows and columns of the matrix.The higher the correlation is, the stronger that relationship will be.
The process of factor analysis involves normalizing the  matrix so that its mean value is 0 and its variance is 1, under the assumption.Then we need to assume that  and   are independent of each other.Here, X has m common factors and is known as the factor model.After that, the correlation coefficient matrix, i.e.,  = () * , and its latent root, i.e., , are obtained.The last step is to determine the number of factors that are common and relevant and add up the common variance, i.e., ℎ 2  , and then eventually the common factors are obtained by rotating the loading matrix.
. .Ordered Probit Model.Ordered probit model is the other statistical modeling methodology that has been used in this paper.This model is usually built on the idea of a latent factor underlying the injury risk propensity from road crashes.These road crashes were studied to determine the observed ordinal fatal injury crash reasons [33].The general specification of the model is given by [34] Here   is the latent risk propensity for crash victim ,  is the estimated vector of parameters, and   is the vector of observed nonrandom explanatory variables.It measures the various attributes of crash victims , and   is the random error term that follows a standard normal distribution.Hence, its mean is 0, and the variance is 1 for   .
The standard ordinary least square regression technique cannot be applied here, as the dependent variable of this model,   , is unobserved.It is also reasonable to assume that a high risk of severity is related to observed fatalities, denoted by   .This relationship has been described in the following equations [35,36]: Here  = { 1 . . .  . . . -1 } represent the threshold values for crashes that define   , corresponding to integer ordering, and  is the highest-ordered severity level.Now, it may be noted that the probability of a crash victim  facing a level  severity is always equal to the probability that   (i.e., the latent risk propensity) assumes a value which is between two fixed thresholds.It can also be explained as follows: if we are given the value of   , then the probability that the severity faced by crash victim  belongs to each severity level is given by Here Φ is the cumulative normal distribution function.For estimation, it can be written as where   depicts the lower threshold and  + depicts the upper thresholds for the severity level .Now, for all the probabilities to be positive, the threshold values must satisfy the restriction  < . . .<uk < . . .<uk-1.The understanding of the effect of individual estimated parameters involves the computation of these probabilities.SPSS has been used to develop the crash model.Crash severity was considered in three levels for severity models in this paper including (i) no injury, (ii) injury, and (iii) fatal injury.

Results and Discussion
. .Factor Analysis.Factor analysis is a statistical approach used to reduce data dimensionality.In this study, factor analysis was conducted on the high-dimensional work-zone data to simplify the process of understanding the data.The factor analysis model was prescribed using the SPSS statistics program.To verify the suitability of the data for the analysis, the fitness test was carried out using the KMO (Kaiser-Meyer-Olkin) and Bartlett test of sphericity.The KMO test measures the adequacy of samples for each variable in the model and for the complete model.The value ranges between 0 and 1.When the KMO approaches 1, it implies a high correlation between variables, and the data is suitable for factor analysis.In this paper, the KMO value of 0.67 obtained suggests that the data is well suited for factor analysis [37,38].In the case of Bartlett test of sphericity, the Chi-square value of 1786.156, with 78 degrees of freedom, was found to be significant at 0.000.The result here demonstrates a good correlation between variables and proved to be suitable for factor analysis.To ensure reliable results, we apply three tests, i.e., the eigenvalue, scree plot, and parallel analysis test to identify the number of factors that reasonably explain greater percentage of the data variability.
The eigenvalue (EV) represents the proportion of variance accounted for by each factor.In Table 2, from the initial eigenvalue, the percentage of variance explained by the first principal factor was 24.466% and the percentage explained by The number of common factors also can be known by parallel analysis test and scree plot.Figure 1 shows a scree plot and parallel analysis test consisting of 13 factors.From the fifth factor, the eigenvalue of each factor declined more slowly.Table 3 illustrates the results of parallel analysis test for each factor.According to the results of the test methods, we should select the first five factors as common factors.It is significant to state that one of the motives for conducting factor analysis in this study is to decrease the large number of variables that describe a complex phenomenon such as work-zone crash severity analysis to limited interpretable Weather condition Weather condition latent variables known as a factor.In essence, this section seeks to identify a smaller number of interpretable factors that explain the maximum amount of variability in the data.The principal component method was used to extract the factor loading matrix.The varimax orthogonal method has also been used to diffuse the factors that are common, by factor rotation.The loadings of the rotated factor loading matrix further differed from each other.High loadings were dispersed close to the matrix diagonal, which suggested that each common factor is mainly associated with a few factors.This study adopts a strategy which is named the common factor, based on the communal idea of influential variables taken into account in the common factors.In this way, we can better comprehend the function implication of each common factor.The five factors with high loadings as shown in Table 4 are (number of lane closures = .901,type of surface construction = .830,road character = .830,and road class = .813)related to work-zone information being a feasible and functional countermeasure for reducing the severity of crashes occurring in work-zones.Similarly, the four factors with high loadings (day of week = .887,month = .852,and time = .843)on factor two reveal information about visibility and traffic condition.Factor three reveals information about the crash characteristics (crash type = .869,number of the vehicles = -.603, and vehicle type = .593).In the case of factor four, the variables with high loading are gender and age, all related to driver skills and ability to operate safely under different work-zone configurations.Finally, there is the weather condition factor.The names assigned to the five common factors are shown in Table 5.
Based on the regression method factor score matrix was extracted.Each powerful factor had scored on every common factor.Through normalizing of the variance percentage of the five common factors, the weights of factors were calculated.The normalized weight Ai is estimated using the data of rotated loading quadratic sum.The absolute value Ak (k is the number of influential factors) of the influential factor score of each common factor was extracted in Table 6, and then normalization was proposed.The score Ak after normalization represents the weight of the influential factor in relevant common factor.Then the weight Wk of the influential factors of the work-zone crashes was deduced.The weights of main influential factor are shown in Table 7. Figure 2 represents the weight of 13 influential factors Wk.
The results showed that the weather condition factor is the most important factor that explains high amount of the variability in the original 13 variables, followed by number of lane closures, type of surface construction, road character, day of week, and so forth.The first five influential factors had a noticeable effect on work-zone crashes because their individual weights are significant (i.e., ≥ 8%).The results in this section are consistent with findings in the literature.The impact of weather condition on work-zone crash severity is consistent in the research by [39].Regarding traffic safety at freeway work-zone, Ozturk et al. [40] and WuBiao et al. [41] concluded that number of lane closures has a significant relationship.Road character and day of week were found to be significantly associated with work-zone severity [9,10,37,39]; these conclusions were consistent with the findings in this paper.
. .Estimations of Ordered Probit Model.The ordered probit model was specified using SPSS.The analysis consisted of 345 observations of work-zone crashes on 13 factors.The factor analysis identified five factors, which were used as inputs for the ordered probit model.The factors were tested for statistical significance (p value < 0.05) to find out the factors that significantly influence injury severity crashes at highway construction work-zones in Egypt.The results of the ordered probit model are presented in Table 8.Four factors were identified as significant factors associated with crash injury severity in work-zones.
As estimated in the ordered probit model, the crash characteristics factor has a better safety performance in terms of reducing the probability of the average injury risk, as compared with advanced information factor, driver skills factor, and weather condition factor.In other words, crashes   that occurred under the influence of crash characteristic factor are more likely to have a nonfatal injury than under other factors.Based on the result, the visibility and traffic factor does not have a significant impact on injury severity of work-zone traffic crashes.The coefficient of advanced work-zone information factor is found to be positive and thus points to the fact that inadequate advanced information about construction work-zones, on Egyptian highways, tends to increase the probability of fatal injury crashes in workzones.In this regard, [42] demonstrated that road traffic crashes occur for one of three key factors.The first reason, related to information flow, is a perceptual error.This occurs, for instance, when work-zone conditions are not perceived early enough to allow for sufficient driver perception-reaction time.
The increase in driver skills factor will decrease the probability of fatal injury in work-zone crashes.One possible reason may be the lack of sufficient driving skills for safe maneuver in the event of unfavourable traffic and roads.Several studies have been conducted to investigate the factors, relating to driver skills, which are influential in predicting fatal injury crashes.A study by [43] found that young drivers are overrepresented in road crashes and fatalities and that one approach to improving their safety is the provision of advanced training.Beanland et al. [44] reviewed the literature on the efficacy of advanced driver training with the aim being to assess its effectiveness as a means of reducing young driver crash-involvement.Their report indicates that various forms of prelicense and postlicense training have been found to be effective, for skill acquisition and enhanced driving performance.The results revealed that crashes that occurred under the influence of weather condition factor have higher probability of fatal crashes.This finding is consistent with previous studies [15,39] and in contrast with the previous study [13].
The probability to be involved in no injury and injury crash rather than fatal crash will increase significantly if the crash occurred under the influence of crash characteristic factor.That may be explained by the fact that crashes involving heavy vehicles often result in multiple-vehicle crashes as opposed to when crashes occur without heavy vehicles, because heavy vehicles have reduced breaking abilities.This is explained by the excessive weight carried on trucks in Egypt; more than 96% of Egypt's goods are transported on trucks [45].
Hence, the influence of the factors on the risk of the crash can be ranked by comparing the size of the estimated coefficients of the factors.In that regard, it can be concluded that advanced information factor apparently has the highest and greatest impact on injury level for a crash occurring at a work-zone (with  = .161),whereas weather condition factor appears to have a relatively low risk ( = .132).In terms of magnitude, for example, crashes that occurred under the influence of advanced information factor are 1.22 times crashes that occurred under the influence of factor weather condition, assuming all factors remain being equal.

Conclusion and Recommendations
In this paper, a hybrid approach, which combines an ordered probit model and factor analysis method, was developed to carry out a comprehensive analysis of the injury severity of crashes in highway construction work-zones in Egypt.The results of the factor analysis revealed the important and main factors that are influential in determining crash severity were weather condition, number of lane closures, type of surface construction, road character, day of the week, and so forth.The advanced work-zone information factor, visibility and traffic factor, crash characteristic factor, driver skills factor, and weather condition factor are common factors.The percentage variance explained by the five factors is 73.278%.We applied the results of the factor analysis method to calibrate the ordered probit model to examine the influence of these factors in predicting injury severity of work-zone crashes.The model estimation results showed that four factors (advanced work-zone information factor, crash characteristic factor, driver skills factor, and weather condition factor) are significant risk factors associated with work-zone crashes.In addition, the results showed that the weather condition factor had significantly great influence on severe work-zone crashes in Egypt.
Based on the results obtained in this study, four factors which had a massive influence on severe injury workzone crashes were found in this study.This means that the proportion of 55.87% variance in work-zone risk could be explained by four factors.Thus, this paper offers the following suggestions that can help to reduce the frequency and severity of work-zone crashes, for improved traffic safety in highway construction work-zones in Egypt.
Driver skills training: programmes designed to offer regular skills training to prelicensed and postlicensed drivers are required to reduce the incidence of bad driving behavior, for improved road safety.Besides, existing countermeasures that work well in other countries can be adapted to Egypt, such as graduated driving licensing programmes [46][47][48] and additional precincts to minimize exposure of novice drivers to work-zone crashes [49,50].
Provision of advanced information: the fact that most of the crashes that occurred were associated with human errors infers that advanced knowledge of road and traffic conditions will assist drivers to prepare and take action to avoid crashes adequately.In this regard, we recommend the development of effective education programmes to educate the general public as well, who have to travel through work-zones.In addition, ITS technologies such as Dynamic Message Signs (DMS), at a reasonable distance ahead of the work-zone, may be an effective means of compensating for adverse design, human factors, and roadway conditions.This paper has some limitations, which may affect the result and interpretations; some information (seat belt, alcohol, etc.) was not taken into account, due to the lack of such information in the database.It is recommended that all of this information be collected in the work-zone crash database and used for model calibration in the future.
table, environmental table, workzone information table, road character table, vehicle and crash characteristics table, and crash severity table.Table 1 illustrates the frequency distribution and descriptive statistics of the influential factors.

Table 1 :
Characteristics and descriptive statistics of factors.

Table 3 :
Parallel analysis results.

Table 5 :
Name of the common factors.

Table 7 :
The weight of each influential factor.

Table 8 :
Ordered probit model estimation results.