Predicting Free Flow Speed and Crash Risk of Bicycle Traffic Flow Using Artificial Neural Network Models

Free flow speed is a fundamental measure of traffic performance and has been found to affect the severity of crash risk. However, the previous studies lack analysis andmodelling of impact factors on bicycles’ free flow speed.Themain focus of this study is to develop multilayer back propagation artificial neural network (BPANN) models for the prediction of free flow speed and crash risk on the separated bicycle path. Four different models with considering different combinations of input variables (e.g., path width, traffic condition, bicycle type, and cyclists’ characteristics) were developed. 459 field data samples were collected from eleven bicycle paths in Hangzhou, China, and 70% of total samples were used for training, 15% for validation, and 15% for testing.The results show that considering the input variables of bicycle types and characteristics of cyclists will effectively improve the accuracy of the prediction models. Meanwhile, the parameters of bicycle types have more significant effect on predicting free flow speed of bicycle compared to those of cyclists’ characteristics. The findings could contribute for evaluation, planning, and management of bicycle safety.


Introduction
Traffic safety and crash risk of both motorized vehicles and bicycles are the high-priority issues to traffic engineers and researchers [1][2][3][4].Recently, with the rapid growth of bicycles (including classic bicycles and electric bicycles) in developing countries such as Vietnam, Malaysia, Indonesia, and China, there have been many efficiency and safety problems for bicycle traffic flow.Although there are many significant environmental, climate, congestion, and public health benefits for cycling, bicycle crash is still a serious issue [5].According to the statistical data from the Ministry of Public Security in China [6], the percentages of deaths and injuries of cyclists in all travel modes have been increasing, up to around 15% and 17%, respectively.In 2012, there were nearly 9,000 people who died in bicycle traffic crashes in China.Therefore, the improvement of bicycle safety is very important and urgent for both traffic engineers and researchers.
Speed is a fundamental measure of traffic performance of a highway system and can be widely used to describe the condition of the traffic flow and as an input for travel time, delay, and level of service determination [7].Meanwhile, speed is also an important factor in road safety.There have been lots of studies having found that speed not only affects the severity of a crash but is also related to the risk of being involved in a crash [8].There is a strong relationship between crash risk of motorized vehicles and speed under free flow conditions [9].Similar conclusions can also be found for bicycle traffic flow [10].Therefore, modelling and analysis of impact factors on bicycle free flow speed or crash risk are very useful and will provide the basis for improved bicycle traffic safety.
The previous studies on bicycle speed focus on the determination of bicycle free flow speed and speed distribution.Liu et al. [11] reported the mean of observed bicycle free flow speed was approximately 14 kph.Wei et al. [12] reported that the peak-hour free flow speeds of bicycle with and without separated barrier are 18.2 kph and 13.9 kph, respectively.According to Allen et al. [13], the bicycle free flow speed appears to be somewhere between 10 kph and 28 kph, with the majority of the observations being between 12 kph and 20 kph.Cherry [14] found free flow speeds of  [16].In terms of bicycle speed distribution, Dey et al. [17] proposed a speed distribution curve model under mixed traffic conditions, including both fast-moving vehicles (e.g., cars/jeeps, trucks/buses, two-wheelers, and three-wheelers) and slow-moving vehicles (e.g., bicycles and tractors).Lin et al. [15] used the lognormal distribution to fit the heterogeneous bicycle speed data.Wang et al. [18] analysed the impact of various factors on the speed of heterogeneous bicycle flow and used normal distribution to fit the bicycle speed samples.Most studies emphasize modelling the relationships between free flow speed and such factors as geometric features, traffic characteristics, traffic control, environmental features, weather conditions, and driver's experience and characteristics [19][20][21][22][23].However, the majority of existing models are only applicable to predict the speed for cars [23].The impact factors on free flow speed for motorized vehicles are significantly different compared to bicycle traffic.To the best of our knowledge, there was little research focus on modelling the affecting factors on bicycles' free flow speed.The authors believe that this research would be helpful in evaluating and improving the safety of bicycle traffic flow, particularly at high heterogeneous bicycle flow locations.
The contribution of this paper is to develop artificial neural network (ANN) models to predict free flow speed for bicycle traffic with considering some impact factors.Four different models, namely, Model 1, Model 2, Model 3, and Model 4, were developed considering different categories of contributing factors.The characteristics of different models were analysed and compared.It is expected that the developed models may be useful for future prediction of bicycles' free flow speed or crash risk under different cycleway features, traffic conditions, bicycle types, and/or characteristics of cyclists.

Data Collection
2.1.Model Parameters.Selection of model parameters is a critical task to model the relationship between bicycle free flow speed and its contributing factors.Based on the literature review and analysis of bicycle traffic flow [16], the input parameters of the proposed models could be divided into the following four groups: cycleway features (e.g., cycleway width, pavement condition, and geometric feature), traffic conditions (e.g., flow, speed, and density), bicycle types (e.g., electric bicycles), and characteristics of cyclists (e.g., age, gender, and alcohol consumption).Because the cycleway conditions are good and separated with motorized vehicle by barriers at the survey sites, the pavement conditions and geometric features have little effect on cyclists.Therefore, only cycleway width (CW) was considered in the proposed models.The traffic conditions category only includes bicycle flow per hour per meter (BF).Bicycle types in China typically consist of three categories: classic bicycle (CE), bicycle-style-electric-bicycle (BSEB), and shooterstyle-electric-bicycle (SSEB).The bicycle type parameters hence include percentage of BSEBs (PBS) and percentage of SSEBs (PSS).Considering the difficulty of bicycle data collection, four characteristic parameters of cyclists including percentage of male cyclists (PMC), percentage of young cyclists (PYC), percentage of middle-aged cyclists (PMAC), and percentage of loaded cyclists (PLC) were selected in this paper.The selected input parameters of models are listed in Table 1.

Data Survey.
Field bicycle data used in this study were collected from eleven cycleways in Hangzhou, China.The widths of cycleway range from 2.27 to 4.60 m.All of the survey sites are straight and low gradient, located at least 100 meters away from intersection, and separated with motorized vehicle lane.The cameras were set up on the roadside of the cycleway to record the operation of bicycle traffic.Video surveillance application could record the movement of bicycles and the flow and speed can be automatically calculated.The other parameters (e.g., bicycle type and age and gender of cyclists) could be recorded and coded manually.In this paper, bicycle type consists of three categories: CE, BSEB, and SSEB.Cyclists' genders are easily distinguished and recorded by investigator.According to cyclists' age, the cyclists were divided into three groups: the young (under 40), the middleaged (between 40 and 60), and the elderly (over 60).The loaded cyclist means a cyclist who is carrying something (including an object or a person) on his/her bicycle.From the collected bicycle data, the descriptive statistics of model parameters can be found in Table 1.From the table, it can Because it is difficult to determine which traffic conditions are of low volumes and densities, the 85th percentile speed is usually used as the free flow speed [24].The 85th percentile speed of bicycle is the speed below which 85 percent of cyclists travel and is the most frequently used for speed limit design.
The TRB special report also shows that the 85th percentile speed is an important descriptive statistic in evaluating road safety [25].Therefore, in this study, we use the 85th percentile speed of bicycle flow as the free flow speed and the crash risk indicator for the evaluation of bicycle safety.

Artificial Neural Network Models
Artificial neural networks are a family of statistical learning models inspired by biological neural networks and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown [26].ANN models are widely used in modelling free flow speed of motorized vehicles [27][28][29][30][31].

Description of Models. An ANN model can include multiple input variables to predict multiple output variables.
In this study, four ANN models were developed with or without considering different input variables such as bicycle types and characteristics of cyclists.The purpose of including different input variables in the modelling of bicycles' free flow speed and crash risk is to analyse and assess whether or not the selection of input variables will affect the performance of the developed ANN models.Table 2 lists four categories input parameters used for each model.Cycleway features and traffic flow parameters have been proved to have important effect on free flow speed; hence, these two parameters are always included in four models.Model 2 and Model 3 include the input parameters of bicycle types and characteristics of cyclists,

Input layer
Hidden layer Output layer  2 indicate that the input parameters are included in the modelling process as input variables.
The developed four ANN models will provide selectivity and flexibility for considering suitable input variables in the prediction of free flow speed of bicycle flow.

Network Architectures.
A back propagation ANN (BPANN) model, as shown in Figure 1, was introduced for modelling bicycle free flow speed.BPANN model is one of the most well-known ANN models applied in many areas [26].The goal and motivation for developing the backpropagation algorithm are to find a way to train a multilayered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output.
The three-layer BPANN architecture of this study is listed in Figure 1.Multilayer BPANN is a layered parallel processing system consisting of input layer, output layer, and hidden layer [32].According to Figure 1, , , and  are subscripts for input, hidden, and output layers, respectively.The number of input and output parameters and hidden nodes is , , and , respectively.The number of nodes in input and output layers ( and ) corresponds to the number of input variables and output variables.The number of nodes in the hidden layer () should be determined by the network designer and number of input and output variables.The weight factors for hidden layer and output layer are   and   , respectively.The values  and  are problem dependent.In this study, the values of  are 2, 4, 6, and 8 for Models 1, 2, 3, and 4. The output parameter is free flow speed, and thus the value  is one.
The number of nodes in hidden layer has significant effect on the performance of BPANN models.According to the previous research, the number of nodes in hidden layer should meet the following conditions: where  is an integer between 0 and 10.Because the four models have different numbers of input variables, we use the same number of nodes in hidden layer for comparison.Therefore, in this paper, the number of nodes in hidden layer was set as 10 for all of models.

BPANN Algorithm Process.
The backpropagation learning algorithm for ANN can be divided into two phases: propagation and weight update.The detailed algorithm processes are listed as follows.

Calculating the Inputs and Outputs in Hidden
where  ℎ and   are the critical values of neurons in hidden layer and in output layer, respectively.(⋅) is the logarithmic sigmoid transfer function which is represented by the following equation: (4)

Calculating Partial Derivative of Error Function.
The partial derivative of the error function for each neuron in output layer can be expressed as follows: ( The partial derivative of the error function for each neuron in hidden layer can be also expressed as follows:

Adjustment of Interconnecting
Weights.The adjustment of interconnecting weights for hidden layer and output layer is expressed as where Δ  and Δ  are the changes of weight values for hidden and output layers;  is the number of iterations;  is the learning rate, a parameter selected for the magnitude of change in interconnecting weights.

Calculating the Total Error. The total error of all training samples can be calculated as
where  is the serial number of training samples and  is the number of training samples.

Iteration Termination Conditions.
If  <  or the number of iterations is larger than preset maximum learning number , then stop the ANN algorithm and output the results.Otherwise, return to the second step and begin the next learning iteration.

Results and Discussion
The BPANN codes were developed using a commercial software named MATLAB.The field bicycle data for training, validation, and testing are collected from eleven bicycle paths in Hangzhou, China [33].459 samples were collected, and 70% of total samples (321 samples) were used for training, 15% (69 samples) for validation, and 15% (69 samples) for testing.Detailed descriptive statistics of field data can be found in [16] and Table 1.The BPANN with 2-10-1, 4-10-1, 6-10-1, and 8-10-1 architectures for Models 1-4 are trained and validated.The trained models were tested by 69 samples which were not used in the training and validation stages.
Before training, in order to improve the training performance of the BPANN, it is often useful to scale the field input variables so that they always fall within a specified range.Therefore, in this study, field sample data is normalized in the range [0 1] by using the following formula: The strength of each training, validation, and testing stage was evaluated by calculating the error and regression coefficient .Learning performance plots of four BPANN models are shown in Figure 2, and the regression analysis plots of four models for training, validation, and testing are presented in Figures 3-6.
The performance indicators, the mean absolute percentage error (MAPE) and the root mean square error (RMSE), for the testing samples were proposed [34].These two indicators are given by the following equations: where V () is the predicted free flow speed of bicycle for the th testing sample;   () is the observed free flow speed for the th testing sample;  is the number of testing samples.
The correlation coefficient ( 2 ), MAPEs, and RMSEs of four models are listed in Table 3, and the observed and predicted free flow speeds are illustrated in Figure 7. From the figure and the table, we have the following findings:  (1) It is seen that all four BPANN models predict free flow speed with less errors, and the absolute speed differences are less than 2 kph.The results indicate that these models are all excellent in predicting the free flow speed.Model 1 including minimum input variables also performs well in predicting the free flow speed.
(2) It can be also found that Model 2 and Model 3 have higher accuracies than Model 1.It is evident that the inclusions of bicycle types and characteristics of cyclists greatly improves the performance of Model 2 and Model 3 compared to Model 1. Different from motorized vehicles, characteristics of cyclists can be observed and analysed.Model 3 shows that using the  input variables of cyclists' characteristics produces a slightly higher rate of accuracy compared to Model 1.
(3) Comparing Model 2 and Model 3, it can be found that the performance of Model 2 is better than that of Model 3.This implies that bicycle type has more significant effect on bicycles' free flow speed and crash risk than characteristics of cyclists.Due to the higher speed of electric bicycles, the free flow speed and bicycle crash risk have significant correlation on the percentage of electric bicycles.Therefore, the management and speed limit for electric bicycles are very important to improve the safety of bicycle path.
(4) Considering both input categories of bicycle types and characteristics of cyclists, the performance of Model 4 for testing dataset is the best.The MAPE and RMSE of testing data are 4.13% and 1.09 kph, respectively.This model provides us with the theoretical foundation for analysing the impact factors on the free flow speed and crash risk of bicycle traffic flow.

Conclusions
Free flow speed of bicycle traffic flow is a very important parameter for determining the speed limit of cycleway and evaluating the crash risk of bicycle traffic flow.The developed BPANN models in this paper are expected to be a useful and robust method to help traffic engineers improve the safety of bicycle traffic flow.Therefore, four different models with or without considering the impact factors (e.g., bicycle types and characteristics of cyclists) are used to predict the free flow speed and crash risk of heterogeneous bicycle traffic flow.
The BPANN models have been trained, validated, and tested using MATLAB software.As mentioned in results of testing datasets, the correlation coefficients ( 2 ) of four models by using adaptive learning have been obtained as 0.72, 0.85, 0.82, and 0.87, respectively, for expected outputs.The results imply that the proposed ANN methods have acceptable accuracies in predicting free flow speed of bicycles, and the considered bicycle types and characteristics of cyclists will effectively improve the accuracy of the prediction models.The study is limited to predicting the free flow speed only considering four categories factors.Other parameters such as percentage of passing, geometric features, and environmental features may be included for modelling in future work.

Figure 1 :
Figure 1: Structure and notations in a three-layer BPANN model.

Best validation performance is 2 Figure 2 :
Figure 2: Learning performance of four BPANN models.
free flow speed (kph)

Figure 7 :
Figure 7: Observed and predicted free flow speed for different models.

Table 1 :
[15]riptive statistics of model parameters.Shanghai were 18.2 kph and 13.0 kph for electric bicycles and classic bicycles, respectively; free flow speeds in Kunming were similar, at 17.9 kph for electric bicycles and 12.8 kph for classic bicycles.Lin et al.[15]found the free flow speeds for both electric bicycles and classic bicycles in Kunming were 21.86 kph and 14.81 kph, respectively.Similar results (21.86 kph for electric bicycles and 14.81 kph for classic bicycles) have also been found in Hangzhou by Jin et al.

Table 2 :
Selected input parameters for different ANN models.
2.3.Estimation of Free Flow Speed for Bicycle Traffic.The free flow speed of bicycle flow is the speed of bicycles under low volumes and low densities and is the most important parameter for cycleway capacity estimation, LOS, and speed limit.

Table 3 :
Prediction errors for different models.