Estimation of Hourly Traffic Flows from Floating Car Data for Vehicle Emission Estimation

In order to model air quality in heavy pollution days, a dynamic emission monitoring system is implemented in the Beijing road network, which requires the input of hourly traffic flows. Floating car data (FCD) is increasingly employed for flow estimation based on the fundamental diagrams to supplement data provided by stationary detectors. However, existing studies often used a typical fundamental diagram without considering the hysteresis phenomena and the uncertainty of traffic flow estimation. +is study aims to develop a multiperiod fundamental diagram for the traffic flow estimation from FCD considering the hysteresis phenomena. +e result shows that the proposed multiperiod fundamental diagram can improve the accuracy of flow estimation. +e uncertainty of traffic flow estimation at both 10 minutes and 1 hour is also quantified, and the result indicates that the variation of the estimation uncertainty at 1 hour is lower than that at 10 minutes, with an average 7% reduction of the range of 95% confidence interval (CI). But there is no significant difference in magnitudes of the estimation uncertainty at 1 hour compared with that at 10 minutes. Moreover, the uncertainty for congested flows is lower than that for free flows. In the case study, the proposed model is employed to develop the spatial and temporal distributions of flows and emissions for the metropolitan area in Beijing.


Introduction
Heavy air pollution days have occurred frequently in some megacities of China in the past decades [1,2]. In order to model air quality and cope with heavy pollution days, one of the important tasks is monitoring emissions of major pollution sources, including industrial, agricultural, domestic, and mobile pollution. And emissions of these pollution sources should be monitored at a uniform interval, according to the Bulletin on the Second National Census of Pollution Sources released by China's Ministry of Ecology and the Environment [3,4].
Emissions of stationary pollution sources, such as industrial, agricultural, and domestic pollution sources, can be dynamically monitored at 1 hour on a small scale. However, it is not practically easy to conduct dynamic monitoring of mobile pollution sources on the road network, which requires the hourly traffic flow and travel speed of each road link [5][6][7].
And it is difficult to collect reliable traffic flows for the whole road network, which has become one of the problems to be solved in most cities to carry out dynamic traffic emission estimation. is is primarily attributed to the insufficient coverage of traffic detectors on the road network [8]. Conventional traffic monitoring systems use stationary detectors, such as inductive loops, traffic cameras, radio frequency identification, and remote traffic microwave sensor (RTMS) to collect traffic flows [9,10]. e stationary detectors remain quite expensive to install and maintain. erefore, these technologies are too costly to cover the traffic flows of metropolitan areas completely at high temporal-spatial resolution.
With the development of mobile traffic monitoring technology, the real-time speed data collection of the largescale road network has been achieved, covering various types of urban roads, such as expressways, major arterial roads, and minor arterial roads. For example, the floating car system in cities such as Beijing [11,12], Shanghai, Guangzhou, and Hangzhou, as well as the navigation software of Internet companies such as AutoNavi and Baidu, can provide real-time speed data with high coverage in a simple and cost-efficient way. In this context, floating car data (FCD), also referred to as probe vehicle data, is widely employed to provide traffic information as a supplement to traditional traffic monitoring techniques because of its low cost and high coverage [13,14]. e FCD consists of position, direction, and speed information of probe vehicles equipped with on-board Global Positioning System (GPS). e data is transmitted to a central processing unit that processes data for all probe vehicles to calculate average speeds of road links [11,15].
In this context, for the road links without traffic detectors, deriving traffic flows from the FCD speed based on the macroscope fundamental diagram has important practical significance for the dynamic estimation of vehicle emissions.
However, the accuracy and uncertainty of this method has been questioned in practical applications, which has a great impact on the results of vehicle emission estimation. At a specific speed, the observed flow tends to fluctuate around the estimated flow in a range. e uncertainty of the flow estimation can be illustrated as the difference between the estimated flow and the observed flow [16]. Besides, the estimated traffic flow is often aggregated to 1 hour or even larger time intervals. erefore, the object of this study is to address the issue: how much uncertainty is the hourly flow estimation from the FCD for vehicle emission estimation?

Literature Review
FCD or probe vehicle data is obtained from GPS-equipped probe vehicles, and it can be provided by private companies. Many cities such as Beijing, Shanghai, and Berlin have established taxi-FCD systems, which have the advantages of the long operation time, wide coverage, and centralized management [17]. Meanwhile, private companies such as AutoNavi, Baidu, INRIX, and TomTom also collect linkbased speed and travel time data from private vehicles. Gühnemann et al. [9] reported that taxis could be used as cheap probe vehicles for an extensive FCD base without additional expenses for on-board equipment of vehicles or software at the server site of the taxi headquarters, and their utilization and coverage of cities were much higher than private vehicles. Llorca et al. [18] reported that collecting commercial FCD from private cars arose privacy issues due to possible liability for speed limit violations. us, FCD providers typically choose not to report speeds more than posted speed limits to protect their customers, resulting in the speed truncation at the posted speed limits [15]. Sun et al. [17] studied a data fusion framework based on multisource FCD from the taxi, Baidu, and bus. e method described in this paper utilizes GPS data from the taxi dispatching system of taxi companies. As the penetration rate of probe vehicles has increased and FCD has become more and more available, the proposed method in this paper can also be applied.
FCD has been widely used in traffic analysis, such as travel time estimation [19,20], traffic state estimation [21], and congestion detection [22]. However, the quality of FCD and its true potential to be used in traffic analysis have been concerned about for a long time, as the FCD only monitors a limited sample of the total traffic [23]. Some studies discussed the penetration of probe vehicles required for FCD use. Sunderrajan et al. [24] concluded that a minimum of 5%-10% probe vehicles was needed to achieve reliable traffic state estimating, depending on the prevailing traffic conditions. Houbraken et al. [25] examined the potential of FCD to be used for real-time traffic management, which was collected by a county-wide FCD system in the Netherland covering 6-8% of all vehicles. e results showed that FCD presented a valuable alternative for loop data in dynamic traffic management systems. e research of Kerner et al. [26] showed that high-quality reconstruction of the actual travel times in the road network could be achieved when the penetration rate of probe vehicles was 1.5% among all vehicles. Talebpour et al. [27] investigated the fundamental diagrams under different penetration rates of connected vehicles.
Some studies attempt to investigate the potential of FCD use by comparing commercial FCD speeds with ground truth (GT) speeds. Altintasi et al. [15] evaluated the quality of FCD speeds provided by Be-Mobile in Turkey, which was compared with video-based GT speeds collected on a specific urban arterial road segment. e results showed that there was a nonlinear relation with a high correlation coefficient of 0.82. However, Hu et al. [28] reported that the magnitude of correlation varied with GT speeds for the arterials, ranging from −0.236 to 0.360, and the highest correlation was observed when the speed was between 40 and 45 mph. Chase et al. [29] reported that the speed differences were normally distributed with a mean speed difference greater than 5 mph, and the nonsystematic differences were found to be caused by the time lag in recovery from slow speeds, which had an impact on the speed-flow relationship. Kim et al. [30] evaluated INRIX speeds over a two-month period, including recurrent and nonrecurrent events. ey found that the reported INRIX speeds tended to log the loop detector measurements by almost 6 min, and the effective sampling periods were between 3 and 5 min. Based on these findings, FCD speed data needs to be filtered and quality controlled before use to reduce nonsystematic error.
As the FCD data can provide the travel speed of the road link, it is widely used in fundamental diagram derivation. Bussion et al. [31] studied the fundamental diagram for Toulouse based on the flow and occupancy data of the loop detectors and found that the fundamental diagram was affected by the uniformity of the detector distribution, road types, and road congestion. Ambühl et al. [32] developed a methodology for estimating fundamental diagrams by fusing FCD and loop detector data. e results showed that the fusion algorithm using two data sources simultaneously could reduce the fundamental diagram estimation error to 0.04 with an estimated 25% of all drivers generating FCD. Ji et al. [33,34] determined a fundamental diagram by combining loop detector data and GPS data from probe vehicles. ey found that data from 30% of the links could also generate a fundamental diagram, similar to the one derived from data on the whole network.
ere have been many studies that attempt to estimate traffic flows from the FCD typically based on the fundamental diagrams [9,24,[35][36][37]. Gühnemann et al. [9] derived traffic flows from FCD speed based on the relationship between the flow and space mean speed on the Berlin networks. Sunderrajan et al. [24] estimated the traffic state constituting average speed, density, and flow using FCD. ey also investigated the impact of probe penetration and the sampling period on the estimation accuracy. Anuar et al. [35] estimated traffic flows based on the FCD speed by utilizing four fundamental diagrams including Greenshield, Underwood, Northwestern, and Van Aerde. It was concluded that Van Aerde's fundamental diagram provided the least error compared to the others. Neumann et al. [36] proposed a more detailed representation of the fundamental diagram based on stochastic Bayesian networks for estimating hourly traffic flows from probe vehicle data.
As for the uncertainty of traffic flow estimation, most of the existing studies concentrated on the statistical characteristics of the estimation errors. Anuar et al. [35] investigated the estimation errors from different speed-flow fundamental diagrams and FCD aggregation intervals (5 minutes, 10 minutes, and 15 minutes). e result showed that with increasing aggregation intervals of FCD, traffic flows could be more accurately estimated [35]. Moreover, the estimation of traffic flows during congested traffic conditions was found to be more accurate than that during free-flow conditions [35]. Mulligan et al. [37] analyzed the uncertainty of the traffic flow estimation using the movingobserved method, which was calculated using Monte Carlo simulations for each combination of controlled variables. Deng et al. [38] constructed several linear equations that mapped the traffic measurements as a function of cumulative vehicle counts on both ends of a traffic segment based on multiple data sources. en, they quantified the uncertainty and value of estimation of traffic state by the mean and variance of estimated cumulative vehicle counts. Davis et al. [39] employed empirical Bayes methods to compute quantiles of the predictive probability distribution of the traffic total at a highway site based on a sample of daily traffic volumes from that site, to quantify the uncertainty of traffic totals estimation. Guo et al. [40] predicted short-term flow rate and quantified the uncertainty as kickoff percentage and width to flow ratio.
In summary, the speed-flow fundamental diagrams are widely used to estimate traffic flows from the FCD. A similar concept is also applied in this study. However, existing work has one limitation since they often employed typical fundamental diagrams without considering the spatial and temporal variations of traffic state. Besides, the distribution characteristics of the uncertainty of hourly traffic flow estimation for emission estimation have not been investigated thoroughly in the existing studies.
In this study, an improved fundamental diagram considering the spatial and temporal variations of traffic state between the traffic flow and the travel speed is developed for traffic flow estimation from the FCD. e uncertainty of the flow estimation is defined, and the impact of aggregating the FCD and traffic flow data to the 1-hour interval is investigated, by employing bootstrapping to generate the distribution of the estimation uncertainty.
us, the distribution characteristics of the hourly traffic flow estimation uncertainty from the FCD for vehicle emission estimation are quantified.

Materials and Methods
To overcome the limitations of existing studies, the general methodology used in this research includes four steps: (1) e FCD speed and the corresponding traffic flow from RTMS are prepared, both of which are matched geographically at 10 minutes (2) Based on the field data, a multiperiod fundamental diagram between the FCD speed and the traffic flow is developed with improved accuracy, compared with the typical Van Aerde model (3) e uncertainty of the flow estimation is defined and quantified. e effect of aggregation intervals on the uncertainty at different speed bins is also analyzed in this paper (4) As a case study, traffic flows that have been extracted from the FCD are further exploited for developing a dynamic emission model in Beijing

Data Sources and Preparation.
In this study, two field data types are collected, including the FCD and RTMS data.
e FCD only provides the travel speed without a direct measurement of traffic flows. In this case, a data fusion between FCD and traffic flow data from other sources, e.g., RTMS, can provide the needed information. e FCD is collected by more than 60,000 taxis equipped with GPS devices in Beijing, which provides the travel speed at 5 minutes for road links. And the FCD system has been proven to be an available source of travel speed data in the previous study, with a penetration rate of 2.5% among all vehicles in Beijing [41][42][43]. e corresponding RTMS data is collected by 44 detectors on the ring expressways in Beijing, which provides reliable traffic flow data at 2 minutes [41]. In order to ensure the data quality of each detector, a data preparation method is designed, and the outlier data records were deleted. More detailed information about the data quality control and the potential of the FCD used in this paper has been illustrated in previous studies, which are investigated by comparing FCD speeds with RTMS speeds [11,41]. en, the FCD data of road links and RTMS data is matched geographically to ensure the temporal-spatial matching of the two data sources [41]. In this study, 8

Estimation of Traffic Flows Based on FCD Speed.
To estimate traffic flows based on the relationship between the travel speed and the traffic flow for the whole network, it is necessary to develop the macroscopic fundamental diagram, which should be categorized into road types, speed limits, and other factors of traffic flow characteristics [44]. erefore, the macroscopic fundamental diagram of the second ring expressway in Beijing is developed considering these influencing factors.
Besides, according to the Beijing traffic management policy, the truck can be allowed on the Beijing second ring only from 23:00 to 6:00, which makes the composition of the traffic flows during this period different from that during other time periods. Meanwhile, the traffic state during this period is in a state of free flow, and the speed is nearly constant and independent of the travel density. As a result, it is not reliable to estimate traffic flows from speed between 23:00 and 6:00; thus, this study only covers the period from 6:00 to 23:00 of workdays.

Existence of Hysteresis Phenomena on Fundamental Diagram.
is paper analyzes the speed from FCD and corresponding flows from RTMS on the second ring expressway in Beijing, which entails a limiting speed of 80 km/ h and six lanes in two directions. Figure 1(a) shows the temporal variation characteristics of average flow and average speed. It is observed that the average flow remains at a high level between 7:00 and 23:00, and the temporal variation in the average speed is more significant all day. From 6:00 to 7:00, the traffic flow is at the onset of the congestion. From 7:00 to 18:00, the traffic flow is in a state of congestion with high flow and low speed. At 18: 00, the traffic congestion reaches a peak at the minimum speed. After 18:00, the traffic flow is at the offset of congestion and reaches another peak at 21:00.
Note that the values of flows can be different although at the same speed during the day. For example, at the speed of 40 km/h, the values of flows are different for the onset and offset of the congestion [45]. To make it more explicit, this study plots the flow-density pairs for the whole network as shown in Figure 1(b). It is observed that the flow-density curves follow clockwise hysteresis loops, and the values of the maximum flow are different at the day and night peaks.
is phenomenon indicates that it is necessary to develop a multiperiod fundamental diagram that can describe the onset and offset of congestion separately. us, the traffic flow can be estimated more accurately, compared with typical flow estimation without considering this phenomenon.
Based on the analysis of the phenomenon of the hysteresis loops [45][46][47][48][49], the shape and characteristics of the fundamental diagrams are different in these two time periods. In this case, a multiperiod fundamental diagram is developed for different time periods (from 6:00 to 18:00 and from 18:00 to 23:00) by averaging the data of all the detectors.

Development of Speed-Flow Fundamental Diagram.
To derive traffic flows from the travel speed, it is necessary to investigate the relationship between the two variables. As discussed earlier, it is necessary to divide the data into two time periods according to the shape and characteristics of fundamental diagrams. However, in practice, the speed-flow relationship is typically developed without considering the difference between different time periods. In order to determine the accuracy improved by the multiperiod fundamental diagram between the speed and the flow, the all-day speed-flow relationship is also developed for comparison purposes.
In the subsequent analysis, the RTMS data at 10 minutes is converted into the number of vehicles per hour per lane. By employing the Van Aerde single-regime model, four essential parameters including free-flow speed (FS), the speed at capacity (SC), traffic capacity (CAP), and jam density (JD) for the road network are calibrated by the regression [41,50,51]. Besides, the absolute percentage error (APE) is used as a performance indicator to reflect the deviation between the estimated flows and the ground truth measurements, as shown in where Q v is the observed flow at the FCD speed v and q v is the estimated flow derived from FCD speed v.

Uncertainty Analysis of Flow Estimation with Bootstrap.
In the subsequent analysis, the uncertainty of the flow estimation, also referred to simply as the estimation uncertainty, is defined as the relative differences between the observed and the estimated flow. e uncertainty of traffic flow estimation can be illustrated as in where U v is the uncertainty of flow estimation at stated (α), usually at 5%. To investigate the distribution of estimation uncertainty for different aggregation intervals and speed bins, large samples of the relative differences between the observed and the estimated flows are required. In this study, bootstrapping is taken as a Monte Carlo style simulation method to obtain a 95% CI of the estimation uncertainty [52]. e 95% CI of the estimation uncertainty means that at the same speed, upon repeating to observe the traffic flow a hundred times, only 5% samples of the observed flow are expected to deviate from the estimated flow by more than U v . e uncertainty of traffic flow estimation is quantified using the following steps. First, the FCD speed data is divided into bins at 5 km/h interval, such as [5,10), [10,15), . . ., [60, 65), and bootstrap data sets are developed for each 4 Journal of Advanced Transportation speed bin, containing the speed and flow data of two aggregation intervals including 10 minutes and 1 hour. Second, the relative difference between the observed and the estimated flow of different aggregation intervals is calculated for each iteration. ird, the distribution characteristic of the estimation uncertainty is investigated, including the averages and 95% CI. e lower and upper bounds of the 95% CI are obtained by the 2.5% and 97.5% percentiles of the uncertainties of all iterations [52]. For each speed bin, to generate bootstrap data sets at 10 minutes, the original data pairs of the speed and flow data at 10 minutes of size N are taken as the data pool. Data pairs of size N are randomly sampled from the data pool with a replacement for each time.
e routine is repeated 1,000 times. For each speed bin, to generate bootstrap data sets at 1 hour, the aggregated speed and flow data of size M are taken as the data pool. Data pairs of size M are randomly sampled from the aggregated data pool with a replacement for each time. e routine is repeated 1,000 times.

Comparison of Fundamental Diagram for Different Times of the Day.
e flow-speed fundamental diagrams of different times for the second ring road expressways in Beijing are shown in Figures 2(a) and 2(b), and the parameters of speed-flow fundamental diagrams for different times of the day are shown in Table 1. It can be observed that the regression curves can fit the speed-flow relationship well. However, when the speed is around the speed at capacity, observed flows are not always close to the theoretical capacity, and this separation of data points is caused by special characteristics of the ring road expressways in Beijing, which has been observed in the previous study [46]. And compared with the multiperiod model, this dispersion of data points is more significant in the all-day model.
In this paper, the estimated flows based on these different fundamental diagrams are compared with the observed flows from RTMS, as illustrated in Figures 3(a) and 3(b). It can be observed that the estimated errors of the multiperiod model range from −300 to 300, while the estimated errors of the all-day model range from −600 to 600. Moreover, Figure 3(c) shows the average and deviation of APE for the different models of fundamental diagrams. According to Figure 3(c), the average APE (corresponding with the symbol "×") of the multiperiod model has decreased to 11%, and the value of the all-day model is 13%. Besides, the interquartile range and maximum of APE of the multiperiod model are reduced by 3% and 7%, which suggests that the variation of the estimated errors is reduced. From Figure 3(c), a key point that can be inferred is that a better performance appears in using the multiperiod speed-flow fundamental diagram.

Uncertainty of Traffic Flow Estimation.
For each speed bin, the estimation uncertainty is calculated at different aggregation intervals, including 10 minutes and 1 hour. Figure 4 illustrates the distribution of the estimation uncertainty from data sets generated by bootstrapping. Figure 4 shows that the variation of the estimation uncertainty is reduced as the time aggregation increases from 10 minutes to 1 hour. According to equation (2), the central tendency of the estimation uncertainty at 1 hour can be explained as the offset of positive and negative estimated errors at 10 minutes, resulting in less randomicity in the process of aggregating the traffic conditions from 10 minutes to 1 hour. is rule is universal in different speed bins and especially significant in both high-and low-speed bins of ≥60 km/h and <15 km/h. Moreover, the variation of the estimation uncertainty is different across the speed bins. As seen in Figure 4, the estimation uncertainty presents a smaller size of the boxes and whiskers in the speed bins of 30-50 km/h, which suggests a smaller variation of estimation uncertainty. As Figure 2(a) shows, the data points are clustered around the fitting curve in the speed bins of around the speed at capacity (around 40 km/h). e traffic state in these speed bins is during the congested state. Besides, the variation of the estimation uncertainty of the two aggregation intervals is significantly high in the speed bins of ≥ 60 km/h. e traffic state in these speed bins is in the state of free flow, and the vehicle activities are much more complicated than those in other speed bins. Combined with Figure 2(a), the observed speed fits a wide range of flows in these speed bins.
To quantify the impact of time aggregation intervals for different speed bins on the estimation uncertainty, the averages and 95% CI of the uncertainty are obtained, which are presented in Figure 5 and Table 2.
According to Figure 5 and Table 2, key findings regarding the characteristics of the estimation uncertainty are summarized as follows.
(1) ere is no significant difference in the average of the estimation uncertainty with an increase in the aggregation level of the FCD and flow data.
(2) For both of the two aggregation intervals, the average of the estimation uncertainty is relatively smaller during the transition from low-to high-speed bins, compared with that in both high-and low-speed bins of ≥60 km/h and <15 km/h. (3) In terms of variation of the estimation uncertainty, the range of 95% CI at 1 hour is smaller than 2% for every speed bin, while the range of 95% CI at 10 minutes is higher than 12% in both high-and lowspeed bins of ≥60 km/h and <15 km/h. (4) e variation of the estimation uncertainty at 1 hour is relatively lower compared with that at 10 minutes for each speed bin, with a 7% reduction of the range of 95% CI. is reduction of 95% CI is especially significant in the speed bins of ≥60 km/h and <15 km/h, with 22% and 12% respectively.

Case Study
e general purpose is to estimate hourly vehicle emissions from transportation networks in Beijing. erefore, traffic flows are estimated from FCD speeds based on the speedflow fundamental diagrams.
A case study is conducted for the metropolitan area in Beijing. e traffic flow estimation and the dynamic vehicle emission estimation are developed based on FCD on 20 June 2016. Subsequently, the speed-flow fundamental diagrams of the Beijing road network are developed, which are categorized by road types, speed limits, and other traffic conditions to derive traffic flow of road links. e result of the hourly traffic flows from 8:00 to 9:00 is depicted in Figure 6(a).
FCD speed and the estimated traffic flows are further used to develop a dynamic vehicle emission model. is model is linked to a Geographic Information System (GIS) for visualization and further analysis of spatial effects [9]. e vehicle emissions are estimated as illustrated in the previous studies [53,54]. Figure 6(b) depicts the spatial distribution of CO emissions from 8:00 to 9:00, which covers the metropolitan area of Beijing. Compared with conventional emission models, the dynamic emission model combined with the temporal resolution of FCD speeds can effectively identify emission hotspots at the high spatial and temporal resolution.

Conclusions
Hourly traffic flow estimation from FCD is vital for dynamic estimation of vehicle emissions on the road network, but the uncertainty of hourly flow estimation using a fundamental diagram is often questioned.
To improve the accuracy of traffic flow estimation, this study investigates the existence of hysteresis phenomena on the fundamental diagrams and proposes a multiperiod speed-flow fundamental diagram. Further, the uncertainty of traffic flow estimation using the improved model is quantified, and the impact of aggregating the FCD and traffic flow data to the 1-hour interval is investigated at different speed bins. e main conclusions can be summarized as follows: (1) e multiperiod speed-flow fundamental diagram considering the hysteresis phenomena can improve the accuracy of the flow estimation effectively.
(2) ere is no significant difference in magnitudes of the estimation uncertainty at 1 hour compared with that at 10 minutes.
(3) e variation of the estimation uncertainty is reduced in the process of aggregating the time intervals from 10 minutes to 1 hour, with an average 7% reduction of the range of 95% CI. is reduction of 95% CI is especially significant in the speed bins of ≥60 km/h and <15 km/h, with 22% and 12% respectively. It is demonstrated that hourly traffic flow estimation presents a better performance compared with that at 10 minutes. (4) Besides, the traffic flow estimation performs better during congested states (with a higher emission rate) compared with that at the free-flow state (with a lower emission rate). e average and variation of the estimation uncertainty are relatively smaller in the speed bins around the speed at capacity (40 km/h), compared with those in both high-and low-speed bins of ≥60 km/ h and <15 km/h. It should be noted that this research has limitations. First, more data fusion should be collected to validate the accuracy of the proposed method. Second, the study investigates the uncertainty on a ring expressway in Beijing. An uncertainty analysis covering other road types is recommended.
Data Availability e matched FCD and RTMS data used to support the findings of this study are from the cited literature [41].

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Journal of Advanced Transportation 9