Map Matching for Fixed Sensor Data Based on Utility Theory

Map matching can provide useful traffic information by aligning the observed trajectories of vehicles with the road network on a digital map. It has an essential role in many advanced intelligent traffic systems (ITSs). Unfortunately, almost all current map-matching approaches were developed for GPS trajectories generated by probe sensors mounted in a few vehicles and cannot deal with the trajectories of massive vehicle samples recorded by fixed sensors, such as camera detectors. In this paper, we propose a novel map-matching model termed Fixed-MM, which is designed specifically for fixed sensor data. Based on two key observations from real-world data, Fixed-MM considers (1) the utility of each path and (2) the travel time constraint to match the trajectories of fixed sensor data to a specific path. Meanwhile, with the laws derived from the distribution of GPS trajectories, a path generation algorithm was developed to search for candidates. The proposed Fixed-MM was examined with field-test data. The experimental results show that Fixed-MM outperforms two types of classical map-matching algorithms regarding accuracy and efficiency when fixed sensor data are used. The proposed Fixed-MM can identify 68.38% of the links correctly, even when the spatial gap between the sensor pair is increased to five kilometers. The average computation time spent by Fixed-MM on one point is only 0.067 s, and we argue that the proposed method can be used online for many real-time ITS applications.


Introduction
Map matching is the process of correctly identifying the path on which a vehicle is travelling [1]. It provides a promising opportunity to upgrade the service level of various intelligent traffic system (ITS) applications [2][3][4]. However, the current map-matching algorithms are generally designed for satellite-based GPS points that are provided by probe sensors mounted on probe vehicles. ese probe vehicles provide spatial traffic information and direct measurements of travel time to monitor the traffic conditions in a citywide road network.
However, probe sensor data have limitations. e cost of purchasing GPS units and transferring data can severely limit the scale of probe samples. Only a biased estimation of the traffic information can be obtained because the probe data are usually collected from one type of vehicle, such as taxis. Additionally, a probe sensor system imposes an enormous computational burden on the system administration owing to high polling frequency and positional noise [5].
Fixed sensor data show the potential to overcome the issues existing in the probe sensor data. Fixed sensors, such as cameras, loops, and microwaves, are widely used in urban traffic monitoring and management (with the development of ITS technology, camera sensors have been improved in terms of accuracy, cost, and ease of use. erefore, the fixed sensor data considered in this paper refer specifically to the observations collected through camera-based sensors). e transit information of every vehicle approaching the fixed sensor station is captured. Consequently, the movement patterns of almost all vehicles running on a road network with fixed sensors can be recorded. is provides opportunities to reduce the estimation bias in traffic information. e fixed sensor system may also improve the efficiency of the map-matching process with a reduced polling frequency and more accurate location record, even for a large-scale urban traffic system. Many map-matching methods have been developed, and their reviews can be found in [1,6]. Quddus et al. classified the methods into four categories: geometric, topological, probabilistic, and advanced. However, such approaches can only perform well with high-frequency GPS data and may become less effective for low-frequency trajectory data [6]. In recent years, two groups of methods, namely, HMMbased algorithms and ST-Matching algorithms, have been developed to deal with the sparsity issue of low-polling frequency trajectory data.
(i) HMM-based algorithms: Newson and Krumm [7] introduced a two-step map-matching algorithm based on a hidden Markov chain for a sparse GPS trajectory, called the HMM algorithm. First, this method finds a set of candidate links for each GPS point and defines a measurement probability to describe how the GPS point is aligned with each candidate link. en, it connects each pair of consecutive candidate links with the shortest path to generate the candidate graph. Next, a transition probability defines the likelihood of the tracking vehicle moving along each candidate path. Finally, the best matching path sequence is identified using the Viterbi algorithm. e experimental results show that even with sampling intervals of 30 s, the accuracy of this algorithm is barely degraded. However, it has high computational complexity and becomes slow when working with long trajectories and extended search radii. Mohamed et al. [8] employed three filters (i.e., speed, direction, and α-trimmed mean filters) to reduce the candidate sets for improving the efficiency of the map-matching process. Koller et al. [9] proposed a fast-HMM algorithm that replaces the Viterbi algorithm with the bidirectional Dijkstra to determine the optimal map-matching solution. is algorithm can avoid up to 45% of the costly routing operations without negatively affecting the map-matching result. Han et al. [10] partitioned road networks into approximate segments and then indexed the approximate segments into an optimised packed R tree to improve the road-network search duration. It has also been argued that mobility in a road network is non-Markovian. Jagadeesh and Srikanthan [11] complemented the HMM algorithms with the concept of drivers' route choice. e results show that this improves matching accuracy further, especially at high levels of noise. (ii) ST-Matching algorithms: Lou et al. [12] introduced a map-matching algorithm for low-polling frequency GPS trajectories based on both spatial and temporal analysis, called ST-Matching. It modelled temporal analysis using speed and travel time data to improve its accuracy. e experimental results show that ST-Matching is more robust to the decrease in sampling rate than the map-matching algorithm using only spatial information, indicating that temporal constraints are indeed useful in map matching with sparse trajectory data. Considering that this method cannot handle the matching error well at junctions, Hsueh and Chen [13] introduced directional analysis to ST-Matching, called STD-Matching. It employs real-time directional motion with the directional analysis function to reflect the influence of a user's true movement over the GPS trajectories. e experimental results demonstrate that the STD-Matching algorithm significantly improves the matching accuracy. Liu et al. [14] proposed a spatial and temporal conditional random field mapmatching method called the ST-CRF algorithm. e ST-CRF model considers both spatial and temporal accessibility between two GPS points, in addition to consistency in the direction of travel. A series of experiments showed that the ST-CRF method has better performance and robustness and solves the "label-bias" problem in the HMM algorithm.
e above-mentioned map-matching algorithms are mainly designed for low-frequency probe sensor data, such as GPS trajectories. ey may become less effective for fixed sensor data because the fixed sensor data differ from probe sensor data in at least two aspects: (a) e fixed sensor data are much sparser than the probe sensor data. As shown in Figure 1(a), the distance between consecutive points recorded by fixed sensors is usually dozens of times that recorded by the probe sensor. Hence, there are too many possible paths to be matched between neighbouring fixed sensors. If only the shortest path length is considered (as in the current map-matching algorithms developed for probe sensors), the realistic paths may not be adequately evaluated.
(b) e positions provided by the fixed sensors are fixed and accurate, while the probe sensors move along with the probe vehicle and generate GPS points with random errors [15]. Figure 1(b) presents a microscopic view of the trajectories between the fixed sensors 20200906 and 10203801. One easily finds that the fixed sensor data (green points) are located accurately on the road links, and the probe sensor data (red points) are always positioned several meters away from the true path.
In this study, we developed a map-matching algorithm designed specifically for fixed sensor data, called Fixed-MM. For this purpose, the conventional map-matching models for probe sensor data are abbreviated as Probe-MM. e contributions of Fixed-MM can be summarised as follows: (a) It combines both route choice preferences and temporal constraints to identify the true path of the fixed sensor data. e experimental results show that the proposed method significantly improves the matching accuracy. (b) Fixed-MM developed a candidate-path generation algorithm to search for a realistic path by relaxing the assumption that the location of each point is noisy. In this manner, the time-consuming candidate-path generation process can be conducted separately and in parallel, and average computation time of the matching process for a point is reduced to 0.067 s. e remainder of this paper is organised as follows. e problem definition and overview of the framework are presented in the Preliminaries section. en, the Fixed-MM algorithm and candidate-path set generation algorithm are proposed in the Methodology section.
e Experiment section details the process and presents the experimental results. Finally, we conclude the paper in the last section. { } connected in a graph format. Each road link, l, is a directed edge with two terminal points, a length (l.len), a level (l.lev) (e.g., an expressway, a primary road, or a secondary road), a direction (l.di) (e.g., one-way or bidirectional), and the number of lanes (l.lan).

Definition 2.
Path: path P is represented by a sequence of connected road links, P: l1, l2,. . ., lx,. . ., lX, in an RN. Definition 3. Fixed sensor trajectory: a fixed sensor trajectory, Tr, is a sequence of time-ordered points, Tr: Definition 4. Sensor pair: a sensor pair is two neighbouring points in a Tr, namely, is the original fixed sensor point and F id (j+1) is the destination fixed sensor point.
Definition 5. Candidate path set: the candidate path set, Φ j , consists of all paths with a nonzero probability of matching between a given sensor pair (F id (j) , F id (j+1) ), while all unrealistic paths have a probability of zero. Now the problem of Fixed-MM is defined as follows.
Problem 1. Given a fixed sensor trajectory Tr and a road network RN, for each sensor pair (F id (j) , F id (j+1) ) in Tr, find a path Pi from Φ j with the highest probability of being a matched path.

2.2.
Framework. e framework of Fixed-MM is illustrated in Figure 2. ree types of datasets, including fixed sensor data, probe sensor data, and road network data, are used as inputs.
e trajectory of the fixed sensor data is first decomposed into separate sensor pairs. e probe sensor data are also matched with a specific path based on the Probe-MM algorithm. Meanwhile, a candidate path generation algorithm is used to search for possible paths for each sensor pair. en, the matching probability for each candidate path is calculated, and the matching results can be attained by finding the candidate path with the highest matching probability.

Characteristics of the Data.
e key to Fixed-MM is finding the most likely path to connect the sensor pair. In this section, we provide two key observations of the true trajectories that lead to the proposed approach.  e drivers prefer to travel along the path with high utility. Example 1. Consider path A, path B, and path C visualised in Figure 3(a) with their attributes summarised in Table 1. Sixty-eight percent of the samples travel path A, while only 32% of the samples travel along the other two. us, it is reasonable to infer that drivers prefer to choose paths with less travel time, fewer intersections, and more high-level road links, which indicates that the higher the utility of the path, the more attractive the path is to the driver.

Observation 2.
e observed travel time tends to be close to the expected travel time of the true path.
Example 2. Based on the Prob-MM algorithm, the GPS trajectories can be matched to three paths. e histograms of the observed travel times for the three paths are calculated in Figure 3(b). It is easily found that the histograms fit well to the normal distribution, which means that a path's observed travel time tends to be close to its expected travel time (average travel time). If the observed travel time of a sample is 18 min, we may infer that this trip is very likely to be matched with path C.
Based on the above observations, we propose a novel map-matching algorithm for fixed sensor data, namely, Fixed-MM that incorporates both (1) the utility of each route and (2) the travel time constraint to identify the path with the highest probabilities from the candidate path set as the matched path. Details of the utility model, travel time constraint, and candidate path set generation algorithm are described in the following subsections.

Utility
Model. Similar to the route choice model, the travel behaviour preference reflected in Observation 1 is modelled with utility theory. It assumes that the driver's preference for a path is captured by a value called utility, and the driver selects the path in the candidate set with the highest utility [16].
Let U i,j be the utility of the ith path P i,j belonging to the candidate set Φ j of the sensor pair: (F id (j) , F id (j+1) ). It consists of a deterministic term V i,j and a random term ε i,j such that (1) e random term ε i,j is assumed to be independent and identically distributed (i.i.d.) as a Gumbel distribution. e deterministic term is assumed to have a linear relationship with path attributes, such that where x FTT i,j , x NSL i,j , and x PE i,j are vectors of the observed path attributes and β FTT , β NSL , and β PE are vectors of coefficients that represent drivers' preferences on path attributes. e descriptions of the path attributes are presented in Table 2.
Based on the above definitions of path utility, the matching probability of a candidate path P i,j is given by [16] Pr Equation (3) can also be transformed as    Journal of Advanced Transportation It is easy to find that the larger the difference between the utility V i,j and the other V i′,j s, the higher the matching possibility, Pr(P i,j ). is means that the candidate path with higher utility is more likely to be matched, which corresponds to the rule reflected in Observation 1.

Temporal Constraint.
To consider Observation 2, the temporal constraint between the observed travel time and expected travel time of a candidate path must be modelled. eir definitions are as follows. e observed travel time t j,n is the time spent by the nth sample when travelling between sensor pairs (F id (j) , F id (j+1) ) and can be obtained by calculating the difference between the transit timestamps recorded by F id (j) and F id (j+1) : e expected travel time t i,j is the average travel time of the candidate path, P i,j , where P i,j ∈ Φ j . is can be calculated based on probe sensor data: where t x,n is the travel time spent by the n th sample on road link l x , and N x is the total number of probe vehicles traversing road link l x . e temporal constraint can be calculated based on the deviation t j,n − t i,j between the observed t j,n and the expected travel times, t i,j . is is attributed to a combination of the natural variation in travel times and the error in the travel time estimate. e deviations of the three sample paths are shown in Figures 4-6 in Appendix A, respectively. e travel time varies significantly on different paths depending on the time of day, and all the histograms of t j,n − t i,j during the morning peak fit well to the normal distribution. erefore, we can assume that the deviations have a Gaussian distribution t j,n − t i,j ∼ N(μ s , σ s ). μ s and σ s are the mean and variance of t j,n − t i,j for the candidate path P i,j , during period s. en, the temporal constraint q(t j,n − t i,j ) can be defined as e denominator aims at normalizing the temporal constraint to one.
We added the temporal constraint as a correction term for the utility function. en, the matching probability can be rewritten as where α is a scale parameter. e correct term α ln q(t j,n − t i,j ) in equation (8) describes the likelihood of compliance between the observed t j,n and expected travel time t i,j . When t j,n − t i,j is smaller (the observed travel time is closer to the expected travel time), q(t j,n − t i,j ) becomes larger.

Journal of Advanced Transportation
According to equation (4), the matching probability increases P i,j . is is also in line with Observation 2 in the previous section.

Generating Candidate Path Set.
Finding all possible paths that connect each sensor pair as candidates is another key step for Fixed-MM. e candidate path set is usually large, with a long distance between the paired sensors, and a dense urban road network. In addition, preferential and realistic paths should be included because comparing a path to a set of highly unattractive and unrealistic candidates would not provide much useful information [17]. In this study, we develop a protocol for generating a realistic candidate path set based on the following observations: Observation 3. ere may be certain detours on the candidate paths. Figure 7(a) illustrates the GPS trajectories of 620 samples that travel between sensors F 20507301 and F 20507302 near the Bao'an International Airport in Shenzhen, China.

Example 3.
Based on the map-matching algorithm designed for the probe data, each GPS point was projected onto a specific link. e observed number of samples on each link is represented by different colours in Figure 7(b). Most (92%) of the samples have a large offset against the shortest path, and the departure platform of the airport was chosen as a destination on the way.
is indicates that there may be certain detours on these popular paths. ese circuitous paths may be considered as unattractive alternatives for route choice models. However, they are popular candidates in the context of map-matching algorithms.
Observation 4. Trajectories captured by a sensor pair will not pass the links monitored by other fixed sensors. Figure 7(a), the road link monitored by the fixed sensor F 20507403 has never been travelled by any vehicle captured by the sensor pair (F 20507301 , F 20507302 ). e reason for this phenomenon is that if a vehicle has travelled on the link where F 20507403 located, the pass information will be recorded, and then the sensor pair  (F 20507301 , F 20507302 ) will be decomposed into two sensor pairs, namely, (F 20507301 , F 20507403 ) and (F 20507403 , F 20507302 ).

Example 4. As shown in
In this paper, we believe that historical GPS trajectories contain useful information about the composition of popular candidates.
us, the candidate path does not necessarily conform to behavioural assumptions but must be realistic; we use a biased random walk algorithm, which was first proposed by [17] to generate the candidate set. It draws a candidate path through a succession of random turns. e pseudocode of the candidate set generation algorithm is presented in Algorithm 1. e key to this algorithm is how the probability of turning is defined. In contrast to the original random walk algorithm, we set the turning probability of the links where other fixed sensors are located at 0 to satisfy the rule contained in Observation 4. In other situations, the turning probability is calculated based on field-test probe sensor data rather than the shortest path assumption. In this manner, the candidate path with the destination described in Observation 3 can be generated. Based on the above analysis, the turning probability is defined as Pr l x , l y � 0, l y ∈ Φ FS and l y ≠ l s , l e , N xy where Φ FS is the set of links monitored by the fixed sensors, l s is the start link where the origin fixed sensor is located, l e is the end link where the destination fixed sensor is located, N xy is the number of GPS trajectories traversing from link l x to l y , and Φ x is the set of outgoing links that connect the sink link l x .

Experimental Dataset.
To examine the proposed Fixed-MM algorithm, both fixed and probe sensor data were used with the basic digital road network.
Road Network: the shapefile of the road network in Shenzhen, China, was used [18]. e network graph contained 237,440 vertices and 215,771 road links. As shown in Figure 8, the road network covers a 40 × 50 km spatial area, with a total length of 21,985 km. Fixed sensor dataset: A fixed sensor dataset generated by 715 cameras in Shenzhen from September 1 to October 31, 2016, was used. e transit information of vehicles was recorded, including license plate, timestamp, and detector ID. Probe sensor dataset: we used a GPS trajectory dataset generated by 14,230 taxicabs during the same time range (from September 1 to October 31, 2016) as a probe sensor dataset. e GPS records include license plates, timestamps, and coordinates. e average sampling rate was set at 15 seconds per point.
With identical license plate information, we can extract the probe and fixed sensor data of the same taxicab as observed samples to train and test our model.
In the implementation, we removed noncontinuous driving trips. e main reason is that this noncontinuous driving part of the sample trips contains great uncertainty and will increase the estimation error of the Fixed-MM. Finally, 1,485,476 samples were extracted as a training dataset for estimation, while 156,192 samples were used as the testing dataset for evaluation. e estimation and evaluation of the Fixed-MM are introduced in the following sections.

Model Estimation.
e coefficients of the Fixed-MM reflect the matching results' sensitivity to the variables. e values of the unknown parameters based on the training dataset must be identified. In this study, we consider the most widely used estimation procedure: the maximum likelihood technique [19].
Given the high number of sensor pairs, it is impossible to present detailed estimation results for each pair. erefore, we only provide the detailed estimation results of the example sensor pair: (F 2010002 , F 1010403 ). e GPS trajectories of the samples between this sensor pair are shown in Figure 9 in Appendix B. e candidate path set generated by the algorithm proposed in this paper is illustrated in Figure 10.
Both the Fixed-MM model without temporal constraints (defined by equation (3)) and the Fixed-MM with temporal constraints (defined by equation (8)) are estimated. e estimation results of the two models are presented in Table 3, and several findings can be obtained.
Finding 1: as expected, the estimated parameter of "free travel time" and "number of signal lights" has a Input: e road network RN and the link pair (l s , l e ), where l s and l e are the links where the origin fixed sensor F id (j) and destination fixed sensor F id (j+1) are located. Output: e candidate set Φ j for sensor pair (F id (j) , F id (j+1) ). Initialization Set the candidate set: Φ j � ∅ Set the size of the candidate set: DN Turning Probability For l x in road network RN: Calculate the turning probability Pr(l x , l y ) based on equation (9). Random Walk While n < DN do l x � l s P � [l s ] While l y ≠ l e do Randomly select a next link l y based on the turning probability Pr(l x , l y ) Update the generated path: P.append(l y ) Update the current link: l x � l y End while n+ � 1 Update the candidate set: Φ j � Φ j ∩ P End while ALGORITHM 1: Candidate set generation algorithm.   Figure 10: (a-p) Generated candidate paths between the example sensor pair: (F 2010002 , F 10100403 ).
negative sign and the "proportion of expressway" has a positive sign in each case. e negative sign and t-statistic of β NSL and β FTT suggest that the freer travel time and signal lights the path has, the less likely it is to be matched. e positive sign and t-statistic of β PE imply that a path with a higher proportion of expressways will be more attractive to travellers. Finding 2: the temporal constraint parameter, α, is very large, which means that the correct term has a significant effect on the matching results. Finding 3: when the temporal constraint term, α ln q(t j,n − t i,j ), was considered, the Fixed-MM model with temporal constraints had a much lower loglikelihood. us, we can infer that it has a better model fit and is closer to the true model.

Model Evaluation.
In this section, we describe our algorithm on the testing dataset. Two classical Probe-MM algorithms are used as benchmarks, details of which are introduced as follows: HMM algorithm [7]: given that the positions of the fixed sensors are located without noise, the measurement probability is set to 1 and only the transmission probability is considered ST-Matching algorithm [12]: similar to the HMMbased algorithm, the observation probability in the spatial analysis of this method was set to 1 because of the accurate positions of the fixed sensors In this study, two indexes for expressing matching accuracy were used. One is the accuracy length ratio of paths (ALRP) index, defined as follows: where l x · len is the length of link l x in the matched path, P i,j P true · len is the total length of the true path, and δ x � 1 if l x is also in the true path, and otherwise is 0. e other index is the accuracy number ratio of paths (ANRP) index, which is defined as where N x is the total number of links in the true path P true . Figures 11(a) and 11(b) show the ALRR and ANRR of the proposed Fixed-MM algorithm and two classical Probe-MM algorithms with regard to the spatial gap between fixed sensors. It can be seen clearly that our Fixed-MM outperforms both HMM and ST-Matching significantly. Meanwhile, the performance of two Probe-MM algorithms degrades sharply when the spatial gap decreases while Fixed-MM is more robust to the change of spatial gap. e proposed Fixed-MM can correctly identify 68.38% of the links, even when the spatial gap between the sensor pair increases to 5 km.
Because the candidate generation process and model training process can be conducted separately and in parallel, a comparison of the latency of the matching process may be more meaningful for online applications. In this study, the computation time for one point (ACTOP) was used to measure the computational latency of the map-matching algorithm.
As shown in Figure 12, the ACTOP of the two Probe-MM approaches increases dramatically as the spatial gap between the fixed sensors increases. Conversely, the ACTOP of Fixed-MM increases slowly. e main reason, therefore, can be deduced from two factors. e HMM and ST-MM algorithms assume that the position of the sensor is stochastic and noisy, and the candidate set must be regenerated for every sensor pair. It involves several shortest path computations between states at the previous and current time steps, which consumes most of the computation time. Conversely, the candidate set generation of the proposed method can be run in parallel and does not increase the computation time because the projection of the fixed sensor data is known and fixed. In

Conclusions
In this paper, we proposed a new map-matching algorithm called Fixed-MM to match vehicle trajectories recorded by fixed sensors onto a digital map. First, utility theory was employed to model the traveller's behaviour preference. Second, Fixed-MM was modified by adding a travel-time constraint term based on the observed and expected travel times. Moreover, a candidate path generation algorithm was designed for Fixed-MM. Fixed sensor data and probe sensor data were collected as the experimental dataset. Both the Fixed-MM without a temporal constraint and Fixed-MM with a temporal constraint were estimated. e statistical results of the estimated parameters prove that the path attributes correlate significantly to the true path, and the Fixed-MM with the temporal constraint having a better model fit. e Fixed-MM algorithm was also compared with two classical Probe-MM algorithms in terms of matching accuracy and computational efficiency. Fixed-MM outperforms the two Probe-MM algorithms in both number (ANRR) and length (ALRR) accuracy indexes. Meanwhile, the Fixed-MM is more robust to changes in the spatial gap between fixed sensors. Fixed-MM also has a huge improvement in computing efficiency and exhibits potential for online applications.
e experimental results demonstrate that the proposed Fixed-MM algorithm is both effective and efficient.
More research is needed in the future to determine the potential application value of Fixed-MM. Although the travel time and speed can also be estimated by the Probe- MM algorithm with probe sensor data, the Fixed-MM provides a more diverse and credible estimation of travel time and speed. is is because the fixed sensor data covers almost all types of vehicles using the road network, while the probe sensor data can only be collected from one type of vehicle, for example, taxicabs. Meanwhile, with the application of Fixed-MM, more traffic information can be mined from the fixed sensor data. If all the observed trips of every fixed sensor can be matched to the road network, the traffic volumes of each path or link can be estimated, which is the key input value for traffic planning and management. us, our next research focus is to utilise the Fixed-MM to mine more reliable and accurate traffic state information from fixed sensor data. Moreover, since the fare gate in the AFC system is fixed, applying the proposed map-matching algorithm to learn the route choice behavior of subway passengers [20,21] also presents great practical application values and is worthy of further study.