DDNS Discrete Dynamics in Nature and Society 1607-887X 1026-0226 Hindawi Publishing Corporation 192470 10.1155/2014/192470 192470 Research Article A Bayesian Network Model for Origin-Destination Matrices Estimation Using Prior and Some Observed Link Flows Cheng Lin http://orcid.org/0000-0002-6138-5130 Zhu Senlai http://orcid.org/0000-0002-6230-1955 Chu Zhaoming Cheng Jingxu Wang Wuhong School of Transportation Southeast University Nanjing 210096 China seu.edu.cn 2014 1342014 2014 01 11 2013 11 03 2014 13 4 2014 2014 Copyright © 2014 Lin Cheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper presents a Bayesian network model for estimating origin-destination matrices. Most existing Bayesian methods adopt prior OD matrixes, which are always troublesome to be obtained. Since transportation systems normally have stored large amounts of historical link flows, a Bayesian network model using these prior link flows is proposed. Based on some observed link flows, the estimation results are updated. Under normal distribution assumption, the proposed Bayesian network model considers the level of total traffic flow, the variability of link flows, and the violation of the traffic flow conservation law. Both the point estimation and the corresponding probability intervals can be provided by this model. To solve the Bayesian network model, a specific procedure which can avoid matrix inversion is proposed. Finally, a numerical example is given to illustrate the proposed Bayesian network method. The results show that the proposed method has a high accuracy and practical applicability.

1. Introduction

Information about the traffic demand, which commonly means the origin-destination (OD) matrices, has been traditionally used by transportation planning agencies to evaluate the impact of various strategic transportation plans. And the real-time OD matrices are essential for real-time traffic applications, especially in the intelligent transportation system (ITS), such as the real-time route guidance via a dynamic traffic assignment or the evaluation of various ITS deployment alternatives [1, 2].

Various methods have been proposed to estimate OD matrices by using aggregate data such as OD demand counts and/or a set of traffic counts observed on the links. Using information derived from traffic counts is very attractive because they are cheap, easy, and immediate data. However, based on these data, we cannot obtain a unique OD matrix because the number of OD pairs is much larger than the number of links in large-scale transportation networks and there are infinite solutions satisfying the conservation law.

In order to have a unique solution which must be close to the actual one, one has to give more information. Normally, people use a prior OD matrix which can be obtained by many different methods, such as an old out-of-date or subjectively guessed OD matrix. These methods for estimating OD matrices can be classified as ( 1 ) least squares  and generalized least squares  methods, ( 2 ) entropy or information based methods [13, 14], and ( 3 ) statistical based methods.

Providing variability information of the traffic flow estimation is the most important advantage of the statistical methods. Normally, other methods give only the particular values of the OD and link flows, while statistical methods could also provide the corresponding probability intervals. The statistical methods can be categorized as follows. ( 1 ) Classical methods : the traffic flows are assumed multivariate random variables given some parametric families, such as Poisson, Gamma, and multivariate normal. Then, the problem reduces to estimating the parameters and becomes a standard statistical problem. ( 2 ) Bayesian methods : these methods also consider parametric families of distributions, but the parameters are considered as random variables themselves. Particularly, among the Bayesian methods, using Bayesian network  can easily know the relationship of all the variables (link flows and OD flows) and then simplify the calculation.

Whether using prior information (historical information or experience) or not is the main difference between Bayesian methods and classical statistical methods. In the Bayesian methods, based on some prior information, the prior distribution of some parameters or variables can be determined. Then by updating the sample information (observed information), we can derive the posterior distribution, which is the fundamental inferential tool of the Bayesian methods.

Generally, the quality of the prior information can affect the accuracy of the estimation when using the Bayesian methods. The prior information used by almost all existing Bayesian methods for estimating OD matrices is a prior OD matrix. However, it is difficult to guarantee the accuracy of the prior OD matrix, which is outdated or subjectively guessed. Moreover, it is even impossible to get a prior OD matrix in some cases, especially in a newly developed city.

In reality, there are usually large amounts of historical link flow data stored in the cities’ transportation system data base. Compared with a prior OD matrix, prior (historical) link flows are more accurate as they were obtained by traffic detectors or manual investigation. Therefore, in this paper, in order to estimate OD matrices, we propose a Bayesian network (BN) method using prior link flows and a set of new observed link flows. Based on these prior link flows, we can derive the prior distribution of link flows and OD flows. Then, by updating a set of observed link flows, we can modify the means and reduce the variances of the remaining variables. Using these updated means and variances, we can obtain the posterior distribution of all the variables. Based on the posterior distribution, both the point estimation and the corresponding probability intervals can be provided.

Note that the level of total traffic flow varies randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.) [20, 24]. So the proposed BN model also considers the level of total traffic flow, which is very useful for many real-time traffic applications. In addition, the BN model also considers the variability of link flows and the violation of the conservation law.

The rest of the paper is organized as follows. Section 2 briefly introduces Bayesian network and Gaussian Bayesian networks. In Section 3, the proposed BN model for estimating OD matrices and its main assumptions are described. In Section 4, using the Bayesian network model, a specific procedure for estimating OD matrices is proposed. In Section 5, a numerical example is provided to illustrate the proposed model and clarify some of its implementation details. Finally, some conclusions are provided in Section 6.

2. Bayesian Network and Gaussian Bayesian Network

In this section, we briefly review the Bayesian network and Gaussian Bayesian network, which are the basic tools of this paper.

Definition 1 (Bayesian network).

A Bayesian network is a pair ( G , P ) , where G is a directed acyclic graph (DAG) defined on a set of nodes X , P = { p ( x 1 π 1 ) , , p ( x n π n ) } is a set of n conditional probability densities (CPDs), and π i is the set of parents of node X i in G . The set P defines the associated joint probability density (JPD) as (1) p ( X ) = i = 1 n p ( x i π i ) .

The graph G contains all the qualitative information about the relationships among the variables. As a supplement, the probabilities in P quantify the qualitative information in graph G .

In Bayesian networks, the factorization of JPD implied by (1) is normally very simple and the conditional independence relations among variables can be inferred directly from the graph G , which makes the evidence propagation easy. Due to these advantages, Bayesian network models have been used widely to solve a large variety of practical problems [25, 26].

Bayesian networks can be applied to many distributions. For the sake of illustration, we consider the important and particular case of Gaussian Bayesian networks, in which the traffic flows distribution is supposed to be a normal distribution. A normal distribution for traffic flows is reasonable, because these random variables are the sum of a great number of independent Bernoulli experiments in which the users decide where to travel and which routes to choose. In the literature, Gaussian Bayesian networks have been used frequently [24, 27].

Definition 2 (Gaussian Bayesian network).

A Bayesian network ( G , P ) is said to be a Gaussian Bayesian network if and only if the joint probability distribution (JPD) associated with its variables X is a multivariate normal distribution, N ( μ , Σ ) , that is, with joint probability density function: (2) f ( X ) = ( 2 π ) - n / 2 | Σ | - 1 / 2 exp { - 1 2 ( X - μ ) T Σ - 1 ( X - μ ) } , where μ is the mean vector, Σ is the n × n covariance matrix, | Σ | is the determinant of Σ , and μ T is the transpose of μ .

The JPD of the variables in a Gaussian Bayesian network can be specified as in (1) by the product of a set of CPDs, whose joint probability density function is (3) f ( x i π i ) ~ N ( μ i + j = 1 i - 1 β i j ( x j - μ j ) , ψ i 2 ) , where β i j is the regression coefficient of X j in the regression of X i on its parents π i .

And the conditional variance of X j is (4) ψ i 2 = Σ i - Σ i π i Σ π i - 1 Σ i π i T , where Σ i is the unconditional variance of X i , Σ i π i is the covariance matrix between X i and the variables in π i , and Σ π i is the covariance matrix of π i .

3. Proposed Bayesian Network Model

Since Bayesian network has so many advantages as introduced in Section 2, in this section we propose a Bayesian network (BN) model to reproduce the probabilistic structure of link and OD flows.

3.1. Model Assumptions

Assuming we have some prior (historical) link flows, in order to give the prior distribution of the link flows, we make the following assumptions.

Assumption 3.

The link flows are given by (5) V = K U + η .

Assumption 4.

The variable U is a normal random variable with mean μ U and variance-covariance matrix σ U 2 , where U is a normal random variable and measures the level of total mean flow. It reflects that traffic flows vary randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.). K is a vector, whose elements measure the relative weights of link flows with respect to the total traffic flow; η is a vector of independent normal random variables with zero mean; and η a measures the discrepancy of the flow of link a with respect to its mean.

Note that traffic flows vary randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.) [20, 24]. Assumption 3 can take this into account. The distribution of U varies with the situation. Based on prior link flows and considering the similar situation, the distribution of U and the initial vector K are determined. Then we can easily derive the prior distribution of link flows, which will be shown later. Assumption 4 is a normal assumption, which is also adopted in Maher , Hazelton , Castillo et al. , and so forth.

To give the prior distribution of OD flows, we first consider the well-known conservation law equation: (6) V a = i ( k T i p i k δ a k i ) = i T i ( k p i k δ a k i ) , where T i and V a are the flows of OD pair i and link a , respectively. δ a k i is the incidence element; that is, it takes value 1 if link a belongs to route k of OD pair i and 0, otherwise. p i k is the proportion of users from OD pair i choosing route k . In this paper, the route choice proportions are defined by a logit model as follows: (7) p i k = exp ( - θ c k i ) l R i exp ( - θ c l i ) , where θ is a parameter measuring travelers’ sensitivity to the cost difference between routes; c k i is the cost associated with route k of OD pair i .

Equation (6) can be written as (8) V a = i ( k p i k δ a k i ) T i .

Set d a i = k p i k δ a k i to represent the proportions of users from OD pair i choosing link a . Then (8) can be rewritten in the form of matrix as (9) V = D T .

Matrix D is not necessarily reversible because it is not necessarily a square. So we do the following conversion of (9): (10) D T V = D T D T .

If matrix D T D is of full rank, it is reversible. Then, (10) can be written as (11) T = ( D T D ) - 1 D T V .

Set β = ( D T D ) - 1 D T and according to (11), we make the following assumption.

Assumption 5.

The OD flows are given by (12) T = β V + ε , where ε = ( ε 1 , ε 2 , , ε n ) are mutually independent normal random variables with mean E ( ε i ) and variance ψ i 2 . The variables ε represent OD flows apart from those using links of the considered network. Setting all the variables of ε to be null or evaluating their values, the conservation law equation can be satisfied.

3.2. The Complete Model

Based on Definitions 1 and 2, in order to complete our BN model, we need to define an associated graph G . For example, consider the simple network shown in the left of Figure 1, which has 2 nodes, 2 links, and 1 OD pair 1-2. The right of Figure 1 shows the associated Bayesian network. The link flow node v a has as parents the corresponding node U and η a . The OD flow node t i has as parents the corresponding node v a and ε i .

Bayesian network model.

Then according to Assumptions 3 and 4, we get the variance-covariance matrix of V : (13) Σ V = ( K I ) Σ ( U , η ) ( K T I ) = σ U 2 K K T + D η , where Σ ( U , η ) and D η are diagonal matrixes. D η is the variance-covariance matrix of η .

Based on Assumption 5, we get (14) ( V T ) = ( I 0 β I ) ( V ε ) .

Then, the mean E [ ( V , T ) ] is (15) E [ ( V , T ) ] = ( E ( U ) K E ( U ) β K + E ( ε ) ) .

And the variance-covariance matrix of ( V , T ) is (16) Σ ( V , T ) = ( Σ V Σ V β T β Σ V β Σ V β T + D ε ) , where D ε is the variance matrix of ε .

In summary, all random variables involved in our model are related by the linear expression: (17) ( U η ε V T ) = ( 1 0 0 0 I 0 0 0 I K I 0 β K β I ) ( U η ε ) .

The mean E [ ( U , η , ε , V , T ) ] is (18) ( E ( U ) E ( η ) E ( ε ) E ( V ) E ( T ) ) = ( E ( U ) 0 E ( ε ) E ( U ) K E ( U ) β K + E ( ε ) ) .

And the variance-covariance matrix Σ ( U , η , ε , V , T ) is (19) Σ = ( σ U 2 0 0 σ U 2 K T    σ U 2 K T β T 0 D η 0 D η D η β T 0 0 D ε 0 D ε σ U 2 K D η 0 σ U 2 K K T + D η ( σ U 2 K K T + D η ) β T σ U 2 β K β D η D ε β ( σ U 2 K K T + D η ) ( σ U 2 K K T + D η ) β T + D ε ) .

Then, the prior distribution (joint probability density function) of all the variables can be given as (20) f ( V 1 , V 2 , V m , T 1 , T 2 , , T n ) = f N ( μ V , Σ V ) ( V 1 , V 2 , V m ) i = 1 n p ( T i i = π i ) .

4. Estimating OD Matrices Using the Proposed BN Model

In this section, using the proposed BN model, we describe how to estimate the OD matrices when some new observed link flows are available.

Since we have obtained the prior distribution of all the variables, we can use the following equations to update the mean and the covariance matrix of the variables [23, 24] based on some observed variables. Note that one only needs to consider the unobserved variables conditioned on the observed variables and then update the expected values and covariance of the remaining variables. These equations are (21) μ Y Z = z = μ Y + Σ Y Z Σ Z Z - 1 ( z - μ Z ) , (22) Σ Y Z = z = Σ Y Y - Σ Y Z Σ Z Z - 1 Σ Z Y , (23) μ Z Z = z = z , (24) Σ Z Z = z = 0 , where Y and Z are the sets of unobserved and observed variables, respectively; μ Y and Σ Y Y are the mean vector and covariance matrix of Y ; μ Z and Σ Z Z are the mean vector and covariance matrix of Z ; and Σ Y Z is the covariance matrix of Y and Z .

Given a set of evidential nodes Z whose values are known to be Z = z , by (21) and (22), we can derive the mean vector and covariance matrix of the unobserved nodes in Y . Thus, the conditional distribution of Y can be obtained. Equations (23) and (24) state that the expected values of the observed variables coincide with their observed values and their variances and covariances are null. In order to simplify the calculation, we can use an incremental method, that is, updating evidence from Z one by one. Thus, we do not need to calculate the matrix inverse operation, because the matrix degenerates to a scalar. In this case, Σ Y Z is a column vector and Σ Z Z is a scalar (i.e., Σ Z Z - 1 = 1 / σ Z Z ).

If we want to give the point estimation as well as the corresponding probability intervals, we can solve the following maximum posterior distribution problem to get the point estimation, whose results are normally the conditional means: (25) Maximize f N ( μ T , Σ T ) ( t 1 , t 2 , , t n ) a = 1 m p ( v a π a ) | Z , where Z is the set of the observed variables, including those observed link flows and/or OD flows.

In summary, the specific procedure for estimating OD matrices and those unobserved link flows is given as follows.

Step 0.

Initialize the model. According to Assumptions 3 and 4, based on prior (historical) link flows, we can determine the distribution of U and the initial matrix K . Then we can obtain the initial link flows V = ( V 1 , V 2 , , V m ) = K E ( U ) . Thus the initial route choice proportion p i k is calculated as follows: (26) h a ( V a ) = h a o [ 1 + α a ( V a C a ) β a ] , (27) c k i = a A h a ( V a ) δ a k i , (28) p i k = exp ( - θ c k i ) l R i exp ( - θ c l i ) , where (26) is the link cost function, where h a o is the cost associated with free flow conditions, C a is the link capacity, and α a and β a are constants defining how the cost increases with traffic flow; (27) is the route cost function; (28) calculates the route choice proportion defined in (7).

Step 1.

Solve the BN model. According to model assumptions, using the initial route choice proportion matrix P , we can get the prior distribution of traffic flows (prior means and variances) using the following formulas: (29) β = [ ( P δ ) T P δ ] - 1 ( P δ ) T , (30) E [ V ] = K E [ U ] , (31) E [ T ] = β E [ V ] + E [ ε ] , (32) D η = Diag ( v E [ V ] ) , (33) Σ V V = σ U 2 K K T + D η , (34) Σ V T = Σ V V β T , (35) Σ V T = Σ T V , (36) Σ T T = β Σ V V β T + D ε , where (29) is for calculating the regression coefficient matrix given in (12). Equations (30) and (31) are for calculating the means of V and T given in (17). Equation (32) defines the diagonal variance matrix of η ; that is, Var ( η a ) = ( E ( v a ) × ν ) 2 , where v is the coefficient of variation. Equations (33) to (36) define the variance-covariance matrix in (19).

Step 2.

Update the observed link flows, using the formulas (37) E [ Y Z = z ] = E [ Y ] + Σ Y Z Σ Z Z - 1 ( z - E [ Z ] ) , (38) Σ Y Z = z = Σ Y Y - Σ Y Z Σ Z Z - 1 Σ Z Y , (39) E [ Z Z = z ] = z , (40) Σ Z Z = z = 0 , (41) T = E [ Y Z = z ] | ( Y , Z ) = T , where (37) and (38) are for updating the means and variance-covariance matrix of the unobserved variables, where Y and Z refer to the unobserved and observed components of ( T , V ), respectively. Equations (39) and (40) state that the expected values of the observed variables are their observed values and their variances and covariances are zero, as given in (23) and (24). Equation (41) takes the conditional means as point estimation for the OD and link flows, as the results of the maximum posterior distribution problem given in (25).

Step 3.

Calculate the new route choice proportions. Since link flows are updated in Step 2, the route choice proportions also need to be updated. Given that the matrix T is obtained by (41), the new route choice proportion p i k * is calculated using the following expressions: (42) V a = i I T i k p i k δ a k i , (43) h a ( V a ) = h a o [ 1 + α a ( V a C a ) β a ] , (44) c k i = a A h a ( V a ) δ a k i , (45) p i k * = exp ( - θ c k i ) l R i exp ( - θ c l i ) , where (42) is the conservation law equation given in (6).

Step 4.

Test convergence. If Σ i , k ( p i k - p i k * ) 2 < ξ , where ξ is a small number to control convergence of the process, then stop the process and return the OD flow T i , the link flow V a , and the route choice proportion p i k * . Otherwise, continue with Step 5.

Step 5.

Update route choice proportions and the matrix K , using the expressions (46) p i k = ρ p i k * + ( 1 - ρ ) p i k i , k , (47) K = V E ( U ) and go to Step 1, where (46) is for updating the route choice proportion matrix, where ρ , 0 < ρ < 1 , is a relaxation factor; (47) is for updating the matrix K . The values of variables V are obtained by (42).

5. Example: The Nguyen-Dupuis Network

In this section, we illustrate the proposed methods using the well-known Nguyen-Dupuis network, shown in Figure 2. It consists of 13 nodes, 19 links, and 4 OD pairs: 1-2, 1-3, 4-2, and 4-3.

The Nguyen-Dupuis network.

The network data are shown in Table 1 and the associated parameters in (26) are assumed to be α a = 0.1 5 , β a = 4 for any link.

Network parameters of the Nguyen-Dupuis network.

Link h a o C a
1 7 900
2 8 700
3 9 700
4 14 900
5 5 800
6 9 600
7 5 900
8 13 500
9 5 300
10 9 400
11 10 700
12 10 700
13 9 600
14 8 700
15 9 700
16 8 700
17 7 300
18 15 700
19 11 700

The assumed true OD matrices, which are used later for testing the quality of the estimation, are shown in Table 4 under the heading “True flow.” The true link flows are obtained by solving the multinomial logit assign model with parameter θ = 1.0 for the stochastic loading.

The prior information is assumed as follows: the expected value of the level of total traffic flow U and its standard deviation are E ( U ) = 50 and σ ( U ) = 10 , respectively and the initial matrix K is (48) K = ( 1.5,0.9,0.9,0.7,1.7,0.8,1.8,0.1,0.8,1.0,1.5,0.3 , 1.2,0.3,0.5,0.8,0.1,0.7,1.2 ) T .

The observed link flows are assumed to be V 5 = 82.57 , V 7 = 87.38 , V 10 = 48.07 , V 13 = 58.66 , and V 18 = 37.12 . And it is supposed that they are known in this order. Since they are observed, their values are equal to the true link flows (as shown in Table 4).

Step 0-Step 1. Initialize and give the prior distribution of all the variables.

Based on the prior information, we can get the prior distribution of the traffic flows. The prior means and variances are shown in the second columns of Tables 2 and 3, respectively. To simplify the calculation, in this example, the expectation and variance-covariance of ε are assumed to be null (i.e., there is no uncertainty in the conservation law). In addition, to obtain the variance-covariance matrix D η , we have selected v = 0.1 in (32).

Point estimation of traffic flows after updating observed link flows one by one.

Item 0 5 7 10 13 18
T 1 37.16 39.02 39.05 39.86 40.11 39.88
T 2 82.88 77.95 78.32 78.63 80.51 81.22
T 3 68.37 59.64 59.94 59.23 60.85 59.95
T 4 12.68 22.48 22.29 23.28 19.54 19.06
V 1 75.00 74.64 74.92 75.26 76.98 77.55
V 2 45.00 42.36 42.48 43.28 43.69 43.61
V 3 45.00 46.30 46.32 45.73 46.22 46.03
V 4 35.00 35.82 35.92 36.78 34.14 32.80
V 5 85.00 82.57 82.57 82.57 82.57 82.57
V 6 40.00 38.37 38.62 38.37 40.69 41.17
V 7 90.00 87.24 87.38 87.38 87.38 87.38
V 8 5.00 1.43 1.45 1.55 1.51 1.54
V 9 40.00 39.38 39.40 39.31 39.31 39.31
V 1 0 50.00 47.86 47.92 48.07 48.07 48.07
V 1 1 75.00 75.64 75.73 76.24 76.58 76.43
V 1 2 15.00 15.14 15.42 15.44 16.20 15.43
V 1 3 60.00 59.05 59.12 59.71 58.66 58.66
V 1 4 15.00 16.57 16.87 16.99 17.71 16.84
V 1 5 25.00 23.06 23.31 22.89 24.42 23.57
V 1 6 40.00 41.37 41.49 42.22 41.34 41.34
V 1 7 5.00 6.10 6.16 6.29 6.40 6.51
V 1 8 35.00 36.26 36.32 36.40 37.29 37.12
V 1 9 60.00 59.05 59.12 59.71 58.66 58.66

Variances of traffic flows after updating observed link flows one by one.

Item 0 5 7 10 13 18
T 1 65.81 25.98 20.88 16.94 14.35 2.93
T 2 294.56 54.98 29.91 20.46 15.66 15.66
T 3 203.36 22.72 7.59 5.41 5.40 3.30
T 4 18.56 16.98 15.59 14.10 3.24 3.16
V 1 281.25 100.33 81.13 69.99 67.48 65.57
V 2 101.25 32.31 26.08 23.15 21.73 20.73
V 3 101.25 38.61 31.01 25.85 24.33 23.10
V 4 61.25 23.11 18.65 16.72 13.27 11.73
V 5 361.25 0 0 0 0 0
V 6 80.00 26.51 21.55 18.19 18.85 18.48
V 7 405.00 137.07 0 0 0 0
V 8 1.25 0.04 0.03 0.03 0.03 0.03
V 9 80.00 27.93 22.44 0 0 0
V 10 125.00 41.25 33.19 0 0 0
V 11 281.25 103.05 82.88 71.83 66.79 0
V 12 11.25 4.13 3.44 2.95 2.99 2.60
V 13 180.00 62.81 50.52 44.05 0 0
V 14 11.25 4.94 4.11 3.57 3.57 0
V 15 31.25 9.58 7.85 6.47 6.79 0
V 16 80.00 30.82 24.88 22.03 0 0
V 17 1.25 0.67 0.55 0.50 0.47 0.46
V 18 61.25 23.68 19.07 16.91 15.83 0
V 19 180.00 62.81 50.52 44.05 0 0

The true flow and the point estimation from the proposed method.

Item True flow Proposed method Relative error (%)
T 1 40 39.88 0.30
T 2 80 81.22 1.53
T 3 60 59.95 0.08
T 4 20 19.06 4.70
V 1 76.64 77.55 1.19
V 2 43.36 43.61 0.58
V 3 46.13 46.03 0.22
V 4 33.87 32.80 3.16
V 5 82.57 82.57 0
V 6 40.21 41.17 2.39
V 7 87.38 87.38 0
V 8 1.42 1.54 8.45
V 9 39.31 39.31 0
V 10 48.07 48.07 0
V 11 76.43 76.43 0
V 12 15.41 15.43 0.13
V 13 58.66 58.66 0
V 14 16.84 16.84 0
V 15 23.57 23.57 0
V 16 41.34 41.34 0
V 17 6.24 6.51 4.33
V 18 37.12 37.12 0
V 19 58.66 58.66 0

Step 2–Step 5. Give the posterior distribution by updating the observed link flows one by one.

Table 2 shows how the means of the traffic flows changed after updating the observed link flows one by one. After each update, the point estimation of link flows { V 1 , V 2 , , V 19 } and OD flows { T 1 , T 2 , , T 4 } is provided. It can be seen that once V 7 and V 10 become known and updated, the point estimation of V 9 in Table 2 remains unchanged and its variance in Table 3 becomes zero (boldfaced in the table). Because, due to the flow conservation in node 6, once V 7 and V 10 become known, V 9 becomes known. Similarly, V 19 becomes known once V 13 is given; V 16 becomes known once V 19 is given;    V 11 becomes known once V 9 and V 18 are given; V 15 becomes known once V 11 is given; V 14 becomes known once V 10 , V 15 , and V 16 are given. In other words, due to the conservation law, the observed link flows are known in this order: V 5 = 82.57 , V 7 = 87.38 , V 10 = 48.07 , V 9 = 39.31 , V 13 = 58.66 , V 19 = 58.66 , V 16 = 41.24 , V 18 = 37.12 , V 11 = 76.43 , V 15 = 23.57 , and V 14 = 16.84 .

Table 3 shows how the variances of the traffic flows changed after updating the observed link flows one by one. Note that after some link flows (including the observed link flows and those derived from the observed link flows and the conservation laws) are known, their means remain constant and their variances become null (boldfaced in the tables). And normally the variances of the unknown variables (OD flows and those unobserved link flows) decrease with each update. Note that the smaller the variances, the higher the precision of the estimation. So after a series of update, the estimation becomes more accurate. This derives a method to determine how many links and what links need to be observed when estimating traffic flows by the Bayesian network model, that is, the network sensor location problem (NLSP) . Note that the variances updating equation (22) has no relevance with the values of the observed link flows. So we can solve NLSP without observing any link. First, by the Bayesian network model, we can get the prior distribution of all the variables shown in (19). Next, we can take the link which can reduce the variance of the OD flows maximally by updating as the first observed link. Then we update the variances of traffic flows, determine the second observed link, and iterate until the variances decrease to meet the requirement of the estimation precision or until the budget exceeds the constraint.

By the updated means (point estimation) and variances, we can obtain the posterior distribution of OD flows and those unobserved link flows. Figure 3 illustrates how the marginal densities of OD flows and those unobserved link flows evolve from their initial form to their final form (boldfaced) by updating the observed link flows one by one. It can be seen that the variances of the unknown variables are normally decreasing with each update.

Conditional distributions of the OD flows and the unobserved link flows.

In summary, according to Tables 2 and 3 and Figure 3, using the proposed Bayesian network method, after some variables are observed, the means of these observed remain constant and their variances become null. For the remaining variables (those unobserved), their variances are normally decreasing with each update. The proposed method can provide a control of the conservation law as well. The final forms (boldfaced) in Figure 3 are the posterior densities of OD flows and those unobserved link flows. These posterior densities supply complete statistical information about the unknown variables. By these posterior densities, we can provide the point estimation as well as the corresponding probability intervals.

In order to test the quality of the estimation, Table 4 compares the true flow and the point estimation of the proposed BN model. Because V 5 , V 7 , V 9 , V 10 , V 11 , V 13 , V 14 , V 15 , V 16 , V 18 , and V 19 are observed, their values are equal to the true flow. And studying the OD pairs and those unobserved links (boldfaced in the table), it can be seen that the estimation and true flow are basically the same. The relative errors are all very small and the maximum relative error value of the OD flows estimation is only 4.70%. This illustrates that the proposed BN model has a high accuracy.

6. Conclusions

In this paper, we use a Bayesian network model to estimate origin-destination matrices based on prior link flows and a set of observed link flows. Normally, large amounts of historical link flows are stored in the cities’ transportation system. Compared with an outdated or subjectively guessed prior OD matrix, prior link flows are more accurate as they are obtained by traffic detectors or manual investigation. The proposed Bayesian network model can make use of these historical link flows and also consider the level of total traffic flow, which is really useful for many real-time traffic applications, especially in the ITS.

Using the Bayesian network model and updating the observed variables (including the observed link flows and those derived from the observed link flows and the conservation law) can modify the means and reduce the variances of the remaining variables. These updated means and variances allow us to obtain the posterior distribution of the unobserved variables based on those observed. Thus, the methods can provide not only point estimation but also the corresponding probability intervals. In addition, an incremental procedure is developed for solving the Bayesian network model without the intensive computation of matrix inversion, which can make this model apply easily in large-scale networks.

Moreover, in this paper, a normal distribution for traffic flows is assumed. It is reasonable because these random variables are the sum of a great number of independent Bernoulli experiments in which the users decide where to travel and which routes to choose. For future research, it is worthwhile to relax the normal distribution assumption.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (no. 51178110 and no. 51378119) and Graduate Innovation Project of Jiangsu Province (no. CXZZ12_0113). Comments provided by anonymous referees are much appreciated.

Ben-Akiva M. Bierlaire M. Bottom J. Koutsopoulos H. Mishalani R. Development of a route guidance generation system for real-time application Proceedings of the IFAC Transportation Systems Conference 1997 Beijing, China Mahmassani H. S. Dynamic network traffic assignment and simulation methodology for advanced system management applications Network and Spatial Economics 2001 1 3-4 267 292 10.1023/A:1012831808926 Bell M. G. The estimation of origin–destination matrix from traffic counts Transportation Science 1983 17 2 198 217 2-s2.0-0020750072 Cascetta E. Nguyen S. A unified framework for estimating or updating origin/destination matrices from traffic counts Transportation Research B: Methodological 1988 22 6 437 455 2-s2.0-0003029956 Yang H. Heuristic algorithms for the bilevel origin-destination matrix estimation problem Transportation Research B: Methodological 1995 29 4 231 242 2-s2.0-0006644968 Doblas J. Benitez F. G. An approach to estimating and updating origin-destination matrices based upon traffic counts preserving the prior structure of a survey matrix Transportation Research B: Methodological 2005 39 7 565 591 2-s2.0-14544273260 10.1016/j.trb.2004.06.006 Nie Y. Zhang H. M. A variational inequality formulation for inferring dynamic origin-destination travel demands Transportation Research B: Methodological 2008 42 7-8 635 662 2-s2.0-44649156373 10.1016/j.trb.2008.01.001 Nie Y. M. Zhang H. M. A relaxation approach for estimating origin-destination trip tables Networks and Spatial Economics 2010 10 1 147 172 2-s2.0-77149124922 10.1007/s11067-007-9059-y Shen W. Wynter L. A new one-level convex optimization approach for estimating origin-destination demand Transportation Research B: Methodological 2012 46 10 1535 1555 10.1016/j.trb.2012.07.005 Carey M. Revelli R. Constrained estimation of direct demand functions and trip matrices Transportation Science 1986 20 3 143 152 10.1287/trsc.20.3.143 Bell M. G. H. The estimation of origin-destination matrices by constrained generalised least squares Transportation Research B: Methodological 1991 25 1 13 22 2-s2.0-0000442923 Parry K. Hazelton M. L. Estimation of origin-destination matrices from link counts and sporadic routing data Transportation Research B: Methodological 2012 46 1 175 188 2-s2.0-80755184681 10.1016/j.trb.2011.09.009 van Zuylen H. J. Willumsen L. G. The most likely trip matrix estimated from traffic counts Transportation Research B: Methodological 1980 14 3 281 293 2-s2.0-0019229292 Xie C. Kockelman K. M. Waller S. T. A maximum entropy-least squares estimator for elastic origin-destination trip matrix estimation Transportation Research B: Methodological 2011 45 9 1465 1482 2-s2.0-83555176427 10.1016/j.trb.2011.05.018 Lo H. P. Zhang N. Lam W. H. K. Estimation of an origin-destination matrix with random link choice proportions: a statistical approach Transportation Research B: Methodological 1996 30 4 309 324 2-s2.0-0030304857 10.1016/0191-2615(95)00036-4 Vardi Y. Network tomography: estimating source-destination traffic intensities from link data Journal of the American Statistical Association 1996 91 433 365 377 2-s2.0-0001073106 Hazelton M. L. Estimation of origin-destination matrices from link flows on uncongested networks Transportation Research B: Methodological 2000 34 7 549 566 2-s2.0-0033915027 10.1016/S0191-2615(99)00037-5 Maher M. J. Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach Transportation Research B: Methodological 1983 17 6 435 447 2-s2.0-0020970404 Li B. Bayesian inference for origin-destination matrices of transport networks using the em algorithm Technometrics 2005 47 4 399 408 2-s2.0-27844539285 10.1198/004017005000000283 Hazelton M. L. Statistical inference for time varying origin-destination matrices Transportation Research B: Methodological 2008 42 6 542 552 2-s2.0-44149093609 10.1016/j.trb.2007.11.003 Hazelton M. L. Statistical inference for transit system origin-destination matrices Technometrics 2010 52 2 221 230 2-s2.0-77955339725 10.1198/TECH.2010.09021 Tebaldi C. West M. Bayesian inference on network traffic using link count data Journal of the American Statistical Association 1998 93 442 557 573 2-s2.0-0032369672 Sun S. Zhang C. Yu G. A Bayesian network approach to traffic flow forecasting IEEE Transactions on Intelligent Transportation Systems 2006 7 1 124 133 2-s2.0-33644989492 10.1109/TITS.2006.869623 Castillo E. Menéndez J. M. Sánchez-Cambronero S. Predicting traffic flow using Bayesian networks Transportation Research B: Methodological 2008 42 5 482 509 2-s2.0-41849106497 10.1016/j.trb.2007.10.003 Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference 1988 San Mateo, Calif, USA Morgan Kaufman Jensen F. V. Bayesian Networks and Decision Graphs 2001 Springer Shachter R. Kenley C. Gaussian influence diagrams Management Science 1989 35 5 527 550 10.1287/mnsc.35.5.527 Zhu S. L. Cheng L. Chu Z. M. Anthony C. Chen J. X. Identification of network sensor locations for traffic flow estimation accepted by Journal of the Transportation Research Board