This paper presents a Bayesian network model for estimating origin-destination matrices. Most existing Bayesian methods adopt prior OD matrixes, which are always troublesome to be obtained. Since transportation systems normally have stored large amounts of historical link flows, a Bayesian network model using these prior link flows is proposed. Based on some observed link flows, the estimation results are updated. Under normal distribution assumption, the proposed Bayesian network model considers the level of total traffic flow, the variability of link flows, and the violation of the traffic flow conservation law. Both the point estimation and the corresponding probability intervals can be provided by this model. To solve the Bayesian network model, a specific procedure which can avoid matrix inversion is proposed. Finally, a numerical example is given to illustrate the proposed Bayesian network method. The results show that the proposed method has a high accuracy and practical applicability.
1. Introduction
Information about the traffic demand, which commonly means the origin-destination (OD) matrices, has been traditionally used by transportation planning agencies to evaluate the impact of various strategic transportation plans. And the real-time OD matrices are essential for real-time traffic applications, especially in the intelligent transportation system (ITS), such as the real-time route guidance via a dynamic traffic assignment or the evaluation of various ITS deployment alternatives [1, 2].
Various methods have been proposed to estimate OD matrices by using aggregate data such as OD demand counts and/or a set of traffic counts observed on the links. Using information derived from traffic counts is very attractive because they are cheap, easy, and immediate data. However, based on these data, we cannot obtain a unique OD matrix because the number of OD pairs is much larger than the number of links in large-scale transportation networks and there are infinite solutions satisfying the conservation law.
In order to have a unique solution which must be close to the actual one, one has to give more information. Normally, people use a prior OD matrix which can be obtained by many different methods, such as an old out-of-date or subjectively guessed OD matrix. These methods for estimating OD matrices can be classified as (1) least squares [3–9] and generalized least squares [10–12] methods, (2) entropy or information based methods [13, 14], and (3) statistical based methods.
Providing variability information of the traffic flow estimation is the most important advantage of the statistical methods. Normally, other methods give only the particular values of the OD and link flows, while statistical methods could also provide the corresponding probability intervals. The statistical methods can be categorized as follows. (1) Classical methods [15–17]: the traffic flows are assumed multivariate random variables given some parametric families, such as Poisson, Gamma, and multivariate normal. Then, the problem reduces to estimating the parameters and becomes a standard statistical problem. (2) Bayesian methods [18–21]: these methods also consider parametric families of distributions, but the parameters are considered as random variables themselves. Particularly, among the Bayesian methods, using Bayesian network [22–24] can easily know the relationship of all the variables (link flows and OD flows) and then simplify the calculation.
Whether using prior information (historical information or experience) or not is the main difference between Bayesian methods and classical statistical methods. In the Bayesian methods, based on some prior information, the prior distribution of some parameters or variables can be determined. Then by updating the sample information (observed information), we can derive the posterior distribution, which is the fundamental inferential tool of the Bayesian methods.
Generally, the quality of the prior information can affect the accuracy of the estimation when using the Bayesian methods. The prior information used by almost all existing Bayesian methods for estimating OD matrices is a prior OD matrix. However, it is difficult to guarantee the accuracy of the prior OD matrix, which is outdated or subjectively guessed. Moreover, it is even impossible to get a prior OD matrix in some cases, especially in a newly developed city.
In reality, there are usually large amounts of historical link flow data stored in the cities’ transportation system data base. Compared with a prior OD matrix, prior (historical) link flows are more accurate as they were obtained by traffic detectors or manual investigation. Therefore, in this paper, in order to estimate OD matrices, we propose a Bayesian network (BN) method using prior link flows and a set of new observed link flows. Based on these prior link flows, we can derive the prior distribution of link flows and OD flows. Then, by updating a set of observed link flows, we can modify the means and reduce the variances of the remaining variables. Using these updated means and variances, we can obtain the posterior distribution of all the variables. Based on the posterior distribution, both the point estimation and the corresponding probability intervals can be provided.
Note that the level of total traffic flow varies randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.) [20, 24]. So the proposed BN model also considers the level of total traffic flow, which is very useful for many real-time traffic applications. In addition, the BN model also considers the variability of link flows and the violation of the conservation law.
The rest of the paper is organized as follows. Section 2 briefly introduces Bayesian network and Gaussian Bayesian networks. In Section 3, the proposed BN model for estimating OD matrices and its main assumptions are described. In Section 4, using the Bayesian network model, a specific procedure for estimating OD matrices is proposed. In Section 5, a numerical example is provided to illustrate the proposed model and clarify some of its implementation details. Finally, some conclusions are provided in Section 6.
2. Bayesian Network and Gaussian Bayesian Network
In this section, we briefly review the Bayesian network and Gaussian Bayesian network, which are the basic tools of this paper.
Definition 1 (Bayesian network).
A Bayesian network is a pair (G,P), where G is a directed acyclic graph (DAG) defined on a set of nodes X, P={p(x1∣π1),…,p(xn∣πn)} is a set of n conditional probability densities (CPDs), and πi is the set of parents of node Xi in G. The set P defines the associated joint probability density (JPD) as
(1)p(X)=∏i=1np(xi∣πi).
The graph G contains all the qualitative information about the relationships among the variables. As a supplement, the probabilities in P quantify the qualitative information in graph G.
In Bayesian networks, the factorization of JPD implied by (1) is normally very simple and the conditional independence relations among variables can be inferred directly from the graph G, which makes the evidence propagation easy. Due to these advantages, Bayesian network models have been used widely to solve a large variety of practical problems [25, 26].
Bayesian networks can be applied to many distributions. For the sake of illustration, we consider the important and particular case of Gaussian Bayesian networks, in which the traffic flows distribution is supposed to be a normal distribution. A normal distribution for traffic flows is reasonable, because these random variables are the sum of a great number of independent Bernoulli experiments in which the users decide where to travel and which routes to choose. In the literature, Gaussian Bayesian networks have been used frequently [24, 27].
Definition 2 (Gaussian Bayesian network).
A Bayesian network (G,P) is said to be a Gaussian Bayesian network if and only if the joint probability distribution (JPD) associated with its variables X is a multivariate normal distribution, N(μ,Σ), that is, with joint probability density function:
(2)f(X)=(2π)-n/2|Σ|-1/2exp{-12(X-μ)TΣ-1(X-μ)},
where μ is the mean vector, Σ is the n×n covariance matrix, |Σ| is the determinant of Σ, and μT is the transpose of μ.
The JPD of the variables in a Gaussian Bayesian network can be specified as in (1) by the product of a set of CPDs, whose joint probability density function is
(3)f(xi∣πi)~N(μi+∑j=1i-1βij(xj-μj),ψi2),
where βij is the regression coefficient of Xj in the regression of Xi on its parents πi.
And the conditional variance of Xj is
(4)ψi2=Σi-ΣiπiΣπi-1ΣiπiT,
where Σi is the unconditional variance of Xi, Σiπi is the covariance matrix between Xi and the variables in πi, and Σπi is the covariance matrix of πi.
3. Proposed Bayesian Network Model
Since Bayesian network has so many advantages as introduced in Section 2, in this section we propose a Bayesian network (BN) model to reproduce the probabilistic structure of link and OD flows.
3.1. Model Assumptions
Assuming we have some prior (historical) link flows, in order to give the prior distribution of the link flows, we make the following assumptions.
Assumption 3.
The link flows are given by
(5)V=KU+η.
Assumption 4.
The variable U is a normal random variable with mean μU and variance-covariance matrix σU2, where U is a normal random variable and measures the level of total mean flow. It reflects that traffic flows vary randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.). K is a vector, whose elements measure the relative weights of link flows with respect to the total traffic flow; η is a vector of independent normal random variables with zero mean; and ηa measures the discrepancy of the flow of link a with respect to its mean.
Note that traffic flows vary randomly and deterministically in similar situations (vacation, peak hour, special weather conditions, etc.) [20, 24]. Assumption 3 can take this into account. The distribution of U varies with the situation. Based on prior link flows and considering the similar situation, the distribution of U and the initial vector K are determined. Then we can easily derive the prior distribution of link flows, which will be shown later. Assumption 4 is a normal assumption, which is also adopted in Maher [18], Hazelton [20], Castillo et al. [24], and so forth.
To give the prior distribution of OD flows, we first consider the well-known conservation law equation:
(6)Va=∑i(∑kTipikδaki)=∑iTi(∑kpikδaki),
where Ti and Va are the flows of OD pair i and link a, respectively. δaki is the incidence element; that is, it takes value 1 if link a belongs to route k of OD pair i and 0, otherwise. pik is the proportion of users from OD pair i choosing route k. In this paper, the route choice proportions are defined by a logit model as follows:
(7)pik=exp(-θcki)∑l∈Riexp(-θcli),
where θ is a parameter measuring travelers’ sensitivity to the cost difference between routes; cki is the cost associated with route k of OD pair i.
Equation (6) can be written as
(8)Va=∑i(∑kpikδaki)Ti.
Set dai=∑kpikδaki to represent the proportions of users from OD pair i choosing link a. Then (8) can be rewritten in the form of matrix as
(9)V=DT.
Matrix D is not necessarily reversible because it is not necessarily a square. So we do the following conversion of (9):
(10)DTV=DTDT.
If matrix DTD is of full rank, it is reversible. Then, (10) can be written as
(11)T=(DTD)-1DTV.
Set β=(DTD)-1DT and according to (11), we make the following assumption.
Assumption 5.
The OD flows are given by
(12)T=βV+ε,
where ε=(ε1,ε2,…,εn) are mutually independent normal random variables with mean E(εi) and variance ψi2. The variables ε represent OD flows apart from those using links of the considered network. Setting all the variables of ε to be null or evaluating their values, the conservation law equation can be satisfied.
3.2. The Complete Model
Based on Definitions 1 and 2, in order to complete our BN model, we need to define an associated graph G. For example, consider the simple network shown in the left of Figure 1, which has 2 nodes, 2 links, and 1 OD pair 1-2. The right of Figure 1 shows the associated Bayesian network. The link flow node va has as parents the corresponding node U and ηa. The OD flow node ti has as parents the corresponding node va and εi.
Bayesian network model.
Then according to Assumptions 3 and 4, we get the variance-covariance matrix of V:
(13)ΣV=(KI)Σ(U,η)(KTI)=σU2KKT+Dη,
where Σ(U,η) and Dη are diagonal matrixes. Dη is the variance-covariance matrix of η.
Based on Assumption 5, we get
(14)(VT)=(I0βI)(Vε).
Then, the mean E[(V,T)] is
(15)E[(V,T)]=(E(U)KE(U)βK+E(ε)).
And the variance-covariance matrix of (V,T) is
(16)Σ(V,T)=(ΣVΣVβTβΣVβΣVβT+Dε),
where Dε is the variance matrix of ε.
In summary, all random variables involved in our model are related by the linear expression:
(17)(UηεVT)=(1000I000IKI0βKβI)(Uηε).
The mean E[(U,η,ε,V,T)] is
(18)(E(U)E(η)E(ε)E(V)E(T))=(E(U)0E(ε)E(U)KE(U)βK+E(ε)).
And the variance-covariance matrix Σ(U,η,ε,V,T) is
(19)Σ=(σU200σU2KTσU2KTβT0Dη0DηDηβT00Dε0DεσU2KDη0σU2KKT+Dη(σU2KKT+Dη)βTσU2βKβDηDεβ(σU2KKT+Dη)(σU2KKT+Dη)βT+Dε).
Then, the prior distribution (joint probability density function) of all the variables can be given as
(20)f(V1,V2,…Vm,T1,T2,…,Tn)=fN(μV,ΣV)(V1,V2,…Vm)∏i=1np(Ti∣∏i=πi).
4. Estimating OD Matrices Using the Proposed BN Model
In this section, using the proposed BN model, we describe how to estimate the OD matrices when some new observed link flows are available.
Since we have obtained the prior distribution of all the variables, we can use the following equations to update the mean and the covariance matrix of the variables [23, 24] based on some observed variables. Note that one only needs to consider the unobserved variables conditioned on the observed variables and then update the expected values and covariance of the remaining variables. These equations are
(21)μY∣Z=z=μY+ΣYZΣZZ-1(z-μZ),(22)ΣY∣Z=z=ΣYY-ΣYZΣZZ-1ΣZY,(23)μZ∣Z=z=z,(24)ΣZ∣Z=z=0,
where Y and Z are the sets of unobserved and observed variables, respectively; μY and ΣYY are the mean vector and covariance matrix of Y; μZ and ΣZZ are the mean vector and covariance matrix of Z; and ΣYZ is the covariance matrix of Y and Z.
Given a set of evidential nodes Z whose values are known to be Z=z, by (21) and (22), we can derive the mean vector and covariance matrix of the unobserved nodes in Y. Thus, the conditional distribution of Y can be obtained. Equations (23) and (24) state that the expected values of the observed variables coincide with their observed values and their variances and covariances are null. In order to simplify the calculation, we can use an incremental method, that is, updating evidence from Z one by one. Thus, we do not need to calculate the matrix inverse operation, because the matrix degenerates to a scalar. In this case, ΣYZ is a column vector and ΣZZ is a scalar (i.e., ΣZZ-1=1/σZZ).
If we want to give the point estimation as well as the corresponding probability intervals, we can solve the following maximum posterior distribution problem to get the point estimation, whose results are normally the conditional means:
(25)MaximizefN(μT,ΣT)(t1,t2,…,tn)∏a=1mp(va∣πa)|Z,
where Z is the set of the observed variables, including those observed link flows and/or OD flows.
In summary, the specific procedure for estimating OD matrices and those unobserved link flows is given as follows.
Step 0.
Initialize the model. According to Assumptions 3 and 4, based on prior (historical) link flows, we can determine the distribution of U and the initial matrix K. Then we can obtain the initial link flows V=(V1,V2,…,Vm)=KE(U). Thus the initial route choice proportion pik is calculated as follows:
(26)ha(Va)=hao[1+αa(VaCa)βa],(27)cki=∑a∈Aha(Va)δaki,(28)pik=exp(-θcki)∑l∈Riexp(-θcli),
where (26) is the link cost function, where hao is the cost associated with free flow conditions, Ca is the link capacity, and αa and βa are constants defining how the cost increases with traffic flow; (27) is the route cost function; (28) calculates the route choice proportion defined in (7).
Step 1.
Solve the BN model. According to model assumptions, using the initial route choice proportion matrix P, we can get the prior distribution of traffic flows (prior means and variances) using the following formulas:(29)β=[(Pδ)TPδ]-1(Pδ)T,(30)E[V]=KE[U],(31)E[T]=βE[V]+E[ε],(32)Dη=Diag(vE[V]),(33)ΣVV=σU2KKT+Dη,(34)ΣVT=ΣVVβT,(35)ΣVT=ΣTV,(36)ΣTT=βΣVVβT+Dε,
where (29) is for calculating the regression coefficient matrix given in (12). Equations (30) and (31) are for calculating the means of V and T given in (17). Equation (32) defines the diagonal variance matrix of η; that is, Var(ηa)=(E(va)×ν)2, where v is the coefficient of variation. Equations (33) to (36) define the variance-covariance matrix in (19).
Step 2.
Update the observed link flows, using the formulas
(37)E[YZ=z]=E[Y]+ΣYZΣZZ-1(z-E[Z]),(38)ΣY∣Z=z=ΣYY-ΣYZΣZZ-1ΣZY,(39)E[Z∣Z=z]=z,(40)ΣZ∣Z=z=0,(41)T=E[Y∣Z=z]|(Y,Z)=T,
where (37) and (38) are for updating the means and variance-covariance matrix of the unobserved variables, where Y and Z refer to the unobserved and observed components of (T,V), respectively. Equations (39) and (40) state that the expected values of the observed variables are their observed values and their variances and covariances are zero, as given in (23) and (24). Equation (41) takes the conditional means as point estimation for the OD and link flows, as the results of the maximum posterior distribution problem given in (25).
Step 3.
Calculate the new route choice proportions. Since link flows are updated in Step 2, the route choice proportions also need to be updated. Given that the matrix T is obtained by (41), the new route choice proportion pik* is calculated using the following expressions:
(42)Va=∑i∈ITi∑kpikδaki,(43)ha(Va)=hao[1+αa(VaCa)βa],(44)cki=∑a∈Aha(Va)δaki,(45)pik*=exp(-θcki)∑l∈Riexp(-θcli),
where (42) is the conservation law equation given in (6).
Step 4.
Test convergence. If Σi,k(pik-pik*)2<ξ, where ξ is a small number to control convergence of the process, then stop the process and return the OD flow Ti, the link flow Va, and the route choice proportion pik*. Otherwise, continue with Step 5.
Step 5.
Update route choice proportions and the matrix K, using the expressions
(46)pik=ρpik*+(1-ρ)pik∀i,k,(47)K=VE(U)
and go to Step 1, where (46) is for updating the route choice proportion matrix, where ρ, 0<ρ<1, is a relaxation factor; (47) is for updating the matrix K. The values of variables V are obtained by (42).
5. Example: The Nguyen-Dupuis Network
In this section, we illustrate the proposed methods using the well-known Nguyen-Dupuis network, shown in Figure 2. It consists of 13 nodes, 19 links, and 4 OD pairs: 1-2, 1-3, 4-2, and 4-3.
The Nguyen-Dupuis network.
The network data are shown in Table 1 and the associated parameters in (26) are assumed to be αa=0.15, βa=4 for any link.
Network parameters of the Nguyen-Dupuis network.
Link
hao
Ca
1
7
900
2
8
700
3
9
700
4
14
900
5
5
800
6
9
600
7
5
900
8
13
500
9
5
300
10
9
400
11
10
700
12
10
700
13
9
600
14
8
700
15
9
700
16
8
700
17
7
300
18
15
700
19
11
700
The assumed true OD matrices, which are used later for testing the quality of the estimation, are shown in Table 4 under the heading “True flow.” The true link flows are obtained by solving the multinomial logit assign model with parameter θ=1.0 for the stochastic loading.
The prior information is assumed as follows: the expected value of the level of total traffic flow U and its standard deviation are E(U)=50 and σ(U)=10, respectively and the initial matrix K is
(48)K=(1.5,0.9,0.9,0.7,1.7,0.8,1.8,0.1,0.8,1.0,1.5,0.3,1.2,0.3,0.5,0.8,0.1,0.7,1.2)T.
The observed link flows are assumed to be V5=82.57, V7=87.38, V10=48.07, V13=58.66, and V18=37.12. And it is supposed that they are known in this order. Since they are observed, their values are equal to the true link flows (as shown in Table 4).
Step 0-Step 1. Initialize and give the prior distribution of all the variables.
Based on the prior information, we can get the prior distribution of the traffic flows. The prior means and variances are shown in the second columns of Tables 2 and 3, respectively. To simplify the calculation, in this example, the expectation and variance-covariance of ε are assumed to be null (i.e., there is no uncertainty in the conservation law). In addition, to obtain the variance-covariance matrix Dη, we have selected v=0.1 in (32).
Point estimation of traffic flows after updating observed link flows one by one.
Item
0
5
7
10
13
18
T1
37.16
39.02
39.05
39.86
40.11
39.88
T2
82.88
77.95
78.32
78.63
80.51
81.22
T3
68.37
59.64
59.94
59.23
60.85
59.95
T4
12.68
22.48
22.29
23.28
19.54
19.06
V1
75.00
74.64
74.92
75.26
76.98
77.55
V2
45.00
42.36
42.48
43.28
43.69
43.61
V3
45.00
46.30
46.32
45.73
46.22
46.03
V4
35.00
35.82
35.92
36.78
34.14
32.80
V5
85.00
82.57
82.57
82.57
82.57
82.57
V6
40.00
38.37
38.62
38.37
40.69
41.17
V7
90.00
87.24
87.38
87.38
87.38
87.38
V8
5.00
1.43
1.45
1.55
1.51
1.54
V9
40.00
39.38
39.40
39.31
39.31
39.31
V10
50.00
47.86
47.92
48.07
48.07
48.07
V11
75.00
75.64
75.73
76.24
76.58
76.43
V12
15.00
15.14
15.42
15.44
16.20
15.43
V13
60.00
59.05
59.12
59.71
58.66
58.66
V14
15.00
16.57
16.87
16.99
17.71
16.84
V15
25.00
23.06
23.31
22.89
24.42
23.57
V16
40.00
41.37
41.49
42.22
41.34
41.34
V17
5.00
6.10
6.16
6.29
6.40
6.51
V18
35.00
36.26
36.32
36.40
37.29
37.12
V19
60.00
59.05
59.12
59.71
58.66
58.66
Variances of traffic flows after updating observed link flows one by one.
Item
0
5
7
10
13
18
T1
65.81
25.98
20.88
16.94
14.35
2.93
T2
294.56
54.98
29.91
20.46
15.66
15.66
T3
203.36
22.72
7.59
5.41
5.40
3.30
T4
18.56
16.98
15.59
14.10
3.24
3.16
V1
281.25
100.33
81.13
69.99
67.48
65.57
V2
101.25
32.31
26.08
23.15
21.73
20.73
V3
101.25
38.61
31.01
25.85
24.33
23.10
V4
61.25
23.11
18.65
16.72
13.27
11.73
V5
361.25
0
0
0
0
0
V6
80.00
26.51
21.55
18.19
18.85
18.48
V7
405.00
137.07
0
0
0
0
V8
1.25
0.04
0.03
0.03
0.03
0.03
V9
80.00
27.93
22.44
0
0
0
V10
125.00
41.25
33.19
0
0
0
V11
281.25
103.05
82.88
71.83
66.79
0
V12
11.25
4.13
3.44
2.95
2.99
2.60
V13
180.00
62.81
50.52
44.05
0
0
V14
11.25
4.94
4.11
3.57
3.57
0
V15
31.25
9.58
7.85
6.47
6.79
0
V16
80.00
30.82
24.88
22.03
0
0
V17
1.25
0.67
0.55
0.50
0.47
0.46
V18
61.25
23.68
19.07
16.91
15.83
0
V19
180.00
62.81
50.52
44.05
0
0
The true flow and the point estimation from the proposed method.
Item
True flow
Proposed method
Relative error (%)
T1
40
39.88
0.30
T2
80
81.22
1.53
T3
60
59.95
0.08
T4
20
19.06
4.70
V1
76.64
77.55
1.19
V2
43.36
43.61
0.58
V3
46.13
46.03
0.22
V4
33.87
32.80
3.16
V5
82.57
82.57
0
V6
40.21
41.17
2.39
V7
87.38
87.38
0
V8
1.42
1.54
8.45
V9
39.31
39.31
0
V10
48.07
48.07
0
V11
76.43
76.43
0
V12
15.41
15.43
0.13
V13
58.66
58.66
0
V14
16.84
16.84
0
V15
23.57
23.57
0
V16
41.34
41.34
0
V17
6.24
6.51
4.33
V18
37.12
37.12
0
V19
58.66
58.66
0
Step 2–Step 5. Give the posterior distribution by updating the observed link flows one by one.
Table 2 shows how the means of the traffic flows changed after updating the observed link flows one by one. After each update, the point estimation of link flows {V1,V2,…,V19} and OD flows {T1,T2,…,T4} is provided. It can be seen that once V7 and V10 become known and updated, the point estimation of V9 in Table 2 remains unchanged and its variance in Table 3 becomes zero (boldfaced in the table). Because, due to the flow conservation in node 6, once V7 and V10 become known, V9 becomes known. Similarly, V19 becomes known once V13 is given; V16 becomes known once V19 is given; V11 becomes known once V9 and V18 are given; V15 becomes known once V11 is given; V14 becomes known once V10, V15, and V16 are given. In other words, due to the conservation law, the observed link flows are known in this order: V5=82.57, V7=87.38, V10=48.07, V9=39.31, V13=58.66, V19=58.66, V16=41.24, V18=37.12, V11=76.43, V15=23.57, and V14=16.84.
Table 3 shows how the variances of the traffic flows changed after updating the observed link flows one by one. Note that after some link flows (including the observed link flows and those derived from the observed link flows and the conservation laws) are known, their means remain constant and their variances become null (boldfaced in the tables). And normally the variances of the unknown variables (OD flows and those unobserved link flows) decrease with each update. Note that the smaller the variances, the higher the precision of the estimation. So after a series of update, the estimation becomes more accurate. This derives a method to determine how many links and what links need to be observed when estimating traffic flows by the Bayesian network model, that is, the network sensor location problem (NLSP) [28]. Note that the variances updating equation (22) has no relevance with the values of the observed link flows. So we can solve NLSP without observing any link. First, by the Bayesian network model, we can get the prior distribution of all the variables shown in (19). Next, we can take the link which can reduce the variance of the OD flows maximally by updating as the first observed link. Then we update the variances of traffic flows, determine the second observed link, and iterate until the variances decrease to meet the requirement of the estimation precision or until the budget exceeds the constraint.
By the updated means (point estimation) and variances, we can obtain the posterior distribution of OD flows and those unobserved link flows. Figure 3 illustrates how the marginal densities of OD flows and those unobserved link flows evolve from their initial form to their final form (boldfaced) by updating the observed link flows one by one. It can be seen that the variances of the unknown variables are normally decreasing with each update.
Conditional distributions of the OD flows and the unobserved link flows.
In summary, according to Tables 2 and 3 and Figure 3, using the proposed Bayesian network method, after some variables are observed, the means of these observed remain constant and their variances become null. For the remaining variables (those unobserved), their variances are normally decreasing with each update. The proposed method can provide a control of the conservation law as well. The final forms (boldfaced) in Figure 3 are the posterior densities of OD flows and those unobserved link flows. These posterior densities supply complete statistical information about the unknown variables. By these posterior densities, we can provide the point estimation as well as the corresponding probability intervals.
In order to test the quality of the estimation, Table 4 compares the true flow and the point estimation of the proposed BN model. Because V5, V7, V9, V10, V11, V13, V14, V15, V16, V18, and V19 are observed, their values are equal to the true flow. And studying the OD pairs and those unobserved links (boldfaced in the table), it can be seen that the estimation and true flow are basically the same. The relative errors are all very small and the maximum relative error value of the OD flows estimation is only 4.70%. This illustrates that the proposed BN model has a high accuracy.
6. Conclusions
In this paper, we use a Bayesian network model to estimate origin-destination matrices based on prior link flows and a set of observed link flows. Normally, large amounts of historical link flows are stored in the cities’ transportation system. Compared with an outdated or subjectively guessed prior OD matrix, prior link flows are more accurate as they are obtained by traffic detectors or manual investigation. The proposed Bayesian network model can make use of these historical link flows and also consider the level of total traffic flow, which is really useful for many real-time traffic applications, especially in the ITS.
Using the Bayesian network model and updating the observed variables (including the observed link flows and those derived from the observed link flows and the conservation law) can modify the means and reduce the variances of the remaining variables. These updated means and variances allow us to obtain the posterior distribution of the unobserved variables based on those observed. Thus, the methods can provide not only point estimation but also the corresponding probability intervals. In addition, an incremental procedure is developed for solving the Bayesian network model without the intensive computation of matrix inversion, which can make this model apply easily in large-scale networks.
Moreover, in this paper, a normal distribution for traffic flows is assumed. It is reasonable because these random variables are the sum of a great number of independent Bernoulli experiments in which the users decide where to travel and which routes to choose. For future research, it is worthwhile to relax the normal distribution assumption.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research is supported by the National Natural Science Foundation of China (no. 51178110 and no. 51378119) and Graduate Innovation Project of Jiangsu Province (no. CXZZ12_0113). Comments provided by anonymous referees are much appreciated.
Ben-AkivaM.BierlaireM.BottomJ.KoutsopoulosH.MishalaniR.Development of a route guidance generation system for real-time applicationProceedings of the IFAC Transportation Systems Conference1997Beijing, ChinaMahmassaniH. S.Dynamic network traffic assignment and simulation methodology for advanced system management applicationsBellM. G.The estimation of origin–destination matrix from traffic countsCascettaE.NguyenS.A unified framework for estimating or updating origin/destination matrices from traffic countsYangH.Heuristic algorithms for the bilevel origin-destination matrix estimation problemDoblasJ.BenitezF. G.An approach to estimating and updating origin-destination matrices based upon traffic counts preserving the prior structure of a survey matrixNieY.ZhangH. M.A variational inequality formulation for inferring dynamic origin-destination travel demandsNieY. M.ZhangH. M.A relaxation approach for estimating origin-destination trip tablesShenW.WynterL.A new one-level convex optimization approach for estimating origin-destination demandCareyM.RevelliR.Constrained estimation of direct demand functions and trip matricesBellM. G. H.The estimation of origin-destination matrices by constrained generalised least squaresParryK.HazeltonM. L.Estimation of origin-destination matrices from link counts and sporadic routing datavan ZuylenH. J.WillumsenL. G.The most likely trip matrix estimated from traffic countsXieC.KockelmanK. M.WallerS. T.A maximum entropy-least squares estimator for elastic origin-destination trip matrix estimationLoH. P.ZhangN.LamW. H. K.Estimation of an origin-destination matrix with random link choice proportions: a statistical approachVardiY.Network tomography: estimating source-destination traffic intensities from link dataHazeltonM. L.Estimation of origin-destination matrices from link flows on uncongested networksMaherM. J.Inferences on trip matrices from observations on link volumes: a Bayesian statistical approachLiB.Bayesian inference for origin-destination matrices of transport networks using the em algorithmHazeltonM. L.Statistical inference for time varying origin-destination matricesHazeltonM. L.Statistical inference for transit system origin-destination matricesTebaldiC.WestM.Bayesian inference on network traffic using link count dataSunS.ZhangC.YuG.A Bayesian network approach to traffic flow forecastingCastilloE.MenéndezJ. M.Sánchez-CambroneroS.Predicting traffic flow using Bayesian networksPearlJ.JensenF. V.ShachterR.KenleyC.Gaussian influence diagramsZhuS. L.ChengL.ChuZ. M.AnthonyC.ChenJ. X.Identification of network sensor locations for traffic flow estimationaccepted by Journal of the Transportation Research Board