Empirical Approximation for the Stochastic Fundamental Diagram of Traffic Flow on Signalized Intersection

Thewide scattering nature of the fundamental diagram (FD) with observed flow-density data may be associatedwith the dynamical traffic flow process, especially on signalized intersection. To describe the uncertainty of FD, in this work we established stochastic fundamental diagram (SFD) which is defined by the distributions of shockwave speed. Our approach is based on a two-level stochastic process of the traffic flow system in terms of the dynamics of traffic density and state mode associated with signal phases which is named switching linear dynamical systems (SLDS).Then, variational Bayesian learning method is adopted to compute the distributions of SFDparameter to approximate the experimental distributions of shockwave calculated by the observed flow-density data. Given traffic flow data from the NGSIM program, the verification result demonstrated that the SFD can be more helpful to capture the main features of the observed widely scattering of the flow-density data compared with FD. With the shockwave speed sampled from the SFD, the SLDS could describe the dynamic characteristics of traffic flow and be applied to the maximum likelihood estimation of traffic density or flow rate.Because it is simple and automatically calculated, the SFDprovides an alternative description for fundamental diagram and its uncertainty in the traffic flow.


Introduction
The model-based traffic simulation and traffic state estimation require stochastic models to describe the dynamical phenomena of traffic flow system when the traffic flow management is applied on the freeways and signalized arterials.Compared to the randomness of driver behavior in microscopic simulation model, there are several methods to introduce stochastic elements to the macroscopic simulation model including the boundary conditions [1], the flowdensity relationship or the fundamental diagram (FD) [2], and the traffic flow evolution process [3].For the freeway condition, the scattered FD with observed flow-density data may be associated with the dynamical traffic flow system and the uncertainty of FD can be mostly expressed in terms of the variance of FD parameters [2,4,5].Then, the problem is how to numerically determine the variance of FD parameters which can be derived from macroscopic simulation model [4,5] or stochastic differential equation [2].Considering the traffic flow of signalized intersection, widely scattered FD is more complicated than the freeway condition which may be affected by not only the traffic flow system itself but also the traffic signals [6].
This research follows the abovementioned principle to explore the stochastic fundamental diagram (SFD) of traffic flow on signalized intersection.First, whether applied to the freeways or the signalized intersections, the common features of SFD are that the flow-density data is divided into two different regions: lower vehicle density and higher vehicle density, which correspond to "free" and "congested" flow.Once either of the density or the flow rate is determined, the other can be obtained by the "forward" and "backward" shockwave speed in each two different regions of "free" and "congested" flow.So, in our research, the SFD can be defined by the empirical distributions of traffic flow shockwave speed which is simple enough to be obtained and applied.Secondly, the macroscopic traffic simulation model, such as the cell transmission model (CTM) [7,8], can succeed in describing traffic evolution under many different traffic conditions with the traffic flow density combined with FD.So, the traffic model can be employed to derive the SFD that should not be subjected to a specific traffic scenario or some certain observed data.Alternatively, the SFD may be used as an input together with the boundary conditions to determine the temporal evolution of traffic flow system.Furthermore, in our research, traffic signal influence must be considered in traffic flow evolution process.For the freeway condition, complex traffic flow evolution process can be regarded as jump Markov linear systems in which the traffic state transitions between different modes with constant FD [9,10].Further, a stochastic cell transmission model (SCTM) has been proposed with stochastic demand and supply introduced in a switchingmode model [3].This type of SCTM model has been applied to traffic flow on signalized intersection [11] by ignoring the signal control effect on traffic flow.In this research, we developed a dynamical model to determine the dynamic changing process of traffic flow on signalized intersection by switching linear dynamical systems (SLDS).SLDS are useful in describing dynamical phenomena, particularly structural changes over time.SLDS can be viewed as an extension of hidden Markov models (HMMs) [12].Each HMM mode can be associated with the patterns of traffic flow transition such as on freeway [3] or signalized intersection condition [13].The patterns or modes of traffic flow can be defined according to the characteristic of propagating shockwave on signal phases of which speed can be sampled from SFD.When the traffic density evolves according to the triangular FD, conditioned on an appointed mode sequence, the SLDS should ideally be consistent with the SCTM.
After constructing the SLDS model, the key point is to find the exact expressions for the approximate parameter distributions of the SFD.In our SLDS with time series data, the traffic state varies in time and is structured by the SFD conditioned on mode sequence.Assuming a certain framework with mode sequence and observations, the distributions of shockwave speed can be expressed by the posterior probability distribution, approximately found by the variational Bayesian method [14].
The organization of this paper is as follows.In Section 2, we will discuss our proposed approach for modeling traffic flow in signalized intersection.And in Section 3, the SFD is constructed by a local linear model with stochastic variables.Section 4 details the variational Bayesian learning method for the SFD's parameter distribution inference.Section 5 provides numerical verification for the model and algorithm presented in Sections 3 and 4. The model and algorithm are validated using the next-generation simulation (NGSIM) data.Section 6 concludes the paper and briefly discusses future research directions.Appendix A is the derivation process of the posterior distribution of SFD which is proved to be the approximation distribution in Appendix B.

The Proposed Model for Traffic Flow on Signalized Intersection
. .e Minimal Representation of Road Segment.Consider the simple setting of a homogeneous roadway with signal control, as shown in Figure 1(a).The road segment between adjacent intersections can be divided into two simplest cells (link cells), where Cell(2) represents the road area, Cell(1) represents the intersection area, and Cell(1, 2) represents the signalized link shown in Figure 1(b).If the adjacent intersections are far away, the number of cells can be increased.But, at the same time, too much size of cell will reduce the efficiency of calculation.The cells' lengths must satisfy the constraint condition:   /Δ ≤ V max , where V max denotes the maximum transit speed of the flow in the cell and Δ is time interval.The input and output traffic flow can be considered as the flow from or to a "boundary" cell that is named a "virtual" cell in our research, shown in Figure 1(b).The virtual cell can be defined with any length and capacity corresponding to the sending and receiving flow associated with signal control.For a better explanation, we define the traffic states located at the three sections as input, output, and transition traffic flow.

. . State Mode of Traffic Flow on Signalized Intersection.
For the simplest signalized link case.The state mode of traffic flow can be defined by the shockwave propagation assumed to be located at the boundary between Cell(1) and Cell(2) on freeway [3].Because of cyclic signal phase changes, the traffic flow wave propagating at a signalcontrolled intersection has a cyclic pattern [13], as shown in Figure 2. In our research, we define the state mode of traffic flow by considering both the transition wave at the boundary between Cell(1) and Cell(2) and the boundary condition with cyclic changes.As indicated in Figure 2(a), at the beginning of a red light (assume there is no residual queue at the beginning of a cycle), the shockwave  (1) is generated and propagates backward.In this period, the traffic Cell( 2) Cell(1) state changes from the "free flow-free flow" in Cell(1) and Cell (2) to "Free flow-Congestion1" due to the queuing process in Cell (1)  In the red light period, if the shockwave   (1) does not reach the boundary between Cell(1) and Cell(2), the traffic state will remain in "Free flow-Congestion1" until the end of the queue discharge, as shown in Figure 2(a).This situation arises because the capacity of Cell(1) is very large or the input flow is particularly small resulting in a very short queue length.
If a queue does not fully discharge by the end of the cycle, a residual queue is formed, as shown in Figure 2(b).After the next cycle red phase starting, the shockwave    (1) propagates backward which has the same speed as the shockwave   .Then, the other processes are just the same as those above.
The fact that the shockwave pattern is formation from cycle to cycle due to the signal control makes the shockwave identify traffic state modes on a signalized link.So, the state modes can be summarized by the shockwave pattern and boundary conditions with signal phase, as in Table 1.

. . State Mode-Based Model of Traffic Flow on Signalized
Intersection.Consider a road segment with flow through its boundaries at time .By combining FDs which express the mean flow rate as functions of mean traffic density, the numerical solution of the conservation laws is obtained by recursively computing the following: where (, ) denotes the mean traffic density in Cell() at time ,   is the length of the road segment, and Δ is time interval, while ( − 1, ) and (, ) denote the mean flow  rate through the upstream and downstream boundaries of the link Cell(x-1, x), respectively.The FD can then be written as In the simple case of a concave flux function, (2) could be given by CTM [7,8]: where V is the forward speed,  is the backward wave speed,   is the jam density, and  max is the maximum flow.In this case, the FD is assumed to be a triangular function, as shown in Figure 3.
With the triangular FD, the traffic flow rate through the upstream or downstream boundaries is determined by the wave propagation's speed and direction (forward or backward) dependence on adjacent cell traffic densities.In this paper, we introduce a stochastic process to describe the stochastic wave propagation direction and define the SFD to compute distributions of the traffic state.So (1) is rewritten as where (, ) is a stochastic variable denoting the traffic density in Cell(x) at time , F((, ))is the SFD, and   is process noise that is mode-specific with Gaussian distribution (with mean zero and covariance Σ).
Owing to cyclic signal phases and random arrival and departure of traffic flow, the traffic state dynamically changes from one mode set to others, as described in Section 2.2.Using (4), the dynamic evolution process of density can be modeled using a set of two linear systems, with which the parameter respectively is the traffic mode and density.This is known as a SLDS, as shown in Figure 4.
At each time , in the first-level process, a switch traffic state mode   = {1, . . ., , . . .} describes which one in a set of linear dynamical systems (LDS) is to be used.In the second-level process, the transition dynamics of the continuous traffic density () are linear.Then, the switchingmode model is used to transform a CTM's nonlinear state equations into a set of piecewise linear equations, similar to SCTM for freeway [3,9], and (4) becomes ( ()) =     () +  ,    +  ,  q max (6)  where () is the vector of traffic density states at time  with the dimension ,   is the vector of jam densities, and q max is the vector of maximum flow rates.When the conservation law is added to the noise in (5), the density must be subject to [0,   ].The switch variable   ,   = {1, . . ., , . . .}, selects a single transition matrix from each of the available sets, { 1 ,  2 , ⋅ ⋅ ⋅   }, { ,1 ,  ,1 , ⋅ ⋅ ⋅ ,  , }, and { ,1 ,  ,1 , ⋅ ⋅ ⋅ ,  , }.The dynamics of   itself are Markovian, with transition probability: Considering the SLDS model consisting of four cells indicated in Section 2.1, we specify the transition matrixes in (6) under each mode of SLDS as listed in Table 2.Note that the input flow rate is   and the upstream virtual cell density Because the downstream virtual cell flow rate is zero in red light, in this model, the shockwave   (1) cannot be derived directly.

Stochastic Fundamental Diagram (SFD)
In this section, we describe the SFD.According to ( 5)-( 8), this stochastic process can be described by HMM with the stochastic variables of traffic density and traffic state mode.
From time  to  + Δ, the traffic density of Cell() is    (, ) to   +△ (, +Δ), and the corresponding flow rate is (−1, ) and (, ) through the Cell(x).If FD appears nonlinear and flux such as in Figure 5, we must define the SFD function tractable and cannot be modeled by a function completely, an alternate approach is to apply the linear regression model locally to approximate the real SFD [15].Given a nonlinear function F(•) that could be any curve, as shown in Figure 5, we construct a first-order Taylor series approximation in any local area of F(•) by given mode   =  as where (  0 (, )) is the SFD's function value of   0 (, ),   is the wave's propagation speed, and   = ∇F(  (, ))|   (,)=  0 (,) .The SFD can be approximated by n tangents passing through point (  0 , (  0 )) constrained by the limit of the maximum flow rate  max , shown in Figure 5, so that where   (  (, )) denotes the local approximation function and [•]  implies n numbers.If   is a stochastic variable subjected to a certain distribution, a more general definition of the SFD is given as follows.
Definition .Given the stochastic function variable   (, ) and stochastic parameter variables   belonging to the th mode, the variables [  ] n are subject to independent and identical distribution (IID) R. The SFD can be written as where With Definition 1 applied, SFD is the distributions of parameter   .When the wave propagates forward, the density data group is   ∈ [0,   ), where 0 <   <   .Let    > 0; then if the Taylor series is expanded at point (0, 0), meaning   0 (, ) = 0 and   (  0 (, )) = 0, the SFD is approximated by n numbers of rays forming at the point (0, 0), as shown in Figure 6(a).When the wave propagates backward, the density data group is   ∈ (  ,   ], where 0 <   <   .Let    > 0, so the Taylor series expansion at point (  , 0) is   0 (, ) =   and   (  0 (, )) = 0.The SFD is approximated by n rays forming at point (  , 0), as shown in Figure 6(a).Then, SFD can be written as So (3) can be recognized as a special solution of (13).And (13) 6(b), then (6) can be sampled from F(()) in ( 13) condition on   = :

Variational Bayesian (VB) Learning for SFD
Once the traffic flow on freeway has been modeled by SCTM with FD [9,16], the parameters    ,    can be obtained from the maximum likelihood estimation (MLE) method by giving the independent observation sequence {()} =1: and state mode sequence {  } =1: [17], which equals the parameter calibration on freeway condition [9,16].But we want to obtain the distribution of parameters    ,    in (12) on the basis of Definition 1 for the traffic flow on signalized intersection, which can be computationally approximated as the posterior distribution by the variational Bayes (VB) learning method for SLDS [18].The basic idea of variational Bayesian learning is to simultaneously approximate the intractable joint distribution over both hidden states and parameters of discrete graphical models with a simpler distribution, usually by assuming the hidden states and parameters are independent [18].In our problem, the hidden states of the discrete graphical model are correspondingly the traffic density and traffic state mode of SLDS.Given the independent traffic density sequence {()} =1: and state mode sequence {  } =1: , the unknown variables of SLDS are parameters    ,    and covariance Σ  .Let the training dataset be as follows: where (  ,   ) are the ( − 1) dimension vectors:   = {()} =1:−1 and   = {()} =2: .For the parameter set Θ = {   ,    }, the posterior of parameter (Θ | ) is computed by Bayesian expression as which can be rewritten as where (Θ) or (  ) is the prior of parameters    ,    and ( | Θ) or (  |   , Σ  ) is the likelihood and identical to the one in MLE [17].If the prior of   is selected as the Gaussian, and {()} =1: represents the IID of the Gaussian, the conjugate prior of the covariance Σ  is the inverse-Wishart distribution, denoted by IW [19].
Considering (6), the matrix    ,  ,  is composed of the same parameter   by different modes   .The matrix    ,  ,  can be converted to a vector V  .In Table 2, we find that the matrix    ,  ,  is the sparse matrix and V  is also a sparse matrix.So, we consider the automatic relevance determination (ARD) which encourages driving the matrix elements to zero if their presence is not supported by the model or data [14].Then, the parameter priors are given by (Σ  ) =  (Σ  ;  0 ,  0 ) .
Finally, the parameter posterior is (detailed in Appendix A) Remark .Given the parameter priors as in (18), the covariance prior as in (19), and the likelihood as in (A.6), the parameter [   ] n , [   ] n 's distribution R in Definition 1 has the approximation form in (20) (proof in Appendix B).As the optimal form of the parameter distribution is given, the concrete distribution can be obtained by Gibbs Sampling with conditions on observation data [20,21].The interdependencies among model parameters, variables, and hyperparameters are shown in Figure 7. Here, we use the Parameter Gibbs Sampling in Algorithm 1 to obtain the approximated posterior distributions using (20).

Empirical Verification
. .Data Introduction.The data utilized in this research was collected on a segment of Peachtree Street, in Atlanta, Georgia, USA, on November 8th, 2006, as part of the Federal Highway Administration (FHWA) NGSIM [22].The data includes detailed individual vehicle trajectories with time and location, vehicle velocity, and headway (distance) between vehicles, from which the flow-density relationship can be calculated.The Peachtree Street segment is approximately 2,100 feet in length, with five intersections and two or three through-lanes in each direction.Intersections 1, 2, 3, and 5 are signal-controlled intersections, while Intersection 4 is a stop sign-controlled intersection.The Peachtree Street data consist of two 15-minute time periods, 12:45 p.m. to 1:00 p.m. and 4:00 p.m. to 4:15 p.m.The traffic volume in 4:00 p.m. to 4:15 p.m. is larger than the volume in 12:45 p.m. to 1:00 p.m., so our study focused on the segment between Intersections 2 and 3 in 4:00 p.m. to 4:15 p.m. Intersection 2 is a two-phase signal-controlled intersection (green, red, and amber lights).
In our study's time period, the cycle length is constant (102s) with the g/C ratio being 0.43.

. . Observations and Empirical Distributions.
As shown in Figure 8, the segment between Intersections 2 and 3 is divided into two cells, with lengths of 167 feet and 170 feet for Cell (1) and Cell(2), respectively.The average traffic flow rate and density in each cell are calculated in every 5s interval.The average density and the jam density in each cell can be better estimated from the definition "vehicles per distance."Average volume of the traffic flow can be calculated for vehicles through the boundary.The traffic state sequence, cell density, and flow rate are shown in Figure 9. "Input" is the traffic flow rate in the upstream segment boundary and "Output" is the traffic flow rate in the downstream segment boundary."Trans" is the traffic flow rate from Cell (2) to Cell (1).
The widely scattering flow-density relationship for the selected road segment, as shown in Figure 10, is obviously not tractable to be modeled by a function completely.We use the flow-density data to fit the Gaussian shockwave speed empirical distribution (the red line) shown in Table 3 and Figure 11.To compare with the triangular FD, we use the flowdensity data to fit the linear model in (3) with   = 0.14 (veh/feet), with the results shown in Table 4 and Figure 12.
To test the assumption of Gaussian shockwave speed empirical distribution statistically, the following hypothesis test by using -test is conducted for each of the traffic state modes, as shown in Table 5.This test is used to determine if there is evidence that the shockwave speed comes from a  Gaussian distribution with mean  0 and unknown variance.Using a confidence level of  = 0.05, if the result is  < 0.05, the test rejects the null hypothesis  0 and  > 0.05 otherwise.The ci is confidence interval for the true population mean.
. .Numerical Verification.We examine the posterior distribution of the wave speed to analyze the SFD by giving the state sequence and cell density, as shown in Figure 9. Here, we select (13) expansion for SFD.We use G( 0 ,  0 ) as the precision parameter    of the wave priors in (18) and ( 0 ,  0 ) for the prior of covariance Σ  in (19).We choose the settings  0 = 1,  0 = 0.01 and  0 = 10,  0 = 10 4 .The results of the parameter posterior are listed in Table 6 and the posterior PDF (green line) are shown in Figure 11.
To the fitted empirical distributions, the posterior distributions can be the approximate distribution for the shockwave speed.And the means of inference posterior distributions are also included in the ci of hypothesis test.In Figure 13, the triangular FD shockwave speeds in every state mode group are almost equal to the mean of the SFD shockwave speed's distribution.Because we are given SFD shockwave speed's distribution, we can obtain the infinite lines whose  slope is the wave speed.The family line of SFD is a fanshaped area that encompasses all the data according to the probability.
In Figure 14, the normal quantile-quantile of the NG wave speed versus the sample data, which are the 50 shockwave speeds sampled from the posterior, are displayed.The plot produces an approximately straight line, suggesting that the sample speed and the NG speed follow the same distribution.
It displays the likelihood (q,   | ,Θ) = (q(), F(  () |   = ), Σ  ) =1: by using the inference result of parameter Θ and covariance Σ  in Figure 15, and the flow-density is the same as that shown in Figure 10(b).We find that most of the flow-density data is distributed in the region in which the likelihood is the maximum one.
The traffic flow rate transit from Cell(2) to Cell(1) is decided by the adjacent cell's density and the state mode in SLDS with SFD or SCTM with calibrated FD [9].By using the cell's density and state mode of NGSIM data in Figure 9, 100 times' random simulations are carried out to calculate the transition traffic flow rate where the shockwave speeds are sampling from SFD in order of traffic mode sequence in each simulation.The Boxplot of transition traffic flow rate, removing the value of zero, in each time interval of all SLDS model simulations is shown in Figure 16 with the NGSIM data (red line) and simulation result of SCTM with calibrated FD (Blue line).The NGSIM data (red line) and simulation result of SCTM are included in the range of the SLDS model result which can be seen as a special simulation with SFD.The SLDS model with SFD is a more large-scale description of the dynamic characteristics of traffic flow than SCTM model with FD.

Conclusion, Limitations, and Future Research
In this work, first, the traffic flow on signalized intersection is modeled by SLDS model with SFD which is defined by the distributions of shockwave speed.Then, variational Bayesian learning method is adopted to compute the distributions of SFD parameter to approximate the experimental distributions of shockwave calculated by the observed flow-density data.It is shown that SFD can be more helpful to capture    the main features of the observed wide scattering of the flowdensity data than FD tested by traffic flow data on Peachtree Street from the NGSIM program.With the wave speed sampled from SFD, the SLDS could describe the dynamic characteristics of traffic flow and be used to the maximum likelihood estimation for traffic density or flow rate.Furthermore, SFD in this paper is also suitable for freeway traffic flow on the condition that the SLDS's traffic state mode is defined by freeway traffic flow such as in the SCTM [3,9].Once obtained, SFD may be used as an input together with the boundary conditions to determine the dynamical evolution of the traffic flow on signalized intersection.So SFD proposed in this paper could be applied to model-based signal control optimization [23].
One of the limitations of this model is that it does not consider scenarios in which the road segment in the intersection arterial is divided into straight lane group and turning lane group, which could potentially lead to the lane blockage phenomenon [24].The model should consider the lane groups by expanding them to the form of a network [8].And the traffic state mode should consider the traffic flow spillover [13].Another key point is that when using the Bayesian framework to infer SFD, we assume sufficient statistics equal to the observation, including the state mode sequence, density sequence, and cell boundary flow rate sequence.However, in practice, it is not easy to directly obtain the state mode sequence and the density sequence.Therefore, for SLDS model, we could infer the state mode and density which are set to be the latent variables when the traffic flow rate at the upstream and downstream boundaries of the road can be set to observation data, same as in the Hybrid Systems [24].In this procedure, SFD could be automatically inferred by variational Bayesian learning method just like the automatic calibration [16].

Figure 1 :
Figure 1: Cell representation of the road segment: (a) the signalized road segment, (b) the link cells of the road segment with virtual cells.

Figure 2 :
Figure 2: State mode of traffic flow corresponding to shockwave propagation: (a) without residual queues, (b) with residual queues.

Figure 3 :
Figure 3: Triangular FD for the generalized CTM.

Figure 5 :
Figure 5: Local model to approximate the SFD.

Figure 6 :
Figure 6: SFD defined by family of the triangular FDs: (a) family of the triangular FDs, (b) the parameters of SFD defined by Gaussian distribution: any triangular FD in the family can be calculated by the Gaussian distribution.

Figure 7 :
Figure 7: The interdependencies of the parameters, variables, and hyperparameters of graphical model.

Figure 8 :
Figure 8: Selected target sections of NGSIM: (a) Peachtree Street in NGSIM, (b) the cell representation of the segment between Intersections 2 and 3.

Figure 9 :
Figure 9: The NG data for verification (time interval 5s): (a) state sequence   , (b) the cell density, and (c) the flow rate on three boundaries.

Figure 10 :Figure 11 :
Figure 10: The flow-density relationship: (a) without condition on the mode, (b) with condition on the mode.

Figure 12 :
Figure 12: Fitted line of the four state modes' FD: (a) the wave   (1) of mode I, (b) the wave   of mode II, (c) the wave   (2) of mode IV, and (d) the wave   (2) of mode V.

Figure 13 :
Figure 13: Fitted line of FD and family line of SFD: (a) the wave   (1) of mode I, (b) the wave   of mode II, (c) the wave   (2) of mode IV, and (d) the wave   (2) of mode V.The color bar represents the probability of a wave sampling from SFD.The color from deep to shallow indicates the probability from low to high.

Figure 14 :
Figure 14: Normal quantile-quantile plot of wave speed: (a) the wave   (1) of mode I, (b) the wave   of mode II, (c) the wave   (2) of mode IV, and (d) the wave   (2) of mode V.

Figure 15 :Figure 16 :
Figure 15: Likelihood of traffic flow density conditioned on traffic modes: (a) mode I, (b) mode II, (c) mode IV, and (d) mode V.The color bar represents the probability of likelihood.The color from deep to shallow indicates probability from low to high in [0, 1].

Table 1 :
State modes of traffic flow in two cells link for intersection arterial.

Table 2 :
Transition matrix of (6) for the state modes.
can be regarded as the family of triangular FDs with different values of density by the added parameter [   ,    ]  .If we assume that the parameters [   ]  , [   ]  are subject to independent and identical Gaussian distribution N( ;    ,    ), N(   ;    ,    ), as shown in Figure

Table 3 :
Fitted Gaussian distribution of shockwave speed (feet/s).

Table 5 :
The test for assumption of Gaussian shockwave distribution (feet/s).

Table 6 :
Fitted Gaussian and posterior distribution of wave speed (feet/s).