Towards Scalable Distributed Framework for Urban Congestion Traffic Patterns Warehousing

We put forward architecture of a framework for integration of data from moving objects related to urban transportation network. Most of this research refers to the GPS outdoor geolocation technology and uses distributed cloud infrastructure with big data NoSQL database. A network of intelligent mobile sensors, distributed on urban network, produces congestion traffic patterns. Congestion predictions are based on extended simulationmodel.Thismodel provides traffic indicators calculations, which fusewith the GPS data for allowing estimation of traffic states across the whole network. The discovery process of congestion patterns uses semantic trajectories metamodel given in our previous works. The challenge of the proposed solution is to store patterns of traffic, which aims to ensure the surveillance and intelligent real-time control network to reduce congestion and avoid its consequences. The fusion of real-time data from GPS-enabled smartphones integrated with those provided by existing traffic systems improves traffic congestion knowledge, as well as generating new information for a soft operational control and providing intelligent added value for transportation systems deployment.


Introduction
Knowledge data discovery and big data are concepts that revolutionize the modern information technology.Big data refers to the large volume of data recorded in various "digital" activities that are directly involved in the data mining process.Decision-makers of public transport systems are aware of the role assigned to knowledge discovery and big data to draw promised profits [1].
Urban transport as part of "Urban Computing" is now considered one of the most striking big data and knowledge discovery applications.These range from urban traffic management activities with computerized processing of massive amounts of traffic data and geographic data to complex signaling and traffic assignment or control systems, to communications, vehicles tracking, and traveler information operations using increasingly common modern technologies like GPS [2], Wi-Fi, and cellular phone systems.In the field of urban transportation, road traffic state can be represented at any time through the analysis of information collected on all vehicles, which contributes to the development of descriptive models in the form of formulas and rules to reproduce the complex dynamics of traffic.There are several techniques for measuring these data, such as pneumatic tubes, radars, cameras and electromagnetic loops, GPS, RFID, and Bluetooth.Local detectors are practical for relevant measures.The extracted data correspond directly to the number of vehicles detected and used to calculate macroscopic variables such as traffic flow, density, and average speed.Data from various sensors should be stored in large databases.The history of all data will be operated by data mining techniques for efficient traffic management.Traffic congestion has been a major concern of most cities.The congestion phenomena dramatically affect the mobility of people and generate significant stress.Traffic congestion wastes time and energy and causes pollution.For general planning and traffic surveillance several studies provide a collection of velocity data in the field using a GPS device.The data relate to the speed and the number of vehicles in circulation.These data can be used to assemble velocity profiles and indicators of travel time per period.It can also be used to identify congested areas.In order to quantify the severity of congestion, Global Positioning System (GPS) applications have been utilized to collect travel time per period and delay data for many of transportation networks.Data provided by GPS technology proved to be at least as accurate as the data provided by other sensors.
The main contributions of this work can be summarized as follows: (i) the first point concerning the modeling of congestion and allowing developing a measurement process based on GPS technology, (ii) macroscopic traffic simulation which was built and has extended Daganzo model, for predicting the spread of congestion on a network, (iii) distributed infrastructure encompassing various components to meet warehousing services and patterns discovery inherent in the phenomenon of traffic in a transportation network.Big data and mobile computing technologies are the foundation of this infrastructure.
This paper has been organized as follows.In the next section, a state of the art on the macroscopic traffic models has been proposed.Then a network traffic simulation model based on Cellular Transmission Model is introduced.Congestion management and modelling are given in Section 3. Sections 3 and 4 introduce the congestion conceptual modeling using trajectories' metamodel.
Section 5 outlines the global system architecture.Finally, Section 6 states the main conclusions of our work.

Traffic Modelling and Simulation
One of the main societal and economic problems related to transportation in many countries is congestion occurrence in traffic flow.In this context, understanding of the various traffic flow operations is important in managing the congestion and traffic behind road networks.Therefore, several questions arise such as the following: how can we define congestion?What causes congestion?How can we measure congestion?How congestion is propagated through networks?What determines traffic breakdown in terms of location and time?.To answer some of these questions, many traffic flow theories and models have been developed.These models can be deductive by applying physical laws and theories to reach prediction and explanation of traffic operations.They can be inductive by analyzing available real data to fit generic mathematical structures or they can be intermediate by developing basic mathematical modelstructures from which one is fitted using real data (see Figure 1).Various criteria are used to classify developed models, such as both application and independent variables scales, operationalization, processes representation, and especially level of detail which describes the vehicular traffic flow.Scale of the independent variables distinguishes between two time scales to describe traffic system's variables which can be either stochastic or determinist.Developed models can be used either as analytical solutions of sets of equations or as a simulation model in a specific area of application.
The level of detail category considers the distinguished traffic of both entities (vehicles and drivers) and their description level in the respective flow models.Microscopic models describe this later by considering both the time-space behavior of individual drivers and vehicles, as well as their interactions at a high level of detail.Some works differentiate between microscopic models and submicroscopic ones both of which provide high level of detail.The first describe the functioning of vehicles' subunits and the interaction with their surroundings, while the second distinguish and trace individual entities.
Mesoscopic models do not distinguish nor trace individual system's entities either vehicles or drivers but specify the behavior of a small group of entities whose activities and interactions are described at a low detail level.Macroscopic models describe the collective flow by using the analogy between vehicles in traffic flow and particles in a fluid.Several researchers have nourished the macroscopic modeling approach based on the assumption that it provides a correct description of the traffic, compared to the other two categories.In the following paragraph, we present the traffic flow modeling approaches according to fundamental diagram.

Fundamental Diagrams of Traffic Flow.
Fundamental diagrams of traffic flow are curves representing relations between flow and density, density and speed, and speed and flow (Figure 2).These diagrams are essential tools, which enable analysis of fundamental relationships [3][4][5][6][7].
The flow and density vary with time and space.When the density is zero, flow will also be zero, since there are no vehicles on the road, while when the vehicles number increases gradually the density as well as flow increases.Traffic reaches its jam state when vehicles cannot move because their density becomes maximum.At jam density, flow will be zero because the vehicles are not moving.When density is between zero density and jam density, flow is in a free state.Note that same flow can have two different densities.However corresponding speeds are different.

Traffic Simulation Model.
There have been various approaches that have been proposed to apprehend the mechanism of propagation of traffic congestion [8][9][10][11].In this work, we use the cell transmission model with some extensions and apply it to simulate the formation and dissipation of congestion at the semimacroscopic level.Cell transmission model (CTM) is a discrete approximation to the LWR [7] model proposed by Daganzo [3][4][5].CTM (Figure 4) is based on the assumption that the road is divided into similar cells whose lengths are equal to the distance traveled by free-flowing traffic in given interval (Figure 3).A discrete representation of system state is governed by the number of vehicles   () in each cell.Another parameter of model's cell is the maximum number of vehicles   () that can flow into cell  between time steps  and +1.As defined earlier   () is defined to be the product of the cell's length and its jam density, and   () is the product of the clock interval and The key idea in CTM is that the length of a cell is the product of free vehicle speed over discrete time step.Therefore, Δ =   Δ.
If cells are numbered consecutively and if   () designs the number of vehicles moving from cell  to cell  during time interval , then the recursive relationship of the CTM can be expressed as where   () denotes the number of vehicles contained in cell  at time . ℎ () denotes the inflow to cell  in the time interval [,  + 1] and can be expressed by the following formula: and  = /V.If the flow  and density  are uniform between two cells during a time interval then Unlike most traffic models, the CTM adopts a trapezoidal shape of fundamental diagram (Figure 4) which is defined by four properties: the free-flow speed   , the capacity  max , the jam density   , and the speed with which disturbances propagate backward when traffic is congested.This trapezoidal fundamental diagram has almost the same form of inflow to a given cell.Therefore The uncongested case corresponds to  =    leading the wave to propagate downstream.If  =  max , then vehicles leave the cell at capacity.When  = /V(  () −   ()), the number of vehicles which can enter the cell is restricted by the number of vehicles which fit at jam density.

Proposed Model.
In this section, we present extended cell transmission model.Consider a link of road, which is divided into homogenous numbered cells having same length as shown in Figure 3.
The following notation has been adopted to represent the model:  = the "flow rate" of traffic along a segment lane, in vehicles per hour;  = the (average) speed of the traffic, in km per hour;  = the "density" of traffic, in vehicles per mile.
The above quantities satisfy the relationship given in equation (see Figure 2).Specific relationship between speed and density is described by equation (see Figure 2).
Moreover inflow to th cell, expressed in (6), is given by where The trapezoidal fundamental diagram of CTM can be represented as shown in Figure 4. Basic cell transmission model states that if the density in cell  at time step  is greater than   , then the inflow to cell  is given by the difference between jam density and current cell density, weighted by  constant.If traffic system is in its free state then the inflow to cell  is expressed by the fundamental relation of traffic flow.Otherwise, inflow to cell  is equal to jam density.Therefore, CTM can be formulated as follows: If we consider speed variation on time, then the inflow traffic to th cell can be expressed by the mean of current vehicles density and speed instead of V and  constants.Therefore, (6) becomes where its fundamental relation between free speed and jam density gives speed: Hence if the density in cell  at time step  is greater than half density jam, then the inflow to cell  is given by the difference between jam density and current cell density, weighted by speed value of vehicles in the same cell at the same time step.Otherwise if traffic system is in its free state and if density of th cell is less than half jam density the inflow to cell  is expressed by the fundamental relation of traffic flow.This can be formulated as follows: where 0 ≤ (, )Δ ≤ Δ, 0 ≤ (, ) ≤   , 0 ≤ (, ) ≤   , and 0 ≤ ( − 1 → , ) ≤  ,max .One can show that this discretization method is "stable" if the Courant-Friedrich-Lewy condition is satisfied: The extended CTM model given above allows constructing an arbitrary network (Figure 5).In the following the generalized model for urban network where Γ denotes successors cells and   The simulation model formulated in this part traffic is of great help for the prediction of congestion on the transport network.The next section deals with congestion and provides definitions and measurement tools.

Congestion Modelling and Management
One of major issues that most countries are facing is traffic congestion because in both perception and reality, this phenomenon affects both people and society.To handle this problem many researchers have developed models to evaluate or predict the traffic congestion status along road networks.There are three standard models of traffic congestion.The first model states that trip cost increases in traffic flow, approaching infinity as capacity is reached, while the second describes congestion as a deterministic queue related to a bottleneck for a given a flow capacity [12].The third model is based on macroscopic characteristic variables represented by the so-called "fundamental diagram, " and has been subject to extensive debate in the literature.Traffic congestion can be studied either at a microscopic level, by using, for instance, queuing theory [13], at a macroscopic level where vehicles are treated as a fluid-like continuum [14], or at an intermediate level [15].

Congestion Definition.
Defining congestion presents lack of consensus because it is considered as a complex physical phenomenon on the behavior of drivers of vehicles that hinder the progress of other vehicles as demand for limited road space approaches full capacity [16].
For common sense, congestion is the condition when there is too much traffic in the road.Some other productive approaches exist and consider how the phenomenon of congestion influences the transportation system and interacts with socioeconomic objectives and geogovernance.
The U.S. Federal Highway Administration [17,18] notes that the phenomenon of congestion is essentially a complex phenomenon that is related to nonsynchronization between the performance of the road transportation system and the expectations of users of the network.

Congestion Measures Index.
Traffic congestion can be understood as factor of level of traffic services.Therefore, three factors are used to characterize congestion which are congestion perception by roadway user, streams of road networks, and time because of temporal nature of congestion phenomena (Figure 6).
Recurrent congestions are usually precipitated by events that regularly affect the transportation system, while the nonrecurring congestion is unpredictable.In order to deliver better congestion outcomes, a necessary step is measuring congestion.At the local level, managers of urban transport network must have the congestion measures which enable them to meet operational concerns of incident management and regulation.For this purpose, road managers and engineers rely on collected indicators from roadway sensors.However free-flow speeds should not be used as a direct point of reference.These sensors are used to collect both the extent and relative scale and congestion evolution.Some indicators are strongly relevant for road users such as predictability of travel times and system reliability, while others are relevant to road systems operators, namely, speed and flow on the network links.
Nevertheless, these measures are difficult to aggregate and do not directly address the apprehensions of managers and users of urban transport network.System managers need to understand how good the entire network works; they are concerned with how large the volumes of vehicles are on the network impact travel time, while roadway users are more often worried with trip-based measurement like how much time do they need to get to their destinations which highlights travel time reliability and variability of travel conditions.
There is no simple measure of congestion that is useful for all purposes and situations; knowing how much time one must plan to get from one place to another will not necessarily help an engineer better time traffic signals in the central business district.
Road indicators are grouped as follows.
(i) Speed based indicators do not adequately capture congestion effects and can serve as a benchmark for reliability measures.
(ii) Delay based indicators depend on a baseline value for calculating the start of "delayed" travel.This concept becomes misleading at peak hours.
(iii) Temporal based indicators are based on both travel time index and rate.They also depend on the identification of a baseline value for signaling the start of congested conditions.
(iv) Spatial indicators also depend on threshold values in terms of median/average speeds achieved or on freeflow speeds.
(v) Service level/capacity indicators typically reference the design capacity of roadway links and are typically implicitly used to maximize their throughput; these indicators have had the favor of roadway managers.
(vi) Reliability based indicators try to capture how road users typically make trip decisions on congested networks.
(vii) Economic cost/efficiency based indicators measure the cost caused by congestion.
(viii) Other indicators may capture either a population exposure to congested road conditions or fuel consumption.
Link: Volume/Capacity Ratio.The volume/capacity ratio, /, varies from a low of 0 (free flow) to values sometimes greater than 1.0 (severely/heavily congested).Freeways are considered sternly congested when the volume/capacity (/) ratio is larger than 1.0; for quite short periods of time, roads can handle more traffic than their rated capacities.
In the Highway Capacity Manual, the "level of service" (LOS) delivered by the facility refers to both the amount of traffic and the quality of traffic flow.Table 1 summarizes the descriptions of level of service, which range from "A" (freeflow uncongested travel) to "F" (severely or heavily congested flow).
Intersections: Delay.For signalized intersections, the Highway Capacity Manual measures congestion in terms of average delay per vehicle, and "levels of service" are defined based on the average amount of delay.
where   (), is free-flow travel time on link  per unit of time;  1 is flow attempting to use link  per unit of time;  max is capacity of link  per unit of time; and LP() is the average travel time for a vehicle on link .The BPR function is commonly used for computing an optimum traffic assignment.Values for  and  are empirically measured from data.They may be different for different type of roads, whereas typical values for  and  are 0.15 and 4, respectively, based on the empirical data on highways [25,26].
Travel Time and Speed Definitions.Travel time is commonly defined as "the time required for traversing a route between two points of interest." Travel time can be measured directly across the road(s) that connects two or more points of interest.Details of calculation of these variables are given in [19,[22][23][24].

Congestion Trajectory Meta Model
Congestion is a space-time event, and therefore the evolution of congestion is a trajectory.Figure 10 illustrates the concepts directly related to congestion.A means of transport operates in a space-time; it has a trajectory that materializes spatiotemporal events relating to positions occupied and provided by sensors (GPS for instance).Variable traffic speed type travel time is developed by embedded applications in smart phones.Virtual sensors in the form of line or polygon are discreetly positioned on the sections and junctions of the urban network [14,15].These dynamic virtual sensors are stored in a spatial database server.Our previous works on the trajectories are recalled in the following [27].
Trajectories Modeling Overview.Mobile phone networks, GPS-equipped devices, and other indoor and outdoor localization technologies generate a huge amount of spatiotemporal data.Such amount of data coming from many different heterogeneous fields calls.Providing location-based services (LBS) has multiple challenges as scalability, performance, query processing, high-precision positioning, and privacy preservation.Therefore, LBS growth and need unified model to deal and explore captured data to meet the expectations of several application areas [2,[28][29][30][31].In the following, we present different existing presentations of trajectories.
(ii) Structured trajectory [30] is defined as raw trajectories designed into segments corresponding to significant steps in the trajectory trace (e.g., travel).
(iii) Semantic trajectory [30] provides a semantic view of trajectory, which enables applications to associate whatever semantics they want with trajectories.However, this approach is only applicable to transactional schema.Indeed, no work has been published using trajectories as semantic objects with activities on multidimensional data modeling.
(iv) Trajectory based on Region of Interest: other recent approaches describe trajectories in composed spatial and temporal contexts based on Region of Interest [32,33] by defining spatial neighborhood and temporal acceptance.
(v) Space Time Path: the "aquarium" [34] of the relevant time-space unit describes anything having spatial and temporal extent as paths (for instance, people, plants, and animal).The Unified Moving Object Trajectories' Metamodel [27] describes a general metamodel that could be used by different application domains; it can also use an object approach and integrates previous trajectories models described in literature [27][28][29][30][32][33][34][35][36].Using the space-time event ontology, the metamodel models space according to OGC Spatial Data Model [31,34,[37][38][39], observation domain of trajectory, according to OGC Sensor Meta Model and OGC Feature Type, physical and virtual activities between the beginning and the end of Space Time Path [27], sensors used for collecting moving object's traces, and movement patterns using composite Region of Interest.The metamodel as proposed in the class diagram (see Figure 10) expresses congestion as a spatiotemporal event.The congestion is measured by a sensor network based on the GPS technology and in accordance with the sensor metamodel proposed by OGC [38].Spatiotemporal markers are introduced to control the collection of measurements of speed and travel time.The marking technique that we propose generalizes the one suggested by works given in [20,21].
Sensors-enabled mobiles and smartphones are a great choice because they enable the use of many useful and beneficial location applications.For instance, the Android sensor framework allows access of many types of sensors.
Android operating system provides service to collect different sensor data (accelerometer, GSM, WIFI, network and GPS, light, temperature, etc.) on a mobile Android device.The data is stored in a local database and can be transmitted to a remote host periodically (serialized to XML and optionally packed into an rsa-encrypted archive).The service can also broadcast the collected sensor data in order to provide it to other applications.It is possible to locate moving objects by using GSM Cell Tower location in combination with GPS.A system of real-time traffic monitoring based smartphones with integrated GPS takes advantage of the diversity of the network coverage delivered by telecom operators, as well as the correctness in position and velocity measurements provided by GPS devices and the existing infrastructure of telecommunication network (Figure 7).
Figure 7 provides visualizations based on the data collected, including cellular coverage maps that show exactly how strong signal is in any particular area for Casablanca city.

Global System Architecture
The architecture is given in Figure 8.Its foundations are based on the specifications delivered by the Open Geospatial Consortium (OGC).Precisely the OGC reports this requirement by developing the Sensor Web Enablement [37] (SWE) specification series [39].Figure 9 describes the GeoMobility server, which is integrated with other elements of the architecture of location-based services (LBS) [31][32][33][34].The GeoMobility server delivers content such as maps, directions, points of interest, and traffic.It can also access other databases of local content on the Internet.The system involves vehicles equipped with GPS-enabled smartphones, a near-real-time big data collection infrastructure and a traffic patterns' engine, and an information visualization system.
The software components architecture distinguishes four abstraction levels.The first considers the collection of data and patterns of traffic from mobile sensors (GPS-enabled smartphones).The second involves the development of measures and travel time and strengthens the congestions simulator.Finally, a monitoring component of urban traffic network is served by congestion index and specific patterns measures (see Figure 11).databases are not suitable for manipulating the volume, velocity, and variety of all dynamic collected spatiotemporal datasets, required to support such services, when we favor performances rather than guarantee writing data.In the following, we present the technologies used in the proposed architecture to provide a powerful and scalable framework for collecting and visualizing moving object's trajectory's data.
NoSQL Databases.The acronym NoSQL signifies "not only SQL" [40].It is designed for storing data in a much simpler, flatter, and nonrelational manner that allows data repositories to be scaled up.In a NoSQL database, there is no fixed schema so we can store, in the same entity, heterogeneous spatiotemporal data and activities generated by different kinds of locations sensors.In addition, they are often open source, nonrelational, and distributed and often do not guarantee ACID of relational database (atomicity, consistency, isolation, and durability).Relational database scales up by getting faster hardware and adding memories whereas NoSQL, on the other hand, can take advantage of scaling out by spreading the load Our choice of MongoDB as a NoSQL database is motivated by the need for a document-oriented store for visualizing trajectories on the map using JSON documents.
MongoDB Database.MongoDB is a scalable, high-performance, open source NoSQL document-oriented database developed by 10gen in 2009 [41].It is implemented in C++, document-oriented storage, full index, rich documentbased queries, and flexible aggregation and data processing.MongoDB may contain several databases.Using JavaScript for its query language, MongoDB supports both single and complex queries.Storing JSON documents, the basis documents format of many modern geospatial applications, makes it easy to build on top of MongoDB.MongoDB database benefits from ascending, descending, unique, and geospatial indexes.To make performance enhanced, JSON is stored by MongoDB in BSON format [42].To scale its performance on a cluster of servers, MongoDB uses a technique called sharding, which is the process of splitting the data evenly across the cluster to parallelize access.This is implemented by breaking the MongoDB server into a set of front-end routing servers mongos that route operations to a set of back-end data servers (mongod).
MongoDB queries examine one record at a time, which means that queries across multiple records must be implemented on the client or use MongoDB's built-in MapReduce (MR).Though MongoDB's MR can be executed in parallel at each shard, there are two major drawbacks [43]: (i) the language for MR scripts is JavaScript, which is slow and has poor analytics libraries and (ii) the SpiderMonkey JavaScript implementation used by MongoDB is not thread-safe, so only one MapReduce program can run at a time.
Hadoop Distributed Framework.Hadoop is scalable, faulttolerant, and distributed big data storage and processing system [44].Two main components of Hadoop ecosystem are (i) HDFS which is a distributed file system that provides efficient access to application data and (ii) Hadoop MapReduce which is software which was designed to solve the problem of processing in excess of terabytes of data in a scalable way.
Hadoop has been designed to run on multiple servers simultaneously.In practice, the data is spread across different servers, and Hadoop manages a replication system to ensure a high availability of data, even when one or more servers are failing.The strength of Hadoop is to benefit from the computational power of multiple servers unmarked cluster.MapReduce, whose mission is to distribute the treatments on different servers and vice versa to aggregate the elementary results in an overall result, manages the parallelized processing.
MapReduce plays a major role in the treatment of large quantities of data.The distribution of data within many name servers enables parallelized processing of multiple tasks each involving pieces of files.The Map function performs a specific operation on each element.The Reduce operation combines the elements according to a particular algorithm and outputs the result.The principle of delegation may be recursive: the nodes assigned to tasks can also delegate operations to other nodes.
In traditional applications, the built-in aggregation functionality provided by MongoDB is sufficient for analyzing data [40].However, storing and analyzing the collected spatiotemporal data of trajectories need more complex data aggregation.This is the reason to use Hadoop as a powerful framework for complex analytics queries in our system architecture.Thus, we come up with the architecture in Figure 12.
The first stage is to collect spatiotemporal data of trajectories, as GPX, OV2, or CSV files from different GPS enabling devices, using asynchronous .Net sockets.Then the collected data is processed using data reducer, error measures, reverse geocoding, and activity recognition services.After that, data could be stored on MongoDB database processed within Hadoop via one or more MapReduce jobs.Output from these MapReduce jobs can then be written back to MongoDB for later querying and ad hoc analysis.Finally, we export results data using JSON documents in order to be visualized rapidly using Google Maps API.Data Warehousing.In urban traffic engineering context, data warehousing services ensure that the huge volumes of realtime traffic data from a variety of data sources and locations are recorded and maintained for traffic management, traffic information, and traffic analysis and decision support.The real-time and historic data are integrated to monitor network status, manage traffic to reduce congestion, improve air quality, and manage noise impacts.Historical data in the data warehouse are as follows: the trapezoidal profile regarding the fundamental diagram of each link, the travel time of each link, and the average speed observed by a moving object.Other patterns are also considered which result from the data combination from several sensors (see Figure 12).

Conclusion
In this paper, we have presented global traffic congestion management architecture.The main part of our contribution lies in the proposal of a novel model of congestion, which is aligned with our previous trajectories' metamodel.Furthermore, we developed a CTM-like traffic simulation model, for the congestion prediction.Other challenges have been introduced concerning the traffic patterns warehousing in NoSQL database within a distributed infrastructure.The next step for these constructions is to develop the corresponding software solution and achieve major field tests.This paper also provides an overview of integration and management of data provided by GPS network sensors.These data will be used to develop significant information for real-time intelligent transport systems [45].

Figure 1 :
Figure 1: Categorization of traffic flow models.
Travel Time Index (TTI).This index conveniently relates congestion to peak travel times, which people see every day and can understand: TTI = Average travel time in peak hour Average travel time in off-peak hours .(15)