A Novel Multisensor Traffic State Assessment System Based on Incomplete Data

A novel multisensor system with incomplete data is presented for traffic state assessment. The system comprises probe vehicle detection sensors, fixed detection sensors, and traffic state assessment algorithm. First of all, the validity checking of the traffic flow data is taken as preprocessing of this method. And then a new method based on the history data information is proposed to fuse and recover the incomplete data. According to the characteristics of space complementary of data based on the probe vehicle detector and fixed detector, a fusion model of space matching is presented to estimate the mean travel speed of the road. Finally, the traffic flow data include flow, speed and, occupancy rate, which are detected between Beijing Deshengmen bridge and Drum Tower bridge, are fused to assess the traffic state of the road by using the fusion decision model of rough sets and cloud. The accuracy of experiment result can reach more than 98%, and the result is in accordance with the actual road traffic state. This system is effective to assess traffic state, and it is suitable for the urban intelligent transportation system.


Introduction
With the rapid development of the urbanization, the motor vehicle ownership and the road traffic flow are rapidly increasing, and the traffic congestion has become a common problem in the world [1]. Therefore, an accurate and scientific assessment of the current traffic state can provide the basis for traffic guidance systems and traffic control systems, optimize traffic management program, and reduce the incidence of traffic congestion. It is an important part to maximize the social and economic benefits of transportation resources [2]. More and more different types of vehicle detectors, such as coil detectors [3,4], probe vehicles [5,6], microwave detectors, and videos [7,8], are employed to collect traffic information with the rapid development of sensor technology.
Many researchers have attempted to develop more efficient assessment models in order to obtain better results. Bachmann et al. [9] investigated the multisensor data fusionbased estimation techniques to fuse data from loop detectors and probe vehicles to accurately estimate freeway traffic speed. Bachmann et al. [10] studied the fusion techniques with Bluetooth and loop detector to improve the accuracy of traffic speed estimates. Berkow et al. [11] used the traffic signal and probe vehicles data to estimate the real-time transit location data and online implementation of arterial travel time information. Guo et al. [12] used Kalman filter approach to estimate the speed with single loop detector measurements under congested conditions. Kong et al. [13] proposed a fusion-based system composed of real-time traffic state surveillance, which can realize the real-time traffic state estimation with over 10,000 bidirectional road sections. In order to achieve the estimation of the state of the road network traffic, Kong et al. [14] also combined Kalman filter with evidence theory as a fusion platform and estimated the speed of the road network based on traffic wavelet theory. El Bantli [15] proposed the optimal linear estimation and weighted least squares method theoretically based on the incomplete traffic data, and the method was applied to estimate the road travel time. Klein et al. [16] used the data 2 The Scientific World Journal fusion and D-S theory to the decision support system of advanced traffic management. Li and McDonald [17] put forward a method of link travel time estimation by using a single GPS equipped on probe vehicle. Cheu et al. [18] put forward a fusion model based on neural network and test its effects by using simulation data. Smith and Conklin [19] used the local lane distribution patterns to estimate missing data values from traffic monitoring systems. The methodology used time-of-day lane distribution patterns at a particular location to estimate missing detector data and the results of this methodology showed that the error ranged from 6% to 8%. Chen et al. [20] proposed a method using historical data to detect bad data samples and impute them into missing or bad samples, and it gave a better estimate than previous methods. Treiber et al. [21] presented an advanced interpolation method for estimating smooth spatiotemporal profiles of local highway traffic variables such as flow, speed, and density, to fuse the traffic data and get the dynamic traffic information. Sumner [22] used fuzzy logic to fuse the detected traffic data information, quantified traffic conditions, and made a comprehensive assessment of the traffic state. Chang [23] applied the neural network to Brainmaker project, which made the current traffic state and historical information for pattern matching, and it improved the effect of computer traffic monitoring and automatic incident detection.
The traditional methods of urban road traffic state assessment are usually based on complete data obtained by the detection sensors. However, during the whole process of data acquisition, transmission, and processing, there are some factors, as follows, that cause incomplete data [24,25]: (i) the error installation and correction of the sensors; (ii) under the influence of abnormal weather or environment, which cause the occasional data exceptions or data missing; (iii) abnormal work of detection sensors; (iv) hardware or software failure of traffic management center system; (v) communication interrupt between traffic detection sensors, regional controller, and traffic management center; (vi) no enough evaluation and sustainability maintenance of system.
These factors have a great impact on the effective and accurate assessment of the traffic state. These incomplete data are often manifested as irregular data collection, great data acquisition error, data missing, and so on [26,27]. Texas Transportation Institute (TTI) showed that the complete rate range of traffic management system is from 16% to 93%, and the average value is 67% [28]. It means that the incomplete data is one of the outstanding problems existing in the traffic management system. Therefore, improving the effectiveness and completeness of traffic flow data and making road traffic running state assessment results more reasonable and accurate have an extremely important significance for the development of urban intelligent transportation.
In this paper, a new multisensor traffic state assessment system is developed. This system can obtain traffic data, which may include incomplete data, by coil detection sensors, microwave detection sensors, and probe vehicle detection sensors. This data is processed by a novel algorithm based on the fusion decision model of rough sets and cloud. The algorithm first checks the data validity. After the preprocessing, the selected incomplete traffic data are fused and recovered using the history information. Then, a space-matching fusion model is proposed to estimate the average travel speed. Finally, a fusion decision model of rough sets and cloud is presented to assess the traffic state using the flow, speed. and occupancy, which acquired form of the multisensors. Experimental results show that our system is suitable for traffic state assessment with incomplete traffic data.

Multisensor Traffic State Assessment System
In view of the existing problem that the traffic flow data is incomplete, a method of traffic state assessment based on multisensor is proposed.
The system obtains the traffic data to get the multiple source information by the fixed detectors (coil detection sensors and microwave detection sensors) and the probe vehicle detector (floating car detection sensors). The testing environment of traffic flow data and the main system elements are shown in Figure 1.
The main three sensors of this system are as follows: 2.1. Coil Detection Sensors. Generally, coil detection sensors with square shape are laid under the roads as shown in Figure 1. When vehicles pass from these coil sensors, the inductance value of coil loop will be changed, which cause the change of frequency. And the detection sensors use this change to judge whether there are cars that pass the sensor or not. This kind of sensors can detect the traffic flow, speed, queue length, and other traffic parameters. The advantages of this sensor are low cost, high reliability, and high detection precision. However, when the distance between vehicles is less than 3 meters, the detection precision will be greatly reduced due to the magnetic field interference.

Microwave Detection Sensors.
Microwave detectors shown in Figure 1 are sensors using microwave transmission form to detect traffic data. They send microwave in the test roads and detect the traffic parameter by calculating the receiving frequency and receiving time. Microwave detection sensors can detect traffic information, such as flow, occupancy, speed, and direction. This kind of sensors can adapt to all kinds of bad weather and have strong antiinterference ability, but it will greatly reduce the detection accuracy, while the vehicle speed is relatively slow. time and travel speed indirectly. GPS data have a strong continuity and the acquisition range is extensive. However, the probe vehicle detection precision is affected by the GPS positioning accuracy, and data communication is susceptible to electromagnetic interference. The assessment method is also one of the important elements in our multisensor traffic assessment system. And the overall flow chart of the assessment method is as shown in Figure 2

Validity Check of Multisensor Data
The purpose of validity check is to screen the incomplete data out of traffic flow information and reduce the interference during the process of traffic state assessment. Three parameters of traffic data flow and the mechanism of traffic flow are used to adapt to the validity check of different types of incomplete data.
The method mainly includes the following four steps.
Step 1 (basic data screening). Before macrodata screening, these data need to determine whether it contains a negative or missing data [21]. Three basic traffic parameters, traffic flow , speed V, and occupation , are considered. Through analyzing the relation of three parameters, the incorrect data can be screened. The approach is listed in Table 1.
Step 2 (threshold inspection). The threshold test determines the upper and lower threshold of single information based on the statistical data. If the test value is not in the range of the upper and lower threshold, it can be considered to be erroneous data. Taking a lane, for example, there is a maximum limit value of flow and the minimum value is 0;  = 0, at the same time, the maximum value of occupancy is 100% and the minimum is 0%.
Step 3 (mechanism inspection of traffic flow). The theory of the mechanism inspection is mainly according to the basic characteristic of the traffic flow and the functional relation between the three parameters of the traffic flow. If the data does not conform to the inherent rule of traffic flow theory, this data set can be considered wrong and it should be deleted or recovered.
Step 4 (abnormal inspection). Under normal traffic conditions, the change in the network traffic flow is a stationary random process. And the amplitudes of traffic data should be within a certain range of change. However, when a traffic incident occurs, there goes a large deviation. This paper uses the mean value and variance of previous data of  moment to identify the fault data. That when − 2 ≤ ≤ + 2 is satisfied, the data is normal or is abnormal [22].
The above four steps can almost deal with all possible data error. Take an example of traffic flow data, the fault data is filtered after the validity check and the result is shown as in Figure 3.

Traffic State Assessment Method
The traffic state assessment method includes three stages. First, restore the incomplete traffic flow data. Second, fuse and estimate the speed value. Third, build fusion decision model.

Restoration of Incomplete Traffic Flow Data.
The traditional restoration algorithm based on incomplete data includes linear interpolation algorithm, historical trend restoration algorithm, restoration method based on the spatial correlation, and restoration method based on the BP neural network [23]. The advantages and disadvantages of methods are shown in Table 2.
Due to the heavy traffic on the road, the traffic flow data have small fluctuation and show the obvious time correlation obviously. So the historical data should be used for fusion estimation. In this paper a traffic data restoring algorithm based on the generation of area geometry, which specializes in the analysis of history traffic flow data and the connection between the area geometric formed by the adjacent historical data and the present moment data, is proposed. The area of the geometric region formed by historical data can reflect the changing trends and the oscillation range of traffic flow data. So we make full use of the area to restore the present moment incomplete traffic flow data. The volatility of the traffic flow data can be shown by the recovered data.
Take the example of flow. As shown in Figure 4, the flow data −5 , −4 , −3 , −2 , −1 are obtained by the traffic detector, respectively, at the moment −5 , −4 , −3 , −2 , and −1 . Due to the fault of sensor or transmission, the flow data at the moment is incomplete. The area of the triangle is defined as −1 , and it reflects the nonlinear degree of −3 , −2 , −1 . When −1 is large, the oscillation amplitude of data −3 , −2 , −1 is increasing. And when −1 is 0, it indicates that the data −3 , −2 , −1 is changed in liner by time. There is a correlation between the data and the historical data and their nonlinear trend. So that , the  Figure 4: Sketch map of triangle area geometry by traffic flow data. area of Δ −2 −1 , is connected with −1 . In order to make the restored value more reliable, the area −2 of Δ −4 −3 −2 and the area −3 of Δ −5 −4 −3 are taken into account. The three triangles are given different weights to determine the area of Δ −2 −1 finally. The function for calculating is defined as the following formula: where 1 , 2 , and 3 are the weights of −1 , −2 , and −3 . And then the method is used to get the weights. Define where −4 is the area of Δ −6 −5 −4 . And ( = 1, 2, 3) is the intermediate variable used to calculate the weights . And define Therefore, if the geometric area constituted by the incomplete data and the last two neighboring data is fixed, the incomplete data can be fixed soon.
We assume that there are two units between the adjacent moments. Then the height of the triangle Δ −2 −1 is computed by the following formula: And the linear equation of −2 −1 is According to Formulas (4) and (5), we can get two traffic flow data values at moment and these are and (defining > ) The solving process is shown in Figure 5. It can be figured that the flow value of moment is equal to , when −1 < −2 , and equal to when −1 > −2 . This ensures that the restored data for moment reflects both the historical data trends and the oscillation amplitude.

Fusion and Estimation of Speed Based on Space Matching.
In order to improve the effectiveness and accuracy of traffic flow data, a fusion and estimation model of speed based on space matching is proposed in this paper. This method uses the mean speed information from the probe vehicle detector and the coil detector, sets up the fusion model of road speed, and trains the weights and deviation of the model by Newton method, to obtain the final speed data. The flow chart is shown in Figure 6.

Speed Fusion and Estimation
Model. The speed fusion and estimation model based on probe vehicle detector and coil detector [29] is built as shown in Figure 7.
In the model, the whole road is divided into upstream and downstream, which is expressed with 1 and 2 , respectively. On the downstream side of the road, because of the influence of the traffic lights, traffic will be lined up; so it is unable to provide effective information for sections of the mean travel speed; so this paper selects the upstream road sections as the research object to estimate the mean speed. The upstream of the road is divided into sections of equal length, the th is close to the downstream of whole road, and the coil is placed on the th section, so that we can get the parameters of the vehicles such as flow, speed, and occupancy through the cross section. There is no fixed detector in sections 1 to − 1, and the dotted boxes represent the spots of the cross section. The data come from the probe vehicle detector, and this model is  mainly used to access the speed data of probe vehicle detector. In this paper, the data of probe vehicle detector is regarded as coil detector.

Speed Fusion Method.
Since the probe vehicle detector is a part of the traffic participants, and on the other hand, the coil detector can only collect the spot speed, so it cannot estimate the mean speed very well. For these reasons, it is necessary to make space-matching data of the probe vehicle detector and coils. In other words, eliminate the difference between the data of probe vehicle detectors and data of coils with data correction. According to Figure 7, the mean speed in every section can affect arterial mean speed; so it can be estimated arterial mean speed through the weight sum of mean speed in every section where V is the arterial mean speed (km/h), V is the mean speed of th section (km/h), and is the weight of the corresponding section.
∈ [0, 1]. is the deviation, which is used to correct fusion result. So the function of total error is The where V( ) is the estimated mean speed of the th sample and V( ) is the actual mean speed of th sample. is the total number of sample data.
In order to find weight and deviation when getting the minimal total error, it needs to train Fusion model. The weight is trained by Newton method, which is a fast optimal method based on quadratic's Taylor series. Newton method is defined as where +1 is +1th weight or deviation, is previous weight or deviation, is coefficient of variable, and −1 is Hessian matrix which is obtained from error performance function in the current weights and threshold value. The basic idea of Newton method is that with a quadratic function locally approximate ( ) at first and then find minimum of approximated function. The Hessian function can be expressed as [30] ∇ 2 ( ) = 2 ( ) ( ) + 2 ( ) , where ( ) is Jacobean matrix: where V ( ) is the error vector. When ( ) is small, Hessian matrix is approximately expressed as If ( ) is the form of (8), gradient can be expressed as follows: where ] .
Making second derivative to formula (13), the , th element of result is that So Newton method is expressed as follows: The Newton method has fast convergence speed and always can be found minimum of quadratic function in one step; so it can be used to train weight and deviation of fusion model. When data of probe vehicle and data of coils detector are fused, the fusion result can reduce training time and reduce the consumption of computer resource with this method. It also can guarantee real-time performance of fusion algorithm.

Fusion Decision Model of Rough Sets and Cloud.
After the restoration and estimation, a fusion decision model of rough sets and cloud is presented in this paper to assess the road traffic state.

Cloud Model Review. Assume
is the quantitative domain represented by an exact value and is the qualitative concept of . If quantitative value is a random realization of concept , and ∈ , therefore, ( ) ∈ [0, 1] which refers to the membership grade of in , is a random number with a stable tendency The distribution of in is called cloud model, and is called cloud drop, just as shown in Figure 8.
There are three digital features of cloud [31,32]: expected value Ex, entropy En, and hyper entropy He. Ex is described as the center of the whole cloud drop in the domain . It reflects the digital domain coordinates which has the most representative of the concept. En is the fuzzy measurement of the qualitative concept. It reflects the range that can be accepted by the language value in the digital domain. He is the degree of dispersion of the entropy En, which is the entropy of En. It reflects cohesion degree of the cloud drops.
If the membership grade ( ) of in satisfies the following equation [33]: where ∼ (Ex, En 2 ) and En ∼ (En, He 2 ), then the distribution of in is called normal cloud [34].

Cloud Generator Review.
There are mainly two kinds of cloud generators, named forward cloud generator and backward cloud generator [35][36][37]. Forward cloud generator is described as the algorithm to generate a quantity of cloud drops drop( , ) of the normal cloud model by using the three digital characteristics (Ex, En, He), which is shown in Figure 9.
The forward cloud generator algorithm description is as follows.
Step 1. Generate a Gaussian random number En , with the expected value En and the standard deviations He 2 .
Step 2. Generate a Gaussian random number , with the expected value Ex and the standard deviations En . 8 The Scientific World Journal   Step 3. Make be a detailed quantitative value of concept , called the cloud droplets.

Repeat
Steps from 1 to 4 until producing cloud droplets. And the backward cloud generator is the inverse process of the forward cloud generator; it transforms the given sample of data to the qualitative concept, with the expression by digital characteristics of the cloud {Ex, En, He}; it is a mapping from sample of data to concept, which is shown in Figure 10.
The back cloud generator algorithm description is as follows.

Rough Set Theory
Review. The main idea of rough set theory [38] is to divide the given space according to the equivalence relation; at the same time the equivalence property of knowledge is guaranteed. Attribute reduction is an important content of rough set; it deletes the redundant or unimportant condition attributes and attribute values under the condition that keeping the constant ability of classification, and then get the rules of the condition attribute relative to decision attribute decision. The method is simple and does not need any priori information; so it can be applied into the generation of fusion decision rules. Because of its objectivity uncertainty The Scientific World Journal  description of the problem, it is well applied in traffic state assessment.

Proposed Fusion Decision
Model. When using the rough set theory to the analysis of the actual data and knowledge, each attribute value of the decision table must be discrete, and though there exists fluctuation in traffic flow data, in the local scope it has certain continuity. So in this paper, we use the cloud model to realize the discretization of traffic flow data.
The fusion decision model of rough sets and cloud is mainly based on cloud model theory. The algorithm steps are as follows.
Step 1. For multiple parameters of traffic detector, select the qualitative concept, respectively, and determine the scope of its quantitative value.
Step 2. According to the cloud model theory, generate a different qualitative concept of cloud, respectively, and make the continuous values of traffic flow data discrete.
Step 3. Regard the discrete traffic parameters of the samples as condition attributes, obtain the status value of every moment as decision attribute according to the expert system, and establish a decision table.
Step 4. Delete duplicate objects in a decision table.  Step 5. Calculate each of the importance degree of condition attributes for decision attribute and delete the condition attribute whose important degrees are 0.
Step 6. According to the knowledge reduction method of rough set, delete redundant condition attributes.
Step 7. Delete the redundant attribute values for each object and obtain the final decision rules.

Result of Data Restoration.
The traffic flow data, which were acquired from the Beijing DeShengMen bridge to the Drum Tower in June 19, 2009, were taken as the original data. In order to test the effectiveness of the incomplete data restoration algorithm, the original data were modified and manufacture some incomplete data artificially. Then we used the proposed algorithm to deal with the incomplete data, and the result is shown in Figure 11. Figure 11 shows that the 8 incomplete data points under this algorithm can effectively be detected accurately. To further illustrate the effectiveness of the algorithm, we compare with two other algorithms: the linear interpolation   algorithm and the historical trend of restoration algorithm. And the result is shown in Table 3. Take number 158 as an example, the modified data means that we change the original data from 23 to 81. Compare with the relative error of different algorithms, the mean relative error of the proposed algorithm is 1.85%, while the historical trend restore algorithm is 14.78%, and the restoration of the linear interpolation algorithm is 11.90%. The effectiveness of the proposed algorithm in this paper is much better than other methods.

Result of Speed Fusion Experiments.
In order to verify the reliability of the algorithm, weighted average method, Kalman filtering method, and BP neural network method The Scientific World Journal   have been taken into the experiment analysis. The specific analysis of experimental data with the three methods is shown in Figure 12.
After the match of data detected by probe vehicle detector, we extracted the speed in each section of the road, respectively, and take its average value as the input of the fusion model. Here the road is divided into six subsections and the data detected by coil detector is constant. The 60% of the data is taken as the training sample with the method of 10-fold cross validation, and the Newton method is used to determine the weight of each speed value. And then the remaining 40% data is tested with the steps as mentioned above. The result of the space-matching fusion method and error curve is shown in Figure 13.
We assess the strengths and weaknesses of these methods with these indicators such as mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE), mean square percentage error (MSPE), and the maximum error (MAXERR (%)). The evaluation results are shown in Table 4.
Through observing the comparison results, we find that the evaluation result of space-matching fusion method is much better than the weighted average fusion method and the Kalman filtering method and much similar to the fusion effect of the neural network method. But the calculation neural network method is relatively complex. In conclusion, the method of fusion and estimation of speed based on space matching not only guarantees the timeliness but also improves the reliability and validity of data.

Analysis of Traffic State Assessment Result.
Qualitative concepts of traffic flow parameters are given as follows: traffic flow = {very low, low, normal, high, very high}, speed = {very slow, slow, normal, fast, very fast}, and occupancy = {very low, low, normal, high, very high}; we use 0, 1, 2, 3, and 4 to represent the qualitative concept, respectively. And in Table 5, the threshold value of qualitative concepts of flow, speed, and occupancy is listed.
The cloud models (Ex , En , He ) are shown in Table 6. Then, the collected traffic flow data and the cloud listed in Table 6 are substituted into (18), respectively, if is the maximum in ( = 0, 1, . . . , ), then traffic flow parameters value belongs to the cloud . Table 7 lists parts of the identification results traffic flow parameters based on cloud theory.
We can get the final decision rules based on rough set theory, which is shown in Table 8. Figure 14 lists the results of assessment of the traffic state. There are four states below: 1 represents the smooth traffic, 2 represents the slight congestion, 3 represents the moderately congestion, and 4 represents the overcrowded.
In order to explain the effectiveness of the algorithm better, we use crowded identification rate (IR) and false identification rate (FIR) to test the algorithm. The test result is shown in Table 9. It shows that the identification rate is over 98% and the misjudgment rate is low.
Experimental results show that the restoring algorithm based on self-adaptive generation of area geometry and the fusion and estimation model of speed based on space matching improve the completeness and effectiveness of the traffic flow data. The fusion decision model of rough sets and cloud can be used to assess the traffic state and achieve the desired results.

Conclusion
In this paper, a multisensor traffic state assessment system was developed. As the sensors usually acquire incomplete data of traffic data, our system provides a novel and robust algorithms to solve this problem. The results of the restoring algorithm based on self-adaptive generation of area geometry are comparatively consistent with the real data, and the mean relative error is only 1.85%, which improves the reliability of the data greatly. And with the speed fusion estimation model based on space matching, the estimation precision is above 90%, which improves the effectiveness and the accuracy of the speed data. Finally, the traffic state assessment based on the fusion decision model of rough sets and cloud is applied to the actual road traffic condition, and the evaluation accuracy is above 98%. The experiment results show that the proposed system is feasible, effective, and accurate, and it has great important significance to the development of the urban intelligent transportation system.