A Road Network Traffic State Identification Method Based on Macroscopic Fundamental Diagram and Spectral Clustering and Support Vector Machine

Accurate identification of road network traffic status is the key to improve the efficiency of urban traffic control and management. Both dataminingmethod andMFD-basedmethods can divide the traffic state of road network, but each has its own advantages and disadvantages.The dataminingmethod is oriented to traffic data with high efficiency, but it can only discriminate traffic status from microlevel, while theMFDof roadnetwork can discriminate traffic status frommacrolevel, but there are still some problems, such as the fact that the discriminantmethod of equivalence points based onMFD lacks theoretical support or that traffic status could not be subdivided. If dataminingmethods and road network’sMFDare combined, the accuracy of road network traffic state identification will be greatly improved. In addition, the research shows that the combination of unsupervised learning clustering analysis method (such as spectral clustering algorithm) and supervised learning machine algorithm (such as support vector machine algorithm (SVM)) is more accurate in traffic state identification. Therefore, a traffic state identification method based on MFD and spectral clustering and SVM is proposed, combining the advantages of spectral clustering algorithm and SVM algorithm. Firstly, spectral clustering algorithm is used to classify the traffic state of road network’s MFD. Secondly, SVM multiclassifier is trained with the partitioned road network’s MFD parameters, and the accuracy evaluation method of classification results based on obfuscation matrix is given. Finally, the connected-vehicle network simulation platform is built for empirical analysis. The results show that the classification results of spectral clustering algorithm are closer to the theoretical values, compared with K-means algorithm, and the accuracy of SVMmulticlassifier is 96.3%. It can be seen that our algorithm can identify the road network traffic state more effectively from the macrolevel.


Introduction
The traffic state of road network objectively reflects the traffic operation of road network, which is the key to improve the efficiency of urban traffic control and management.Traffic state identification method of road network has always been a research hotspot in the field of intelligent transportation system (ITS).Generally, it can be divided into two categories: the methods based on data mining and the methods based on traffic flow fundamental diagram.
1.1.The Methods Based on Data Mining.The methods based on data mining are to use machine learning algorithms such as neural network, deep learning, clustering algorithm, support vector machine, and Bayesian method to mine data, so as to automatically identify the traffic state of the road network.For example, Mehdi (2010) et al. [1] used Kmeans, fuzzy c-means (FCM), clustering large applications (CLARA), and other three clustering methods to classify expressway traffic flow.The results showed that K-means clustering results are most consistent with highway capacity manual (HCM) classification.Montazeri-Gh (2011) et al. [2] used K-means algorithm to classify data sets such as average speed, acceleration, and percentage of idle time collected by floating cars, so as to discriminate the actual road network traffic state.Xia (2012) et al. [3] used Bayesian algorithm to identify highway traffic state based on traffic flow, speed, and occupancy data.Antoniou (2013) et al. [4] proposed a location traffic state estimation method based on dynamic data driving, which used machine learning algorithm to cluster and classify traffic flow data.Yang (2014) et al. [5] used FCM clustering method to discriminate the traffic state of expressway based on the change of speed of big and small cars.Bing (2015) et al. [6] used projection pursuit technique and dynamic clustering method to construct projection index function between traffic parameters and traffic state in order to improve the accuracy of traffic state discrimination.The projection direction of projection index function was optimized by using the hybrid frog leaping algorithm.Finally, the threshold of traffic state discrimination was calibrated by simulation data.Zhang (2016) et al. [7] took speed, traffic flow, and traffic density as data sets and defined the relational relationship between multidimensional attribute information based on grey relational analysis and rough set theory, established the grey relational clustering model, and then introduced the grey relational membership ranking algorithm (GMRC) to discriminate the clustering priority, so as to analyze the degree of road network congestion.Shang (2017) et al. [8] constructed a traffic state discrimination model based on spectral clustering and stochastic subspace integration K-nearest neighbor (RS-KNN) in order to improve the accuracy of urban expressway traffic state discrimination.

The Methods Based on Traffic Flow Fundamental Diagram.
The methods based on traffic flow fundamental diagram are to use the relationship between traffic flow and density, which presents a single peak parabola (called fundamental diagram) to distinguish the traffic state of the road network [9].According to the fundamental diagram, the traffic state of the road network can be divided into two-phase traffic flow, i.e., congestion flow and noncongestion flow.Kerner (2002) et al. [10,11] divided the traffic state of the road network into three phases: free flow, synchronous flow, and wide moving congestion according to the measured data based on the two-phase traffic flow.Guan (2007) et al. [12] divided traffic flow into four steady-state phases: free flow, harmonic flow, synchronous flow, and congestion based on the analysis of the general distribution characteristics of traffic flow speed and density in urban expressway network.Traffic state identification based on fundamental diagram has been developing slowly.With the revelation of macroscopic fundamental diagram (MFD), some scholars used MFD to discriminate road network traffic from macroscopic level.For example, Wang (2012) et al. [13] proposed a traffic condition discrimination method based on equal points of MFD parameters (referred to as "MFD equal points discrimination method").This method determined the road network's MFD through simulation data and divided the traffic state of the road network into five congestion levels by equal points, i.e., free flow, stable flow, unstable flow, restricted flow, and forced flow.However, it directly divided traffic status by equal points, which lacked theoretical support.Zhu (2012) et al. [14] used the measured and simulated data to establish the actual road network's MFD and calibrated the traffic flow parameters of the road network and then studied the difference of the distribution of average traffic flow and average density in the period of time, so as to determine the traffic state of the road network, i.e., free flow, congestion flow, and oversaturated flow.However, the three traffic conditions cannot be further subdivided.Xu (2013) et al. [15] divided the traffic state of the road network into free flow, cumulative flow, and congestion flow according to the observed MFD.However, there are only three traffic conditions.Yue (2014) et al. [16] established the actual road network's MFD model with the help of traffic data of remote traffic detectors and designed the macrotraffic state index of expressway based on speed mileage distribution and divided the traffic state of expressway into five grades, including smooth, basic smooth, mild congestion, moderate congestion, and congestion.However, this method only discussed and verified the feasibility of expressway traffic state discrimination, and it did not discuss whether it was suitable for the traffic state division of the whole road network.Ding (2018) et al. [17] estimated the MFD of expressway network based on floating car detection data and divided the expressway network into five states initially according to the MFD equal points discrimination method [13].Finally, the state parameters of the network are further modified by clustering algorithm according to the real-time data of the network.Similarly, it directly divided traffic status by equal points, which lacked theoretical support.
However, both data mining method and MFD-based methods can divide the traffic state of road network, but each has its own advantages and disadvantages.The data mining method is oriented to traffic data with high efficiency, but it can only discriminate traffic status from microlevel, while the MFD of road network can discriminate traffic status from macrolevel, but there are still some problems, such as the fact that the discriminant method of equivalence points based on MFD lacks theoretical support [13,17] or that traffic status could not be subdivided [14,15] or that the application scope of the method is limited [16].If data mining methods and MFD can be combined, the accuracy of road network traffic state identification will be greatly improved.With the rapid development of large data mining algorithms, K-means, FCM, spectral clustering, and other clustering analysis methods have emerged, and they have been widely used in traffic state discrimination.At the same time, supervised learning machine algorithms, represented by artificial neural network (ANN) and support vector machine (SVM), are also applied in traffic condition discrimination.Recent studies have shown that the combination of clustering analysis method and supervised learning machine algorithm is more accurate in traffic state identification [4,18].Clustering analysis provides necessary prior information for supervised learning algorithm, and supervised learning algorithm can ensure the real-time traffic status discrimination [8].Among them, spectral clustering is a new clustering method based on spectral graph theory.It has been widely used in speech recognition, video segmentation, image segmentation, VLSI design, web page partitioning, text mining, and so on, but its application in the field of transportation has just begun.SVM method, which has a rigorous mathematical basis, has become a hot technology in the fields of clustering analysis, pattern recognition, state discrimination, prediction, and regression analysis and has attracted wide attention of scholars at home and abroad.Spectral clustering algorithm and SVM algorithm have been widely used in many fields because of their strong mathematical theory support, and many examples have proved their good application effect.Therefore, in view of the fact that the discriminant method of equivalence points of MFD lacks theoretical support and traffic status could not be subdivided, this paper proposes a road network traffic state identification method based on MFD and spectral clustering and SVM, combining the advantages of spectral clustering algorithm and SVM algorithm.Its basic ideas are as follows.
Firstly, FCD method is used to estimate road network's MFD and then spectral clustering algorithm is used to classify road network's MFD.The traffic state of the road network is divided into four traffic states, i.e., smooth, stationary, congested, and oversaturated.Then, SVM multiclassifier is trained with the divided MFD parameters of the road network, and the accuracy of the classification results is evaluated by the confusion matrix.Finally, the core road network intersection group of Guangzhou is taken as a research area.The microscopic traffic simulation model is built up by using the Vissim traffic simulation software, and the validity of the algorithm is validated.The specific process is shown in Figure 1.

Traffic State Identification Method Based on MFD
The concept of MFD was first proposed by Godfrey (1969) [19], but its theoretical principle was not revealed by Daganzo and Geroliminis [20,21] until 2007.These two scholars believe that MFD is an inherent attribute of road network, which objectively reflects the general relationship between weighted traffic flow and weighted traffic density of road network.If the floating car is evenly distributed in the road network and its coverage is known, then the road network's MFD can be estimated by using the floating car data (FCD).The formula is as follows [22]: where   and   are the weighted traffic density (veh/km) and the weighted traffic flow (veh/h) of the road network obtained by the FCD estimation method, respectively;  is the ratio of floating cars;  is the acquisition cycle();  is total number of sections of the road network;   is the length of road section  ();   is the number of floating cars during the acquisition cycle  (Vℎ);    is the driving time of the -th floating car during the acquisition cycle (); and    is the driving distance of the j-th floating car during the acquisition cycle ().
Road network's MFD can monitor the traffic state of the road network from the macrolevel, as shown in Figure 2. As shown in Figure 2, when the number of vehicles in the road network is less, the weighted traffic density and weighted traffic flow of the road network are small.With the increase of vehicles in the road network, the weighted traffic density of the road network increases, and the weighted traffic flow increases with the increase of the weighted traffic density.At this time, the traffic state of the road network is defined as smooth state.With the continuous increase of traffic flow in the road network, the weighted traffic density and the weighted traffic flow in the road network are also rising, but the traffic flow in the road network is relatively smooth.At this time, the traffic state of the road network is defined as Mathematical Problems in Engineering a stable state.When the traffic flow keeps entering the road network, the weighted traffic density of the road network rises again, and the traffic flow in the road network will interfere with each other.The weighted traffic flow of the road network will continue to increase with the weighted traffic density of the road network.When the weighted traffic density of the road network continues to rise to a certain range, the change of the vehicle speed in the road network will be unstable.The traffic flow of the road network reaches the maximum, and the traffic state of the road network is defined as the congestion state.When vehicles continue to pour into the road network, the weighted traffic density of the road network continues to increase, the speed of the road network and the weighted traffic flow continue to decline, the traffic demand has exceeded the supply capacity of the road network, the efficiency of the traffic operation has been declining, and the road network has entered a state of oversaturation.At this time, the traffic state of the road network is defined as oversaturated state.As the traffic flow continues to increase, the traffic density continues to increase, reaching the road network congestion density, the speed and traffic flow of the road network are close to zero, and the whole road network is paralyzed.
According to the road network's MFD, the traffic state of the road network is divided into four grades: smooth, stable, congested, and oversaturated.In order to quantify the traffic state of the road network, Wang (2012) et al. [13] analyzed the relationship among flow , density , and speed  based on the road network's MFD and solved the extreme value of the road network operation index.The critical point between steady flow and unstable flow is determined as   = (2/3)  ,   = (1/3)  , where   is the critical velocity,   is the critical density,   is the free flow velocity, and   is the blocking density, and then the traffic state of the road network is divided into five grades directly according to the equivalence of velocity and density.The traffic state of the road network is free flow, stable flow, unstable flow, restricted flow, and forced flow.The corresponding evaluation indicators are shown in Table 1.
In theory, it is feasible to divide the traffic state of the road network into five grades according to the traffic density or speed of the road network.However, this method directly divides the traffic density or speed of the road network by equal points, lacking theoretical support, and whether it is reasonable needs further verification.

Road Network Traffic State Identification
Method Based on MFD and Spectrum Clustering and SVM The common partition criteria are mini cut, average cut, normalized cut, min-max cut, ratio cut, MN cut, and so on.SC algorithm can be divided into two categories: twoway SC algorithm and multiway SC algorithm according to the partition criterion.Among them, two-way SC algorithm includes PF algorithm, SM algorithm, SLH algorithm, KVV algorithm, M cut algorithm, and so on.Multiway SC algorithm includes Ng-Jordan-Weiss (NJW) algorithm [23] and MS algorithm; NJW algorithm has the best effect, compared with the application effect of various SC algorithms.The basic idea of NJW algorithm is to calculate the similarity matrix of sample data sets first and then transform the similarity matrix into Laplace matrix and then construct a new vector space  by using the eigenvectors corresponding to the first  largest eigenvalues of Laplace matrix and build a one-to-one construction matrix  corresponding to the original data in the new  space.Finally, cluster the construction matrix  by using K-means algorithm [23].Therefore, this paper will use NJW algorithm to divide road network's MFD; the specific process is as follows.
(1) It needs to define the sample data set and then determine the number of clusters , K=4 (i.e., smooth state, stable state, congestion state, and oversaturated state); the formula is as follows: where  is the sample data set;   is the i-th scatter point on road network's MFD;  1 is the i-th weighted traffic density of road network; and  2 is the i-th weighted traffic flow of road network.
(2) Normalization of sample data: the formula is as follows: where  max and  min are the maximum and minimum values of the j-th eigenvector, respectively;   is the initial value of the first i-th element of the j-th eigenvector; and    is the normalized value of normalization of the i-th element of the j-th eigenvector.
(3) To calculate the similarity matrix A, the formula is as follows: where ‖  −   ‖ is the Euclidean distance between sample   and sample   and  is the standard deviation of samples; the value is 0.9 in this algorithm.(4) To calculate the Laplace matrix , the formula is as follows: where  is the diagonalization matrix of similar matrix , which satisfies the following conditions: (5) The first  maximum eigenvalues ( 1 ,  2 , . . .,   ) and corresponding eigenvectors ( 1 ,  2 , . . .,   ) of Laplacian matrix  are calculated.The eigenvectors are arranged in descending order according to the size of eigenvalues, and the matrix U is constructed, which is expressed as  = { 1 ,  2 , . . .,   } ∈  × .
(6) To normalize the row vector of matrix U to get matrix Y, the formula is as follows: (7) To take every row vector   ∈   ( = 1, 2, . . ., ) in the Y matrix as a point, K-means algorithm is used to cluster   , and then it gets K clustering, which are expressed as  1 ,  2 , . . .,   .

Design of SVM Multiple Classifiers. SVM was proposed by
Cortes and Vapnik in 1995, which is mainly used to solve the problem of data classification.This method has strict theoretical basis and belongs to supervised machine learning method.It has been widely used in many fields, such as pattern recognition, data classification, data mining, computational intelligence, prediction and analysis, traffic condition identification, traffic sign recognition, vehicle type recognition, traffic demand prediction, traffic accident severity, pedestrian detection, and so on.
SVM multiclassifier is implemented by the LIBSVM software package developed by Professor Lin Chih-Jen of Taiwan University.This software package belongs to One-Versus-One SVMs, whose basic principle is to combine  samples in pair arbitrary combinations, design a SVM for each combination sample, and form k * (k-1)/2 SVMs.When it classifies an unknown sample, each SVM decides the category of the unknown sample, votes on the corresponding category, and finally decides that the category of the unknown sample is the category with the most votes.The basic idea of each SVM classifier is to find an optimal hyperplane and segment the sample data according to positive and negative examples to maximize the classification margin.The algorithm flow is as follows [24].
(1) The training sample set is linearly separable, that is, there exists a plane which can divide the training sample into positive and negative categories.This plane is called hyperplane, and its classification function formula is as follows: where  is the slope of the hyperplane and   is the transposition of the slope;  is a constant;  is a multidimensional vector.At this time, the classification interval is 2/‖‖.When ‖‖ is the smallest, the classification interval is the largest.The problem is transformed into a minimization problem for easy solution.The objective function and constraints are established as follows: (2) When the training sample set is linearly inseparable, it needs to introduce the kernel function K to map the lowdimensional sample set to the high-dimensional one, so that the results calculated in the low-dimensional space are the same as those calculated in the high-dimensional space.Then the linearly inseparable is transformed into linearly separable.Its classification function formula is as follows: where  is a kernel function.The common kernels are polynomial kernels, Gauss kernels, linear kernels, string kernels, etc.;   is the Lagrange coefficient.
There may be data points that are far from the normal position or the outlier data points of Category 2 are mixed in Category 1 area.In order to deal with this situation, individual data points are allowed to deviate from the hyperplane to some extent; i.e., nonnegative relaxation variable   is introduced.The original objective function and constraints are transformed into where  is the penalty coefficient, indicating the importance of the loss caused by outlier sample data points.The smaller the  value, the smaller the penalty for misclassification and the smaller the loss to the objective function.2.
According to the confusion matrix, the classification accuracy of SVM multiclassifier can be evaluated from five evaluation indicators: accuracy, recall, omission, prediction, and commission.
(1) Accuracy.The accuracy rate is the correct ratio in all test samples; the formula is as follows: (2) Recall.The actual recall rate refers to the proportion of the correct predicted value of the actual category to the total number of the actual categories in the i-th actual classification; the formula is as follows: (3) Omission.The actual omission rate refers to the proportion of the actual category value to the total number of the actual category value in the i-th actual classification; the formula is as follows: (4) Precision.The prediction accuracy rate is the proportion of the correct value of the prediction to the total number of samples of the prediction category in the j-th prediction category; the formula is as follows: (5) Commission.The commission refers to the ratio of the value of the error prediction to the total number of samples in the j-th prediction category; the formula is as follows:

Empirical Analysis
4.1.Experimental Platform Construction.The intersections of Tianhe District core road network in Guangzhou are chosen as the experimental area [25], as shown in Figure 3.
A microtraffic simulation platform for connected-vehicle network is built by using VISSIM simulation software and based on the road network layout, the actual road lane layouts, the intersection plane layout, the traffic flow data, the intersection signal control schemes, the traffic organizations, and other information.
The MFD estimation accuracy of road network can reach 97% when the coverage of networked vehicles reaches 42% after many simulations.Therefore, the coverage rate of networked vehicles is set to 42%, and the related data of each networked vehicle are read every 15 seconds.The statistical period of data is 120 seconds, and the simulation time is 3240 seconds.The simulation results of the online vehicle data file ( * .fzp) are imported into EXCEL file, and the FCD estimation method is realized by VBA macroprogramming.Finally 270 sets of the weighted traffic flow (  ) and the weighted traffic density (  ) are obtained, and the MFD of the simulation network is drawn, as shown in Figure 4.
The data points of road network's MFD are fitted by function, and the critical weighted traffic density (  ) and the critical weighted traffic flow (  ) are calculated, as shown in Table 3.

Analysis of Experimental Results
(1) The Partition Result of Road Network's MFD Based on NJW Spectral Clustering Algorithm.The NJW algorithm is programmed in Matlab software.The parameter of the road network's MFD is input into the program, and the result of traffic state division is output, as shown in Figure 5.In order to better analyze the effect of spectral clustering algorithm, K-means algorithm is selected as a comparison to cluster the MFD parameters of road network, as shown in Figure 6.
As shown in Figures 5 and 6, the partition result of road network's MFD based on K-means algorithm only considers the size of MFD parameters but does not consider the specific location of MFD scatters.It incorrectly incorporates part of the congestion scatters into the oversaturated state and individual oversaturated scatters into the congested state, while the partition result of road network's MFD based on spectral clustering is more reasonable.It can distinguish traffic conditions into four better states, i.e., smooth, stationary, congested, and oversaturated.The scope of traffic state partition of the two clustering methods is obtained by Figures 5 and 6.At the same time, according to the MFD fitting function of the whole road network in Table 1, the equal   value of traffic state partition of the road network is calculated by using MFD equal points discrimination method [13], as shown in Table 4.
As shown in Table 4, compared with the K-means-based traffic state partition results, the result of road network traffic state partition based on spectral clustering is close to the equal value.It shows that the results of spectral clustering algorithm and equivalence method are feasible, but    (3) Accuracy Evaluation of SVM Multiclassifier.The classification result of the classifier is sorted out into confusion matrices, and the accuracy evaluation indexes of the classifier are calculated, as shown in Table 5.As shown in Table 5, it can be seen that the prediction accuracy of Category 1 is 100%, the actual recall rate is 90.3%, the prediction accuracy of Category 2 is 97.2%, the actual recall rate is 97.2%, the prediction accuracy of Category 3 is 97.1%, the actual recall rate is 97.1%, the prediction accuracy of Category 4 is 91.9%, and the actual recall rate is 100%.From the overall prediction results, the accuracy of SVM multiclassifier is 96.3%.Therefore, the results of SVM multiclassifier are ideal.

Conclusion
In view of the fact that the discriminant method of equivalence points of MFD lacks theoretical support and traffic status could not be subdivided, this paper proposes a road network traffic state identification method based on MFD and spectral clustering and SVM, combining the advantages of spectral clustering algorithm and SVM algorithm.Based on the results of empirical analysis, this paper draws the following conclusions: (1) Spectral clustering-based traffic state partition results are closer to the theoretical values, compared with K-meansbased traffic state partition results.
(2) The accuracy of SVM multiclassifier is up to 96.3%; it can be seen that the classification results are ideal.
(3) This paper combines the advantages of spectral clustering algorithm and SVM algorithm, which can identify the traffic state of road network more effectively and automatically from the macrolevel on the basis of MFD.
(4) It is noteworthy that this paper estimates the road network's MFD by using the data of the simulation environment of the networked vehicle and assumes that the coverage rate of the networked vehicle is as high as 42% and is evenly distributed in the road network, but, in the actual road network, it is impossible to obtain such a high coverage rate of the networked vehicle, and there is an uneven distribution of the networked vehicle.Therefore, in practical application, a large amount of traffic data is needed to accurately estimate road network's MFD.

Figure 3 :
Figure 3: Road network layout diagram of vehicle networking simulation platform.

Figure 5 :
Figure 5: The division results of road network's MFD based on spectral clustering.

Figure 6 :
Figure 6: The division results of road network's MFD based on Kmeans.
the classification of the actual test sets the classification of the predictive test sets
-means clustering and FCM clustering, and it can obtain global optimal solutions.The graph theory based on partition criterion is most closely related to spectral clustering results.
algorithm is a research hotspot in the field of machine learning.It is a point-to-point clustering algorithm based on spectral graph theory, which transforms the data clustering problem into the optimal graph partition problem.SC algorithm is suitable for spatial clustering problems with arbitrary shapes, compared with typical clustering algorithms such as K

Table 1 :
The identification of traffic state and the classification of congestion level.

Table 2 :
The confusion matrix of the classification results of SVM multiple classifiers.The Accuracy Evaluation of SVM Multiple Classifiers.The test classification results of the SVM classifier are transformed into the confusion matrix, as shown in Table

Table 3 :
The fitting function of MFD.

Table 4 :
The scope of traffic state parameters.

Table 5 :
The confusion matrix and the accuracy evaluation index of the classification results by SVM multiclassifiers.