An Empirical Study on GAN-Based Traffic Congestion Attack Analysis: A Visualized Method

With the development of emerging intelligent traffic signal (I-SIG) system, congestion-involved security issues are drawing attentions of researchers and developers on the vulnerability introduced by connected vehicle technology, which empowers vehicles to communicate with the surrounding environment such as road-side infrastructure and traffic control units. A congestion attack to the controlled optimization of phases algorithm (COP) of I-SIG is recently revealed. Unfortunately, such analysis still lacks a timely visualized prediction on later congestion when launching an initial attack. In this paper, we argue that traffic image feature-based learning has available knowledge to reflect the relation between attack and caused congestion and propose a novel analysis framework based on cycle generative adversarial network (CycleGAN). Based on phase order, we first extract four-direction road images of one intersection and perform phase-based composition for generating new sample image of training. We then design a weighted L1 regularization loss that considers both last-vehicle attack and first-vehicle attack, to improve the training of CycleGAN with two generators and two discriminators. Experiments on simulated traffic flow data from VISSIM platform show the effectiveness of our approach.


Introduction
With the development of Internet-of-Things (IoT), transportation system is being transformed by various smart sensing devices and connected vehicle (CV) technology [1,2]. Based on communication and collaboration among vehicle, road-side unit (RSU) [3], and signal system, such intelligent transportation system shows the desirable efficiency and effectiveness of mobility and safety. A typical case is in September 2016; a Pilot Program [1] of CV-based intelligent transportation system was launched by the USDOT (U.S. Department of Transportation) to firstly deploy and test in three states including California, Florida, and New York. Unfortunately, an algorithm-level attack on controlled optimization of phases-(COP-) based [4,5] intelligent signal system (I-SIG) [6] is exposed in 2018, in which through data spoofing of vehicle's GPS location and speed, an attacker can compromise the vehicle-side units of a last vehicle existing with quite low attack cost, then mislead the traffic control decisions at proper timing, causing unexpected heavy traffic congestion. This worst result shows that one single attack vehicle is able to cause total congestion of 14 times higher [7]. This is very surprising, since the I-SIG system uses an optimal signal control algorithm COP that can decrease the congestion degree for an intersection. Thus, it is highly important to analyze the traffic congestion attack caused only by one malicious vehicle instead of lots of vehicles, helping to provide effective defenses before wide deployment to the ground.
Although the previous work [7] reveals the existence of congestion attack on COP and analyzes the reason of COP decisions influence, it still lacks detailed guidance about defense even prediction of such attack. Thus, we aim to study the prediction of I-SIG congestion attack in this work. Compared to traditional congestion prediction, the attack-based congestion prediction is totally different, and it is because any classical traffic flow-related theory such as traffic wave distribution does not well fit. Due to timing spoofing attack to the vulnerability of COP, the congestion occurs unexpectedly which seems impossible in normal signal planning, having a nonlinear traffic delay of 200% in short time. There are several urgent questions: (1) What features can be used to characterize the attack? (2) Is there any correlation between the attack and congestion degree? (3) Is congestion degree able to reflect the details of attack consequence? To the best of our knowledge, no similar work focuses on the above questions. Thus, towards the difficulty of feature representation and extraction for quantifying attack, we aim to realize an approach to congestion prediction of attack, based on unsupervised learning from attack image to congestion image, so as to explore new visualized analyzing method to reveal detailed attack results in each phase of intersection. This is our motivation, aiming to benefit all stakeholders for I-SIG, including experts of transportation and security.
In this work, towards I-SIG congestion attack, we are the first to predict the congestion caused by spoofing attack based on adversarial generative network (GAN) [8], an unsupervised learning of machine learning, through directly utilizing high-level image features of traffic, instead of basic features such as location, speed, and delay of vehicle. We firstly perform a phase-based image processing, through background filtering, splitting, and joining operations to form new image which represents a global image of intersection according to certain phase order. We then take such image pairs of initial attack time and congestion time of 30 minutes later as batch training inputs into CycleGAN [9]. We design a weighted L1 regularization loss to learning and distinguish fine differences between the last-vehicle attack and first-vehicle attack. In addition, we also use the trick of early stop to improve CycleGAN performance.
We implement the I-SIG and experiment through visualized simulation in PTV VISSIM [10]. The experiment shows the effectiveness of our approach compared to the pix2pix [11] framework. In condition of fixing 200-epoch training with 0.0002 learning rate, our CycleGAN-based approach output visualized results with satisfied prediction compared to real values: MAE and RMSE of capacity ratio are 0.0267 and 0.0340, respectively, and MAE and RMSE of congestion degree are 1.1250 and 1.5882, respectively. For common use, we suggest to set 200 epochs and 0.0002 learning rate to train as a baseline reference without more tuning efforts. In dynamic training of different epochs from 200 to 500, we find that 200 epochs can effectively prevent the training's mode collapse, and we have the best results when starting a linear learning rate decay at the 150th epoch: MAE and RMSE of capacity ratio are 0.0114 and 0.0134, respectively, and MAE and RMSE of congestion degree are 0.5333 and 0.6245, respectively.
We summarize our contributions as follows: (i) We perform the first study to predict the congestion caused by spoofing attack based on adversarial generative network (GAN), through directly utilizing high-level image features of traffic. This is a novel visualized approach towards I-SIG congestion attack to reveal the relation between the attack and congestion of 30 minutes later (ii) We propose a CycleGAN-based prediction approach, in which we design a weighted L1 regularization loss to learning and distinguish fine differences between the last-vehicle attack and first-vehicle attack. Such approach not only enables a prediction from attack to corresponding consequence but also provides an explanation from congestion to the initial traffic of attack phase (iii) We evaluate our approach empirically from real COP algorithm through VISSIM and collect 4476 image samples of high quality for experiment, which shows the effectiveness of our approach compared to ground truth. We also find that 200 epochs can effectively prevent the training's mode collapse in our approach and have a satisfied performance as a baseline.
The remainder of this paper is organized as follows. Section 2 introduces the backgrounds. In Section 3, we propose our CycleGAN-based prediction approach. Experiments and detailed analysis are reported in Section 4. Section 5 discusses the related works. Finally, we conclude the paper in Section 6.

Backgrounds
2.1. Dataflow of I-SIG. The dataflow of the I-SIG system is revealed in Figure 1. Each on-board unit (OBU) [3] of vehicles sends basic safety messages (BSM) [3] to the RSU for a trajectory collection in real time. Then, such data will be preprocessed to form an arrival table as an input to signal planning which has COP and estimation of location and speed (EVLS) [5] modules. If penetration rate (PR) of OBU for vehicles is less than 95%, the arrival table will be sent to EVLS for update. Otherwise, it will be directly sent to COP for planning. According to the results of COP, a downward signalling command will be transferred to the phase signal controller. After each stage of signal control, the next status of signal will be returned as a feedback for continuous COP planning.
As shown in Figure 1, there are 8 phases in the I-SIG environment, and the EVLS fills the blank monitoring area of the monitoring segment on each phase and inserts the estimated vehicle data between equipped vehicles. The key is to estimate the queued vehicles; it is critical to estimate the queue length based on Wiedemann's car following model. Since it is assumed that a queue always begins at the stop bar, the last vehicle in queue needs to be found to determine the queue length. However, while having an effective support in low penetration rate, such estimation also introduces a new threat of data spoofing attack to COP.

Threat Model.
In I-SIG congestion attack, there is a threat model which characterizes the spoofing attack as input, the congestion as output, and studies corresponding causal relation. Based on the attack goal of creating congestion in the intersection, the data spoofing attack has been experimentally proved feasible on CV-based intelligent transportation system. As shown in Figure 1, dataflow of the I-SIG system involves data from both vehicle-side devices (the OBUs) and infrastructure-side device (RSUs and signal controllers). Ghena et al. [12] have pointed out the weakness 2 Wireless Communications and Mobile Computing of the infrastructure-side device. In comparison, without considering the weakness of the infrastructure-side device, we aim to realize the attack from vehicle-side devices (the OBUs), in which the attacker sends malicious BSM messages to the OBUs to disrupt signal plan. More specifically, we focus on single intersection, and the attacker is able to run the ISIG system on a personal computer with a general configuration. Assuming that the attacker has a prior investigation of the system structure and road conditions, after obtaining a set of BSM messages, the attacker can run the I-SIG system to get the prior and subsequent signal planning by COP algorithm. To maximize the realism of the threat model, we mainly explore the effectiveness of attack by a single attack vehicle which is a challenging task as the signal planning of the I-SIG system based on all vehicles in an intersection.

Congestion
Attack on I-SIG. In this paper, two attack strategies of data spoofing have been proposed in I-SIG, one is direct attack on arrival table without considering penetration rate, and the second one is indirect attack on EVLS when penetration rate is less than 95% called the "lastvehicle attack." In the second attack strategy, an attacker adds a spoofing vehicle with speed v = 0 at the end of a phase as Figure 2(a) shows. The purpose of this strategy is to extend the queue length estimated by the EVLS algorithm through changing the location and speed values in BSM message. The last-vehicle spoofing can cause the EVLS to have a maximum wrong estimation of queuing length. Such attack further causes an increment of the duration of green light allocated by COP algorithm for the current phase. As a result, it eventually delays the next start time of green light of all the phases and increases the delay for vehicles to pass. As shown in Figure 2(b), the last-vehicle attack causes heavy traffic congestion after just 30 minutes, and the traffic delay has been increased 200%. Accordingly, we experiment the "first-vehicle attack" as shown in Figure 2(c), in which attacker adds a spoofing vehicle with speed v > 0 in front of the original vehicle queue to minimize the queue length. This attack causes the minimum estimating queue length by EVLS and further causes a reduction of the duration of green light allocated by COP algorithm for the current phase and finally increases the delay for vehicles to pass. For all the follow-up phases, it causes early start of the green light. In real simulation on VISSIM, the first-vehicle attack also causes a traffic congestion in the intersection shown in Figure 2(d).

CycleGAN Framework Construction
3.1. Sample Image Processing. Figure 3 shows the process to produce samples to form GAN's training dataset, including three main steps: (a) collecting original traffic images from VISSIM; (b) extracting road traffic by background filtering; and (c) forming novel rectangle image of road traffic by split joint. According to the phase order from phase(4,7), phase(8,3), phase(2,5), phase(6,1), joint above four images from top to down to form one sample image.
There are two domains X and Y, ∀x ∈ X refers to processed original traffic image at the spoofing time, and ∀y ∈ Y are real congestion traffic images that correspond to domain X 30 minutes later. Figure 4 illustrates the architecture of CycleGAN framework. One training sample is a pair of images x and y to form (x, y), x ∈ X and y ∈ Y. Here, X and Y denote the source domain and target domain of the framework, x refers to the processed traffic image at the spoofing time, and y is the processed traffic image of congestion 30 minutes later that corresponds to x.

CycleGAN Framework.
The CycleGAN framework is composed of two generators (G and F) and two discriminators (D X and D Y ). In the forward direction, the generator G generates fake imageỸ similar to Y given real image x, i.e., G : X ⟶ Y. F generates fake imageỸ similar to X, i.e., F : Y ⟶ X. The adversarial discriminator D X aims at distinguishing whether the input image is real and outputs corresponding probability PðxÞ as a decision.

Wireless Communications and Mobile Computing
Similarly, D Y aims at discriminating whether the input image is real and outputs corresponding probability PðyÞ.
The CycleGAN framework has two transform directions to compose a cycle. For x ∈ X, x ⟶ GðxÞ ⟶ FðGðxÞÞ ≈ x is called forward cycle consistency. Similarly, for y ∈ Y, y ⟶ FðyÞ ⟶ GðFðyÞÞ ≈ y is called backward cycle consistency. Thus, there are two kinds of losses in the original CycleGAN framework: adversarial loss and cycle-consistency loss.
3.2.1. Adversarial Loss. In the forward direction, based on generator G : X ⟶ Y and discriminator D Y , the adversarial loss can be calculated as follows: ð1Þ D Y ðyÞ is responsible for determining the probability of y's belonging to real Y, and the generator G is used to generate fake image close to the real one. Thus, we have the objective min Similarly, for the backward direction, we have loss and corresponding objective function as following: where D X ðxÞ is responsible for determining the probability of x's belonging to real X, and the generator F is used to generate fake image close to the real one.
The traffic at the beginning of last-vehicle attack Attack results 30 minutes later Attack results 30 minutes later The traffic at the beginning of first-vehicle attack = 0 → > 0   4 Wireless Communications and Mobile Computing The complete adversarial loss function is defined as 3.2.2. Cycle-Consistency Loss. Cycle-consistency loss is designed to push G and F to be consistent with each other, denoted as FðGðxÞÞ ≈ x and GðFðyÞÞ ≈ y. The cycleconsistency loss can be calculated as follows: in which k•k 1 is the 1-norm calculation.

Weighted L1 Regularization Loss.
For samples of the first-vehicle attack and the last-vehicle attack, we divide processed datasets X, Y into two parts that can be denoted as X = fx 1 , x 2 g, Y = fy 1 , y 2 g. For ∀a, b ∈ x 1 and ∀c ∈ x 2 , the goal of generator G and F is to minimize the difference between a and b as well as to maximize the difference between a and c. The object can be denoted as arg min GÞ . Weighted L1 regularization loss can be calculated as follows: in which L s ðF, GÞ reflects the image difference of same attack type and L d ðF, GÞ reflects the image difference of different attack type. α and β are weights.
The whole objective of our CycleGAN framework is defined as follows: where λ and μ are parameters, which control the relative importance of different objectives, λ ≥ 1 and μ ∈ ð0, 1.
The optimal G * , F * can be achieved as follows.
3.3. Build Generator and Discriminator. Figure 5 illustrates the architecture of structures of generator and discriminator. The two generators G, F share the same structure. Specifically, a generator network contains encoder, transformer, and decoder. The encoder network includes one 7 × 7 Convolution-InstanceNorm-ReLU layer and two 3 × 3 Convolution-InstanceNorm-ReLU layers. Transformer network has 9 residual blocks for 256 × 256 images that contains two 3 × 3 convolutional layers. Decoder network consists of two 3 × 3 fractional-strided-Convolution-InstanceNorm-ReLU layers and one 7 × 7 Convolution-InstanceNorm-ReLU layer. The two discriminators D X , D Y have the same structure. The discriminator networks use the architecture of 70 × 70 PatchGANs [11], and the discriminator architecture includes four 4 × 4 Convolution-InstanceNorm-LeakyReLU, which transforms the input image into a set of feature maps and finally outputs a 1-dimension decision.
3.4. Training Process. There are two training directions in CycleGAN framework; Adam (Adaptive Moment Estimation) [13] is chosen as the optimizer of training. It is an adaptive optimization method that dynamically updates network   Generate fake image GðxÞ and recommend image FðGðxÞÞ.

5:
Calculate G, F loss. 6: Update the gradient of G, F: Generate fake image FðyÞ and recommend image GðFðyÞÞ.

12:
Calculate G, F loss. 13: Update the gradient of G, F:

14:
Discriminate fake image D X ðFðyÞÞ and real image D X ðyÞ.

15:
Calculate D X loss. 16: Update the gradient of D X :

Setup.
We run the I-SIG System and VISSIM simulations to get the original image datasets X 0 , Y 0 . The platform and experimental environment configuration are shown in Table 1.

Datasets and Initial
Network. Both of the training and test datasets are composed of two parts: the processed traffic image dataset X at the spoofing time and corresponding congestion image dataset Y. Table 2 shows the sample datasets for training and test. The size of all the image is 256 × 256 pixels. In addition to CycleGAN parameters described in above section, we also set up a comparable GAN model named pix2pix [11], and its parameters are described as Table 3.

Evaluation
Actually, our method can be directly compared to NDSS2018's work for the same I-SIG system. In addition, there are also some similar work to discuss. Reporting road traffic congestion can be challenging as there is no standard way of measurement fit for each specific occasion. A series of methods have been proposed to evaluate traffic congestion. Lu and Cao [14], proposed a method based in which level of congestion is considered a continuous variable from free flow to traffic jam, since the source domain and the target domain of our visualized prediction method are both composed of traffic images, and it is hard to extract the high-level image features using traditional text features such as location, speed, and delay of vehicles. Pongpaibool et al. [15] proposed a method based on deep network using image processing technology to deal with the whole image. In comparison, we aim to explore the effectiveness of different attack strategies which need an accurate analyze on each phase instead of the whole region; the former traditional methods are not suitable. Thus, we propose a phase-based evaluation method   to quantitatively analyze the congestion results. We first define the evaluation metrics, and we further evaluate them based on the mean absolute error (MAE) and root mean squared error (RMSE), respectively.

Evaluation Metric
(1) Vehicle capacity ratio (CR). C max k is the maximum vehicle capacity of each phase, in which k denotes the kth phase, and C max k is a constant. For a 300meter-long road in any phase, the maximum vehicle capacity is 75 assuming that the average vehicle length is 3 meters. C max total is used to compute the vehicle capacity of all 8 phases; it can be denoted as C max total = ∑ 8 k=1 C max k , and it is also a constant with value 600. For total vehicles of all 8 phases at an intersection, the vehicle capacity ratio can be calculated as follows.
in which the N k is the vehicle number of the kth phase (2) Phase congestion degree (PCD). PCD reflects the ratio of queuing length to normal queuing length. For the kth phase, its PCD k can be calculated by where the Q k is the vehicle number of queuing and Q normal is a constant that we set Q normal = 10 (3) Intersection congestion degree (ICD). ICD reflects the global congestion degree for an intersection, and it can be calculated by For N samples testing, we will further evaluate the CR, PCD, and ICD from a statistical view based on the mean absolute error (MAE) and root mean squared error (RMSE), in which CR i is the real value and g CR i is the estimated value. Similarly, we have MAE PCD k , RMSE PCD k , MAE ICD , and RMSE ICD .

Visualization
Results. Figure 6 shows congestion traffic images generated by CycleGAN and pix2pix, respectively. The first column is the original image x, the second column is the generated congestion image GðxÞ, and the real congestion image y is given in the third column. The comparison of generated GðxÞ indicates that our approach has a more satisfied generator of training than pix2pix, having a higher accuracy compared to the ground truth.

Quantitative Analysis.
We quantitatively analyze the performance of CycleGAN and pix2pix. Tables 4-6 show the MAE and RMSE values of CycleGAN and pix2pix under different settings of epoch and learning rate.
As Table 4 shows, when epoch = 1000, CycleGAN and pix2pix both have the best performance of CR and ICD prediction that has very small MAE and RMSE values: for CycleGAN, we have MAE CR = 0:0137, RMSE CR = 0:0177, MAE ICD = 0:7000, and RMSE CR = 0:9930. Respectively, and for pix2pix, they are 0.0862, 0.0870, 1.4000, and 1.4509, respectively. Figure 7(a) shows the trend of the MAE and RMSE values of capacity ratio for both CycleGAN and pix2pix, and Figure 7(b) shows the trend of the MAE and RMSE values of intersection congestion degree. We can see that when epoch = 200, both CycleGAN and pix2pix gain a good performance, and performance improvement is not obvious when epoch = 1000. Thus, considering the balance between performance and training cost, we suggest a 200epoch early stop. Table 5 shows the performance under different learning rate settings. For example, 400/100 means that in the first 400 epochs, LR is kept with 0.0002, and in the following 100 epochs, we perform a linear decay. We can see that when the learning rate is 50/150, CycleGAN has the best performance of CR and ICD prediction has quiet small MAE and RMSE values: MAE CR = 0:0114, RMSE CR = 0:0134, MAE ICD = 0:5333, and RMSE CR = 0:6245. While for pix2pix, when the learning rate is 100/100, pix2pix has the best performance: MAE CR = 0:0929, RMSE CR = 0:0985, MAE ICD = 2:4750, and RMSE CR = 2:9441.The CycleGAN with 50/150 LR is better than the pix2pix 100/100 LR. Figure 8(a) shows the trend of the MAE and RMSE values of capacity ratio for both CycleGAN and pix2pix, and Figure 8(b) shows the trend of the MAE and RMSE values of intersection congestion degree. Through different compo-sitions within total 100, 200, and 500 epochs, we can see that for CycleGAN, the LR has relative small influence, while for pix2pix, the LR's influence is bigger and the best setting is within 200 epochs.
We further reveal the detailed values of each phase for MAE and RMSE of congestion degree in Table 6. We set training epoch as 200, and the LR settings for CycleGAN and pix2pix are 50/150 and 100/100, respectively. We can see that through comparing the values based on 8 phases of CycleGAN and pix2pix; for CycleGAN, the best results occur at k = 3, which have the lowest values (0.1500, 0.1958) of MAE and RMSE. Similarly, for pix2pix, the best results are at k = 3 with values 0.2250 and 0.2398 of MAE and RMSE.
We also give bar charts for MAE and RMSE of 8-phase congestion degree by Figure 9. In Figure 9(a), the smaller average value of MAE is for CycleGAN with value 0.3438. In Figure 9(b), we have similar results of RMSE; the average value of CycleGAN and pix2pix are 0.3845 and 1.5507, respectively. The MAE and RMSE of CycleGAN are smaller than those of pix2pix; this indicates a better robustness of CycleGAN compared with pix2pix.

Defense Discussion
For the relationship between the evaluation metric and the defense of attack, we have the following suggestions.
6.1. Attack Strategy Detection. In the signal planning stage of I-SIG system, the COP algorithm generates reasonable green light duration based on the queuing length of each phase estimated by the EVLS algorithm. As shown in our evaluation, the vehicle capacity ratio (CR) reflects the total number of the intersection. In the significance of defense, comparing the estimated queuing length by EVLS with the immediate evaluation metric CR is an efficient way to determine whether the attack vehicle is placed in corresponding phase. For instance, if the phase has long estimate queuing line with low CR index, a last-vehicle attack may occur; on the contrary, if the phase has small estimate queuing line with high CR index, a first-vehicle attack may occur. This can bring feasible defense and improve system robustness.

Related Work
7.1. Spoofing Attack Analysis. I-SIG is exposed to a data spoofing attack causing heavy congestion. Such attack belongs to position faking attack of GPS spoofing, but different with tunnel attack. In tunnel attack, each vehicle of a vehicular ad hoc network (VANET) [16][17][18] is equipped with a positioning system (receiver), and then the attack can be achieved using a transmitter generating localization signals stronger than those generated by the real satellites [19,20];   11 Wireless Communications and Mobile Computing then, the victim could be waiting for a GPS signal after leaving a physical tunnel or a jammed-up area. In comparison, the position spoofing attack to I-SIG refers to that authenticated vehicle only sends wrong position to affect the COP algorithm, which has lower attack cost and easier implementation. In such attack, the spoofing is just a causing factor, while the mechanism of COP algorithm is the key. In comparison, for GPS spoofing attack, our work focuses on the revealing of algorithm-level security analysis caused by spoofing, not the security of GPS spoofing or context-aware sensing [21][22][23][24] itself.
The previous work [7] mainly reveals the existence of such congestion attack on COP, analyzes the reason of COP decisions influence called last-vehicle advantage, and also explains how to use the data spoofing to launch an attack. However, it lacks consideration about the potential features and the quantified correlation between the attack and congestion degree. In comparison, we demystify the attack to I-SIG and corresponding congestion from machine learning perspective, through exploring different kinds of features based on unsupervised learning from attack image to congestion image via image search [25,26], so as to explore new visualized analyzing method to reveal detailed attack results in each phase of intersection. In addition, as the first utilization of image feature in congestion attack, our work can provide a visualization for better understanding.

Congestion Prediction.
Traffic congestion prediction has been studied a lot. Traditional traffic feature-based methods [27][28][29] are generally used in traffic congestion prediction, in which the traffic scenario is usually illustrated by manually set features such as location, speed, and delay of vehicle. Early researches are focused on single-site prediction based on one-dimensional traffic time series such as the ARIMA model [30] and the nearest neighbour method [31]. Recently, the trend has been shifted to prediction based on spatial temporal correlations between traffic flows [32][33][34], for instance, the vector ARMA model incorporating both spatial and temporal correlations, and the spatial econometrics models focused on congestion propagation over adjacent links. The core of the existing methods is as follows: They try to predict traffic congestions at one site based on the spatially and temporally correlated information from the sensors distributed on nearby roads, where the number of such sensors contributing to the prediction is referred to as data dimensionality.
Recently, a LSTM model-based approach [35] was proposed for region-wide congestion prediction. In comparison, the attack-based congestion prediction is totally different, and it is because any classical traffic flow-related theory of spatial and temporal correlation does not well fit. Thus, this work does not focus on traditional traffic features. Even for image feature, we perform phase-based reprocessing and produce novel image for training; this is a different method for I-SIG congestion prediction towards a COP attack.

Conclusion
Towards the spoofing to connected vehicle technology, a congestion attack has been revealed on the COP algorithm of I-SIG. Due to the lack of visualized congestion analysis and attack phase explanation, we focus on the prediction of congestion attack. Compared to traditional congestion prediction, such attack-based congestion prediction is totally different, and it is because any classical traffic flow-related theory of spatial and temporal correlation does not well fit. We perform the first study to predict the congestion caused by spoofing attack based on adversarial generative network, through directly utilizing high-level image features of traffic.
In this paper, we propose a CycleGAN-based prediction approach, in which we design a weighted L1 regularization loss to learning and distinguish fine differences between last-vehicle attack and first-vehicle attack. We evaluate our approach empirically from real COP algorithm through VIS-SIM, and collect 4476 image samples of high quality for experiment, which shows the effectiveness of our approach compared to ground truth. We also find that 200 epochs can effectively prevent the training's mode collapse in our approach and have a satisfied performance as a baseline. This work is expected to inspire a series of follow-up studies on security of I-SIG, including but not limited to (1) more machine learning-based approaches and (2) more multimodal feature fusion for visualized congestion analysis caused by spoofing attack.

Data Availability
All data generated or analyzed during this study are owned by all the authors and will be used to our further research. The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.