A Higher-Order Motif-Based Spatiotemporal Graph Imputation Approach for Transportation Networks

,


Introduction
With the rapid growth of urbanization, intelligent transportation systems (ITS) are widely adopted for the urban management and traffic control [1,2].ITS rely on the availability of traffic data to evaluate traffic status and system performance.Assuming that traffic information would be collected in an all-round way, urban commuters can adapt the traffic conditions of urban roads, grasp the law of traffic flow changes, and subsequently promote the development of urban transportation.
In recent years, emerging information technologies, such as fifth-generation networks [3] and edge computing [4], have brought a bit convenience to traffic data collection, and the collected data is usually mobile, multisource, and real time.Unfortunately, due to the frequent occurrence of various types of failures (e.g., power malfunction, device maintenance, and network issues), collected data always are incomplete [5].Moreover, due to the high cost of construction and maintenance, the equipment is difficult to cover the entire traffic network [6].So, the loss of traffic data in the process of data collection is inevitable.
The problem of information missing significantly weakens the data quality, limits the study of transportation networks (e.g., traffic management, urban planning, and route choice), and in worse, may result in false decisions [6][7][8].Thus, handling missing values is a premise for traffic data mining and analysis [9].Making accurate imputation becomes an important research topic in ITS [10].
The key of data imputation is to discover the hidden spatiotemporal information with regard to the neighbouring data [11].For instance, as shown in Figure 1, it makes use of the spatiotemporal neighbouring values of the missing data, thereby improving the accuracy of data imputation.Bae et al. proposed two cokriging methods that exploited the existence of spatiotemporal dependency in traffic data, to impute high-resolution traffic speed under different random data missing scenarios [12].Li et al. developed a combined deep neural model, which extracted spatio-temporal features to estimate missing values [13].To characterize the hidden patterns in spatiotemporal traffic data, Chen et al. incorporated a low-rank tensor completion (LRTC) framework with the truncated nuclear norm (TNN) and obtained a better solution for data imputation [14].Considering the case of continuous data missing, Zhang et al. utilized the temporal neighbouring values of a given period and employed the long short-term memory network (LSTM) to recover missing data [15].
Although many methods achieved promising imputation accuracy of missing data, there are still some limitations.The first question needs to be addressed is how to effectively capture the spatial dependencies.Existing methods usually consider the direct adjacent road segments (i.e., upstream and downstream) but ignore the global information.It is considerable to introduce the imputed models with sensing the global and local variations of spatial information [10].The second one is the continuous data missing, in which the missing values at some consecutive timestamps in a road segment.In this circumstance, it is unable to generate data during a period of time and provide stable inputs for a model.
To address these issues, in this paper, we propose a spatiotemporal imputation approach for traffic data via motifbased graph aggregation (named MGIA), which incorporates the motif-based spatial aggregation with the multitime dimension fusion by bidirectional LSTM (Bi-LSTM).To the best of our knowledge, our work is the first attempt to apply motif-based spatial method to address the issues of traffic data imputation.The contributions of this paper are summarized as follows: (1) We propose a higher-order graph aggregation model based on motifs.It polymerizes the correlated seg-ment attributes of the missing data segments to capture the higher-order spatial correlations in a road network, which utilizes the method of motif-based search, and the aggregation based on graph convolution network (GCN) (2) We develop a Bi-LSTM approach based on the multitime dimension fusion to improve the accuracy in the case of continuous data missing.It incorporates the recent, daily-periodic, and weekly-periodic dependencies to ensure that there are enough historical data to complete the temporal imputation (3) We perform extensive experiments on the real-world dataset to evaluate the performance of our approach.The experimental results confirm the advantages of proposed approach with various missing patterns over the state-of-the-art imputation approaches The remainder of the paper is organised as follows: Section 2 introduces the related work in traffic data imputation.Section 3 describes the proposed approach.Section 4 shows the experimental results and analysis.Section 5 concludes this paper.

Related Work
In this section, we introduce the related work regarding the approaches for conventional and deep learning-based imputation.
2.1.Conventional Imputation.In the past decades, traffic data imputation has caused widespread concern.The imputation methods mainly include prediction, interpolation, and statistical methods.
The predictive methods utilize historical data to predict the missing values.The typical methods include Bayesian networks [16] and support vector regression [17].Ahn The hybrid approach based on fuzzy C-means (FCM) is another example of such predictive methods, which integrates the optimized FCM parameters and genetic algorithms to build prediction models [19,20].These methods focus more on the historical traffic data for missing data filling and fail to consider the imputation on missing continuous data.Moreover, they ignore the spatial relationships which also provide crucial information for imputation.Interpolation methods use the average value of the neighbouring data or historical data to impute the missing values.Using traffic data from the same sensor during the same period in neighbouring days, Yin et al. took the average value of these known data to impute the missing values [21].Chang et al. utilized the k-nearest neighbours (KNN) and local least squares to consider the relationship between similar traffic flow patterns and enhanced the interpolation effects [22].Kriging interpolation [12,23] focused on determining the weighted historical values.It considered the spatiotemporal information to capture the characteristics of traffic data.Although these methods can achieve promising imputation results in a short time, they can only focus on the average calculation without considering complicated changes caused by other factors such as data global attributes and random events.
The statistical methods are aimed at developing a data distribution that best fits the imputed missing data.To reflect the uncertainty between the imputation parameters, Audigier et al. designed a multiple imputation method based on Bayesian principal component analysis (BPCA) to cope with incomplete continuous data [24].To exploit the spatiotemporal correlation of traffic network data, Wang et al. developed a low-rank matrix factorization-based approach to reconstruct the missing traffic data [25].Moreover, some researchers have expanded the two-dimensional matrix into high-dimensional tensors, such as the tensor decomposition models [26,27].However, the accuracy of these approaches mainly relies on the priori assumption of the data distribution, but the unknown data of the actual distribution may cause errors.

Deep
Learning-Based Imputation.Recently, the booming of deep learning [28,29] has inspired new ideas for data imputation.The autoencoder models [30] were developed to hierarchically train the full set of traffic data and extract the spatiotemporal features of the hidden layers to demon-strate the effectiveness of data imputation [31].Pathak et al. converted spatiotemporal trajectory data into images and used the powerful feature extraction by convolutional neural network (CNN) and then combined with the autoencoder model to impute the missing spatiotemporal trajectory data [32].Generative adversarial network (GAN) provides a class of generative models for adversarial training, and it applies actual data/parallel data to generate the true data distribution, so that the imputation quality would be improved [33,34].By incorporating the reversibility of the generative imputer into GAN, Kazemi and Meidani proposed an iterative GAN architecture to evaluate the imputation of traffic missing data [35].These state-of-the-art models work properly while dealing with the data correlation across different road segments.It adopts the powerful feature extraction capabilities of the deep neural networks to impute spatiotemporal data.However, most methods ignore the spatial dependencies in traffic network, and their imputation effect depends on a massive number of training data.
As some graph-based methods consider strong relations of data structures [36,37], it is feasible of capturing global information to improve imputation performance.Chen and He proposed a heterogeneous graph embedding framework, which constructed a travel heterogeneous information network to find the best matched vehicles for the missing records [38].By incorporating the spectral graph convolution operation, Cui et al. developed the graph Markov network to handle missing values for short-term traffic forecasting [39].Graph representation learning is one crucial category of deep learning that has been widely used for traffic data imputation [40].They viewed the observations and features as data nodes in a bipartite graph or constructed sample self-representation strategy and further required the neighbouring missing samples.Motifs are small connected components in a graph and are beneficial to understand the higher-order relations and global spatial graph principles [41,42].They introduced some imputation approaches based on motif discovery [43,44], which are rarely adopted in transportation.The graph-based methods solve the imputation problems by capturing global spatial information from historical and neighbouring data and ignore the time series results to reflect the spatial dependencies.
Inspired by the above viewpoints, we propose a spatiotemporal imputation approach, which is based on motifbased graph aggregation.By adopting the motifs, the proposed approach benefits from capturing the higher-order spatial correlations in traffic networks.In addition, the

Category Limitation
Predictive methods These methods fail to consider the case of continuous data missing.

Interpolation methods
They focus on the average calculation without considering data global attributes.

Statistical methods
The accuracy of these methods degrades owing to the unknown values of data distribution.

Adversarial learning
Most methods ignore global spatial dependencies in traffic network.

Graph-based methods
They only capture global spatial information from neighbouring data.
3 Wireless Communications and Mobile Computing proposed approach focuses on the imputation issue in the case of continuous data missing.At last, we list the limitations of these existing methods in Table 1.

Methodology
According to the analysis of related work, these existing methods are with the following limitations: (1) they ignore the spatial dependencies in the scene of large-scale areas and (2) rarely consider the imputation issue in the case of continuous data missing.The MGIA is aimed at improving the imputation accuracy from the perspectives of continuous data missing and the higher-order spatial correlations in traffic networks.The framework of the MGIA is shown in Figure 2. It works in the following steps: (1) we adopt motifs to define the graph-based structure presented in traffic networks and search for all associated road segments that meet the motif gain condition of the missing data segment.On this basis, GCN is utilized to gradually aggregate the nonmissing features of each associated segment, and the spatial aggregated value of the missing data segment is determined.
(2) The multitime dimension imputation based on Bi-LSTM focuses on dealing with the problem of continuous data missing, which incorporates the recent, daily-periodic, and

The Spatial Imputation of Motif-Based Graph
Aggregation.In order to capture the data correlation and global spatial characteristics between the road segments in traffic networks, the motif-based graph aggregation is employed to impute the missing data of road segments.

The Associated Node Search Based on Motif Discovery.
Motifs are nonisomorphic connected graph structures that occur frequently in the network and the number of nodes is greater than or equal to 3, triangle and quadrangle motifs are shown in Figure 3.As motifs consider higher-order correlations of data structures, it is feasible for capturing global information to improve the performance of node feature aggregation [45].Triangles are traffic network motifs that play important roles in the higher-order connectivity [46].Thus, we select the triangle motif M 32 as the research object in combination with traffic theory, in which nodes denote road segments, and edges denote the connection between two adjacent segments.We design the method of motif-based search, which uses the motif gain to adjust the fitness function.If the motif gain no longer increases or no neighbouring node exists, then the search phase is stopped.
First, a road segment set Assuming that the road segment where the missing data in Q v is a node v t , a target node set Q t = fv t g is determined.Then, the motif-based local optimization algorithm is adopted to search all adjacent nodes of node v t and form adjacent node set Q n = fv n 1 , v n 2 ,⋯g and calculates the motif gain owing to adding node v n i in node set Q n to node set Q t .If the motif gain is greater than zero (i.e., there is a spatial correlation between the node v t and its adjacent nodes), the node v t will be added to the target node set Q t and Q t = fv t , v n i g is updated.Third, by adding the remaining adjacent nodes of Q t , the motif gain is calculated in turn, and the nodes that meet the conditions are joined to the node set ⋯g is updated at last.By analogy, the eligible nodes continue to join the target node set Q t .When the motif gain calculation of all nodes in the node set Q v = fv 1 , v 2 ,⋯,v n g is completed, the associated node search for node v t ends.The associated node search for node v t is shown in Figure 4.
The local motif rate addresses the issue of avoiding repeated counting motifs [47].When an adjacent node v n i is joined, the local motif rate is calculated as follows: where is the number of motifs between the current node set and the external node set, E in ð•Þ is the number of motifs between the current node set and the new nodes, and E out ð•Þ is the number of motifs between the external node set and the new nodes.
The motif gain of the node v t is calculated as follows: where i is spatially related to the missing data node v t .As shown in Figure 5, we provide two examples of calculating the motif gain R v n i M .N in ðQÞ and N out ðQÞ are equal to 4 and 3, respectively, and R M ðQÞ is 4/ð4 + 3Þ α .In Figure 5(a), E in ðv 1 Þ and E out ðv 1 Þ are equal to 1 and 2, respectively, and R M ðQ ∪ v 1 Þ is ð4 + 1Þ/ð4 + 3 + 2Þ α from Equation (3), the default control parameter α is 1,and R v 1 M < 0; thus, the node v 1 cannot be joined to the current node set Q.In Figure 5 M > 0; this indicates that there is a spatial correlation between the missing data node v t and the nodev 2 .5 Wireless Communications and Mobile Computing v t by GCN.A graph G = ðV, E, WÞ, with n nodes to describe a road network, where nodes v i ∈ V denote road segments, edges ðv i , v j Þ ∈ E denote the directed connection from node v i to node v j and W ∈ R n×n denotes the weighted adjacency matrix.The graph G is represented by its corresponding Laplacian matrix.The properties of the graph structure can be obtained by analyzing Laplacian matrix and its eigenvalues.

The Spatial
where L is the normalized form of Laplacian matrix and D, A , and I N are the degree matrix, adjacent matrix, and unit matrix, respectively.g θ and U are the eigenvector function and matrix of L, respectively, and x is the eigenvalue of input node.
According to Equations ( 4) and ( 5), the eigenvalue decomposition of L is represented as follows: where Λ is the diagonal matrix composed by eigenvalues of In order to reduce the time complexity when the scale of the graph is large, the Chebyshev polynomials are adopted to approximate the solution: where T k ðxÞ and θ k are Chebyshev polynomials and coefficients, respectively; L = ð2/λ max ÞL − I N ; λ max represents the maximum eigenvalue of L; T 0 ðxÞ = 1; and T 1 ðxÞ = x.
It can be seen from Equations ( 7) and ( 8), the approximate solution with Chebyshev polynomials is equivalent to using a convolution kernel to extract the eigenvalues of neighboring nodes with the node as the center of each node in the graph.In order to simplify the calculation, limit k to 1, scale the eigenvalue of L to make λ max = 2, and Equation ( 7) is expressed as follows: According to Equation (4), let θ 0 = −θ 1 = θ at the same time, and Equation ( 9) is transformed as follows: In order to avoid numerical and gradient instability problems, let Ã = I N + A, Equation ( 10) is transformed as follows: All associated nodes that meet the motif gain conditions are extracted according to Equation (11), and final aggregated value of the missing data node v t is expressed as follows: 6 Wireless Communications and Mobile Computing where y ðlÞ s is the final aggregated value of the missing data node v t , i.e., the spatial aggregated value y imputed s in Figure 2. l, W, and σ are the number of associated nodes, the parameters to be trained, and activation function, respectively.The initial value y ð0Þ s is the eigenvalue of the firstassociated node that meets the motif gain condition.
3.1.3.The Computational Complexity.We discuss the time complexity of the proposed approach.For the process of motif-based node search, a node v t where the missing data needs to search other n − 1 nodes, and the nodes in the target node set Q t also search the each node that do not exist in Q t , so the computational complexity is Oðn 2 Þ.For the process of the GCN-based aggregation, optimized by the Chebyshev polynomials, the time complexity is reduced to OðkjEjÞ.In order to simplify the calculation, we limit k to 1, so the time complexity of the GCN-based aggregation approximates to OðnÞ.It is a sequential to these two processes, so the total value of the two processes is Oðn 2 + nÞ, i.e., the overall computational complexities Oðn 2 Þ.
The process of motif-based graph aggregation is depicted in Algorithm 1.

The Multiple Dimension Imputation Based on Bi-LSTM.
Existing temporal imputation methods obtain good estimates of missing data in the case of random data missing.When there are a large number of continuous missing values at some consecutive timestamps, the imputation performance will degrade [15].A more challenging task is to recover the continuous missing data.
Processing long-term time-series data is an essential task since there is a large number of continuous missing data.Deep learning, which trains classifiers directly from input data by complex feature representations, may generate high performing results in the dynamic and challenging context [48].LSTM and gated recurrent unit (GRU) [49], which are the key methods of deep learning, have been employed for time-series applications with temporal dependencies.LSTM has three gates, in which, input gate and forget gate are used to control the update of memory cell, and output gate passes the output information.LSTM has recently been employed for missing data imputation, such as in Ref. [10,13,15].GRU is the variant of LSTM that only comprises of update gate and reset gate and utilizes the reset gate to control the information at the previous point.GRU's training of previous point could restrict improvement owing to the case of continuous data missing.Besides, in the case of a large amount of dataset, the performance of LSTM is better than that of GRU, and it has been verified by [50].Therefore, in order to efficiently capture the temporal dependencies of Input: the node setQ v = fv 1 , v 2 ,⋯,v n g, target node v t , the eigenvalue of the first associated nodey   It is known that the missing data points are closely related to the adjacent points from two opposite temporal directions.Bi-LSTM is capable of training in both forward and backward directions [51].Meanwhile, an increase of the window based on different time granularity can provide benefit in prediction performance, by allowing the capitali-zation of temporal dependencies in the time series data [29].Thus, we adopt the time-series imputation based on Bi-LSTM, which incorporates the recent, daily-periodic, and weekly-periodic dependencies of the historical data.That is to say, among the adjacent time-series data of current missing values in the three dimensions, there exist normal data at least one dimension, so as to ensure that there are enough historical data to complete the temporal imputation., respectively.For instance, when x miss t represents the data from 7 a.m. to 7:10 a.m. on 19 September 2018, some adjacent time series data in three dimensions are shown in Figure 6.

3.2.1.
Bi-LSTM Encoder.The Bi-LSTM extends the generic LSTM.In case of recent, daily, and weekly periods, it utilizes the previous and future points by processing the missing data x miss t from both forward and backward directions with two separate LSTMs.The one-way LSTM of recent, daily, and weekly periods include forget, input, and output gates.
In each time dimension, the hidden vector h t in time period t is updated as follows: where f t and o t refer to forget and output gates, respectively, i t and a t are input gates, x adj t is the adjacent historical data in time period t, and C is the cell vector.σ, tan h, and ⊗ represent the sigmoid function, tan h activation function, and element-wise multiplication, respectively.The previous and future hidden vectors generated by Equations ( 13)- (18) are updated in the time period t: where h t,p and h t,f representthe previous and future hidden vectors, respectively, and θ is the impact weight.

LSTM Decoder with Attention.
In this part, we select the LSTM decoder with the attention mechanism.The encoded hidden vectors of adjacent periods are fused into an attention vector: where c t is the attention context vector in time period t, hð•Þ is the updated hidden vector in encoding stage, s is the decoded hidden vector, f is the forward LSTM trained by decoder components, and α and softmax are weight coefficient and activation function, respectively.The dense layer with a linear activation function is added on top of the decoder layer to generate predictions in Figure 2 and outputs the decoded values with the backward LSTM:  where y is the output value decoded by LSTM, ½s t ; c t is a concatenationof the decoder hidden vector s t and the attention context vector c t , Linear is linear activation function, and W and b are linear parameters mapped to decoder hidden states.

Multicomponent Time Dimension Fusion. The input vectors ½x
t are trained by the above encoder-decoder processes, and the estimated value merged by the adjacent data of the missing data in the three time dimensions of recent, daily and weekly periods, i.e., the fusion-value y imputed t are obtained.in time period t are combined as follows: where y imputed ðtÞ is the final result of spatiotemporal imputation in time period t, and the initial value of weight aðtÞ = 1/2.
In order to prevent the poor effect caused by excessive weight fluctuation, an inertia factor β is introduced.The sizeof β reflects the size of the weight inertia.The β is inversely proportional to the volatility of weight a.
Supposing that the error of the spatial imputation in time period t is e 1,t , and the error of the multitime dimension imputation in time period t is e 2,t , aðtÞ is updated as follows: That is to say, when the imputation error is relatively large, in the next time period, the imputation influence will be reduced and vice versa.The value of β can be obtained through simulation experiments to obtain a better value.

Results and Discussion
In this section, we introduce the dataset, baselines, and evaluation metrics, and then verify the superiority of the MGIA in the case of random and continuous data missing.
The scenario of random data missing is that missing data appears randomly in the traffic dataset owing to the temporary failure (e.g., network or power outage issues).Continuous data missing refers to that some data is missing continuously due to the long-term failure of traffic data collectors, in which the values are missed at some consecutive timestamps or multiple intersections in a road network.
4.1.Data Preparation.We validate our approach with the traffic index dataset, which was provided by the Chengdu branch of Didi Chuxing, China.In Figure 7 where C t i is the traffic index of road segment i during timeslot t, v t i is the average speed, and v max i and v min i are the maximum speed and minimum speed corresponding to road segment i in the historical data, respectively.
In the tests, we manually remove a certain amount of processed dataset and then compute these data with the proposed approach and state-of-the-art methods.We assume that there are patterns of random and continuous data missing in the dataset and get the accuracy of these methods by comparing the imputed results with the ground truth data.The range of data missing rate in these two patterns is set between 10% and 60% (in units of 10%).

Experimental Settings
4.2.1.Baselines.The comparison between the MGIA and state-of-the-art methods for traffic data imputation is conducted as follows: (i) LRTC-TNN [14] is a low-rank tensor completion framework with truncated nuclear norm, which extracts spatiotemporal features from traffic data (ii) SSIM [15] is a sequence-to-sequence imputation model, which is designed to impute missing data by utilizing the LSTM from both the past and future time indexes (iii) MGIA: our proposed approach, which incorporates the motif-based graph aggregation method with the multitime dimension fusion method based on Bi-LSTM, to impute missing data (iv) GIA is a comparison for MGIA, which incorporates the graph aggregation method with the multitime dimension fusion method based on Bi-LSTM but does not include motif-based application 4.2.2.Evaluation Metrics.To evaluate the performance of the MGIA, we employ three widely used performance metrics: root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
where n is the total number of the missing data, y i is the i th imputed data, and x r i is the corresponding actual data of y i .4.3.Comparison and Result Analysis.As illustrated in Figures 8 and 9, the average errors with different missing rates in the case of random and continuous data missing is presented, respectively.That shows the MGIA shows superior performance gains over the baselines w.r.t.all the three metrics.All metrics of MGIA are lower than those based on the other three approaches.LRTC-TNN and SSIM are the recent approaches, and the difference between all three metrics of MGIA and those based on other two algorithms are larger, which means that the experiments based on MGIA achieved good performance.Meanwhile, MGIA significantly outperforms the compared approach GIA.It means that the application of motif in spatial imputation is feasible and effective.
As shown in Figures 8 and 9, the overall trends of the four approaches are almost similar, i.e., all metrics of these approaches increase with the increase in the missing rate.In Figure 8, when the missing rate is less than 40%, the error growth of these approaches is steady except for SSIM.Once the missing rate exceeds 40%, the error increase is larger than the respective miss rate (the missing rate is less than 40%).In Figure 9, the error growth of these approaches has remained stable, which crosses various missing rates from 10% to 60%.The reason could be that, as for random data missing, there will be the situation of continuous data missing when the missing rate exceeds 40%; thus, this is different from the current imputed pattern, and the error growth increases significantly.
The LRTC-TNN achieved a better performance than the SSIM.This is because LRTC-TNN makes use of low-rank tensor decomposition and will not be affected by consecutive missing data.On the contrary, SSIM is susceptible to consecutive missing data.If the data is missing at the forward period or the next period, the error will be very different.GIA improves SSIM from one-dimensional to threedimensional horizons and employs graph aggregation to perform spatiotemporal fusion of data imputation, so it has improved performance compared to SSIM.MGIA adds motif-based imputation on the basis of GIA and balances the effectiveness of higher-order spatial correlations and periodicity in merging the imputed values.
Due to the widening of the time interval, i.e., from 10 min to 30 min, the imputation performance becomes worse in Figures 8 and 9, and MGIA still performs better than the other three approaches in all the three metrics.
To represent the advantage of continuous data missing in MGIA, we compare the metrics with the pattern of random data missing in Tables 2 and 3. Regardless of whether it is a 10 min interval or a 30 min interval, the pattern of continuous data missing has better performance than the pattern of random data missing.This is because MGIA adopts the time series method based on Bi-LSTM, which widens the horizon from one-dimension to three-dimension, and pays more attention to the time continuity.
Overall, MGIA outperforms the other baselines due to the spatiotemporal characteristics and higher-order spatial correlations.Moreover, the imputation performance of continuous data missing is better than random data missing.

Conclusion
In this paper, a novel spatiotemporal imputation approach for traffic data (MGIA), which utilizes motif-based graph aggregation, is proposed.To sum up, this approach addresses the issues of (1) traffic spatial imputation for large-scale areas and (2) poor imputation performance especially when in the case of continuous data missing.Based on MGIA, we capture the higher-order spatial correlations in traffic networks and solve the problem of spatiotemporal data imputation.Experiments are performed on a realworld traffic dataset in Chengdu, China.The experimental results showed that the MGIA outperformed all other methods in the case of random and continuous data missing and achieved strong stability crossing various missing rates range from 10% to 60%.
In the future, we will further evaluate the MGIA with regard to other factors (such as weather and event).Besides, we plan to incorporate adversarial learning into the proposed approach to improve the imputation accuracy, in the case of continuous data missing with longer time interval.On this basis, we improve the accuracy at the various missing rate range from 60% to 80%.Additionally, we plan to apply the proposed approach to other tasks in ITS, such as traffic prediction and causal discovery of the congestion propagation patterns.

Figure 1 :
Figure 1: Spatiotemporal representation of missing data.The data labelled in red with question mark and the data labelled in white denote the missing values and normal values, respectively.

Figure 2 :
Figure 2: Framework of the MGIA.The bottom of figure indicates the data input, includes the higher-order spatial structure and time series data.The upper of figure indicates the specific method of the MGIA, the left module represents the spatial imputation of motif-based graph aggregation, the right module represents the multiple dimension imputation based on Bi-LSTM, and these two modules are integrated to obtain results.

Q
Input the current node set Q and triangle motif Search adjacent nodes of the current node set Q Calculate motif gain and output new node set Q

Figure 4 :
Figure 4: The process of motif-based search.

Figure 6 :
Figure 6: The construction of adjacent time series data in three dimensions.

Figure 7 :
Figure 7: Study area description.(a) Geographical area in Chengdu, China.(b) The road network in this work.

Figure 8 :Figure 9
Figure 8: Metrics in the case of random data missing.(a-c) The results of RMSE, MAE, and MAPE with 10 min time interval, respectively.(d-f) The results of RMSE, MAE, and MAPE with 30 min time interval, respectively.The abscissa represents the missing rate, and the ordinate is the corresponding result.

3. 3 .
The Spatiotemporal Fusion of Missing Data.The spatial aggregated value y imputed s and the multitime dimension fusion-value y imputed t

Table 1 :
The limitations of these existing methods.
Aggregated Method Based on GCN.All nodes associated with the missing data node v t by Equation (3) are determined, and the associated node data is aggregated to obtain the estimate values of the missing data node Figure 3: Examples of motifs.M 31 and M 32 denote triangles, and M 41 , M 42 , M 43 , M 44 , M 45 , and M 46 denote quadrangles.
, we select 30.66 °N ∼30.73 °N, 104.02 °E ∼104.10 °E as the geographic area and 74 road segments make up the road network in this work.The time span of these dataset is from September to October 2018.All the programs are developed based on Python 3.7 and TensorFlow 1.13.1.The dataset was filtered under the normal traffic flow hours (6 am-8:50 pm), and the time intervals are set to 10 min and 30 min, respectively.The traffic index is a Figure 9: Metrics in the case of continuous data missing.(a-c) The results of RMSE, MAE, and MAPE with 10 min time interval, respectively.(d-f) The results of RMSE, MAE, and MAPE with 30 min time interval, respectively.The abscissa represents the missing rate, and the ordinate is the corresponding result.

Table 2 :
Analysis of the imputation with MGIA (10 min time interval).The acronyms "r-missing" and "c-missing" represent the patterns of random data missing and continuous data missing, respectively.

Table 3 :
Analysis of the imputation with MGIA (30 min time interval).The acronyms "r-missing" and "c-missing" represent the patterns of random data missing and continuous data missing, respectively.