Traffic volume data is already collected and used for a variety of purposes in intelligent transportation system (ITS). However, the collected data might be abnormal due to the problem of outlier data caused by malfunctions in data collection and record systems. To fully analyze and operate the collected data, it is necessary to develop a validate method for addressing the outlier data. Many existing algorithms have studied the problem of outlier recovery based on the time series methods. In this paper, a multiway tensor model is proposed for constructing the traffic volume data based on the intrinsic multilinear correlations, such as day to day and hour to hour. Then, a novel tensor recovery method, called ADMM-TR, is proposed for recovering outlier data of traffic volume data. The proposed method is evaluated on synthetic data and real world traffic volume data. Experimental results demonstrate the practicability, effectiveness, and advantage of the proposed method, especially for the real world traffic volume data.
1. Introduction
In order to alleviate the traffic congestion problem and facilitate the mobility in metropolises, large amounts of traffic information are collected as a part of intelligent transportation system (ITS) such as CVIS (Cooperative Vehicle Infrastructure System) in China. These collected traffic data have wide range of applications. The real time traffic information is provided to travelers to support their decision for making process on the optimal route choice [1]. As shown by the work of Kim et al. [2], the real time information can contribute to reduce the operation cost and maximize resource utilization. In addition to these applications, the collected data could be applied to maximize the utilization of the infrastructure for smooth flow of the traffic. One such application of real time traffic data is traffic information control [3]. On the other hand, several data mining techniques have been applied to mine time related association rules from traffic databases and their results have been used for traffic prediction such as the works of Williams et al. [4] and Xu et al. [5]. From the above discussion, it is concluded that the collected traffic data are essential for many potential applications in ITS.
In real world, the collected data are always corrupted due to noise values, especially outlier value, which may be caused by detector failures, communication problems, or any other hardware/software related problems. The presence of outlier data in the database would degrade significantly the quality as well as reliability of the data and might impede the effectiveness of ITS applications. Therefore, it is essential to fill the gaps caused by outlier data in order to fully explore the applicability of the data and realize the ITS applications.
While many different kinds of traffic data such as traffic volume, speed, and occupancy are collected, the focus of this research is on the traffic volume outlier data recovery. It is supposed that the detectors collecting traffic information are set up at road sections and the collected values represent the traffic volume for those road sections. The aim of this research is to recover the traffic volume outlier data for road sections.
Literature survey in the related field shows that several filtering recovery techniques have been applied to recover the outlier traffic data [6–8]. Filtering methods include techniques such as singular value decomposition, wavelet analysis, immune algorithm, and spectrum subtraction. Filtering methods formulate the traffic volume as time series model and smooth the traffic waveform. These approaches recover the outlier data of day by day through spectrum analysis and feature information extracting. However, the traffic data through the same location is significantly similar from day to day and these approaches cannot utilize such characteristic. Pei and Ma [6] show that similarity is an important factor impacting on recovery performance. While the above methods consider only one mode similarity, the recovery performance is mainly dependent on the smooth threshold. Unfortunately, the smooth threshold is empirically determined.
In order to improve the recovery performance and consider the multidimension characteristic of traffic data, mining the multimode similarities will make a great contribution for recovering outlier value. Our approach is based on utilizing multimode correlations of traffic data; that is, traffic data have different correlation on different modes, such as week mode, day mode, and hour mode. More concretely, the feature of the proposed method is to recover the outlier value using the traffic volume information of the different modes. But, the problem is not so simple, because traffic volumes from many days might be corrupted by outlier data simultaneously. In order to consider the multiple outlier traffic volumes, we use tensor modeling the traffic volume.
In order to solve the traffic volume outlier data problem, we formulate the traffic volume recovery problem as a data recovery problem based on the assumption that the essential traffic volume is low-n-rank/low rank and the outliers are sparse. That is, the corrupted traffic volumes can formulated as
(1)𝒜=ℒ+𝒮,
where 𝒜 is the observed traffic volume which is corrupted, ℒ is the recovered traffic volume, and 𝒮 represents the outliers. In the problem, the entrances of corrupted traffic data are unknown. One straight solution is optimizing the following problem under the assumption that the n-rank of ℒ is small and the corrupted outliers are sparse or bounded:
(2)minℒ∑iμiranki(ℒ)s.t.∥𝒜-ℒ-𝒮∥F≤δ.
The tensor recovery problem of (2) has been studied in recent years, which will be detailed in Section 3. In this paper, a new data recovery method based on tensor model called Alternating Direction Method of Multipliers for Tensor Recovery (ADMM-TR) is proposed to handle the outlier traffic volumes.
This paper makes three main contributions. (1) We use tensor to model the traffic volume and take advantage of the multiway characteristics of tensor, which could explore the multicorrelations of different modes in traffic data; and (2) we formulate the problem of the traffic volume outlier data recovery as a tensor recovery problem; (3) we proposed ADMM-TR algorithm by extending ADMM from matrix to tensor case to solve the formulated tensor recovery problem for traffic volume, and the convergence of ADMM-TR is proved. It also should be noted that the proposed ADMM-TR method is different from [9], which reported that extended ADM for tensor recovery is proposed. In fact, they presented ADM for tensor completion, in which the data are missed and the entrances of missing data are known. While in this research, the objective is to recover the data which are corrupted including missing or noised and the entrances of corrupted values are unknown.
The paper is organized as follows. We present the review of tensor model in Section 2. Section 3 briefly reviews the related data recovery methods. In Section 4, tensor model for traffic volume is constructed, traffic data recovery problem is formulated, and an efficient algorithm is proposed to solve the formulation. Also a simple convergence guarantee for the proposed algorithm is given. In Section 5, we evaluated the proposed method on synthetic data and real world traffic volume data. Finally, we provide some concluding remarks in Section 6.
2. Notation and Review of Tensor Models
In this section, we adopt the nomenclature of Kolda and Bader’s review on tensor decomposition [10] and partially adopt the notation in [11].
A tensor is the generalization of a matrix to higher dimensions. We denote scalars by lowercase letters (a,b,c,…), vectors as bold-case lowercase letters (a,b,c,…) and matrices as uppercase letters (A,B,C,…). Tensors are written as calligraphic letters (𝒜,ℬ,𝒞,…).
An n-mode tensor is denoted as 𝒜∈ℝI1×I2×⋯×IN. Its elements are denoted as ai1⋯ik⋯in, where 1≤ik≤IK, 1≤K≤N. The mode-n unfolding (also called matricization or flattening) of a tensor 𝒜∈ℝI1×I2×⋯×IN is defined as unfold(𝒜,n):=A(n). The tensor element (i1,i2,…,iN) is mapped to the matrix element (in,j), where
(3)j=1+∑k=1k≠nN(ik-1)Jk,withJk=∏m=1m≠nk-1Im.
Therefore, A(n)∈ℝIn×J, where J=∏k=1,k≠nNIk. Accordingly, its inverse operator fold can be defined as fold(A(n),n):=𝒜.
The n-rank of an N-dimensional tensor 𝒜∈ℝI1×I2×⋯×IN, denoted by rn, is the rank of the mode-n unfolding matrix A(n):
(4)rn=rankn(𝒜)=rank(A(n)).
The inner product of two same-size tensors 𝒜,ℬ∈ℝI1×I2×⋯×IN is defined as the sum of the products of their entries, that is,
(5)〈𝒜,ℬ〉=∑i1∑i2⋯∑iNai1⋯ik⋯inbi1⋯ik⋯in.
The corresponding Frobenius norm is ∥𝒜∥F=〈𝒜,𝒜〉. Besides, the l0 norm of a tensor 𝒜, denoted by ∥𝒜∥0, is the number of nonzero elements in 𝒜 and the l1 norm is defined as ∥𝒜∥1≔∑i1⋯ik⋯in|ai1⋯ik⋯in|. It is clear that ∥𝒜∥F=∥A(n)∥F, ∥𝒜∥0=∥A(n)∥0, and ∥𝒜∥1=∥A(n)∥1 for any 1≤n≤N.
The n-mode (matrix) product of a tensor 𝒜∈ℝI1×I2×⋯×IN with a matrix M∈ℝJ×In is denoted by 𝒜×nM and is size I1×⋯×In-1×J×In+1×⋯×IN. In terms of flattened matrix, the n-mode product can be expressed as
(6)𝒴=𝒜×nM⟺Y(n)=MA(n).
3. Review of Data Recovery Methods
Recently, the problem of recovering the sparse and low-rank components with no prior knowledge about the sparsity pattern of the sparse matrix, or the rank of the low-rank matrix, has been well studied. Authors of [12] proposed the concept of “rank-sparse incoherence” and solved the problem by an interior point solver after being reformulated as a semidefinite problem. However, although interior point methods normally take very few iterations to converge, they have difficulty in handling large matrices. So this limitation prevented the usage of the technique in computer vision and the traffic volume recovery in this research.
To solve the problem for large scale matrices, Wright et al. [13] have adopted the iterative thresholding technique to solve the problem and obtained scalability properties. Lin et al. have proposed an accelerated proximal gradient (APG) algorithm [14] and applied techniques of augmented Lagrange multipliers (ALM) [15] to solve the problem. Yuan and Yang [16] have utilized the alternating direction method (ADM) which can be regarded as a practical version of the classical ALM method to solve the matrix recovery problem. The ADM method has been proved to have a pleasing convergence speed and results in [16] demonstrated its excellent performance.
Inspired by the idea of [16], this paper extends the sparse and low-rank recovery problem to tensor case, which is due to the fact that the multidimensional traffic data can be formulated into the form of tensor.
4. ADMM-TR for Traffic Volume Outlier Recovery
In this section, we show the solution of problem (2). The tensor model is firstly constructed for traffic volume in Section 4.1. Then we present the tensor recovery problem in Section 4.2. In Section 4.3, the classical ADMM approach is introduced. In Section 4.4, we convert the original problem into a constrained convex optimization problem which can be solved by the extended ADMM approach and present the details of the proposed algorithm. Also the convergence guarantees of the proposed algorithm are given in this section.
4.1. Tensor Model for Traffic Volume
The correlations of traffic volume data are critical for recovering the corrupted traffic volume data. Traditional methods mostly exploit part of correlations, such as historical or temporal neighboring correlations. The classic methods usually utilize the temporal correlations of traffic data from day to day. For the single detector data, multiple correlations contain the relations of traffic data from day to day, hour to hour, and so forth. In addition, the spatial correlations exist in multiple detectors data.
In this paper, quantitative analysis of traffic data correlation is analyzed based on the traffic volume data downloaded from http://pems.dot.ca.gov/. The correlation coefficient applied to measuring the data correlation is given by [17]
(7)s=∑n≥i>j≥1R(i,j)n(n-1)/2,
where n refers to the whole data points; R(i,j) refers to the correlation coefficient matrix. Table 1 gives the results of correlation coefficient of four modes, which is hour, day, week, and month.
The similarity coefficient of four modes.
Mode
Size
Similarity coefficient
Hour
6×12
0.9670
Day
7×288
0.8654
Week
7×288
0.9153
Link
4×288
0.8497
Conventional methods usually use day-to-day matrix pattern to model the traffic data. Although each mode of traffic data has a very high similarity, these methods do not utilize the multimode correlations, which are “Day × Hour,” “Week × Hour,” and “Link × Hour,” simultaneously and thus may result in poor recovery performance.
To make full use of the multimode correlation and traffic spatial-temporal information, traffic data need to be constructed into multiway data set. Fortunately, tensor pattern based traffic data can be well used to model the multiway traffic data. This helps keep the original structure and employ enough traffic spatial-temporal information.
4.2. The Tensor Recovery Problem
The problem of (2) is NP hard, since it is not convex. Then, we use an approximation formulation, as shown in (8),
(8)minℒ,𝒮:∥ℒ∥*+η∥𝒮∥1,s.t.∥𝒜-ℒ-𝒮∥F≤δ,
where 𝒜∈ℝI1×I2×⋯×IN is the given matrix to be recovered; ℒ∈ℝI1×I2×⋯×IN is the low-rank component of 𝒜; 𝒮∈ℝI1×I2×⋯×IN is the sparse component of 𝒜. Compared with (2), (8) relaxes the constrain for recovering a low-n-rank tensor from a high-dimensional data tensor despite both small entry-wise noise and gross sparse errors.
Recently, Liu et al. [18] have proposed the definition of the nuclear norm of an n-mode tensor:
(9)∥𝒳∥*∶=1n∑i=1n∥X(i)∥*.
Based on this definition, the optimization in (8) can be written as
(10)minℒ,𝒮:∥ℒ∥*+η∥𝒮∥1≡1n∑i=1nλi∥L(i)∥*+1n∑i=1nηi∥S(i)∥1s.t.∥𝒜-ℒ-𝒮∥F≤δ.
In order to recover (ℒ^,𝒮^), instead of directly solving (10), we solve the following dual problem:
(11)min:ℒ,𝒮12γ∥𝒜-ℒ-𝒮∥F2+∑i=1nλi∥L(i)∥*+∑i=1nηi∥S(i)∥1.
The problem in (11) is still difficult to solve due to the interdependent nuclear norm and l1 norm constraints. To simplify the problem, the formulation can be reformulated as follows:
(12)minℒ,𝒮,Mi,Ni:12γ∥𝒜-ℒ-𝒮∥F2+∑i=1nλi∥Mi∥*+∑i=1nηi∥Ni∥1s.t.Piℒ=MiPi𝒮=Ni∀i,
where Pi is the matrix representation of mode-i unfolding (note that Pi is a permutation matrix; thus PiTPi=I); M(i) and N(i) are additional auxiliary matrices of the same size as the mode-i unfolding of ℒ (or 𝒮).
4.3. The Classical ADMM Approach
The classical alternating direction method of multipliers (ADMM) is for solving structured convex programs of the form:
(13)minx∈Cx,y∈Cy:f(x)+g(y)s.t.Ax+By=c,
where f and g are convex functions defined on closed subsets Cx and Cy; A, B, and c are matrices and vector of appropriate sizes. The segmented Lagrangian function of (13) is
(14)LA(x,y,w)=f(x)+g(y)+〈w,Ax+By-c〉+β2∥Ax+By-c∥22,
where w is a Lagrangian multiplier vector and β>0 is a penalty parameter.
The approach performs one sweep of alternating minimization with respect to x and y individually then updates the multiplier w; at the iteration k the steps are given by [18, Equations (4.79)–(4.81)]:
(15)x(k+1)⟵argminx∈CxLA(x,y(k),w(k)),y(k+1)⟵argminx∈CyLA(x(k+1),y,w(k)),w(k+1)⟵w(k)+ρβ(Ax(k+1)+By(k+1)-c),
where ρ is the step length. A convergence proof for the above ADMM algorithm was shown as follows.
Theorem 1 (See [19, Proposition 5.2]).
Assume that the optimal solution set X* of (13) is nonempty. Furthermore, assume that Cx is bounded or else the matrix A*A is invertible. Then a sequence {x(k),y(k),w(k)} generated by (15) is bounded, and every limit point of {x(k)} is an optimal solution of the original of problem (13).
4.4. ADMM Extension to Tensor Recovery
We observe (12) is well structured in the sense that the separable structure emerges in both the objective function and constraints. Thus, we propose an algorithm based on an extension of the classical ADMM approach for solving the tensor recovery problem by taking advantage of this favorable structure.
The augmented Lagrangian of (12) is
(16)LA(ℒ,𝒮,Mi,Ni)=12γ∥𝒜-ℒ-𝒮∥F2+∑i=1n(λi∥Mi∥*+〈Yi,Piℒ-Mi〉+αi2∥Piℒ-Mi∥F2)+∑i=1n(ηi∥Ni∥1+〈Zi,Pi𝒮-Ni〉+βi2∥Pi𝒮-Ni∥F2),
where Yi, Zi are Lagrangian multipliers and αi, βi>0 are penalty parameters.
Then we can now directly apply ADMM with this augmented Lagrangian function.
Computing Mi. The optimal Mi can be solved with all other variables to be constant by the following subproblem:
(17)minMi:λi∥Mi∥*+〈Yi,Piℒ-Mi〉+αi2∥Piℒ-Mi∥F2.
As shown in [20], the optimal solution of (17) is given by
(18)Mi^=UiDλi/αi(Λ)ViT,
where UiΛViT is the singular value decomposition given by
(19)UiΛViT=Piℒ+Yiαi,
and the “shrinkage” operator Dτ(x) with τ>0 is defined as
(20)Dτ(x)={x-τifx>τx+τifx<-τ0otherwise.Computing Ni. The optimal Ni can be solved with all other variables to be the constants by the following subproblem:
(21)minNi:ηi∥Ni∥1+〈Zi,Pi𝒮-Ni〉+βi2∥Pi𝒮-Ni∥F2.
By the well-known l1 minimization [21], the optimal solution of (21) is
(22)Ni^=Dηi/βi(Pi𝒮+Ziβi),
where Dτ is the “shrinkage” operation.
Computing ℒ. Now we fix all variables except ℒ and minimize LA over ℒ. The resulting minimization problem is the minimization of a quadratic function:
(23)minℒ:LA(ℒ)=12γ∥𝒜-ℒ-𝒮∥F2+∑i=1n(〈Yi,Piℒ-Mi〉+αi2∥Piℒ-Mi∥F2).
The objective function is differentiable, so the minimizer ℒmin is characterized by (∂LA(ℒ))/∂ℒ=0. Thus, we obtain
(24)ℒmin={𝒜-𝒮-γ∑i=1nPiT(Yi-αiMi)}(1+γ∑i=1nαi).Computing 𝒮. Now we fix all variables except 𝒮 and minimize LA over 𝒮. The resulting minimization problem is the minimization of a quadratic function:
(25)min𝒮:LA(𝒮)=12γ∥𝒜-ℒ-𝒮∥F2+∑i=1n(〈Zi,Pi𝒮-Ni〉+βi2∥Pi𝒮-Ni∥F2).
The objective function is also differentiable, so the minimizer 𝒮min is characterized by (∂LA(𝒮))/∂𝒮=0. Thus, we have
(26)𝒮min={𝒜-ℒ-γ∑i=1nPiT(Zi-βiNi)}(1+γ∑i=1nβi).
For comparing with RSTD [22], we also choose the difference of ℒ and 𝒮 in successive iterations against a certain tolerance as the stopping criterion. The pseudocode of the proposed ADMM-TR algorithm is summarized in Algorithm 1.
Assume that the optimal solution set X* of (11) is nonempty. A sequence {ℒ(k),𝒮(k),Mi(k),Ni(k),Yi(k),Zi(k)} generated by our proposed ADMM-TR algorithm is bounded, and every limit point of {ℒ(k),𝒮(k)} is an optimal solution of the original problem (11).
Proof.
We check the assumptions of Theorem 1. Cx is not bounded, but Pi*Pi=I is a constant multiple of the identity operator. Thus Theorem 1 can also be applied to ADMM-TR and Theorem 2 can be derived.
5. Numerical Experiments
This section evaluates the empirical performance of the proposed algorithm on synthetic data and compares the results with RSTD (Rank Sparsity Tensor Decomposition) [22]. Also, experiments on traffic volume data outlier recovery illustrate the efficiency of the proposed method in traffic research filed.
We use the Lanczos algorithm for computing the singular values decomposition and adopt the same rule for predicting the dimension of the principal singular space as [22]. And the parameters are set as α=β=[I1/Imax,I2/Imax,…,In/Imax]T and γ=1/sum([I1/Imax,I2/Imax,…,In/Imax]T for all experiments, where Imax=max{Ii}. η is set to 1/Imax as suggested in [23].
All the experiments are conducted and timed on the same desktop with an Pentium (R) Dual-Core 2.50 GHz CPU that has 4 GB memory, running on Windows 7 and MATLAB.
5.1. Synthetic Data
ADMM-TR and RSTD are tested on the synthetic data of size 40×40×40. We generate the dimension r of a “core tensor” 𝒞∈ℝr×⋯×r which we fill with Gaussian distributed entries (~𝒩(0,1)). Then, we generate matrices U(1),…,U(N), with U(i)∈ℝni×r whose elements are also i.i.d. Gaussian random variables (~𝒩(0,1)) and set
(27)ℒ0=𝒞×1U(1)×2⋯×NU(N).
The entries of sparse tensor 𝒮0 are independently distributed, each taking on value 0 with probability 1-spr and each taking on impulsive value with probability spr. We apply the proposed algorithm to the tensor 𝒜0=ℒ0+𝒮0 to recover ℒ and 𝒮 and compare with RSTD. For these experiments, two cases of n-rank are investigated, n-rank = [5,5,5] and n-rank = [10,10,10]. Table 1 presents the average results (across 30 instances) for different spr.
The quality of recovery is measured by the relative square error (RSE) to ℒ0 and 𝒮0, defined to be
(28)RSE_ℒ0=∥ℒ^-ℒ0∥F∥ℒ0∥F,RSE_𝒮0=∥𝒮^-𝒮0∥F∥𝒮0∥F.
Tables 2 and 3 show that the proposed algorithm (ADMM-TR) is about 10 percent faster than RSTD proposed in [22] and achieves better accuracy in terms of relative square error. Though both of the algorithms involve computing a SVD per iteration, we observe that the proposed algorithm take much fewer iterations than RSTD to converge to the optimal solution.
Comparison of ADMM-TR with RSTD for synthetic data where 𝒜0∈ℝ40×40×40, n-rank=[5,5,5].
spr
ADMM-TR
RSTD
RSE_ℒ0(e-3)
RSE_𝒮0(e-3)
# iter
Time (s)
RSE_ℒ0(e-3)
RSE_𝒮0(e-3)
# iter
Time (s)
0.05
4.3
4.3
133
13.7
4.6
4.6
208
17.8
0.15
9.8
4.2
162
20.8
10.5
4.7
235
33.6
0.25
12.3
4.3
514
44.9
15.9
5.2
676
51.0
0.35
56.5
9.5
654
53.3
67.5
11.4
737
57.5
Comparison of ADMM-TR with RSTD for synthetic data where 𝒜0∈ℝ40×40×40, n-rank =[10,10,10].
spr
Algorithm: ADMM-TR
Algorithm: RSTD
RSE_ℒ0(e-3)
RSE_𝒮0(e-3)
# iter
Time (s)
RSE_ℒ0(e-3)
RSE_𝒮0(e-3)
# iter
Time (s)
0.05
4.4
2.0
236
26.4
4.7
2.5
338
29.9
0.15
4.7
2.1
417
42.1
5.2
2.5
603
50.8
0.25
8.9
2.9
664
60.8
12.0
4.2
663
53.6
0.35
17.3
5.2
981
80.6
21.1
6.5
1110
87.7
The more impulsive entries are added, that is, for higher value of spr, the less probable it becomes for the tensor recovery problem. In addition, the problem becomes more sophisticated when the n-rank is higher for the ground truth tensors of the same size. In Table 1, different spr is set for two tensor cases. And results show the recovered accuracy decreases as the spr grows for a certain case. In particular, the recovered accuracy for the tensor 𝒜0∈ℝ40×40×40 with n-rank = [5,5,5] decreases sharply when the spr is up to 40%, while the phenomenon occurs for tensor 𝒜0∈ℝ40×40×40 with n-rank = [10,10,10] when the spr is about 25%. This is due to that relative low rank and low sparse ratio are precondition of the tensor recovery problem.
5.2. Traffic Volume Data
To evaluate the performances of the proposed method in traffic volume data outlier recovery, a complete traffic volume data set is used as the test data set. We use the data of a fixed point in Sacramento County which is downloaded from http://pems.dot.ca.gov/. The traffic volume data are recorded every 5 minutes. Therefore, a daily traffic volume series for a loop detector contains 288 records, and the whole period of the data lasts for 16 days, that is, from August 2 to August 17, 2010.
Based on multiple correlations of the traffic volume data, we model the data set as a tensor model of size 16×24×12 which stands for 16 days, 24 hours in a day, and 12 sample intervals (i.e., recorded by 5 minutes) per hour. The ratios of outlier data are set from 5% to 15% and the outlier data are produced randomly. All the results are averaged by 10 instances.
For the real world data, we mainly pay attention to the correctness of the recovered traffic volume data. Thus the quality of recovery is measured by the relative square error (RSE) to ℒ0 and Mean Absolute Percentage Error (MAPE) to ℒ0, defined to be
(29)RSE_ℒ0=∥ℒ^-ℒ0∥F∥ℒ0∥F,MAPE_ℒ0=1M∑m=1M|tr(m)-te(m)tr(m)|,
where tr(m) and te(m) are the mth elements which stand for the known real value and recovered value, respectively. M denotes the number of recovered traffic volumes.
Table 4 presents the relative errors of traffic volume outlier data before and after being recovered by ADMM-TR. The results show that the RSE_ℒ0 and MAPE_ℒ0 for traffic volume data corrupted by outlier data are about 5 times than the data after being recovered by ADMM-TR. Figures 1, 2, and 3 present the profiles of traffic volume data for a day. The results show that ADMM-TR could recover the traffic volume outlier data with perfect performance.
Traffic volume outlier data before and afterbeingrecovered.
spr
Before recovery
After recovery
RSE_ℒ0(e-3)
MAPE_ℒ0(e-3)
RSE_ℒ0(e-3)
MAPE_ℒ0(e-3)
0.05
0.5005
0.2136
0.0961
0.0314
0.10
0.7066
0.5008
0.1417
0.1027
0.15
0.8850
0.8777
0.2221
0.2351
Comparisons with raw traffic volume data, data corrupted by outliers with 5% ratio, and data recovered by ADMM-TR.
Comparisons with raw traffic volume data, data corrupted by outliers with 10% ratio, and data recovered by ADMM-TR.
Comparisons with raw traffic volume data, data corrupted by outliers with 15% ratio, and data recovered by ADMM-TR.
6. Conclusions
In this paper, we concentrate on the mathematical problem in traffic volume outlier data recovery and proposed a novel tensor recovery method based on alternating direction method of multipliers (ADMM). The proposed algorithm can automatically separate the low-n-rank tensor data and sparse part. The experiments show that the proposed method is more stable and accurate in most cases and has excellent convergence rate. Experiments on real world traffic volume data demonstrate the practicability and effectiveness of the proposed method in traffic research domain.
In the future, we would like to investigate how to automatically choose the parameters in our algorithm and explore additional applications of our method in traffic research domain.
Acknowledgments
This research was supported by NSFC (Grant No. 61271376, 61171118, and 91120015), Beijing Natural Science Foundation (4122067). The authors would like to thank Professor Bin Ran from the University of Wisconsin-Madison and Yong Li from the University of Notre Dame for the suggestive discussions.
TamuraK.HirayamaM.Toward realization of VICS—vehicle information and communications systemProceedings of the IEEE-IEE Vehicle Navigation and Informations Systems ConferenceOctober 199372772-s2.0-0027876333KimS.LewisM. E.WhiteC. C.IIIOptimal vehicle routing with real-time traffic information2005621781882-s2.0-2164445300210.1109/TITS.2005.848362MirchandaniP.HeadL.A real-time traffic signal control system: architecture, algorithms, and analysis2001964154322-s2.0-003554664910.1016/S0968-090X(00)00047-4WilliamsB. M.DurvasulaP. K.BrownD. E.Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models199816441321412-s2.0-0032207514XuJ.LiX.ShiH.Short-term traffic flow forecasting model under missing data201030411171120PeiY.MaJ.Real-time traffic data screening and reconstruction20033677883ChenS.WangW.LiW.Noise recognition and noise reduction of real-time traffic data20063623223252-s2.0-33745152703DanielB.-G.FranciscoJ. D.-PDavidG.-O.Wavelet-based denoising for traffic volume time series forecasting with self-organizing neural networks20102575305452-s2.0-7795573082910.1111/j.1467-8667.2010.00668.xGandyS.RechtB.YamadaI.Tensor completion and low—rank tensor recovery via convex optimization201127202501010.1088/0266-5611/27/2/025010MR2765628KoldaT. G.BaderB. W.Tensor decompositions and applications200951345550010.1137/07070111XMR2535056ZBL1173.65029LewisA. S.KnowlesG.Image compression using the 2-D wavelet transform1992122442502-s2.0-0026852760ChandrasekaranV.SanghaviS.ParriloP. A.WillskyA. S.Rank-sparsity incoherence for matrix decomposition201121257259610.1137/090761793MR2817479ZBL1226.90067WrightJ.PengY.MaY.GaneshA.Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimizationProceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09)2009Vancouver, Canada20802088GaneshA.LinZ.WrightJ.WuL.ChenM.MaY.Fast algorithms for recovering a corrupted low-rank matrixProceedings of the 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP '09)December 20092132162-s2.0-7795112676110.1109/CAMSAP.2009.5413299LinZ.ChenM.WuL.MaY.The augmented Lagrange multiplier method for exact recovery of a corrupted low-rank matrices2009YuanX.YangJ.Sparse and low-rank matrix decomposition via alternating direction methods2009Department of Mathematics, Hong Kong Baptist UniversityQuL.ZhangY.HuJ.JiaL.LiL.A BPCA based missing value imputing method for traffic flow volume dataProceedings of the IEEE Intelligent Vehicles Symposium (IV '08)June 20089859902-s2.0-5774917857010.1109/IVS.2008.4621153LiuJ.MusialskiP.WonkaP.YeJ.Tensor completion for estimating missing values in visual dataProceedings of the International Conference on Computer Vision (ICCV '09)2009BertsekasD.TsitsiklisJ.1989Prentice-HallCaiJ.-F.CandèsE. J.ShenZ.A singular value thresholding algorithm for matrix completion20102041956198210.1137/080738970MR2600248ZBL1201.90155HaleE. T.YinW.ZhangY.Fixed-point continuation for ℓ1-minimization: methodology and convergence20081931107113010.1137/070698920MR2460734LiY.YanJ.ZhouY.YangJ.Optimum subspace learning and error correction for tensorsProceedings of the 11th European Conference on Computer Vision (ECCV '10)2010Crete, GreeceCandèsE. J.LiX.MaY.WrightJ.Robust principal component analysis?2011583, article 1110.1145/1970392.1970395