Mixture Augmented Lagrange Multiplier Method for Tensor Recovery and Its Applications

The problem of data recovery in multiway arrays (i.e., tensors) arises in many fields such as computer vision, image processing, and traffic data analysis. In this paper, we propose a scalable and fast algorithm for recovering a low-𝑛 -rank tensor with an unknown fraction of its entries being arbitrarily corrupted. In the new algorithm, the tensor recovery problem is formulated as a mixture convex multilinear Robust Principal Component Analysis (RPCA) optimization problem by minimizing a sum of the nuclear norm and the ℓ 1 -norm. The problem is well structured in both the objective function and constraints. We apply augmented Lagrange multiplier method which can make use of the good structure for efficiently solving this problem. In the experiments, the algorithm is compared with the state-of-art algorithm both on synthetic data and real data including traffic data, image data, and video data.


Introduction
A tensor is a multidimensional array.It is the higherorder generalization of vector and matrix, which has many applications in information sciences, computer vision, graph analysis [1], and traffic data analysis [2][3][4].In the real world, as the size and the amount of redundancy of the data increase fast and nearly all of the existing high-dimensional real world data either have the natural form of tensor (e.g., multichannel images) or can be grouped into the form of tensor (e.g., tensor face [5], traffic data tensor model [2][3][4], and videos), challenges come up in many scientific areas when someone confronts with the high-dimensional real world data.Because of some reasons, one wants to capture the underlying low-dimensional structure of the tensor data or seeks to detect the irregular sparse patterns of the tensor data, such as image compression [6], foreground segmentation [7], saliency detection [8], and traffic data completion [2,3].As a consequence, it is desirable to develop algorithms that can capture the low-dimensional structure or the irregular sparse patterns in the high-dimensional tensor data.
In the two-dimensional case, that is, the matrix case, the "rank" and "sparsity" are the most useful tools for matrix-valued data analysis.Chandrasekaran et al. [9] proposed the concept of "rank-sparse incoherence" to depict the fundamental identifiability of recovering the low-rank and sparse components.Wright et al. [10] and Candes et al. [11] demonstrated that if the irregular sparse matrix  is sufficiently sparse (relative to the rank of ), one can accomplish the sparse and low-rank recovery by solving the following convex optimization problem: min , where  ∈ R × is the given matrix to be recovered;  ∈ R × is the low-rank component of ;  ∈ R × is the sparse component of ; ‖ ⋅ ‖ * denotes the nuclear norm defined by the sum of all singular values; ‖ ⋅ ‖ 1 denotes the sum of the absolute values of matrix entries;  is a positive weighted parameter.This optimization method is called the Robust Principal Component Analysis [10,11] (RPCA) or the Principal Component Pursuit (PCP) due to its ability of exactly recovering the underlying low-rank matrix even in the presence of being corrupted by large entries or outliers.
Although the low-rank matrix recovery problem has been well studied, there is not much work on tensors.Li et al. [12] derived a method for the optimal Rank − ( 1 ,  2 ,  3 ) tensor decomposition model.Considering a real -mode tensor A ∈ R  1 × 2 ×⋅⋅⋅×  , the best Rank − ( 1 ,  2 ,  3 ) approximation is to find a tensor A * ∈ R  1 × 2 ×⋅⋅⋅×  with prespecified rank  ( () ) =   that minimizes the least-squares cost function: ( The -rank conditions imply that A * should have the Tucker decomposition [13] as A * ≈ z× 1  1 × 2  2 , . . ., ×    , . . ., ×    .For the application, they applied the model to the high-dimensional tensor-like visual data by dividing the observed tensor into a lowdimensional structure plus unbounded but sparse irregular patterns: A = L + S. By assumption that the -rank of L should be small and the corruption S is bounded, the original function is as follows: In order to solve the problem, they made some conversions to (3) and extended the matrix robust PCA problem to the tensor case.A relaxation technique was used to separate the dependant relationships and the block coordinate descent (BCD) method was used to solve the low--rank tensor recovery problem.Then they proposed the rank sparsity tensor decomposition (RSTD) algorithm.In fact, their algorithm can be seen as a basic version of Lagrange multiplier method.Although being simple and provably correct, the RSTD algorithm requires a very large number of iterations to converge and it is difficult to choose the parameters for speedup.Besides, due to the property of the basic Lagrange multiplier method, the accuracy of results needs to be improved.
In this paper, a new algorithm for low--tensor recovery, which is termed as Mixture Augmented Lagrange Multiplier Method for Tensor Recovery (MALM-TR), is proposed.In the new algorithm, analogy to the RSTD [12], we convert the tensor recovery problem into a mixture convex optimization problem by adopting the relaxation technique strategy which eliminates the interdependent trace norm and ℓ 1 norm constrain.Actually, the elements involved in problem are all in matrix case.Thus, it can be treated as a multilinear extension of the RPCA problem and subsumes the matrix RPCA problem as a special case.Lin et al. [14] have proved that the matrix RPCA problem can be solved by ALM with achieving higher precision, being less storage/memory demanding, and having a pleasing Q-linear convergence speed.Inspired by these merits of ALM, we try to extend the augmented Lagrange multiplier method (ALM) to the multilinear RPCA problem and prove that ALM is not only fit to solve the matrix RPCA problem but also suitable to solve the multilinear RPCA problem.
For the usage of this algorithm, it is applied to real world data recovery including traffic data recovery, image restoration, and background modeling.
In traffic data analysis area, due to detector and communication malfunctions, traffic data often confronts with the noising data phenomenon, especially the outlier value noise, which has a great impact on the performance of Intelligent Transportation System (ITS).Therefore, it is essential to solve the issues caused by outlier data in order to fully explore the applicability of the data and realize the ITS applications.In the application part of this paper, we introduce the tensor form to model the traffic data, which can encode the multimode (e.g., week, day, record) correlations of the traffic data simultaneously and preserve the multiway nature of traffic data.For example, it is assumed that a loop detector collects traffic volume data every 15 minutes.Thus, it will have 96 records in a day.If we have 20 weeks traffic volume data, these data can be formed into a tensor of size 20 × 7 × 96.Then, the proposed tensor-based method which can well mine the multimode correlations of traffic data mentioned above is used to remove outlier noise of the traffic data.
It is observed that the multichannel image can be seen as a tensor with multidimensions.For example, RGB image has three channels including Red channel, Green channel, and Black channel.Thus, it can be represented as width × height × 3 which is a 3-dimensional tensor.For the application, the proposed method is used to remove the noise of the image.Though the method would not be reasonable for some natural images, it has many applications for visual data such as structured images (e.g., the fac ¸ade image), CT/MRI data, and multispectral image.Besides images, video data can be grouped into the form of tensor.For example, there is a video which has 300 gray frames and each of which is in size of 200 × 200.These video data can form a tensor of size 200 × 200 × 300.For the video application, the proposed method will be used for background modeling.
The rest of the paper is organized as follows.Section 2 presents some notations and states some basic properties of tensors.Section 3 discusses the detailed process of our proposed algorithm.Section 4 tests the algorithm on different settings, varying from simulated data to applications in computer vision, image processing, and traffic recovery.Finally, some concluding remarks are provided in Section 5.
The -rank of a -dimensional tensor A ∈ R  1 × 2 ×⋅⋅⋅×  , denoted by   , is the rank of the mode- unfolding matrix  () : If the -rank is very small related to the size of the tensor, we call it low--rank tensor.
The inner product of two same-size tensors A, B ∈ R  1 × 2 ×⋅⋅⋅×  is defined as the sum of the products of their entries, that is, The corresponding Frobenius norm is ‖A‖  = √⟨A, A⟩.Besides, the ℓ 0 norm of a tensor A, denoted by ‖A‖ 0 , is the number of nonzero elements in A and the ℓ 1 norm is defined as The -mode (matrix) product of a tensor A ∈ R  1 × 2 ×⋅⋅⋅×  with a matrix  ∈ R ×  is denoted by A×   and is of size In terms of flattened matrix, the -mode product can be expressed as

MALM-TR
This section is separated into 2 parts.In Section 3.1, we convert the low-n-tensor recovery problem into a multilinear RPCA problem.Section 3.2 simply introduces the ALM approach, extends ALM approach to solve the multilinear RPCA problem, and presents the details of the proposed algorithm.

The Multilinear RPCA Problem. The derivation starts
with the general version [10] of matrix recovery problem: min : where  ∈ R × is the given matrix to be recovered;  ∈ R × is the low-rank component of ;  ∈ R × is the sparse component of ; rank() denotes the rank of ; ‖ ⋅ ‖ 0 denotes the number of nonzero matrix entries;  is a positive weighted parameter.The higher-order tensor recovery problem can be generated from the matrix (i.e., 2nd-order tensor) case by utilizing the form of ( 8), leading to the formulation of the following: where L, S, A are -mode tensors with identical size in each mode.A is the observed tensor data.L and S represent the correspondent structured part and irregular sparse part, respectively.rank CP (X) is the minimum number of rank-1 tensors that generates X as their sum [15,16].However, ( 9) is unsolvable because there is no straightforward algorithm to determine the CP-rank of a specific given tensor and the ℓ 0 norm is highly nonconvex.But when the given tensor is a low--rank tensor we can use the -rank of unfolding of a tensor A instead of CP-rank of tensor to capture the global information of the given tensor.Therefore, we can minimize the -ranks of the given tensor, respectively, instead of minimizing the CP-rank to solve the tensor completion problem.Obviously, ‖S‖ 0 is equal to ‖ () ‖ 0 .As a result, a function  which minimizes all the -ranks of the given tensor to replace ( 9) is obtained as follows: min In order to utilize the information of each mode as much as possible, the -rank minimization problems of each mode are combined by weighted parameters to replace the function ) which is defined in [17,18].Thus, the tensor completion problem becomes min L,S Problem ( 12) is still hard to solve due to the interdependent trace norm and ℓ 1 norm constraint.In order to simplify the problem, we introduce additional auxiliary matrix  () =  () ,  () =  () .Then, we relax the equality constrains by ≤  1 corresponds to the stable Principle Component Pursuit (sPCP) in the matrix case [19].Finally, we get the relaxed form of (12)

Optimization Process.
In [20], the general method of augmented Lagrange multipliers is introduced for solving constrained optimization problems of the kind: min  () , where  : R  → R and ℎ : R  → R  , the augmented Lagrange function is defined as where  is a positive scalar, and then the optimization problem can be solved via the method of augmented Lagrange multipliers (see [21] for more details).
It is observed that ( 13) is well structured and the separable structure emerges in both the objective function and constraint conditions.We convert (9) The core idea of solving the optimization problem in ( 17) is to optimize a group of variables while fixing the other groups.The variables in the optimization are  (1) , . ..,  () ,  (1) , . ..,  () ,  () ,  () which can be divided into 2 + 2 groups.To achieve the optimal solution, the method estimates  () ,  () ,  () ,  () sequentially, followed by certain refinement in each iteration.
Computing  () .The optimal  () with all other variables fixed is the solution to the following subproblem: min As shown in [22], the optimal solution of ( 18) is given by where   Λ   is the singular value decomposition given by Input: n-mode tensor A. Parameters: , , , , , .
( and D  is the "shrinkage" operation.The "shrinkage" operator D  () with  > 0 is defined as The operator can be extended to the matrix or tensor case by performing the shrinkage operator towards each element.
Computing  () .The optimal  () with all other variables fixed is the solution to the following subproblem: min By the well-known ℓ 1 norm minimization [23], the optimal solution of ( 22) is Computing  () .The optimal S with all other variables fixed is the solution to the following subproblem: It is easy to show that the solution to ( 24) is given by Computing  () .The optimal L with all other variables fixed is the solution to the following subproblem: It is easy to show that the solution to ( 26) is given by The pseudo-code of the proposed MALM-TR algorithm is summarized in Algorithm 1.
Under some rather general conditions, when {, , } is an increasing sequence and both the objective function and the constraints are continuously differentiable functions, it has been proven in [20] that the Lagrange multipliers {  ,   ,   } produced by Algorithm 1 converge Q-linearly to the optimal solution when {, , } is bounded and super-Q-linearly when {, , } is unbounded.Another merit of MALM-TR is that the optimal step size to update {  ,   ,   } is proven to be the chosen penalty parameters {, , }, making the parameter tuning much easier.A third merit of MALM-TR is that the algorithm converges to the exact optimal solution, even without requiring {, , } to approach infinity [20].

Experiments
In this section, using both the numerical simulations and the real world data, we evaluate the performance of our proposed algorithm and then compare the results with RSTD on the low--rank tensor recovery problem.In all the experiments, the Lanczos bidiagonalization algorithm with partial reorthogonalization [24] is adopted to obtain a few singular values and vectors in all iterations.A major challenge of our algorithm is the selection of parameters.We simply set the parameters  =  =  = [ 1 / max ,  2 / max , . . .,   / max ]  for all experiments, where  max = max{  }.Similarly, we choose  = 1/√ max as suggested in [11] and tune  with the change of .For comparing with RSTD [12], we also use the difference of L and S in successive iterations against a certain tolerance as the stopping criterion.All the experiments are conducted and timed on the same desktop with a Pentium (R) Dual-Core 2.50 GHz CPU that has 4 GB memory, running on Windows 7 and Matlab.

Numerical Simulations.
A low--rank tensor L 0 is generated as follows.The -way Tensor Toolbox [25] is used to generate a third-order tensor with the size of  1 ×  2 ×  3 and the relative small -rank [ 1  2  3 ].The generated tensor is in Tucker model [13] described as L 0 = C ×1  ×2  ×3 .To impose these rank conditions, C is R  1 × 2 × 3 core tensor with each entry being sampled independently from a standard Gaussian distribution N(0, 1), ,  and  are  1 ×  1 ,  2 ×  2 ,  3 ×  3 factor matrices generated by randomly choosing each entry from N(0, 1).Here without loss of generality we make the factor matrices orthogonal.But one major difference is that the -ranks are always different along each mode while the column rank and row rank of a matrix are equal to each other.For simplicity, in this paper we set the mode- ranks with the same value.
The entries of sparse tensor S 0 are independently distributed, each taking on value 0 with probability 1 − spr, and each taking on impulsive value with probability spr.The recovered tensor A 0 is generated as A 0 = L 0 + S 0 .
Tables 1 and 2 present the average results (across 10 instances) for different sparse ratio.The results demonstrate that our proposed algorithm MALM-TR outperforms RSTD on either efficiency or accuracy.

Image Restoration.
One straightforward application of our algorithm is the image restoration.Same as [12] pointed, our algorithm also assumes the image to be well structured.Though the assumption would not be reasonable for some natural images, it has many applications for visual data such as structured images (e.g., the fac ¸ade image), CT/MRI data, and multispectral image.In experiments, we apply the algorithm on image restoration of the fac ¸ade image, which is also used in [12,17].We add different percent of random impulsive noise to the image and compare MALM-TR with RSTD.The results produced by both algorithms are shown in Figure 1.

Background Modeling.
Another application of our algorithm is to estimate a good model for the background variations in a scene (i.e., background modeling).In this situation, it is natural to model the background variation as approximately low rank.Foreground objects generally occupy only a fraction of the image pixels and hence can be treated as sparse part.
We test our algorithm using an example from [26] and compare with RSTD [12].The visual comparisons of the background modeling are shown in Figure 2. It is observed that  our algorithm is effective in separating the background which is even a dynamic scene.The results are also comparable to RSTD.

Traffic Data Recovery.
In our previous work [3,4], we have proposed two tensor-based methods on traffic data application.In [3], a tensor imputation method based on Tucker decomposition is developed to estimate the missing value.As a fact that the exact coordinate and the number of the missing data in the tensor form can be observed and obtained because if an element in the tensor form is missing, it doesn't have value so we can recognize it easily.While this paper recovers a low--rank tensor that is arbitrarily corrupted by a fraction of noise based on the trace norm and ℓ 1 -norm optimization.The number and the coordinate of the corrupted data are unknown or not easy to obtain.That  means that it is hard to recognize the corrupted data, because the corrupted data have values and hardly be separated from the correct data.The problems solved by the two papers are two different problems.Paper [4] is written from the point of traffic data recovery application which is the same problem that will be solved in this section.The main difference of the two proposed methods is how to use the constraint condition A = L + S. Reference [4] puts the constraint condition to the minimized function with only one parameter , which leads the objective function to contain not only tensor but also matrix.However, as the size and structure of each mode of the given tensor data are not always the same, the contribution of each mode of the tensor to the final result may be different.In order to utilize the information of the constraint condition as much as possible, this paper unfolds constraint condition along each mode and use weighted parameters   to obtain the new constraint condition in matrix versions ∑  =1 (  /2)‖ () −  () − () ‖ 2  which is put into the minimized function using the augmented Lagrange multiplier strategy.With different objective functions, the optimized process is different too.More details can be found in [4].
In the fourth part of the experiment section, we will apply the proposed algorithm to traffic data recovery.The data used in the experiment are collected by a fixed loop in Sacramento County and downloaded from http://pems.dot.ca.gov/.The period of the data lasts for 77 days from March 14 to May 29, 2011.The traffic volume data are recorded every 5 minutes.Therefore, a daily traffic volume series for a loop detector contains 288 records.To finish traffic data recovery by the proposed algorithm, the first step is to convert the mass traffic data into a tensor form.In this part, we choose 8-week complete traffic volume data from the 77 days.Then, the 8week data are formed into a tensor model of size 8 × 7 × 288 as Figure 3 shows.In this model, "8" stands for 8 weeks, "7" stands for seven days in one week, and "288" stands for 288 records in one day.
In our previous work [3], the similarity coefficient [27] had been used to analyze the high multimode correlation ("link" mode, "week" mode, "day mode, " and "hour" mode) of traffic data from the point of view of statistic characteristic.For the high multicorrelations of traffic data, the tensor form of size 8 × 7 × 288 can be approximated by a low--rank tensor.
According to the above description, the traffic data are reasonably converted into a tensor form which can be approximated by a low--rank tensor.In the traffic data recovery experiment, it is assumed that a subset of entries of the traffic data tensor form is corrupted by impulsive noise at random.The ratios of noisy are set from 5% to 25% with the tolerance 5%.Then we compare the proposed method with RSTD algorithm using RSE as the criterion.The criterion is defined as the following function shows: Table 3 tabulates the RSEs by sparse impulsive noise with different ratio on traffic data.Especially, the unrecovered column presents the RSE between corrupted data and the original data.From data in the table, it is observed that the RSEs obtained by MALM-TR and RSTD are much smaller than the unrecovered data, which means that both algorithm can improve the quality of the corrupted data.Moreover, the RSEs of MALM-TR are smaller than RSTD.From the curves of Figure 4, it is vividly shown that our method performs better than RSTD.

Conclusion
In this paper, we extend the matrix recovery problem to low--rank tensor recovery and propose an efficient algorithm () −  () −             2  .
The simulated tensor used in the experiments is of size 40 × 40 × 40, varying the -rank  and the sparse ratio spr.The parameters are adjusted according to the different  and spr.The quality of recovery is measured by the relative square error (RSE) to L 0 and S 0 , which is defined asRSE L 0 =      L − L 0         L 0     , RSE S 0 =      Ŝ − S 0          S 0     .

Figure 2 :
Figure 2: Background modeling.Top: original video sequence of a scene.Middle: foreground object recovered by MALM-TR.Bottom: foreground object recovered by RSTD [12].The results are highlighted for both RSTD and MALM-TR.

Figure 4 :
Figure 4: Comparison of RSE curves on traffic data.
RSE =    recovered data − original data         original data     .

Table 3 :
Comparison of RSE on traffic data.