Scene Flow Estimation Based on Adaptive Anisotropic Total Variation Flow-Driven Method

Scene flow estimation based on disparity and optical flow is a challenging task. We present a novel method based on adaptive anisotropic total variation flow-driven method for scene flow estimation from a calibrated stereo image sequence. The basic idea is that diffusion of flow field in different directions has different rates, which can be used to calculate total variation and anisotropic diffusion automatically. Brightness consistency and gradient consistency constraint are employed to establish the data term, and adaptive anisotropic flow-driven penalty constraint is employed to establish the smoothness term. Similar to the optical flow estimation, there are also large displacement problems in the estimation of the scene flow, which is solved by introducing a hierarchical computing optimization. The proposed method is verified by using the synthetic dataset and the real scene image sequences. The experimental results show the effectiveness of the proposed algorithm.


Introduction
Scene flow was first introduced by Vedula et al. [1] in 1999.It is defined as a dense 3D motion field that describes the change of structured light in the image surfaces, which can be estimated from a calibrated stereo image sequence.We can obtain stereo image sequence by using binocular stereo vision sensor, like BumblebeeX3.As we all know, optical flow can be seen as the projection of the moving object on the retina or the camera's sensor.So scene flow can be regarded as a 3D extension of optical flow.Reliable 3D motion flow field can be used in many applications.For example, Menze and Geiger [2] focus on the autonomous driving with 3D scene flow estimation.Lenz et al. [3] used scene flow for moving object detection in urban environments.Waizenegger et al. [4] present a real-time approach for 3D structure estimation of human bodies.In general, scene flow estimation techniques can be divided into two categories.One is variational minimization method of scene flow based on the optical flow and the disparity from stereo image sequence [5].The other is 3D point cloud parametrization method based on the 3D structure to estimate the scene flow, which allows us to directly estimate the desired unknowns [6].Under these two frameworks, a lot of researches on optical flow estimation can be used for scene flow estimation.
In the estimation of optical flow, many regularization approaches have been derived to solve the ill-posed problem.These methods can be done by minimizing an energy function min ∫ Ω Ψ data (  ⋅  +   ⋅ V +   ) + Ψ smooth (, V)  , (1) where Ψ smooth (, V) is a regularization term which penalizes variations  and V. Ω refers to the domain of the image plane.Ψ data and Ψ smooth are convex regularization functions and used to guarantee convexity and differentiability of the energy function.
Many smoothing regularizations are characterized as below.Homogenous regularization is named by Bruhn et al. [7], because the approach is a linear diffusion equation which is defined by Horn and Schunck [8].Inspired by image filtering, Alvarez and Esclarin [9] use the squared magnitude of the image gradient to control the smoothness.Perona and Malik [10] propose an anisotropic smoothness method, which can preserve flow edges for a large image gradient and reduce the smooth strength in the consistency area.Nagel and Enkelmann [11] only smooth flow along the image edge and reduce the smooth strength in the direction perpendicular to image edge.In other words, this is the "oriented smoothness" method, which is driven by image gradient.Weickert and 2 Mathematical Problems in Engineering Schnörr [12] propose a systematic classification of rotation invariant convex regularizers, which provides a unifying framework for image-driven and flow-driven methods, as well as isotropic and anisotropic regularizers.
Inspired by optical flow regularizers, we propose a scene flow estimation method, which can take the advantages of anisotropic total variation and adaptive regularizer.Different from the structure tensor, our anisotropic diffusion applies robust penalty function to the component of gradient terms in different direction.A weight function used by Blomgren and Chan [13] for image segmentation is introduced to reduce diffusion of flow field in the large gradient area.

Related Work
Binocular camera can be used to build the stereo vision system that is used to obtain the stereo image sequences.Li and Sclaroff [14] first estimated scene flow under binocular setting as a joint framework including optical flow estimation and stereo matching.Huguet and Devernay [5] proposed a common variational framework for scene flow estimation by coupling optical flow and stereo matching.A rotating sphere dataset with ground truth is provided publicly in this paper, which allows us to quantitatively evaluate the accuracy of the estimation.After that, KITTI benchmark was published for autonomous driving [2], and scene flow estimation can also be evaluated using this dataset.400 dynamic scenes were annotated with detailed 3D CAD models and revealed great challenges for scene flow algorithms.Wedel et al. [15] modified Huguet and Devernay's method [5] by decoupling the stereo and motion estimation.Optical flow consistency in both views and the stereo consistency in time  + 1 are taken as data term, while the smoothness term of disparity change was separated from the regularization term.With GPU's development, they achieved the promising accuracy results with frame-rate at 5 fps.Basha et al. [6] proposed a 3D point cloud parametrization method which enforces multiview geometric consistency based on brightness consistency and piecewise smoothness assumption.Menze and Geiger [2] used a novel model and dataset for scene flow estimation, which represents each element in the scene by its rigid motion parameters.And each element can be achieved by super pixel segmentation.Vogel et al. [16] proposed a piecewise rigid model, which combines rigidly moving 3D planes constraint and segmentation regularization, and their method yields better results.

Proposed Method of Scene Flow Estimation
3.1.Variational Scene Flow Framework.Our goal is unify the data term and smooth term in a variational framework and obtain the scene flow and 3D structure.Our method follows the variational formulation proposed by Huguet and Devernay [5].It has the following common form: (2) 3.2.Data Term.We first obtain a set of stereo image sequences, whose disparity is only in the horizontal direction.Two image sequences between two time steps are shown in Figure 1.According to the time and space constraint from 4 images, data term can be divided into 4 parts where Ω usually refers to the domain of the image plane.
is an occlusion factor, which equals 0 when the pixel is occluded and equals 1 otherwise.Each part contains brightness constancy assumption and gradient constancy assumption to overcome the nonlinear illumination problem.
In other words, the brightness and the gradient of the pixel are nearly constant in the short time and in different views.According to the corresponding relationship, brightness and gradient constancy assumption are given by Mathematical Problems in Engineering 3 where ∇ is the vector of gradient.We introduce the following abbreviation to explain these 4 terms.(5)

Δ (𝐼
The 4 components of the data term can be written as where   and   are the left and the right data term energy function of optical flow, respectively.  0 is the disparity term that represents the stereo matching at time  0 .   is the disparity term at time  + 1. Penalty function (Ψ( 2 ) = √  2 +  2 ,  = 0.001) is introduced to guarantee convexity and differentiability of the whole data term energy function and remove the outliers at the same time.Outliers are the pixels that are not consistent with the pixels affected by noise, illumination change, and occlusion. can be determined according to the intensity of illumination change or just rely on the experiences.The most obvious thing should be noted here is that if there is an error in any term of , V,  0 , and   , the error would be propagated to others.We minimize the 4 data terms together to prevent the error from this situation.

Smooth Term Based on Anisotropic Total Variation.
Huguet and Devernay [5] used flow-driven regularization that uses a robust and convex function to keep convexity and differentiability of the smooth term Here Ψ is used to remove the outliers and preserve the discontinuities of the flow field.The magnitude of the gradient is We further derive its scene flow form by minimizing the following problems.In order to simplify the equation, we only consider the 2D case It corresponds to the diffusion equation as ) . ( In the directions of  and V, the weights are the same and are related to the sum of the squares of the gradient.We propose a new scheme that regularization includes information about the direction of the flow and we define it as anisotropic total variation smoothness term.In the 2D case, But for outliers, robustness function still should be used.Here we show that it is anisotropic.Similarly, in the 2D case, we define the following equation: We minimize it and obtain the following equation: Obviously, the weights only depend on the direction of each gradient of   or V  and are not related to other direction, which can be regarded as anisotropic diffusion of flow field.
In this way, we can control the rate of diffusion according to the gradient of flow field so that we can preserve the sharp edge at the position of nonflat area.And in the flat region, it has the same strength of smooth term.Then a new smooth term can be defined as where  and  are empirical parameters.

Smooth Term Based on Adaptive Anisotropic Total Variation Flow-Driven Method.
A major deficiency of the total variation that Rudin et al. [17] pointed out is staircase in the steady state solution.Marquina and Osher [18] proposed a modified method in 2000, but they just relieved this phenomenon.Different from total variation, there is no such problem with 2 norm.Blomgren and Chan [13] used the following norm in the image restoration: where  function monotonously decreases and meets the following requirement: At the edge position, we have In the flat region, we have In other words, this norm can choose from 1 to 2 according to the gradient magnitude automatically.To achieve this, a rational form can be chosen as follows: The Euler-Lagrange equation for (15) is Define a function to represent the equation of the bracket in (20): Then (20) can be simplified as Equation ( 23) can be derived from (21): In the smooth region of flow, the regularization is similar to linear smoothing, and total variation model will be used closing to the flow edge.Combining the robust function ]  .
(24) ]               . (28) We define the  1 function as follows: and then (28) can be simplified as The partial derivatives of  with respect to V,  0 −   , and  0 produce similar equations as given below: (33)

Multiresolution Strategy.
In order to solve the problem of large displacements in the scene and avoid local minima, we introduce the multiresolution method.Referring to Brox et al. [19], the sampling factor is set to 0.9 to avoid a larger span of adjacent layers and local minima.At the same time, we also have a serious problem in the scene flow estimation.
If we start to calculate from low resolution layer, there will be a false disparity which is passed to the high resolution layer.
For this reason, we start to calculate scene flow from middle resolution layer.A schematic diagram that uses the pyramid hierarchy is given in Figure 2.
In each level of the pyramid, we use the warping method.For the two frames of the  level,   0 is obtained according to a certain sampling standard.   can be obtained by motion compensation based on the image  −1  in upper level.We only need to minimize the residual of each layer of the scene flow and the error generated by the large displacement can be avoided.The evaluation of scene flow requires minimizing partial differential equations (32)-(33).This is done by using Euler-Lagrange equations.We use a nested fixed point iteration scheme suggested by Brox et al. [19] and linearize equations in each layer of pyramid.The inner loop is used to linearize brightness constancy assumption and outer loop linearizes the robust penalty function Ψ.We use SOR (Super Relaxation Iteration) as an iterative method.The full algorithm for scene flow estimation is given in the algorithm scheme (see Algorithm 1).

Experiments and Results
In our method, scene flow can be represented by optical flow (, V) and disparity ( 0 ,   ).We used data sets from Middlebury website to evaluate the accuracy of scene flow estimation.The datasets are captured from 8 calibrated cameras in each scene.The images from cameras 2 and 6 are taken as the stereo pairs at time , while the images from cameras 4 and 8 are taken as the corresponding stereo pairs at time  + 1.We evaluate the accuracy by calculating the RMS error in terms of optical flow, disparity map at time , and disparity map at  + 1 as well as the average angular error (AAE).
We compared our results with Huguet and Devernay [5] and Basha et al. [6] using Venus, Cones, and Teddy from Middlebury datasets as shown in Figures 3, 4, and 5. Table 1 summarizes the RMS errors for optical flow, disparity at time , and disparity at time  + 1 to measure the deviation with the ground truth.AAE errors are computed to measure the deviation with the standard flow field.
RMS errors of optical flow (O.F.), disparity at time  (Disp.), disparity at time +1 (Dip.+1), and AAE errors of optical flow are computed to evaluate the accuracy of our method.
Table 1 shows that the results of our method are more accurate than those of Huguet and Devernay [5] for RMS of optical flow and disparity.Comparing with Basha et al. [6], AAE errors of our method are more accurate.For the RMS errors in disparity , our method is better than Huguet    Figure 6 shows the performance on the Venus dataset quantitatively.In terms of the component U, our method keeps more smooth and sharp boundary when comparing with Huguet and Devernay [5].
Cones, Teddy, and Venus datasets have complex edges in the scenes.Traditional scene flow methods are suffering from the isotropic diffusion and receive lower flow accuracy.The proposed adaptive anisotropic diffusion smooth term can smooth scene flow along the direction of motion edges and choose 1 or 2 regularization automatically, which can receive high flow accuracy at the position of edges and preserve motion edges.
In order to compare our method with other scene flow methods, we used a rotating sphere dataset which includes relatively complex motion and consists of the hemispheres in opposite directions.
Figure 7 shows the left and right views of the sphere.The 3D motion , V and disparity are also shown for comparison.Figure 8 shows results computed by our method.Table 2 summarizes the RMS errors of sphere sequence.
Table 2 shows that RMS error of our method in optical flow is the same as that of Vogel et al. [16] and better than others.For the RMS error in disparity at time , our method produces a preferable result.
Rotating sphere dataset has opposite directions motion, which is a very challenging task.The proposed method can          The RMS errors of Huguet and Devernay [5], Wedel et al. [15], Vogel et al. [16], and our method are compared using optical flow (O.F.) and disparity map (Disp.).
smooth flow field along the motion edge and can also reduce the rate of diffusion at the position of motion edges.In this way, our method can preserve motion edges better.

Conclusion
In this article, we proposed an adaptive anisotropic total variation flow-driven regularization method in the variational framework for scene flow estimation.According to the corresponding relationship of the stereo image sequence, data term is divided into 4 parts and meets the brightness and gradient constancy assumption.Regularization term has the advantages of anisotropic smoothing, which applies different weights in the different directions.1 and 2 norm can be chosen automatically according to the gradient of the flow field to preserve the boundary of scene flow and reduce the staircase effect at the same time.By minimizing the energy function, we obtain dense optical flow and disparity map.Pyramid optimization method and warping technology are also referenced to deal with large displacement problems.
In the section of experiments, RMS errors of our method for Middlebury datasets Cones and Teddy are better than those of Huguet and Devernay [5] and close to those of Basha et al. [6].For the rotation sphere dataset, our method's result is better than those of Huguet and Devernay [5] and Wedel et al. [15] and has the same accuracy as that of Vogel et al. [16].The experimental results show the effectiveness of our method.

Figure 1 :
Figure 1: Data term in the two-view case.Motion of a single pixel point in two consecutive frames of the image sequences.Brightness and gradient can be considered equal approximately in every image.

Figure 2 :
Figure 2: Multiresolution strategy for scene flow computation.The left side pyramid is the image at time .The right side pyramid is the image at time  + 1.

Ensure:
The images at time  and  + 1 are in the same level Require:  >  ≥  ≥ 0 for level =  to  do Left flow (  , V  ) adaptive anisotropic flow-driven Right flow (  , V  ) adaptive anisotropic flow-driven end for stereo ( 0 ,   ) for level =  to  do scene flow (  , V  ,  0 ,   ) adaptive anisotropic total variation flow-driven end for Algorithm 1: Scene flow estimation.

Figure 3 :
Figure 3: Venus of Middlebury stereo datasets.(a) and (b) are left and right image in time .(c) and (d) are ground truth of disparity maps.(e) and (f) are left and right image in time  + 1. (g) is ground truth of optical flow in horizontal direction.(h) is ground truth of optical flow in vertical direction.

Figure 4 :
Figure 4: Cones of Middlebury stereo datasets.(a) and (b) are left and right image in time .(c) and (d) are ground truth disparity maps.(e) and (f) are left and right image in time  + 1. (g) is ground truth of optical flow in horizontal direction.(h) is ground truth of optical flow in vertical direction.

Figure 5 :
Figure 5: Teddy of Middlebury stereo datasets.(a) and (b) are left and right image in time .(c) and (d) are ground truth disparity maps.(e) and (f) are left and right image in time  + 1. (g) is ground truth of optical flow in horizontal direction.(h) is ground truth of optical flow in vertical direction.

Figure 6 :
Figure 6: Comparison of our U component and Huguet and Devernay's [5].(a) is the U component of Huguet and Devernay.(b) is our U component.

Figure 7 :
Figure 7: Rotating sphere which consists of the hemispheres in opposite directions.(a) and (b) are left and right images in time.(e) and (f) are left and right images in time.(c), (d), (g), and (h) are ground truth of  0 ,   , , and V, respectively.

Figure 8 :
Figure 8: Rotating sphere which consists of the hemispheres in opposite directions.(a) and (b) are  0 ,   computed by our method.(c) and (d) are , V computed by our method.

Table 1 :
Comparison of scene flow on Middlebury data set.
and Devernay and Basha et al. 's method.This benefits from proposed smooth term based on adaptive anisotropic total variational method.