Epipolar Plane Image Rectification and Flat Surface Detection in Light Field

Flat surface detection is one of the most common geometry inferences in computer vision. In this paper we propose detecting printed photos from original scenes, which fully exploit angular information of light field and characteristics of the flat surface. Unlike previous methods, our method does not need a prior depth estimation.The algorithm rectifies the mess epipolar lines in the epipolar plane image (EPI).Then feature points are extracted from light field data and used to compute an energy ratio in the depth distribution of the scene. Based on the energy ratio, a feature vector is constructed and we obtain robust detection of flat surface. Apart from flat surface detection, the kernel rectification algorithm in our method can be expanded to inclined plane refocusing and continuous depth estimation for flat surface. Experiments on the public datasets and our collections have demonstrated the effectiveness of the proposed method.


Introduction
With the rapid development of light field theory [1,2], light field cameras such as Lytro [3] and Raytrix [4] are now available for consumer and industrial use.Different from 2D image captured by traditional camera, light field camera records extra angular information of the real world and it provides more possibilities for many traditional computer vision tasks [5][6][7][8].
Flat surface detection is such a prominent task to make planar structure inference from natural scenes.One potential application of this issue is detection of printed photo in facebased verification [9].Face identification has been widely applied in industrial world.However, a common problem of such system suffers whether the face is a real one or just a printed photo of the authorized face.The main reason is the loss of depth information when the camera records the real world.Traditional methods always assume that the printed faces contain detectable texture patterns or require a user interaction to solve this problem [10].However, these methods are unreliable or inefficient.Depth estimation [11] can be another option before the authentication, but it may bring other problems in depth estimation such as occlusion [12] and shading [13].
In this paper, we analyze the variant and invariant features of flat surface in EPI representation and propose an algorithm to detect the flat surface without depth estimation, which fully exploits angular information of the light field and the characteristics of flat surface.Our main contributions are as follows: (i) An algorithm to rectify EPI for a flat surface, which can also be expanded to other tasks such as refocusing in an inclined plane.
(ii) A framework to detect the flat surface in light field by a two-stage algorithm without depth estimation.
The rest of this paper is organized as follows.In Section 2, we review the background and the previous works about flat surface detection.In Section 3, the detailed algorithm is described.And we give the experimental results in Section 4. Our conclusion and future work are arranged in Section 5. Some proofs of related properties are provided in Appendix.

Background and Related Work
A light field is a function defined in 4D space named (, , , V) [1] to describe light rays in physical world, where (, ) and (, V) describe the distribution of the light in spatial and angular space, respectively.Under the two-plane parametrization, when we fix one spatial dimension and one angular dimension ( * ,  * ) (or ( * , V * )), the EPI appears.For each point in the real world, there is a corresponding epipolar line in the EPI, and the slope of this line has a linear relationship with the depth of this point.
As a basic problem in computer vision, flat surface detection has been researched for decades but it is not well developed.Most of techniques detect flat surface using a prior depth estimation.Zhao et al. [11] proposed to detect flat surface using the disparity map of the scene; however this method depends on the accuracy of disparity estimation and is sensitive to the alignment errors of disparity estimation.Raghavendra et al. [15] proposed a similar strategy.Instead of the disparity map used in [11], they proposed to obtain a rough depth map by using the focus measure.Undoubtedly, the method also suffers the same problem, that is, inaccuracy of disparity estimation.Ghasemi and Vetterli [14] proposed to extract energy feature vector based on the change of gradient of EPI and to distinguish the flat surface from nonflat one by using a Bayes classifier.The method computes the slope of all epipolar lines and then takes the variance; it is still a depthbased method.
Different from the traditional "depth map to plane fitting" strategy [11,15,16], we detect the Lisad-1 feature point [17] in 4D light field and then fit the function of the flat surface by using several feature points in light field directly; finally the robustness of the function which we build is tested by another several feature points.If the scene is a flat surface, the function which is built from several feature points should also be suitable for other points and vice versa.

The Proposed Approach for Detecting Flat Surface
It is well observed that, for a flat surface which is parallel to the camera plane, all epipolar lines in the EPI have the same slope since they are in the same depth.However, this invariant property went when the flat surface is tilted to different angles.By analyzing the properties of flat surface function, it is noticed that no matter what angle the flat surface is tilted to, the difference of slope stays the same (this property is discussed in Section 3.3 and its proof can be found in Appendix).Based on this invariant property, we first propose to rectify the slope of epipolar lines in EPI into a same value  0 and then to detect the Lisad-1 feature points [17] of the rectified EPI (the most important advantage of the Lisad-1 feature points is that it provides the slope of each feature point, and the extraction of their depth does not suffer any occlusion or shading problems as they are salient points).The slope of each feature point ought to be equal to  0 if the plane function is true.Finally, we combine the energy ratio from different EPIs as an energy vector and employ a classifier to distinguish flat surface from natural scenes.image coordinates.When the point is in an EPI with a fixed  * , its depth can be expressed by a linear function only on  (Figure 1),

Epipolar Plane Image
where () is the depth of the point (,  * ).And we can derive the slope of each point by where dis() is the disparity of the point (,  * ), () is the slope of the point  in EPI with the fixed  * ,  is the focal length of the lens, and  is the baseline between two views.As the camera parameters  and  are constants, they are ignored.Equation ( 2) can be rewritten as If the linear function is determined, the slope of each point in EPI can be known and the slope of epipolar lines in EPI can be normalize into a same value.Suppose one point (, ) in the original EPI; its slope is ( * ), where  * satisfies the following function: If the slope we hope to normalize to is  0 (it is called target slope later), the target point of shearing (, ) is (  , ), where   meets the following function: In a brief summary, for a point (  , ) in the rectified EPI, the corresponding point in the original EPI is We can refer to Figure 2 to understand the procedure of rectification.And Figure 3 gives an example of the original EPI and the rectified EPI.We can see that only one slope is remained in EPI after the rectification.
By solving these constraints, the searching space of (, ) is obtained (it is labeled with red in Figure 4).Then (, ) is generated by dividing the searching space into discrete grids.We rectify the EPI by using each possible combination of (, ).A Lisad-1 feature [17] is the local extrema in scale-depth spaces by convolving the EPIs with scale variation kernels.As the Lisad-1 feature point provides the slope information, the feature points are extracted and the ratio  of the feature points which have the same slope with the target slope  0 in all feature points is computed.
where  is the set of the Lisad-1 feature points extracted from the rectified EPI.|| is the size of the set . (  ) is the slope of the th feature point.For a flat surface, if (, ) is known, all epipolar lines should have the same slope value with the target slope  0 (see Figure 3) after rectification.So we select the optimal (, ) which results in the largest ratio .

The Proposed Strategy.
Practically, we can not use the ratio of only one EPI to detect the flat surface since it is too regional to represent the whole scene.We take the following two useful properties into consideration to solve the problem: (i) The value of  should be a constant with different fixed  * if the scene is a flat surface.
(ii) The value of  should have a linear relationship with the variable  if the scene is a flat surface.
These two properties are obvious and can be proved easily (see Appendix).With these two properties, we formulate our strategy as a plane fitting stage and a feature extracting stage.
In the plane fitting stage, the plane function (the common  and the function of  with ) is fitted by using a series of parameters (, ) calculated from several EPIs.And in the feature extracting stage, the parameters (, ) of each EPI are computed by using the function of the plane.If the scene is a flat surface, the plane function that we build from previous EPIs should also be suitable for other EPIs; that is to say the slope value in all rectified EPIs should be equal to the value  0 and vice versa.This is the core idea of our strategy.The detailed description can be seen in Algorithm 1.

Expand to Inclined Plane Refocus.
The traditional method shears the EPI [3,18] to achieve refocusing; the displacement of each point is the same value as the plane we hope to refocus on is a plane which is parallel to the camera plane, in which all points in the plane are in a same depth.
where  0 denotes the input EPI and   denotes the sheared EPI by a value of .
Under the framework of our algorithm, we can obtain the line function of each EPI after the plane fitting stage, and where  0 denotes the input EPI and  , denotes the sheared EPI by two parameters (, ) of the line function.
In other words, we just need to set the target slope  0 as +∞, in which the epipolar line is perpendicular to the horizontal axis.Two refocus results of our method can be seen in Figure 5; as the data captured by us is a 3D light field (1 angular dimension), there is only defocus blur in horizontal direction and no defocus blur in vertical blur.

Depth Estimation for Inclined Plane.
Similarly, if the scene is a flat surface, we can estimate its depth with the common  and the function of  obtained from the plane fitting stage.The detailed description is as follows: (i) Calculate the parameters (, ) of the line function for each EPI.
(ii) Calculate the disparity of each point in EPI according to We detect the flat surface by using a small set of EPIs, and we fit the function of this plane at the same time.With the function of the flat surface, the depth map can be obtained by substituting the coordinate into the function of the plane.Different from the traditional multilabeling methods [19], our depth estimation results are continuous since we know the function of the flat surface.Two of our results are shown in Figure 6.

Experimental Setup.
To better analyze the performance of our algorithm, we select two different datasets.The HCI light field dataset [20] and its printed edition are selected to analyze the properties of energy ratio.It is noticed that the printed photos are tilted to different degrees in order to better evaluate our algorithm.The experimental environment and printed light fields can be seen in Figure 7. Apart from this, the EPFL light field dataset [14] is selected to do a comparison with the previous work.As this dataset is captured by a Lytro camera, the experimental results on this dataset can better reflect the pros and cons of the algorithm.We implement the algorithm in the Matlab 2014b, on OS X 10.11.1 with 8 gigabytes of RAM and 2.7 GHz of processor.The running time of our implementation for a 9 × 9 × 768 × 768 × 3 light field is measured in seconds but does not excel the time complexity of [14].This time can be accelerated to microseconds by using GPU.
In the stage of determining line function, we divide the searching space of (, ) into an 11 × 11 grid.The target slope  0 we hope to normalize to is not important in our rectification; actually it can be an arbitrary value.In the plane fitting stage, we select 11 EPIs to obtain the function of the plane and select another 15 EPIs in the feature extracting stage.The SVM classifier is employed to distinguish the flat surface from the natural scenes.

Analysis of the Energy Ratio.
We compute the energy ratio for each EPI in the first dataset and plot their distribution in Figure 9.The horizontal axis is the line number, and the vertical axis is the energy ratio for each EPI.It can be seen that the energy ratio of natural scenes may reach a high value sometimes, but mostly it is very small and far away from the flat surface.Apart from this, the energy ratio distribution of  flat surface is not only large but also stable; in contrast, natural scenes do not meet these properties.It is noticed that there are always some EPIs which have large energy ratios in original scenes (the first half of the blue curve in Figure 8).The main reason for this phenomenon is that these EPIs are selected to fit the plane function, and the fitted plane function is more suitable for these areas.However, it may not be suitable for other EPIs (the second half of the blue curve in Figure 8), and this is the reason why we select other EPIs to combine the feature vectors.

Further Experiments and Comparison.
In order to better evaluate the accuracy of our algorithm in real data captured by light field camera, we run our code in another public light field dataset captured by Ghasemi and Vetterli [14].The dataset consists of 50 light fields of printed photos and 50 light fields of natural scenes.
To test and verify our algorithm in a classification setting, we used a SVM model with cross validation [21].The results can be seen in Table 1.
Our detection precision is clearly superior to Ghasemi and Vetterli's method [14].This improvement is prominent especially in the detection of natural scenes, where 7 natural scenes are misclassified as flat surface in [14] and only 3 in our method.We further analyze these failed data and find that most textures of the scene lie on one continuous depth plane and there are a few textures on others.The feature points which come from this continuous plane play a more vital role and lead to a higher energy ratio.In Figure 8(a), most feature points lie on the cardboard and a few points on the foreground, the fitted plane is close to the cardboard plane, and it leads to high energy ratio (the blue curve in Figure 8(e)).Then for the flat surface, the value of wrong classified samples is 3 as well.By analyzing these samples, it is noticed that the number of feature points is too little to estimate the plane function accurately.In Figure 8(c), there are more feature points in the bottom of the scene and a few points in the top which lead to a wrong fitting of the parameters  in Section 3.3.Figures 8(e) and 8(f) show the energy ratio distribution of these failed samples.

Conclusion
In this paper, we propose an algorithm to rectify the EPI of a flat surface, which normalizes the mess slope of epipolar lines in EPI into a same slope.And this algorithm can be easily expanded to the inclined plane refocus and the continuous depth estimation for flat surface.Then, we propose a framework for flat surface detection, which learns a function of the flat surface by using several EPIs and tests this function by using another several EPIs.The results show that our algorithm performs well for most scenes, and the more complex the scene, the better the performance.
We may continue to study the limits of the algorithms, such as in terms of low texture scene which leads to the insufficient feature points.

Figure 1 :
Figure1: The relationship between the horizontal axis  in image and its depth is linear if the scene is a flat surface.

Figure 2 :Figure 3 :
Figure 2: The relationship between two points in the original EPI and the rectified EPI.The black line refers to the location of the epipolar line in the original EPI, and the red dashed line refers to the location of the epipolar line in the rectified EPI.After the rectification, the slope of the red dashed line is the same as the slope of the right black line.

Figure 4 :( 1 ) 2 ( 2 )Algorithm 1 :
Figure 4: With the 4 constraints mentioned in (7), the solution space of the parameters (, ) is obtained and it is labeled with red.

Figure 5 :Figure 6 :
Figure 5: (a) is the refocus result by using the correct inclined plane parameters, and it is an all-in-focus image.(b) is the result by using wrong parameters, and it is noticed that there is only defocus blur in horizontal direction.

7 :
The left top one is our experimental environment.Others are printed light fields.Note that the scene in the left top is just one of the different situations, and our datasets contain 5 different tilt angles.

Figure 8 :
Figure 8: (a)-(d) are four failed samples of our algorithm, where (a) and (b) are misclassified as flat surface, and (c) and (d) are misclassified as natural scenes.The green points are the Lisad-1 points.(e) and (f) are the corresponding distributions of energy ratio.

Table 1 :
The detection accuracy comparison.