The projective reconstruction of 3D structures from 2D images is a central problem in computer vision. Existing methods for this problem are usually nonlinear or indirect. In the previous direct methods, we usually have to solve a system of nonlinear equations. They are very complicated and hard to implement. The previous linear indirect methods are usually imprecise. This paper presents a linear and direct method to derive projective structures of 3D points from their 2D images. Algorithms to compute projective invariants from two images, three images, and four images are given. The method is clear, simple, and easy to implement. For the first time in the literature, we present explicit linear formulas to solve this problem.
The recovery of the geometric structure of 3D points from their 2D projection images is fundamental in computer vision. It is known that the structure of a 3D point set cannot be recovered from a single image generally [
There are mainly three types of methods to solve the structure from motion problem. The first type of methods computes the projective invariants of the 3D points. Groups of researchers studied differently the problem of computing 3D projective invariants of a point set from its 2D images [
There are methods to recover 3D shapes indirectly. Tensors of multiple images of the 3D scene are estimated first. A second-order tensor usually called the fundamental matrix captures the geometry between two views of a 3D scene. A third-order tensor usually called trifocal tensor captures the geometry among three views of a 3D scene. When these tensors of multiple views of a scene are known, there are many algorithms to recover the 3D geometric structure of the scene from them [
There are methods to estimate the structure and motion through the projective factorization technique [
In the literature, there is a well-known result demonstrated by Carlsson and Weinshall that the projective reconstruction with
For the fundamental projective reconstruction problem, it is generally accepted that we need to consider only the cases of two views, three views, and four views. While quadrifocal tensor is the most complicated and controversial tensor in multiple view geometry, we will demonstrate that the configuration of six points and four views is the most natural configuration for deriving 3D projective invariants.
This paper presents linear methods for computing projective invariants of 3D points from their 2D images directly. A 3D point structure can be configured by first choosing four reference points as a basis and then representing the other points under this basis. The cross ratios of the coordinates of the other points under this basis are projective invariant. Systems of bilinear equations are derived then. Traditional methods to solve nonlinear multivariable equations are very complicated. The main contribution of this paper is that we will show that these systems of equations can be easily transformed into systems of linear equations. We present the solutions of the systems of linear equations in the explicit form.
The rest of the paper is organized as follows. First, we review some of the previous works. In Section
We review a few related works in this section.
A camera is a device that transforms properties of a 3D scene onto an image plane. A pinhole camera model is used to represent the linear projection from 3D space onto each image plane. In this paper, 3D world points are represented by homogeneous 4-vector
The projective geometry between two views of a 3D scene is completely captured by the epipolar geometry. Let
The fundamental matrix is of rank two and is defined up to a scalar factor. It encodes all the geometric information among two views when no additional information is available. Numerous algorithms are designed to estimate this matrix. The most famous algorithms are the linear eight-point algorithm and the nonlinear seven-point algorithm [
Hartley proposed a method to recover the 3D scene from the fundamental matrix [
The 3D scene point
Quan proposed an algorithm to compute projective invariants of six 3D points from three projection images [
We then normalize the known point locations in the three images accordingly. They are corresponding to
From these relations, a homogeneous nonlinear equation of the form
Quan tried to solve the system of bilinear equations (
Heyden presented a similar method to compute projective invariants of six 3D points from three views [
As we can see from the procedure described above, the method proposed by Quan is hard to implement by ordinary users and inconvenient for real applications. In [
There are also methods to compute projective invariants of 3D points from two view images [
In the literature, it is generally noted that the minimal number of point correspondences needed for projective reconstruction from two images is seven. The minimal number of points needed for projective reconstruction from three images is six. This does not mean that we can obtain a definite reconstruction from the minimal number of points only. More points are needed to get a unique solution.
In this section, we will first define the form of the 3D projective invariants. We then derive the basic relations of projective invariants among multiple views.
Suppose that a set of
Six 3D points in general position have 18 degrees of freedom. Seven 3D points in general position have 21 degrees of freedom. Eight 3D points in general position have 24 degrees of freedom. The 3D projective transformation has 15 degrees of freedom. So six 3D points have three independent projective invariants, seven 3D points have six independent projective invariants, and eight 3D points have nine independent projective invariants. There are many forms of projective invariants. It is known that the cross ratios of coefficients in (
In the rest of this paper, the symbols
The set of projective invariants have the property that when an invariant equals one, four of the 3D points are coplanar. This can be proved easily. For example, if
In [
In the case of four images and six points, from (
Let
Although
Next, we will derive the solutions of (
Applying constraints (
The solution
The solution of
As we can see from the previous derivation, four images of six 3D points are the simplest configuration to compute 3D projective invariants. On the contrary, it is very hard to estimate the quadrifocal tensor of four images. It requires the solution of a system of 81 multilinear equations.
In this section, we will derive formulas to compute the 3D projective invariants of seven 3D points from three images linearly. To our knowledge, there is no similar method reported. There are nonlinear methods to compute the 3D projective invariants of six 3D points from three images [
In the case of three images and seven points, from (
Since the system of equations in (
The total number of equations in (
Rewriting the system of (
In this section, we will derive explicit formulas to compute the 3D projective invariants of eight 3D points from their two images linearly.
Since the system of equations in (
Let
Similarly, as in the previous section, we have the solutions of
We have presented a novel method to derive 3D projective invariants of 3D points from their 2D images. We have shown that, for two images, eight-point correspondences are needed to derive linearly the 3D projective invariants. For three images, seven-point correspondences are needed to derive linearly the 3D projective invariants. For four images, six point correspondences are needed to derive linearly the 3D projective invariants. This study gives deeper understanding of the structure and motion problem. We have known that it is very hard to estimate the quadrifocal tensor from four images. So it is a little surprise that the configuration of four images and six points is the most natural configuration to derive 3D projective invariants. The proposed method is clear and simple. They are easy to implement since explicit formulas are given. Future research is using the idea to obtain robust estimation of the invariants in the noisy situation.
We have tested the proposed projective invariants on the X = RandomReal M = RandomReal T = RandomReal x = RandomReal a = RandomReal b = RandomReal u = RandomReal v = RandomReal X[[1,4]] = 1; X[[2,4]] = 1; X[[3,4]] = 1; X[[4,4]] = 1; X[[5,4]] = 1; X[[6,4]] = 1; X[[7,4]] = 1; For[i = 1,i <= 3,i++,For[j = 1,j <= 7,j++, x[[i,j]] = M[[i]] . X[[j]]; u[[i,j]] = x[[i,j,1]]/x[[i,j,3]]; v[[i,j]] = x[[i,j,2]]/x[[i,j,3]]; ]]; For[i = 1,i <= 3,i++,For[j = 5,j <= 7,j++, a[[i,j,1]] = u[[i,1]] - u[[i,j]]; a[[i,j,2]] = u[[i,2]] - u[[i,j]]; a[[i,j,3]] = u[[i,3]] - u[[i,j]]; a[[i,j,4]] = u[[i,4]] - u[[i,j]]; b[[i,j,1]] = v[[i,1]] - v[[i,j]]; b[[i,j,2]] = v[[i,2]] - v[[i,j]]; b[[i,j,3]] = v[[i,3]] - v[[i,j]]; b[[i,j,4]] = v[[i,4]] - v[[i,j]]; ]]; XT = Transpose[ AA = LinearSolve[XT,X[[5]]]; BB = LinearSolve[XT,X[[6]]]; CC = LinearSolve[XT,X[[7]]]; Inv1 = (AA[[1]] BB[[2]])/(AA[[2]] BB[[1]]); Inv2 = (AA[[1]] BB[[3]])/(AA[[3]] BB[[1]]); Inv3 = (AA[[1]] BB[[4]])/(AA[[4]] BB[[1]]); Inv4 = (AA[[1]] CC[[2]])/(AA[[2]] CC[[1]]); Inv5 = (AA[[1]] CC[[3]])/(AA[[3]] CC[[1]]); Inv6 = (AA[[1]] CC[[4]])/(AA[[4]] CC[[1]]); Print[''The six invariants computed from 3D point locations: '',
Inv1,'';'',Inv2,'';'',Inv3,''; '',Inv4,'';'',Inv5,'';'',Inv6]; For[i = 1,i <= 3,i++, T[[i,1]] = (a[[i,5,4]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,4]])a[[i,6,2]] a[[i,7,1]], (a[[i,5,2]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,2]])a[[i,6,3]] a[[i,7,1]], (a[[i,5,3]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,3]])a[[i,6,4]] a[[i,7,1]], (a[[i,5,3]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,3]])a[[i,6,1]] a[[i,7,2]], (a[[i,5,4]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,4]])a[[i,6,1]] a[[i,7,3]], (a[[i,5,2]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,2]])a[[i,6,1]] a[[i,7,4]], (a[[i,5,1]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,1]])a[[i,6,2]] a[[i,7,3]], (a[[i,5,3]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,3]])a[[i,6,2]] a[[i,7,4]], (a[[i,5,4]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,4]])a[[i,6,3]] a[[i,7,2]], (a[[i,5,1]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,1]])a[[i,6,3]] a[[i,7,4]], (a[[i,5,1]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,1]])a[[i,6,4]] a[[i,7,2]], (a[[i,5,2]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,2]])a[[i,6,4]] a[[i,7,3]]
T[[i,2]] = (a[[i,5,4]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,4]])a[[i,6,2]] b[[i,7,1]], (a[[i,5,2]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,2]])a[[i,6,3]] b[[i,7,1]], (a[[i,5,3]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,3]])a[[i,6,4]] b[[i,7,1]], (a[[i,5,3]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,3]])a[[i,6,1]] b[[i,7,2]], (a[[i,5,4]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,4]])a[[i,6,1]] b[[i,7,3]], (a[[i,5,2]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,2]])a[[i,6,1]] b[[i,7,4]], (a[[i,5,1]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,1]])a[[i,6,2]] b[[i,7,3]], (a[[i,5,3]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,3]])a[[i,6,2]] b[[i,7,4]], (a[[i,5,4]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,4]])a[[i,6,3]] b[[i,7,2]], (a[[i,5,1]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,1]])a[[i,6,3]] b[[i,7,4]], (a[[i,5,1]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,1]])a[[i,6,4]] b[[i,7,2]], (a[[i,5,2]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,2]])a[[i,6,4]] b[[i,7,3]]
T[[i,3]] = (a[[i,5,4]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,4]])b[[i,6,2]] a[[i,7,1]], (a[[i,5,2]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,2]])b[[i,6,3]] a[[i,7,1]], (a[[i,5,3]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,3]])b[[i,6,4]] a[[i,7,1]], (a[[i,5,3]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,3]])b[[i,6,1]] a[[i,7,2]], (a[[i,5,4]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,4]])b[[i,6,1]] a[[i,7,3]], (a[[i,5,2]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,2]])b[[i,6,1]] a[[i,7,4]], (a[[i,5,1]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,1]])b[[i,6,2]] a[[i,7,3]], (a[[i,5,3]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,3]])b[[i,6,2]] a[[i,7,4]], (a[[i,5,4]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,4]])b[[i,6,3]] a[[i,7,2]], (a[[i,5,1]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,1]])b[[i,6,3]] a[[i,7,4]], (a[[i,5,1]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,1]])b[[i,6,4]] a[[i,7,2]], (a[[i,5,2]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,2]])b[[i,6,4]] a[[i,7,3]]
T[[i,4]] = (a[[i,5,4]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,4]])b[[i,6,2]] b[[i,7,1]], (a[[i,5,2]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,2]])b[[i,6,3]] b[[i,7,1]], (a[[i,5,3]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,3]])b[[i,6,4]] b[[i,7,1]], (a[[i,5,3]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,3]])b[[i,6,1]] b[[i,7,2]], (a[[i,5,4]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,4]])b[[i,6,1]] b[[i,7,3]], (a[[i,5,2]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,2]])b[[i,6,1]] b[[i,7,4]], (a[[i,5,1]] b[[i,5,4]] - a[[i,5,4]] b[[i,5,1]])b[[i,6,2]] b[[i,7,3]], (a[[i,5,3]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,3]])b[[i,6,2]] b[[i,7,4]], (a[[i,5,4]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,4]])b[[i,6,3]] b[[i,7,2]], (a[[i,5,1]] b[[i,5,2]] - a[[i,5,2]] b[[i,5,1]])b[[i,6,3]] b[[i,7,4]], (a[[i,5,1]] b[[i,5,3]] - a[[i,5,3]] b[[i,5,1]])b[[i,6,4]] b[[i,7,2]], (a[[i,5,2]] b[[i,5,1]] - a[[i,5,1]] b[[i,5,2]])b[[i,6,4]] b[[i,7,3]]
]; CoF =
T[[2,2]],T[[2,3]],T[[2,4]],T[[3,1]],T[[3,2]] CoF = Transpose[CoF]; I1 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,7]+Part[CoF,8]],Part[CoF,6]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,8]],Part[CoF,7]]]; I2 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,10]+Part[CoF,9]], Part[CoF,6]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,10]],Part[CoF,9]]]; I3 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,12]+Part[CoF,11]], Part[CoF,5]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,12]],Part[CoF,11]]]; I4 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,11]+Part[CoF,9]], Part[CoF,3]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,11]],Part[CoF,9]]]; I5 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,12]+Part[CoF,7]], Part[CoF,3]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,12]],Part[CoF,7]]]; I6 = Det[Prepend[Prepend[Delete[CoF,
Part[CoF,10]+Part[CoF,8]], Part[CoF,2]]]/ Det[Prepend[Prepend[Delete[CoF, Part[CoF,10]],Part[CoF,8]]]; Print[''The six invariants computed from 2D projections: '',
I1,'';'',I2,'';'',I3,'';'', I4,'';'',I5,'';'',I6];
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Science Foundation for Distinguished Young Scholars of China under Grants no. 61225012 and no. 71325002; the National Natural Science Foundation of China under Grant no. 61572123 and Grant no. 61572117; and Liaoning BaiQianWan Talents Program under Grant no. 2013921068.