Projective Invariants from Multiple Images : A Direct and Linear Method

The projective reconstruction of 3D structures from 2D images is a central problem in computer vision. Existing methods for this problem are usually nonlinear or indirect. In the previous direct methods, we usually have to solve a system of nonlinear equations. They are very complicated and hard to implement. The previous linear indirect methods are usually imprecise. This paper presents a linear and direct method to derive projective structures of 3D points from their 2D images. Algorithms to compute projective invariants from two images, three images, and four images are given. The method is clear, simple, and easy to implement. For the first time in the literature, we present explicit linear formulas to solve this problem.Mathematica codes are provided to demonstrate the correctness of the formulas.


Introduction
The recovery of the geometric structure of 3D points from their 2D projection images is fundamental in computer vision.It is known that the structure of a 3D point set cannot be recovered from a single image generally [1].When two or more images are available, the 3D point structure of a scene can be recovered up to an unknown projective transformation.The projective reconstruction of camera parameters and 3D scene structure from multiple uncalibrated views is also called projective structure from motion [1][2][3][4][5][6].The problem is well studied after decades of research.However, an ultimate solution for this problem is pending.
There are mainly three types of methods to solve the structure from motion problem.The first type of methods computes the projective invariants of the 3D points.Groups of researchers studied differently the problem of computing 3D projective invariants of a point set from its 2D images [7][8][9][10].Previous methods to compute 3D projective invariants of six 3D points from three uncalibrated images can be found in [7,9].Previous method to compute 3D projective invariants of seven 3D points from two images can be found in [10].However, these methods are very complicated.Solutions of polynomial equations up to the eighth degree are needed.Wang et al. proposed an explicit method to derive projective invariants of six 3D points from three uncalibrated images [11].All these methods produce three possible solutions for the reconstruction problem.We need further information to select the unique solution.In the literature, it is not quite clear how to determine the unique solution.
There are methods to recover 3D shapes indirectly.Tensors of multiple images of the 3D scene are estimated first.A second-order tensor usually called the fundamental matrix captures the geometry between two views of a 3D scene.A third-order tensor usually called trifocal tensor captures the geometry among three views of a 3D scene.When these tensors of multiple views of a scene are known, there are many algorithms to recover the 3D geometric structure of the scene from them [12][13][14][15][16][17].There are two kinds of methods to estimate these tensors, nonlinear methods and linear methods.The problem of the nonlinear methods is that they produce multiple solutions.For example, the sevenpoint nonlinear method to estimate the fundamental matrix produces up to three solutions.The problem of the linear methods is that they do not produce the precise solution.For example, the eight-point linear algorithm to derive the fundamental matrix generally produces a matrix that does not satisfy the rank constraint.
There are methods to estimate the structure and motion through the projective factorization technique [4,[14][15][16].This 2 Mathematical Problems in Engineering technique organizes a set of constraints into a single 3 ×  matrix .When all the projective depths in  are known, it is possible to factor  into motion and shape matrices using the rank constraint.The general method to factor  is through singular value decomposition.Since there is no mathematical proof that the derived motion and shape matrices by this technique are the real motion and shape matrices, we do not discuss this type of methods further in this paper.
In the literature, there is a well-known result demonstrated by Carlsson and Weinshall that the projective reconstruction with  points and  images is equivalent to that with +4 points and −4 images.We would like to emphasize that, for this theorem to be true, the number of points should be no less than six, and the number of images should be no less than two.So the minimal number of point correspondences for projective reconstruction from two images is seven.The minimal number of point correspondences for projective reconstruction from three images is six.The minimal number of point correspondences for projective reconstruction from four images is also six.It is generally impossible to projectively recover a 3D structure from less than six point correspondences no matter how many images are available.
For the fundamental projective reconstruction problem, it is generally accepted that we need to consider only the cases of two views, three views, and four views.While quadrifocal tensor is the most complicated and controversial tensor in multiple view geometry, we will demonstrate that the configuration of six points and four views is the most natural configuration for deriving 3D projective invariants.
This paper presents linear methods for computing projective invariants of 3D points from their 2D images directly.A 3D point structure can be configured by first choosing four reference points as a basis and then representing the other points under this basis.The cross ratios of the coordinates of the other points under this basis are projective invariant.Systems of bilinear equations are derived then.Traditional methods to solve nonlinear multivariable equations are very complicated.The main contribution of this paper is that we will show that these systems of equations can be easily transformed into systems of linear equations.We present the solutions of the systems of linear equations in the explicit form.
The rest of the paper is organized as follows.First, we review some of the previous works.In Section 3, we define the form of the 3D projective invariants and derive the basic relations of projective invariants among multiple views.Next, we present a linear method to compute 3D projective invariants of six points from four images.In Section 5, we present a linear method to compute 3D projective invariants of seven points from three images.In Section 6, we present a linear method to compute 3D projective invariants of eight points from two images.Final section is Conclusion.We present mathematica codes to demonstrate the correctness of the method in Appendix.

Previous Works
We review a few related works in this section.
A camera is a device that transforms properties of a 3D scene onto an image plane.A pinhole camera model is used to represent the linear projection from 3D space onto each image plane.In this paper, 3D world points are represented by homogeneous 4-vector   = (  ,   ,   , 1)  .The projection of the th 3D point is represented by a homogeneous 3-vector   = (  , V  , 1)  .The relationships among the 3D points   and their 2D projections are where   is the projection matrix (which is 3 × 4 and is also called the camera matrix) of the th camera,    is a nonzero scale factor called projective depth, and    is the th projection of the th 3D point.Suppose that  perspective images of a set of  3D points are given.The structure and motion problem is to recover the 3D point locations and camera parameters from the image measurements.When the cameras are uncalibrated and no additional geometric information of the point set is available, the reconstruction is determined only up to an unknown projective transformation.For any 3D projective transformation matrix ,    −1 and   produce an equally valid reconstruction.
The projective geometry between two views of a 3D scene is completely captured by the epipolar geometry.Let  1 and  2 be images of a 3D point  observed by two cameras with optical centers  1 and  2 .The epipolar constraint says that if  1 and  2 are images of the same 3D point , then  2 must lie on the epipolar line associated with  1 .That is, where  is a 3 × 3 matrix called the fundamental matrix.The fundamental matrix is of rank two and is defined up to a scalar factor.It encodes all the geometric information among two views when no additional information is available.Numerous algorithms are designed to estimate this matrix.The most famous algorithms are the linear eightpoint algorithm and the nonlinear seven-point algorithm [1,3,13,17,18].The input to those methods is a set of point correspondences between the two images.The eight-point algorithm is simple, fast, and easy to implement.However, the fundamental matrix estimated by the eight-point algorithm is usually full rank.
Hartley proposed a method to recover the 3D scene from the fundamental matrix [12].Two camera matrices  1 and  2 with different projection centers uniquely determine the fundamental matrix .On the other hand, the camera matrices  1 and  2 are not uniquely determined by the fundamental matrix .If the fundamental matrix is factored into then a realization of the fundamental matrix  is where  is 3×3 identity matrix,  is a 3×3 nonsingular matrix, and ) . ( The 3D scene point  is then determined by the two camera matrices,  1 and  2 , and the two projections of ,  1 and  2 .
Quan proposed an algorithm to compute projective invariants of six 3D points from three projection images [9].Given any six 3D points, the author selected five points as the standard projective basis.The six unknown points in 3D space are projective equivalent to the following normalized points: It is also noticed that Since six 3D points have 18 degrees of freedom and a 3D projective transformation has 15 degrees of freedom, six points in 3D space can have 3 independent projective invariants.There are many forms of projective invariants.It is noticed that the ratios of , , , and  in ( 6) are projective invariant.The three independent such invariants can be So the goal is to compute these unknown 3D projective invariants from three of the 2D images.
Quan tried to solve the system of bilinear equations (8) using the classical resultant technique.After eliminating the variable , he obtained two homogeneous polynomial equations of the third degree in three variables: Eliminating  again will result in a homogeneous polynomial equation in  and  of degree eight.After that, a third degree polynomial equation can be derived numerically through polynomial factorization of the following form: Heyden presented a similar method to compute projective invariants of six 3D points from three views [7].
As we can see from the procedure described above, the method proposed by Quan is hard to implement by ordinary users and inconvenient for real applications.In [11], Wang et al. proposed a method to eliminate variables  and  in a single step.A third degree polynomial equation in a single variable  was given explicitly.
There are also methods to compute projective invariants of 3D points from two view images [10].
In the literature, it is generally noted that the minimal number of point correspondences needed for projective reconstruction from two images is seven.The minimal number of points needed for projective reconstruction from three images is six.This does not mean that we can obtain a definite reconstruction from the minimal number of points only.More points are needed to get a unique solution.

Relations of Projective Invariants among Multiple Views
In this section, we will first define the form of the 3D projective invariants.We then derive the basic relations of projective invariants among multiple views.
Suppose that a set of  3D points labeled   ,  = 1, . . ., , is given.The geometric structure of it is unknown.The point set is projected into view images by  unknown camera matrices   ,  = 1, . . ., .The relationships between them are where  = 1, . . .,  and  = 1, . . ., .The only information available is the point locations in the images and point correspondences between the projections where  = 1, . . .,  and ,  = 1, . . ., .It is often supposed that no four points in space are coplanar and no three points in the images are collinear.Otherwise, the problem is much simpler.We can select  1 ,  2 ,  3 , and  4 as a basis of the vector space.Other points can be represented as the linear combinations of  1 ,  2 ,  3 , and  4 : where  = 5, . . ., .Since points  1 ,  2 ,  3 , and  4 are linearly independent, this representation is unique.Since no four points are coplanar, all the coefficients in ( 16) are nonzero.Six 3D points in general position have 18 degrees of freedom.Seven 3D points in general position have 21 degrees of freedom.Eight 3D points in general position have 24 degrees of freedom.The 3D projective transformation has 15 degrees of freedom.So six 3D points have three independent projective invariants, seven 3D points have six independent projective invariants, and eight 3D points have nine independent projective invariants.There are many forms of projective invariants.It is known that the cross ratios of coefficients in ( 16) are projective invariant.A set of independent invariants of this form are In the rest of this paper, the symbols   ,  = 1, . . ., 9, will always denote these invariants.Since all the coefficients in ( 16) are nonzero, all the invariants in ( 17) cannot be zero.
The set of projective invariants have the property that when an invariant equals one, four of the 3D points are coplanar.This can be proved easily.For example, if  1 = 1, then  1  5  2 6 =  1 6  2 5 .From ( 16) we have Subtracting ( 18) from ( 19), we get Since  1 5 and  1  6 are not zero, we have a nontrivial linear combination of points  3 ,  4 ,  5 , and  6 .So they are coplanar.On the other hand, if points  3 ,  4 ,  5 , and  6 are coplanar, there are  3 ,  4 ,  5 ,  6 such that Substituting  5 and  6 using ( 16) into (21), we obtain Since points  1 ,  2 ,  3 , and  4 are not coplanar, we have From ( 23) we obtain Next, we will derive the basic relations of projective invariants among multiple views.Multiplying each side of ( 16) by the projection matrices   , we have where  = 5, . . .,  and  = 1, . . ., .That is, where  = 5, . . .,  and  = 1, . . ., .Applying variable eliminations to (26), we get Dividing each side of (27) by  1    1 , we have Rewriting (29) in another form, we obtain where (31) Since the system of equations in (30) has a nontrivial solution (1,   ,   ,   ), the determinants of every four rows of the coefficients matrix in (30) must be rank deficient.We will use these constraints to derive the 3D projective invariants.

Projective Invariants from Four Views
In [9], the author notes that it is possible to compute projective invariants of six 3D points from five images linearly.
In this section, we will derive formulas to compute the 3D projective invariants of six 3D points from four images linearly.The result was first presented in [19].
In the case of four images and six points, from ( 30 where  = 1, 2, 3, 4. Let  denote the coefficients matrix of the system of equations in (33).It is a 4 × 6 matrix.The first column of  corresponds to the coefficients of  1 , the second column of  corresponds to the coefficients of  2 , and so forth.Let   denote the th column of the matrix ,  = 1, . . ., 6.It is checked that Although  1 = 0 or  2 = 0 or  3 = 0 is possible solution of the system of bilinear equations (33), we discard these solutions since the invariants cannot be zero by definition.
Next, we will derive the solutions of (33) such that   is not zero, where  = 1, 2, 3. Rewriting (33) in another form, we have From ( 35), we can obtain This is a second-degree polynomial equation in variable  1 .
Applying constraints (34) to (36), we obtain The solutions of (37) are  1 = 1 and The solution  1 = 1 corresponds to the condition that four of the 3D points are coplanar.So it is discarded if we assume that no four of the 3D points are coplanar.Similarly, we can obtain the solutions of  2 and  3 linearly.The solution of  2 is The solution of  3 is As we can see from the previous derivation, four images of six 3D points are the simplest configuration to compute 3D projective invariants.On the contrary, it is very hard to estimate the quadrifocal tensor of four images.It requires the solution of a system of 81 multilinear equations.

Projective Invariants from Three Views
In this section, we will derive formulas to compute the 3D projective invariants of seven 3D points from three images linearly.To our knowledge, there is no similar method reported.There are nonlinear methods to compute the 3D projective invariants of six 3D points from three images [7,9].
Since the system of equations in (41) has a nontrivial solution (1,   ,   ,   ), the determinant of the coefficients matrices of every four equations in (41) must be zero.From these constraints, we obtain the following system of equations: where  = 1, 2, 3.
The total number of equations in (42) is 12.We choose the first ten equations as the system of equations to compute the 3D projective invariants.Let  denote the coefficients matrix of this system of equations.It is a 10 × 12 matrix.The first column of  corresponds to the coefficients of  1 , the second column of  corresponds to the coefficients of  2 , and so forth.Let   denote the th column of the matrix ,  = 1, . . ., 12. Let Γ ,,, denote the submatrix of the matrix  with its th, th, th, and th columns deleted.It is checked that Let us denote Rewriting the system of (42) in the concise form, we have

Conclusion
We have presented a novel method to derive 3D projective invariants of 3D points from their 2D images.We have shown that, for two images, eight-point correspondences are needed to derive linearly the 3D projective invariants.For three images, seven-point correspondences are needed to derive linearly the 3D projective invariants.For four images, six point correspondences are needed to derive linearly the 3D projective invariants.This study gives deeper understanding of the structure and motion problem.We have known that it is very hard to estimate the quadrifocal tensor from four images.So it is a little surprise that the configuration of four images and six points is the most natural configuration to derive 3D projective invariants.The proposed method is clear and simple.They are easy to implement since explicit formulas are given.Future research is using the idea to obtain robust estimation of the invariants in the noisy situation.];

( 6 )
We then normalize the known point locations in the three images accordingly.They are corresponding to