A Flexible Online Camera Calibration Using Line Segments

In order to make the general user take vision tasks more flexibly and easily, this paper proposes a new solution for the problem of camera calibration from correspondences between model lines and their noisy image lines in multiple images. In the proposed method the common planar items in hand with the standard size and structure are utilized as the calibration objects.The proposed method consists of a closed-form solution based on homography optimization, followed by a nonlinear refinement based on the maximum likelihood approach. To automatically recover the camera parameters linearly, we present a robust homography optimization method based on the edge model by redesigning the classic 3D tracking approach. In the nonlinear refinement procedure, the uncertainty of the image line segment is encoded in the error model, taking the finite nature of the observations into account. By developing the new error model between themodel line and image line segment, the problem of the camera calibration is expressed in the probabilistic formulation. Simulation data is used to compare this method with the widely used planar pattern based method. Actual image sequences are also utilized to demonstrate the effectiveness and flexibility of the proposed method.


Introduction
Camera calibration has always been an important issue in the field of computer vision, since it is a necessary step to extract metric information from 2D images.The goal of the camera calibration is to recover the mapping between the 3D space and the image plane, which can be separated into two sets of transformations.The first transformation is mapping of the 3D points in the scene to the 3D coordinates in the camera frame, which is described by the extrinsic parameters of the camera model.The second one involves mapping of the 3D points in the camera frame to the 2D coordinates in the image plane.This mapping is described by the intrinsic parameters which models the geometry and optical features of the camera.In general case, these two transformations can be expressed by the ideal pin-hole camera model.
Up to now, much work for camera calibration has been done to accommodate various applications.Those approaches can be roughly grouped into two categories according to whether requiring a calibration object.This first type of camera calibration methods is named as metric calibration, which resolves the camera model with the help of metric information of a reference object.Camera calibration is performed by observing a calibration object whose geometry dimension is known with very high precision.The calibration object can be 3D object with several planes orthogonal to each other [1,2].Sometimes a 2D plane undergoing a precisely known translation [3] or free movement [4] is utilized.Recently, a 1D temple [5][6][7][8] is used with three or more markers for camera calibration.In [6], it was proved that the 1D object undergoing a planar motion was essentially equivalent to the 2D planar object.For such type of methods, calibration can be done very efficiently and accurately.However, a calibration pattern also needs to be prepared, though in [4] the setup is very easy and only a planar object attached with the chessboard is utilized.Another type of camera calibration methods is called selfcalibration which does not use any metric information from the scene or any calibration object.Such methods are also considered as 0D approach for only image feature correspondences are required.Since two constraints on the intrinsic parameter of the camera can be provided by using image information alone [9], three images are sufficient to recover the camera parameters including the internal and external parameters and reconstruct the 3D structure of the scene up to similarity [10,11].The problem of such methods is that a large number of parameters need to be estimated, resulting in very unstable solution.If the camera rotation is known, more stable and accurate results can be obtained [12,13].However, it is not always easy to get the camera rotation with very high accuracy.In general, metric calibration methods can provide better results than self-calibration methods [14].Our current research is focused on smartphone vision system since the potential for using such system is large.Smartphones are now becoming ubiquitous and popular in our daily life.To make the general public who are not experts in computer vision do vision tasks easily, the setup of camera calibrate should be flexible enough.The method developed in [4] is considered as the most flexible technique; however, when the orientation of the model plane with respect to image plane is increasing, foreshortening will make the corner detection less precise and even fail.Moreover, the planer pattern should be prepared, which is still inconvenient for general user of smartphone.Therefore, it would be best to utilize the handy item as the calibration object.The camera calibration technique described in this paper was designed with these considerations in mind.Compared with the classical techniques, the proposed technique does not need to prepare the planer pattern and is considerably more flexible.The calibration objects employed by the proposed method are common and handy in our daily life such as an A4 paper or even a standard IC card.
Our approach exploits the line/edge features of the handy objects to calibrate both the internal and external parameters of the camera, since they provide a large degree of stability to illumination and viewpoint changes and offer some resilience to hash imaging conditions such as noise and blur.A first challenge of the solution proposed in this paper is to automatically estimate the homography and establish the correspondences between model and image features.In this sense, we redesigned the model based tracking method [15][16][17][18] to robustly estimate homography for the common planar object in the clutter scene.An advantage of such methods is handling the occlusion, large illumination, and viewpoint change.With a series of homography from the planar object to the image plane, the initial camera parameters can be solved linearly.A second challenge is to optimize the camera parameters by developing effective object function and by making full use of the finite nature of the observation extracted in the images.In this paper, the error function for the model and image line, which encodes the length of the image line segment and the information of the midpoint, is derived from the noisy image edge points in the least square approach.
The remainder of the paper is organized as follows.Section 2 gives the procedure of the proposed camera calibration algorithm.Section 3 presents an overview of the redesigned homography tracking method based on edge model.Section 4 derives the error model between image and model lines and expresses the problem of the camera calibration in the probabilistic formulation by the maximum likelihood approach.Section 5 details how to solve the problem of camera calibration by the nonlinear technique.Some experiment results are given in Section 6.

Algorithm
The proposed algorithm is summarized in this section.
Step 1. Optimize the homography between the model plane and image plane according to our model based homography tracking approach.
Step 2. Fit the image line segment from the image edge points obtained by 1D search along the normal direction of the corresponding model line.
Step 3. Calculate the initial camera parameters linearly with a series of homography matrices (more than three orientations).
Step 4. Estimate the camera parameters by minimizing the sum of the distance between the finite image line segments and the model lines in the maximum likelihood approach.

Model Based Homography Tracking
As can be seen in Figure 1, the 2D model edge is projected to the image plane using the prior homography of the planar object.Instead of tackling the line segment itself, we sample the projected line segment (black solid line in Figure 1) with a series of points (brown points in Figure 1).Then the visibility test for each of the sample points is performed, since some of these sample points may be out of the camera's view field.For each of the visible sample points, 1D search along the normal direction of the projected model line is employed to find the edge point with the strongest gradient or closest location as its correspondence.Finally, the sum of the errors between the sample points and their corresponding image points is minimized to solve for the homography between frames subsequently.
where H is the homography between the model plane and the image plane.Suppose p 1 , p 2 , . . ., p  is the set of projected sample points and p1 , p2 , . . ., p is their corresponding image points with the presence of the observation noise along the normal direction.Then we can define a function to measure the normal distance between a projected sample point p  and its noisy observation p : where n  is the unit normal vector of the projected sample point p  .Assuming a Gaussian distribution for   , then we have The conditional density of p  given p can be given by Therefore, with the assumption that the observation errors for different sample points are statistically independent, a maximum likelihood estimation of the homography is where  is the number of 3D mode points.It is clear that proposed approach can obtain the maximum likelihood estimation of the homography by minimizing the sum of the square of normal distances.

Interaction Matrix-Distance between Points.
The derivation of the interaction matrix for the proposed approach is based on the distance between the projection of sample point p  and its projected image point p .The motion velocity of the object is then related to the velocity of these distances in the image.
Assume that we have a current estimation of the homography H  .The posterior homography H +1 can be computed from the prior homography H  given the incremental motion ΔH: ΔH can be represented as follows: ) .
The motion in the image is related to the twist in model space by computing the partial derivative of the normal distance with respect to th generating motion at current homography: where Then the corresponding Jacobian matrices can be obtained by ) , J = ( where g  is a 9 × 1 unit vector with the th item equal to 1 and P , = H  P  .

Robust Minimization.
The error vector d is obtained by stacking all of the normal distances of each sample point as follows: The optimization problem for ( 5) can be solved according to the following equation: where h is the motion vector, J is Jacobian matrix which links d to h, and W = diag( 1 ,  2 , . . .,   , . . .,   ) is the weight matrix (refer to [17]).
Then, the solution of ( 11) can be given by ĥ = (WJ) + Wd, (WJ) Finally, the new homography can be computed according to (7) as follows: With a series of homography matrices (more than three orientations), the camera parameters can be solved linearly by method [4].

Maximum Likelihood Estimation of the Camera Parameters
In this paper, the camera calibration problem can be formulated in terms of a conditional density function that measures the probability of the image observations predicted from the camera parameters given the actual image observations.This section describes how to construct this conditional density function.

Probabilistic Formulation for the
where Γ(⋅) is the projection function which takes the camera parameters and the 3D line segment and returns the corresponding edge in the image.,   are the intrinsic and extrinsic parameters of the camera in the image , respectively.(⋅) denotes the conditional density.Then, the maximum likelihood estimation of the camera parameters ,   is maximizing the conditional density function P, which is given by By taking the negative logarithm, the problem of maximize a product is converted into a minimization of a sum, which is given as follows: The intrinsic parameters of the camera  are represented as [  ,   ,   ,   , ]  , where   and   are the equivalent focal length, (  ,   ) is the principal point of the camera, and  is the radial distortion coefficient.The extrinsic parameters of the camera   in the image  are presented in the usual manner by a translation vector T  ∈ R 3 and a rotation matrix R  ∈ (3).In the remainder of this section, the elements of ( 16) are discussed in more detail.

Perspective Projection of 3D Model Line Segment.
Throughout this paper, the perspective projection model is utilized.The relationship between a 3D world P and 2D image point p can be given as where P  = RP + T is the coordinate in the camera frame for P and (⋅)  is the -coordinate.  and   are the equivalent focal length.  and   are the principal point.  and   are the radial distortion, which is modeled as one-order polynomial model: where p = (  ,   )  is corresponding to the projection ray from the focal point to the image point p.
As shown in Figure 2, the line segment M and its projection m in the image plane are represented by their endpoints (P 1 , P 2 ) and (p 1 , p 2 ).The line segments M and m lie on the infinite lines L and l, respectively.The perspective projection of 3D line segment can be given by the projection of its two endpoints: m = Γ (M, , ) = (Γ (P 1 , , ) , Γ (P 2 , , )) . ( When noise is present in the measuring data, we denote p as the noisy observation of the projection of the 3D points P and l as the noisy observation of the projection of 3D model line L.

4.3.
Error Model for the Observation of the Line Segment.Let p1 = ( 1 ,  1 ), p2 = ( 2 ,  2 ), . . ., p = (  ,   ) be a series of image edge points with the presence of the observation noise perpendicular to the line.For convenience, we assume that the true position of the line is parallel to the horizontal axis.Then we have where  1 ,  2 , . . .,   , . . .,   are Gaussian random variables with   = 0,   =  2 and they are mutually independent.Let the noises for the endpoints along the vertical direction be  1 ,  2 , respectively, and s = ( 1 ,  2 )  .It can be easily derived that where It is clear that these two noises are negatively correlative.Since the observation noises conform to Gaussian random variables, the joint density for the random variables  1 and  2 is a Gaussian PDF, which can be given by Supposing that the length of the line segment is  and the intervals of the edge points are all  0 , then we have  1 =  0 ,  2 = 2 0 , . . .,   =  0 .Therefore, we obtain When the number  is large enough, that is,  0 ≪ , it is easy to obtain ) .
From ( 25), it can be seen that the error model allows us to encode the measurement error for image edge point () explicitly and obtain the intuitive impact of image line length.Moreover, long line segments produce more accurate location than shorter ones and small  produces higher confidence about the line location.

Maximum Likelihood (ML) Estimation.
The measurement noise for the localization of the 2D line segments can be decomposed into two components: noise perpendicular to the line and noise along the length of the line.The first noise is modelled as a Gaussian random variable related to orientation error and the noise model has been derived in the last section, whilst the second one is assumed to conform to any distribution (not necessarily Gaussian) related to line fragmentation.
As can be seen in Figure 3, both the projection of the 3D line segment m and its noisy observation l are represented by their endpoints, (p 1 , p 2 ) and (p 1 , p2 ), receptively.The noise vector s perpendicular to the line and the noise vector h along the line are expressed as follows: where the components of s are the distances between the endpoints of l and m along the direction perpendicular to m.The components of h are the distances between the endpoints of the two line segments along the direction of m.
It is assumed that the two random vectors s and h are statistically independent.And then we can approximate the conditional density of l given m as In the literature [19], it is proved that the conditional density of the projection of the 3D model line l given its observed noisy image line segment l is only dependent on the noise perpendicular to line l: Therefore, with the assumption that the observation errors for different line segments are statistically independent, ( 16) can be converted into the following formation: where  is the objective function that measures the disparity between the actual image observations and their corresponding predicted ones by the current camera parameters.s corresponds to the distances from the endpoints of the image line segment to the projected model line.
If the image line segment is fitted by LST and the intervals of the edge points are fixed for all of the image line segments, then we have

Nonlinear Technique for the Optimization of Camera Parameters
In this section, we will describe how to employ the nonlinear technique to solve the problem of camera calibration defined in the previous section.In the initial case, the camera parameters can be provided by the method which is similar to [4] except that the homography matrices are calculated by the method discussed in Section 2, rather than the chessboard corners.At each iteration, the linearized error function is minimized to obtain the interframe motion vector for the intrinsic and extrinsic parameters.Then the camera parameters are updated until the objective function converges to a minimum.The distance from the point of the image line segment to the projection of the model line is given by where  = √(  /  ) 2 + (  /  ) 2 (refer to [20]).
Assume that we have a current estimation of the rotation R  at the time of .The posterior rotation R +1 can be computed from the prior rotation R  given the incremental rotation exp(ω): where ω is the corresponding skew-symmetric matrix of vector : The transformation from the reference frame to the camera frame can be rewritten as where where g 1 = (1, 0, 0)  , g 2 = (0, 1, 0)  , and g 3 = (0, 0, 1)  .The partial derivative of the error function where q = p + ,  = (  ( where s   = (  0   1   2 ) is the distance vector from midpoint and endpoints of the image line segment to the projected model line.
If the incremental motion vector has been calculated, the new camera parameters can be computed as follows: (40)

Experimental Results
The proposed algorithm has been tested on simulated data generated by the computer and real image data captured from our smartphone.The closed-form solution is yielded by the approach [4] except that the homography matrices are estimated by the proposed method.The nonlinear refinement within the IRLS algorithm takes 5 to 8 iterations to converge.

Computer Simulations.
The simulated perspective camera is supposed to be 2 m from the plane object.The resolution of the virtual camera is 640 × 640.The simulated camera has the following property:   =   = 1814.8,  =   = 300.The model plane is a checker pattern printed on the A4 paper (210 mm × 297 mm) with 11 × 14 corners.The images are taken from different orientations in front of the virtual camera.The normal vector of the plane is parallel to the rotation axis represented by a 3D vector r, whose magnitude is equal to the rotation angle.The position of the plane is represented by a 3D vector t (unit in millimetres).In the experiment, the proposed method is compared with the widely used chessboard corners based method [4] (referred to as corners based method and the implementation is according to the related camera calibration function of OpenCV [21]).For the corners based method, 154 corners are utilized.In our method, we use 25 lines fitted from the noisy corners by the LST.The reprojection error indicated by RMS is expressed by the root of mean squared distances in pixels, between the detected image corners and the projected ones.When only four edges of the plane pattern are utilized, the proposed method is referred to as 4-line based method.[4]).Zero mean Gaussian noise is added to the projected image points with the standard deviation  ranging from 0.1 pixels to 2.0 pixels in steps of 0.1 pixels.At each noise level, 100 independent trials are generated.The estimated camera parameters are then compared with the ground truth and RMS errors are measured.Moreover, for 154 points with real projections and the recovered projections, the RMS reprojection error is also calculated.Figures 4(a) and 4(b) display the relative errors of the intrinsic parameters which are measured with respect to   , while Figure 4(c) shows the reprojection errors of the two methods.
From Figure 4, we can see that both the relative errors of the intrinsic parameters and the reprojection errors increase almost linearly with the noise level.The proposed method can produce the equivalent performance with the corners based methods since the image lines are fitted from the noisy image corners.When 4 lines (the smallest set for homography estimation) are utilized, the errors of the proposed method are larger than the corners based method.For  < 0.5, there is little difference between the 4-line based method and the corners based method.
In addition, we vary the number of sample points that are utilized to fit the line segment to validate the performance of the 4-line based method with  = 0.5.From the results in Figure 5, we can see that the errors decrease significantly when more sample points are utilized.When the number is above 40 where more than 160 are utilized to fit 4 line segments, the performance of the 4-line based method is almost similar to that of the 154-corner based method.

Performance with respect to the Number of Planes.
In this experiment, we investigate the performance of the proposed method with respect to the number of the images of the model planes.In the first three images, we use the same orientation and position of the model plane as those used in the last subsection.For the following images, the rotation axes are randomly chosen in a uniform sphere with the rotation angle fixed to 30 ∘ and the positions are randomly selected around [−105, −145, 2000].The number of the model plane images ranges from 3 to 17.At each number of the images, 100 independent trials of independent plane orientations are generated with the noise level for the image points fixed to 0.5 pixels.The errors including the relative errors in camera intrinsic parameters and the reprojection errors for the two methods are shown in Figure 6.The errors decrease when more images are used.From 3 to 7, the errors decrease significantly.Moreover, the reprojection errors of the proposed method are around 0.7, when the number of the images is varying.

Performance with respect to the Number of Lines.
This experiment examines the performance of the proposed method with respect to the number of the lines utilized to recover the camera parameters.For our method, more than 4 lines should be employed.We vary the number of lines from 4 to 25.Three images of the model plane are also used with the same orientation and position as last subsection.100 independent trials are conducted with the noise level fixed to 0.5 pixels for each number of the lines.The results are shown in Figure 7.When more lines are used, the errors decrease.In particular, from 4 to 15, the errors decrease significantly.

Performance with respect to the Orientation of the
Model Plane.This subsection investigates the influence of the orientation of the model plane with respect to the image plane.In the experiment, three images are used with two of them similar to the last two planes in Section 6.1.1.The initial rotation axis of the third plane is parallel to the image plane, and the orientation of the planes is randomly chosen from a uniform sphere with the rotation varying from 5 ∘ to 75 ∘ .The noise level is fixed to 0.5 pixels.The results are displayed in Figure 8. Best performance seems to be achieved with the angle around 40 ∘ .

Real Images.
For the experiment with real data, the proposed algorithm is tested on several image sequences captured from the camera of the smartphone.

Homography Tracking Performance.
In the experiment, three image sequences are captured from the smartphone with a resolution of 720 × 480.In the first image sequence, a chessboard containing 10 × 13 interior corners is printed on an A4 paper and put on the desk.About 1500 frames are taken at different orientation.For each image, the homography from the model plane to the image plane is optimized by the proposed method using the four edges of the A4 paper.parameters is very small about 5 pixels with respect to the corners based method.The last column of Table 1 shows the reprojection RMS of the three methods.When all of the 23 lines are utilized, the proposed method provides the almost same reprojection error as the corners based method.The 4-line based method returns the slightly larger reprojection error, since only the minimum of model lines are utilized.
In order to further investigate the stability of the proposed method, we vary the number of lines from 4 to 23.The results are shown in Figure 12.   and   recovered by the proposed method are around the values estimated by the corners based method only with a small deviation.The reprojection errors for the projected method decrease significantly from 4 to 17.When the number is above 17, the reprojection error is very close to that of the corners based method.

Application to Image-Based Modelling.
In this subsection, we applied the proposed method on two image sequences.In the first image sequence, the card with the size of 54.0 mm × 85.6 mm is utilized as the model object.The A4 paper with the size of 210 mm × 297 mm is chosen as the model object for the second image sequence.In the experiment, a series of images are sampled from the videos to calibrate the camera intrinsic parameters and then the camera pose is optimized for each image frame.After that, the structure from motion developed by the methods [22][23][24] was run on the image sequences to build the complete models of the toys including Luffy and Hulk.In

Figure 1 :
Figure 1: 1D search from the model line to the image line.

3. 1 .
Probabilistic Formulation for Homography Tracking.The relationship between a model point P and its image point p can be given as p = Ψ (P, H)     1)  , p = (    1)  ,

Figure 2 :
Figure 2: Perspective projection of 3D line and point.

Figure 3 :
Figure 3: Relation between projection of the 3D line segment and its noisy observation.
the distances from the two endpoints and midpoint of the image line segment l  to the projected model line l   .It is clear that the error function between 3D model line and 2D image line is weighted by the length of the image line segment.
denotes the location of the origin of the camera frame in the world frame.Let  ∈ R 3 represent the motion velocities corresponding to translation in the , , and  directions between the prior translations T   and the posterior translation T  +1 .Equation (31) can be rewritten as partial derivative of the error function   , (r) with respect to the th motion velocities can be computed as

Figure 4 :
Figure 4: Errors versus the noise level of the image points.

Figure 5 :
Figure 5: Errors versus the number of sample points for 4-line based method.

Figure 7 :
Figure 7: Errors versus the number of the lines.

Figure 8 :
Figure 8: Errors versus the orientations of the plane.
Figure 13, Figures (A), (B), (C), and (D) are some sampled images from the image sequences.The recovered camera poses by the proposed method are shown in Figure (E).The left one of Figure (E) shows the camera poses for the whole image sequence, while the right one corresponds to the sampled views for the following reconstruction.By recovering the whole motion trajectory of the camera, we can easily choose a subset of the frames which are suitable and adequate for modeling.Two rendered views of the reconstructed objects are shown in Figure (F).From Figure 13, we can see that the complete model of the objects has been reconstructed by moving the camera around the objects.For the size of the Camera Calibration.Consider the case where there are  images of a static scene containing  straight line segments.Let {(L =1be the matched set of 3D model and 2D image lines in the image , which can be established automatically according to homography optimization in this paper.With the assumption that the observation errors for different line segments are statistically independent, the conditional density function P of the camera parameters can now be defined as follows: . The error vector d is obtained by stacking all of the normal distances of each image point as follows: 2  +  2  ),   ( 2  +  2  ), 0)  , . . ., (s 1  )  , . . ., (s   )  , . . ., (s   )  )  ,