A Flexile and High Precision Calibration Method for Binocular Structured Light Scanning System

3D (three-dimensional) structured light scanning system is widely used in the field of reverse engineering, quality inspection, and so forth. Camera calibration is the key for scanning precision. Currently, 2D (two-dimensional) or 3D fine processed calibration reference object is usually applied for high calibration precision, which is difficult to operate and the cost is high. In this paper, a novel calibration method is proposed with a scale bar and some artificial coded targets placed randomly in the measuring volume. The principle of the proposed method is based on hierarchical self-calibration and bundle adjustment. We get initial intrinsic parameters from images. Initial extrinsic parameters in projective space are estimated with the method of factorization and then upgraded to Euclidean space with orthogonality of rotation matrix and rank 3 of the absolute quadric as constraint. Last, all camera parameters are refined through bundle adjustment. Real experiments show that the proposed method is robust, and has the same precision level as the result using delicate artificial reference object, but the hardware cost is very low compared with the current calibration method used in 3D structured light scanning system.


Introduction
Binocular structured light scanning system (BSLSS) is widely used in the fields of reverse engineering [1], inspection [2], medical analysis [3], and human body motion analysis [4] due to its advantages of noncontact, high speed, and high precision [5]. The system principle is based on binocular stereo vision, and usually it is composed of two digital cameras and a commercial DLP projector [6]. Before scanning, camera parameters should be calibrated, which include intrinsic parameters (effective focal length, principle point, and lens distortion coefficient) and extrinsic parameters (orientation of two camera coordinate frames relative to a certain world coordinate frame). The process of calibration is not only the first step of scanning but also has great influences on scanning precision, which is critical to the overall system and has been studied extensively in computer vision, and even recently new techniques have been proposed in the papers [7][8][9]. In BSLSS (binocular structured light scanning system), in order to get high precision, 3D or 2D elaborate reference object is usually required in the papers [10][11][12]. The 3D calibration object usually consists of two or three orthogonal planes with each other, and 2D calibration reference object is usually a plane with some patterns on it, as shown in Figure 1; the patterns are circle and chessboard. The adjacent circle center distance or chessboard corner should be known in advance. In the calibration, the planar pattern should be imaged at a few different orientations, as in Aots system; 18 orientations are necessary. Although the calibration algorithm based on 3D or 2D object can achieve high precision which can meet the need of industrial measuring, there are two drawbacks. One is that high precision calibration reference objects need the fine processing, so it is usually expensive. The other is that it is not suitable for large field of view. In this paper, we put forward a novel camera calibration method, which is based on hierarchical self-calibration and bundle adjustment. Hierarchical self-calibration means that we estimate extrinsic parameters though different spaces, more specifically, we first get the initial extrinsic parameters of the cameras in the projective space using factorization method and then 2 The Scientific World Journal upgrade to Euclidean space with orthogonality of rotation matrix and rank 3 of the absolute quadric as constraint. Bundle adjustment method is an optimize technique and widely used in photogrammetry. In this paper, we use this technique to refine all the parameters to guarantee that high precision calibration can be achieved. The novel method is free of fine processing and complex mechanical control setup, only a few of small size patterns printed on PVC membrane and a stand scale bar are needed, which are very easy to process and the cost is much lower than that of current plane board. Moreover, the algorithm proposed is suitable for calibration of large field of view. Real experiments have given out the reliability and validity of this method.

Hardware Used.
Like the traditional method proposed by Zhang and Tsai, the method in this paper also needs to build the relationships between space points and their projection in image. So space points and their projections should be defined firstly. In this paper, some patterns with a unique ID number are designed for space object points. The patterns are a white circle with circular sections around it, just as shown in Figure 2(a). The white circle center is used as a space object point, and their projection on image can be acquired by circle center detection. The circular sections give ID information of the circle. The scale bar is a bar with two coded circles whose distance is detected by the third measuring system.

Binocular Vision Model and Parameters to be Calibrated.
This section describes the binocular vision model and defines the calibration parameters. Figure 3(a) illustrates the pinhole imaging model of one camera. A 3D space point in denoted by A = [ 1] , and its corresponding image point is denoted by ( , V). O X Y Z is the world coordinate system, and O X Y Z is the camera coordinate system. O 0 V is the pixel coordinate system. O is the optical center of camera. The mathematical model between A and can be formulated as where ( , V) is the coordinate of in the image plan, P = K(RT), represents the projection depth, and K is camera interior parameter matrix defined as where and V are the scale factors in image and V axes; is the parameters describing the skewness of the two image axes and is often the zero; ( 0 , V 0 ) is the coordinates of principle points which would be ideally in the centre of the image; R is the rigid body transformation from the world coordinate system to the camera 3-D coordinate system (which is also called rotation matrix). T is the translation vector. R and T are extrinsic parameters.
The binocular vision geometric model is illustrated in Figure 3(b). 1 and 2 are the two optical centers, 1 , 2 are the corresponding image points of space point A. The left and right camera imaging equations are described as follows: In the above two sets of equations, four equations are independent, and the 3D coordinate of a space object point can be solved through using a least squares estimator. We can assume that the world coordinate and left camera coordinate is the same, which means R = I, T = 0, where I is an identity matrix and 0 is a zero vector. Therefore, in order to solve the 3D coordinate of a space object point, parameters to be calibrated include intrinsic parameters of the left and right cameras, which are defined as K , K , and external parameters of the right camera, can be defined as R , T .

The Algorithm Flow.
Our aim is to get the intrinsic parameters and relative position relationship of two cameras used in BSLSS; the proposed algorithm flow is illustrated in Figure 4.
Left and right cameras are calibrated under its own world coordinate O X Y Z separately first then absolute oriented to a global world coordinate, and the global world coordinate is the left camera coordinate. The process of calibration of a signal camera can be divided into four steps. First step is to make estimating initial intrinsic parameters from image and the second step is to estimate initial extrinsic parameters and 3D objects based on hierarchical self-calibration. Third step is estimating initial distortion parameters through corresponding image points and space 3D points. At last, all the parameters are refined through bundle adjustment.

Estimating Initial Intrinsic Parameters of One Camera.
The intrinsic parameters encompass focal length, image format, and principle point, which are the camera's intrinsic properties and they are not to be changed. So, we can compute the intrinsic parameters in advance. Here, we provided two methods to get the intrinsic parameters. One is using a panel to compute intrinsic parameters. When imaging a panel, there is a homography matrix linking the image panel and the space panel; intrinsic parameters can be estimated from the homography matrix. The principle of this method can be found in the paper [11]. The other method is very simple, and it can be assumed that there is little error in the camera assembly process. Since the intrinsic parameters contain the intrinsic properties of the camera, those parameters can be obtained from camera operation manual which the camera manufacturers will provide. More specifically, those parameters are set as following: = 0, 0 = width/2, V 0 = hight/2, = V = ( / + / )/2, where is the focal length, and are the center to center distance between adjacent sensor elements in X direction and Y direction, width is the width of image in pixel, and height is the height of image in pixel.

Estimating Initial Extrinsic Parameters of One Camera.
The process of obtaining initial value of camera extrinsic parameters can be divided into three steps. The first step is to get camera motion and scene sharp matrix in projective space with the method of factorization, and the second step is to get camera motion and scene sharp matrix in Euclidean space. The third step is to get the initial value of camera external parameters by decomposing the camera motion matrix.
In the first step, supposing there are 3D space objects visible in images, the rescaled measurement matrix W 3 × can be got, and W 3 × has rank at most 4. W 3 × can be divided into Q and X, described as (4), where is the projective depth, X is the unknown homogeneous coordinate vectors of the 3D points called sharp matrix, P is the unknown 3 × 4 image projection matrix also called motion matrix in some other articles, and is the measured homogeneous coordinate vectors of the image points, where = 1 ⋅ ⋅ ⋅ labels points and = 1 ⋅ ⋅ ⋅ labels images. Each object is defined only up to an arbitrary nonzero rescaling. Consider Through method of SVD decomposing, we can get Q and X, and the premise is that we know the projective depth of each image point. There are two methods to estimate projective depth, one is based on fundamental matrices and epipoles, and the other is based on minimizing the rank of W 3 × to 4. The detail of estimating projective depth can be found in the papers [13,14]. In this paper, we estimate projective depth by the algorithm proposed in the paper [13]. The principle of this method is based on epipolar geometry, and the solving of can be got through the following: where is the fundamental matrix between the th and th images, which satisfy the constraint of = 0; is the pole point on the th image, and and are the corresponding image point coordinates on the th and th image 4 The Scientific World Journal   separately. When estimating , we assume that the projective depths of every reference image point is 1, 1 = 1 first, and then substitute it to (5) to get all the other projectiave depth on other images. The second step is that, in European space, the projective matrix of one image can be expressed as P = K (R | T ), and we can get following through considering m images: . . .
where P is the projection matrix in projective space and is an arbitrary nonzero number. H is a transformation matrix. The main task in the second step is to solve H, thus P can be got. Because the rotation matrix is an orthogonal matrix, which means R R = I, the following can be deduced through using this property: where H 0 is the first three columns of H and K 1 = ⋅ ⋅ ⋅ = K = K. Set Tran = H 0 H 0 , and Tran is a 4 × 4 symmetric matrix with 10 unknowns. One unknown factor will be added when one image is added, so in order to solve Tran, at least three images are needed [15]. Meanwhile, Tran represents a dual absolute quadric surface, with a rank of three and det(Tran) = 0. This is a constraint to solve Tran. In order to guarantee the rank of Tran is three, SVD decomposition for Tran is carried out and the fourth singular value is seted zero. As shown in The Scientific World Journal 5 (8), H 0 can be got, where A 0 is the first three columns of A. Consider The method to solve the fourth column of H is described as follows. Assuming 0 is one of the points in Q and it is also the origin point of European space coordinate system, then H −1 0 = (0 0 0 1) can be obtained, yielding H 4 = 0 . At this point, all the elements in H are solved. Motion and shape matrixes in European space are represented as follows: In the above deducing process, we do not make any assumptions on the intrinsic parameters, and the rank of dual absolute quadric surface is three and is used as constraint factor which guarantees the robustness of the solving process.
The third step is that if P = P H, X = H −1 X , R T can be solved by decomposing P through method of QR decomposition.

Estimating Initial Distortion Coefficients of One
Camera. The image captured by the digital camera is not satisfied with the pinhole camera model, so we should consider lens distortion of the camera. In this section, we only consider the first two terms of radial distortion. The coefficients of the radial distortion can be solved through the following: ( , V) represents the ideal pixel image coordinates, which can be obtained through (1). ( , V ) represents the corresponding real observed image coordinates, which can be obtained through image feature detection algorithm. ( , ) denotes the ideal normalized image coordinates; the solving for it can be found in the paper [11]. ( 0 , V 0 ) denotes the principle points.

Refine All Parameters of One Camera Using Bundle
Adjustment. In the above Sections 2.4.1 to 2.4.3, the solving of initial intrinsic, extrinsic parameters and distortion coefficients is completed. Usually, the precision is not very high, so in order to achieve high calibration precision, bundle adjustment method is adopted to refine all the parameters. Bundle adjustment method is an optimization technique originally conceived in the field of photogrammetry and has increasingly been used by vision researches during the last decade. The optimized objective function is as follows: where is the reprojective point of the th space point on the th camera, Angle is a vector that represents the Euler angle of the th camera, t is the translation vector, Q is the th control points, and is the th image point on the th camera. More details about BA can be found in the paper [16].

Unifying Extrinsic Parameters of Left and Right Camera to
Global Coordinate. Only the parameters of single camera are solved in the analysis process mentioned above. The global coordinate of the two cameras is not the same. In this section, we will give the algorithm of unifying the two different coordinates of the two cameras into one global coordinate.
Assuming X and X are 3D points in left and right camera world coordinate, and the corresponding projective matrixes are P and P separately. According to the principle of pinhole imaging, the image point and the object point will be satisfied with the following equation, where ( , V ) ( , V ) are image points in left and right images: The relationships between X and X are illustrated as in (6), where G 4×4 is a rigid transformation matrix. Consider the following: From (12) and (13), we can calculate P = P G 4×4 . And then a unified coordinate is obtained and thus the calibration is completed.

Experimental Results
The experiments for testing our proposed method are given out in this section. The binocular structured light system, which is shown in Figure 5, consists of two digital cameras with a total resolution of 1280 × 1024 pixels and an LCD projector with resolution of 1024 × 768. The distance between two cameras is about 30 mm, and the measuring volume depends on the focusing capability and the field of camera's view. In our system, a lens with 6 mm is chosen, and the measuring volume is about 400 mm * 300 mm * 300 mm. Artificial square targets with a side length of 5 mm and a scale bar are placed in the measurement field randomly, which are captured simultaneously by the two cameras. The distance between targets and two cameras is about 1000 mm. 20 images from different positions and angles are used to solve parameters, which are as shown in Figure 6. (Only 10 images are shown due to limited space).
Method proposed by Zhang [11] is the most widely used calibration method in current BSLSS. So, the performance of   the method proposed is evaluated by comparing the results with respect to a classical panel calibration described by Zhang. The panel board used in Zhang's algorithm is as shown in Figure 7. The coordinates of circle center are measured through the third measurement, and the distance between two circle centers is known in advance.
The panel board used in Zhang's algorithm is also used as a standard object. When calibration is completed, we will use the calibration parameters to calculate the 3D distance between two circle centers. It is assumed that the th true distance is , the same distance calculated by the method proposed and Zhang's method are and separately. It can   In order to avoid random error, ten group experiments are carried out, and the results are as described in Table 1. In order to analyze the relationships between number of image and calibration precision, different number images are tested, which is also shown in Table 1. From Table 1, we can see that when the number of images is small (the number is 7), the calibration method in this paper is slightly lower than that of Zhang's method. When the number of images is increased to 14 or 20, the calibration precision is almost the same. One of the experimental results is shown in Table 2. From Table 2, we can see that the calibration results between Zhang's method and the proposed method are slightly different.

Conclusions
As the current camera calibration method in the BSLSS (binocular structured light scanning system) has high cost, In this paper, we put forward a novel calibration method which does not rely on complex calibration reference object, and the hardware are some small size artificial targets and a scale bar. Because the hardware used does not need the strict industrial processing, the cost is lower compared with traditional 2D or 3D elaborated object. Besides, the size of hardware used in this paper is very small which means they are more flexible than 2D or 3D calibration reference object. Real experimental results show that the calibration precision is the same as the traditional method when calibration images are enough.