A High-Precision Registration Technology Based on Bundle Adjustment in Structured Light Scanning System

The multiview 3D data registration precision will decrease with the increasing number of registrations when measuring a large scale object using structured light scanning. In this paper, we propose a high-precision registration method based onmultiple view geometry theory in order to solve this problem. First, a multiview network is constructed during the scanning process. The bundle adjustment method from digital close range photogrammetry is used to optimize the multiview network to obtain high-precision global control points. After that, the 3D data under each local coordinate of each scan are registered with the global control points. The method overcomes the error accumulation in the traditional registration process and reduces the time consumption of the following 3D data global optimization. The multiview 3D scan registration precision and efficiency are increased. Experiments verify the effectiveness of the proposed algorithm.


Introduction
As one of the most effective methods of obtaining threedimensional (3D) surface data of objects, structured light scanning system based on binocular stereo vision has been widely applied in the fields of reverse engineering, relic preservation, biomedicine, and so forth [1,2].Due to the sensors' limited field of view, only partial surface data of the object can be obtained when scanned from a single view.In order to obtain the complete 3D surface data of an object, the object should be scanned from multiple views.At each scanning view, the local 3D data is in its corresponding local coordinate, which means that the 3D data in different views are in different coordinates.Thus, the key to obtain complete surface 3D data is to integrate all 3D data in different local coordinates into a unified coordinate.Currently, there are several typical methods.(1) Find the two point clouds' transformation matrix with the help of precision hardware such as rotary tables, robotic arms, and robots [3,4].This kind of registration methods requires high-precision equipment.The measuring range is limited by the working range of the equipment.Additionally, the cost of the equipment is high.(2) Find the transformation matrix by using the twopoint 3D invariants [5][6][7].The advantage of this method lies in the fact that the 3D characteristics of the object surface are directly used without any processing to the object to be scanned.The method has better performance for objects with complex shapes but can fail for planes and spheres.
(3) Find the transformation matrix by artificial markers, serving as control points, that are pasted on the surface of the object [8].Compared to method (1), this method has larger measurement range and requires low equipment cost.Except for few cases where markers are not allowed (e.g., relic scanning), it can be used for objects with any curvature and shape.This is the most widely used registration method in structured light scanning system.The method usually considers the local coordinate of the first scan as the global coordinate.From the second scan, the transformation matrix between the current scanning local coordinate and the global coordinate is calculated using the common control points that lie in the overlapping area of the two views, and the point cloud and control points under current local coordinate are then integrated to the global coordinate through the calculated transformation matrix.Although this registration approach is simple, it performs registration between every two consecutive scans, which lies error accumulation among all scans.Due to the error in the transformation matrix (we called it transformation error), when the nonhomonymous control points in the current local scanning coordinate are oriented to the global coordinate, they have larger error than them in the local coordinate because the last transformation error propagated to them.In the next scan, if they are used to calculate a new transformation matrix, the error in them will be propagated to the new transformation matrix.Thus, when the number of scans increases, the transformation error will be larger and larger accordingly due to error propagation among each scanning, which will finally cause stratification between two registration point clouds, as shown in Figure 1.
The stratification phenomenon is very common in most structured light scanning systems.So, we address this problem in this paper and propose a stepwise high-precision registration algorithm to avoid the accumulated error which causes stratification between two point clouds in the traditional registration method.In the first step, initial global coordinates of the control points are calculated based on multiple view geometry theory and refined by bundle adjustment.In the second step, the 3D data of each scan will be registered with the global control points as shown in Figure 2. The key of the proposed method is that the local 3D data from each scan is always registered with the global control points.Each registration is independent with each other so that the registration error will not propagate from this scanning to the next scanning, thus the registration error will not propagate among each scan, and the registration precision can be guaranteed.Central to our method is computation of all the control points with high accuracy in a global coordinate.
The remainder of this paper is organized as follows.Section 2 reviews related work.In Section 3, details of the proposed method are given, which mainly focus on how to obtain global control points with high precision; Section 4 gives the scanning and registration process; in Section 5, experimental results are given to verify the effectiveness of the algorithm; conclusions are discussed in Section 6.

Global control points
The ith scanning point cloud The (i + 1)th scanning point cloud

Related Work
In order to solve this problem, Yunbo et al. [9] use common control points with high precision in the calculation of transformation matrix during each registration to reduce the error in a single registration.The method was proved to be effective when measuring small scale objects.However, it in nature is still a cumulative registration method.The point cloud stratification will be inevitable when measuring large scale objects due to error propagation.Zhang et al. [10] and Li et al. [11] introduced theodolites and laser trackers separately to unify all the measures to the global coordinate.Although this method has high registration precision, the measuring system cost is high due to the expensive theodolites and laser trackers.Li et al. [11] and Gong et al. [12] proposed an approach where the global control points of the object are first reconstructed and then the scanned local data were registered with the global control data.The approach needs to rely on a third party photogrammetric system in order to get the global control points.Although the registration precision is high with this method, the system cost dramatically increases for the same reason as introducing theodolites and laser trackers methods.

The Proposed Algorithm
As described in the second paragraph in Section 1, the core of our method is computing 3D coordinates of global control points in the scanning process.We solve this problem using the following method.When the structured light scanner scans the object from different views, the acquired images at different positions form a multiview images network.If the matching points of the control points as well as the internal and external parameters of the camera under a global coordinate are known, the 3D coordinate of the control points can be obtained based on the multiple view geometry theory.Thus, the main tasks in this paper are image control points matching and camera calibration at each scan location.

Image Control Points
Matching among All Images.The matching of corresponding image control points in each image is the key to the construction of multiview network.
The structured light scanner generally uses the center of circles or concentric circles as control points.The circles pasted on the surface of an object are not distinguishable.Therefore, a unique ID needs to be labeled for each circle.
Assuming that the scanned point clouds of the th and (−1)th scans are Q  and Q −1 , respectively, the transformation matrix between the two viewpoints is then where R ts is the rotation matrix of Q  and Q −1 and T ts is the translation matrix of the two.When three or more pairs of common object points exist in Q  and Q −1 , the transformation matrix can be calculated.The method for finding the common object points between two 3D point sets can be found in [14] and the solution method for the transformation matrix can be found in [15] as well.The control points in the first scan are labeled as 1, 2, . . .,  1 .Suppose that Q 2 is the set of coordinates of control points in the second scan; after converting the coordinates of Q 2 to the coordinate system of the first scan by Q  2 = M 2 Q 2 , the distance between each point in Q   2 and each point in Q 1 is then calculated; we call it transformation error in this paper.If the transformation error is less than some given threshold value, 1 mm in this paper, it means that the control point has already been scanned in the first round; otherwise, the control point is a new one to the second scan.The new control point is then labeled as  1 +1,  1 +2, . . .,  2 .The labeling method for the new control points in the third scan follows the same rule.

Camera Calibration in Each Scan
Location.The structured light scanning system includes two cameras, which have already been calibrated before scanning.When the scanner moves from one position to another position during the scanning, we assume that the internal parameters of cameras are consistent.This assumption can be easily satisfied by using fixed focus lens.So, only external parameters of cameras at different positions are needed to be calibrated.
In this paper, we give a very simple method to solve the external parameters under a global coordinate of each image at any scanning position.The scanning process is shown in Figure 3. First, images in each scan position are divided into several groups by scanning sequence; each group has a unique coordinate system (highlighted in pink in Figure 3; we call it group coordinate system).Second, images in different groups are oriented to a unified coordinate system.The second step can easily be satisfied when the number of common 3D points between two different groups is equal to or more than three, as described in Section 3.1.So, in the following, we only focus on the problem in the first step.
The left camera coordinate at the first scan position is defined as the first group coordinate.From the second scan, transformation matrix between the current local coordinate and the first group coordinate is calculated.If the transformation error is larger than a given threshold value (0.1 mm is chosen in this paper), images in the current scan position are divided into a new group; otherwise, they are divided into the last group.As shown in Figure 3, there are three groups.For each group, the initial external parameters of images can be deduced with the following method.
For the first group, variables are defined as follows.R 1 , T 1 , R 2 , and T 2 are the external parameters (rotation matrix and translation matrix) of the left and right cameras in each local coordinate, which can be calculated with high precision through camera calibration method before scanning.Usually, the local coordinate is established on the left camera coordinate, which means R 1 = I and T 1 = 0. We also define the left camera coordinate at the first scan position as the first group coordinate, as highlighted in pink in Figure 3. Assume that a 3D point is denoted by T under the first group coordinate and the second local scan coordinate, respectively.Its corresponding image point coordinates on the left camera at the first and second scan positions are denoted by respectively.The relationship between the image point and space point is shown by formula (1) with the pinhole imaging principle, where K 1 is the internal parameter of the left camera.M 2 is the transformation matrix between P 1 and P 2 and is denoted by With formulas (1) and (2), formula (3) can be derived.Formula (3) describes the relationship between the image point on the left camera at the second scan position and space point coordinate under the first group coordinate, so the external parameters of the left camera at the second scan position under the first group coordinate are R ts2 and T ts2 .The external parameters of the right camera at the second scan position can be deduced with the same rule, which are R 2 R ts2 and R 2 R ts2 + T 2 .For the rest of scan positions, the external parameters of the left camera are R ts , T ts , and the external parameters of the right camera are R 2 R ts , R 2 R ts + T 2 , where R ts and T ts are the rotation matrix and transformation matrix between common 3D points under the th scan local coordinate and the first group coordinate.For images in other groups, the computation method is the same as described above.Consider From Sections 3.1 and 3.2, we obtain the matching points of the control points in the multiview network and the external parameters of each image in the network.The internal parameters and the distortion factor can be obtained by calibration before the scan.As a result, the 3D coordinate of the control points can be solved.Taking into account the errors in the collection of images, there are also errors in the calculated image external parameters and the 3D coordinates of the control points.Therefore, the results need to be optimized.In this paper, the bundle adjustment method in photogrammetry is used to optimize the external parameters and the coordinates of object points.

Refine 3D Coordinates of All the Control Points and Image
External Parameters Using Bundle Adjustment.The bundle adjustment method is originated from analytic photogrammetry.Using collinear equations as mathematical model, the observed values of the image plane coordinates of the image point are then nonlinear functions of the unknowns.The least square errors method is then applied after the nonlinear functions being linearized.The principle of the method is elucidated as follows.The relation between the image point and the object point can be expressed by (4), where ( V) is the coordinate of the image point,   and  V are the effective focal length, ( 0 V 0 ) is the camera coordinate, , ,  are the three components of the object coordinate, [ ] represents the rotation matrix between the coordinate system of the camera and the global coordinate, and [      ] is the translation vector between the coordinate system of the camera and the global coordinate.  ,  V , ( 0 V 0 ) are the internal parameters of the camera, which can be obtained by calibration., , ,  1 ∼ In ( 4), ( V) is a nonlinear function of , , ,  1 ∼  9 ,   ,   ,   .Take the first order approximation of the Taylor expansion to linearize the perspective projection equation and we have (5), where , ,  are Euler angles which are parameters of  1 ∼  9 in the rotation matrix, () and (V) are the approximations of ( 4 ( The  and V obtained from the image are the observed values, which involve errors.The corresponding correction values V  , V V should be added.According to the fact that observed value + correction of the observed value = approximation + correction of the approximation, we have (6).Substituting ( 5) to (6), we then have (7).Minimizing min VV T , where V = [V  V V ], we can find the correction values of each unknown.Compare each correction value with a set tolerance value; if the correction is smaller, then the iterations can be terminated; otherwise, the new value of the unknown is then used as the approximation in the next iteration until the tolerance is satisfied.One has the following: V

Scan and Registration Process
The scanning and registration process proposed in this paper is shown in Figure 4.During each scan, the 3D point cloud under the local coordinate system is saved.In the first scan, the control points are labeled.After the second scan, the transformation matrix M 2 between two viewpoints is calculated; at the same time, the initial external parameters of images at the second scan position also are obtained with method described in Section 3.2.The 3D point cloud of the second scan is sampled and oriented to the coordinate system of the first scan.The new control points are labeled as shown in Section 3.2.The goal of sampling is to facilitate the monitoring of scanned area by the operators and at the same time reduce the memory cost.The sample rate in this paper is 5%.Then, the following scans are performed until all scans are done.After the scans, all the control points should be labeled.According to the method described in Sections 3.2 and 3.3, the initial value of 3D control points and external parameters are solved.The coordinate of each control point is then optimized using the bundle adjustment method.Finally, the 3D data under each local coordinate system are registered with the global control points.In order to further increase the precision of the registration, the iterative closest point (ICP) algorithms are used to optimize the entire data.

Experimental Results
The following experiments were designed in order to verify the effectiveness of the proposed algorithm.The structured light scanning system is shown in Figure 5, which includes two cameras and a projector.Two cameras and projector are placed on a platform fixed on the top of a trivet.The resolution of the camera was 1280 * 1024 pixels.The focal length of the lens was 12 mm.The model of the projector  was Acer k132.The scan distance was about 800 mm.The sensor's field of view was about 300 mm * 250 mm.The computer system configuration was Intel(R) Core i3-3110 M CPU @ 2.40 GHz with a 4.0 G memory.The accuracy in a single scan is about 0.05-0.03mm by computing the difference between the measured surface 3D data with our structured light system and with other third party measuring equipment.
Experiment One: Verification of the Algorithm Precision.A pair of standard balls, as shown in Figure 6, were used.The standard balls were scanned multiple times.The regression diameters and the error of the distance between the centers of two balls were both calculated using methods with the multiview constraint (the proposed method) and without the constraint (method in [8]).The diameter of the standard balls and the distance between centers were checked by Shenzhen Academy of Metrology and Quality Inspection.
The diameter of ball A was 30.0060 mm and the diameter of ball B was 30.0064 mm.The distance between the centers was 60.4344 mm.In order to avoid accidental errors, six experiments were performed under the same conditions.The process of one of the experiments is shown in Figure 6.The standard balls were scanned from six different angles.The white dots around the two balls are the reference points for registration.
In order to have a clear visual analysis on the registration precision, the point cloud of each scan is shown by different colors as shown in Figure 7.In the place where two point clouds overlap, the more aliasing of the two colors, the better the registration of the two point clouds.If there is only one color that can be seen, there is stratification between the two point clouds and the registration precision is low.The results of [8] are shown in Figure 7(a).There is less aliasing between the green color and other colors in the center part, indicating that the green point cloud is floating above other point clouds, whereas, in the results of the proposed algorithm (Figure 7(c)), different colors are displayed uniformly in the overlapping areas.Different colors are mixed together, indicating that the registration precision of the proposed method is higher than that of [8].The sampled display results during the scanning are shown in Figure 7(b).
The above conclusion is qualitative.The comparisons between the results of the proposed method and the method in [8] are shown in Tables 1 and 2. For the six experiments, the average error of the method in [8] for the diameter of ball A is 0.0853 mm and the average error for the diameter of ball B is 0.0855 mm.The error in the center distance is 0.0353 mm.The corresponding errors for the proposed method are 0.0268 mm, 0.0253 mm, and 0.0094 mm, respectively, which is more precise than the method in [8].

Experiment Two: Verification of System Time Consumption.
There is no doubt that the proposed registration method needs to construct the multiview network first and then solve for the coordinates of control points, which could require longer time consumption.Taking into account that, after normal 3D scans, the entire data need to be globally optimized using ICP and the time consumption of ICP depends on the initial precision of the registration of different point clouds, we compare the time consumption of the proposed method and the method in [8] under the same ICP iteration conditions.The time consumption of the construction of multiview network was denoted by 1.For the same termination condition, the ICP time consumption of the proposed method is 2 and the time consumption of [8] is 3.Therefore, the total time consumption of the proposed method is 1 + 2.The total time consumption of the method in [8] is 3.The iteration termination condition was set to reaching the maximum number of iterations, the norm of the difference between the calculated rotation matrix and identity matrix being smaller than 1 − 6, and the norm of the difference between the translation matrix and the zero vector being smaller than 1 − 6.During the ICP iterations, the sampling rate was set to be 30%.The results are shown in Table 3.It can be seen that the proposed method has more time consumption in the registration, but the registration precision is increased accordingly.Therefore, the ICP iteration costs less time and the overall efficiency of the scan system is increased.
Experiment Three: Large Size Object Scanning.We choose a dragon and Buddha model with a size of 1500 mm * 400 mm * 500 mm and 500 mm * 400 mm * 1500 mm separately to test the validity of the method proposed.As for dragon testing, by the 22nd scanning with the method proposed  in article [8], two sets of 3D data at the beaten part of the dragon are stratified, as shown in Figure 8(b).We use the same image data to calculate the whole 3D data of the dragon with the method proposed; the results are illustrated in Figure 8(c).There are no stratified data, which indicate that the method proposed has higher precision.Another example is scanning Buddha model.The scanning results with the method in article [8] and in this paper are shown in Figure 9.
In Figure 9(b), at the head part of the Buddha, there are two pieces of 3D data which are not aligned.specifically,

Conclusions
The precision of multi viewpoints point clouds registration is one of the major indices that measure the performance of the structured light 3D scanner system.Using a consecutive registration approach cannot avoid error propagation and accumulation.When scanning large scale objects, with the increase of number of registrations, the registration would tend to fail.In this paper, the point cloud data under local coordinates during scans were saved.Additionally, all image information was used to construct a multiview network and the bundle adjustment method was used to optimize the precision of all the control points.Finally, the saved point clouds under each local coordinate were registered with the global control points.Experiments showed that the proposed method effectively increased the registration precision as well as the system efficiency.The proposed method has already been applied to the design of structured light scan systems.

Figure 1 :
Figure 1: Example of stratification between two point clouds when registration.

F i r s t g r o uFigure 3 :
Figure 3: The scanning process diagram.

Figure 6 :
Figure 6: Standard balls from six scan angles.

Figure 7 :
Figure 7: Point cloud in different views with different colors.

Table 2 :
Experiment error with proposed method.

Table 3 :
[8]putation time with the proposed method and method in[8].aretwonoses on the face, while the 3D data with the proposed method are in aligned well, and there is no stratified phenomenon.Figures8(a) and 9(a) illustrate one view scanning of the experiments. there