A Novel Method for Automatic Extrinsic Parameter Calibration of RGB-D Cameras

Calibration of extrinsic parameters of the RGB-D camera can be applied in many ﬁelds, such as 3D scene reconstruction, robotics, and target detection. Many calibration methods employ a speciﬁc calibration object (i.e., a chessboard, cuboid, etc.) to calibrate the extrinsic parameters of the RGB-D color camera without using the depth map. As a result, it is diﬃcult to simplify the calibration process, and the color sensor gets calibrated instead of the depth sensor. To this end, we propose a method that employs the depth map to perform extrinsic calibration automatically. In detail, the depth map is ﬁrst transformed to a 3D point cloud in the camera coordinate system, and then the planes in the 3D point cloud are automatically detected using the Maximum Likelihood Estimation Sample Consensus (MLESAC) method. After that, according to the constraint relationship between the ground plane and the world coordinate system, all planes are traversed and screened until the ground plane is obtained. Finally, the extrinsic parameters are calculated using the spatial relationship between the ground plane and the camera coordinate system. The results show that the mean roll angle error of extrinsic parameter calibration was − 1.14 ° . The mean pitch angle error was 4.57 ° , and the mean camera height error was 3.96 cm. The proposed method can accurately and automatically estimate the extrinsic parameters of a camera. Furthermore, after parallel optimization, it can achieve real-time performance for automatically estimating a robot’s attitude.


Introduction
RGB-D cameras, such as Kinect [1][2][3][4][5][6], PrimeSense, and Asus Xtion Pro, are traditional RGB cameras with added infrared cameras. Figure 1 shows the structure of a Kinect camera. It can be seen that it includes a color camera, infrared camera, and an infrared illuminator. e color camera outputs a color image, the infrared camera outputs a depth image, and the infrared illuminator emits infrared light for the calculation of the depth image. With the emergence of low-cost RGB-D cameras such as Kinect, the camera and illuminator are increasingly used for tasks such as 3D scene reconstruction and navigation [8,9], target recognition and tracking [10,11], 3D measurement [12][13][14][15], and even social networks [16], for which the extrinsic parameters of the camera often must be calibrated.
For example, in target detection and recognition [11], the extrinsic parameters of the camera should be calibrated first, and then pedestrians in the scene are tracked. If the extrinsic parameters of the camera can be obtained, the negative effects of perspective can be eliminated. In addition, the detection algorithm can be unified, and the recognition process can be simplified, thus accelerating recognition speed.
However, studies of the calibration of RGB-D cameras mainly focus on the extrinsic parameters of the color camera relative to the infrared camera [17,18]. Calibrating the extrinsic parameters of an RGB-D camera is often similar to the calibration of the color camera, and the method of chessboard calibration is generally used to calibrate the color camera [19]. Munaro et al. [10] used a chessboard to calibrate the extrinsic parameters of multiple cameras and carried out pedestrian detection based on this.
is calibration method did not make full use of the depth information provided by the RGB-D camera, and the calibration results were essentially the extrinsic parameters of the color camera. If it were directly used for 3D reconstruction of the depth map, then there would be a large error. Shibo and Qing [20] designed a calibration board for RGB-D infrared camera recognition, which had holes with regular intervals and calibrated the infrared camera by automatically identifying holes. e need to design special calibration objects increased the difficulty of calibration. Liao et al. [17] divided the calibration into three categories: calibration that required a calibrator, calibration that required human intervention, and fully automatic calibration. e proposed method belonged to the third category.
Liao et al. [17] divided the extrinsic parameter calibration methods for RGB-D cameras into three categories: (1) the first was to calibrate the extrinsic parameters of the color camera and then to use the extrinsic parameter T D C′ of the color camera and images collected by the infrared camera to obtain the extrinsic parameter T D of the infrared camera through transformation T D � T D C′ T C′ . is method can directly use the color image calibration method, but it needs the extrinsic parameters of the color camera relative to the infrared camera and does not use the information provided by the depth map, so the process is complicated. (2) e second was to detect the features on the depth map provided by the infrared camera by designing a specific calibration object to obtain the extrinsic parameters of the camera based on the features (such as chessboard corners). (3) e third method used the depth map to calibrate the extrinsic parameters of the camera. e calibration was carried out by detecting the target on the depth map and using the relationship between the target and the world coordinate system. is method directly used the depth information, which greatly simplified the process and improved the efficiency of calibration. Our proposed method, which we will call the ground plane calibration method, is based on the automatic calibration of the extrinsic parameters of a ground plane detection camera and belongs to the third category. is method can directly calibrate the infrared camera of an RGB-D camera using its depth information.

Extrinsic Parameter Calibration
RGB-D cameras can be divided into color cameras and infrared cameras. For this study, the extrinsic parameters of an infrared camera were calibrated. us, extrinsic parameter calibration refers to that of infrared cameras. e extrinsic parameter of the camera is where T W C ∈ R 4×4 and R W C ∈ R 3×3 are the rotation matrices of the camera, t W C ∈ R 1×3 is the translation matrix of the camera, and (·) W C represents the transformation from the camera coordinate system to the world coordinate system. Figure 2 shows the flowchart of extrinsic parameter calibration. After the establishment of a world coordinate system, the depth image is first obtained from the RGB-D camera.
en, the depth image is transformed to a 3D point cloud under the camera coordinate system using the internal parameters of the infrared camera. After that, the ground plane in the point cloud is solved by iteration. Subsequently, the extrinsic parameters of the camera are obtained by calculation.

Establishment of the Camera Coordinate System and World Coordinate System.
e camera is in a 3D coordinate system, where the infrared camera is the origin, as shown in Figure 3. e infrared camera is the origin O C of the camera coordinate system. e X c axis is along the transverse direction of Kinect. e Z C is perpendicular to Kinect and points in the shooting direction. e world coordinates can generally be established at will. For facilitating the calculation of extrinsic parameters of the camera, as shown in Figure 4, the world coordinates should meet the following requirements.
(1) e origin of the world coordinate system is the projection point of the origin of the camera coordinate system on the ground plane (2) e Y W axis is the projection of the Z C axis of the camera coordinate system on the ground (3) e Z W axis is downwardly perpendicular to the ground plane (4) e coordinate system is a right-hand system In this way, it is convenient to find the point group corresponding to camera coordinates and world coordinates and simplify the calibration process.

Calculation of Extrinsic Parameters of the Camera.
e matrix of transformation from the camera coordinate system to the world coordinate system, that is, the extrinsic parameter T W C of the camera, is a 4 × 4 matrix with 12 unknown parameters, as shown in (1). First, the transformation matrix T C W from the world coordinate system to the camera coordinate system is calculated to obtain the extrinsic parameters of the camera: A point in the camera coordinate system is P (i) C , and the corresponding point in the world coordinate system is P (i) W , as shown in Figure 5. Four special points are selected in the world coordinate system:   Discrete Dynamics in Nature and Society By calculating the corresponding points in the camera coordinate system, that is, It is known that the plane in the camera coordinate system is ax + by + cz + d � 0. Its normal vector perpendicular to the ground is n → C � a b c and ‖ n → C ‖ 2 � 1. As the origin of the world coordinate system is the projection of the origin of the camera coordinate system on the plane, the    coordinates corresponding to the origin P (0) W of the world coordinate in the camera coordinate system are If point P (3) W � 0 0 1 is on the Z axis of the world coordinate system and O C , P (0) C , and P (3) C are collinear, then Because the Y W axis is the projection of the Z C axis on the plane, where n → y is the unit vector of P (0) P Z c is the projection of point 0 0 1 in the camera coordinate system to the plane. By solving (9), the projection point can be obtained as where a, b, c, and d are the parameters of the plane equation and (x 0 , y 0 , z 0 ) is a random point on the plane. After P (0) C , P (2) C , and P (3) C are obtained, P (1) C can be obtained by vector cross-multiplication as By solving (5)- (7) and (10), the coordinates of four points selected from the world coordinate system in the camera coordinate system are obtained. en, the transformation matrix from the world coordinate point to the camera coordinate point is obtained by solving (4). Finally, the extrinsic parameter matrix of the infrared camera is obtained by solving (2).

Ground Plane Estimation
Toward facilitating plane detection, the depth map is first transformed to a 3D point cloud in the camera coordinate system using internal parameters. en, the maximum likelihood estimation sample consensus (MLESAC) method [21] is used to extract the plane. e ergodic plane is iterated to obtain the extrinsic parameters. Next, the extrinsic parameters and point cloud are used to determine whether the plane is a ground plane, and we finally obtain the extrinsic parameters of the camera, as shown in Figure 6.

Transformation of Depth Map to 3D Point Cloud in the Camera Coordinate
System. Consider the following: Figure 7 shows the imaging model of the camera. e coordinate axes of the image plane are X I and Y I , respectively. e imaging point of point P C � (X C , Y C , Z C ) in the camera coordinate system on the image plane is p I � (u I , v I ). e focal length of the camera is f. e coordinates of the intersection of the optical axis and the image plane are c x and c y . e length and width of the pixels are d x and d y , respectively. e value of the pixel is d p . e proportion of Z coordinate values corresponding to the value of pixels in the camera coordinate system is s. en, (u, v, d p ) can be transformed to (X C , Y C , Z C ), as shown in (11).

Ground Plane Estimation.
Ground plane estimation is the basis of subsequent extrinsic parameter calculation of the camera. Its accuracy determines the accuracy of extrinsic camera parameters. In a scene, multiple planes will be fitted out. en, whether the plane is a ground plane is determined according to the following conditions: where θ is the angle between the X c axis of the camera and the plane; median is the operation of taking the median value; P Z W I is the set of interior point Z-values of the fitted plane model; P Z C O is the set of exterior point Z-values of the fitted plane model; and ε is the set tolerance value. Equation (12) represents two conditions: (1) the inclination angle of the camera relative to the plane does not exceed 45°and (2) after the exterior parameter is calculated according to the plane, the point cloud is transformed to the world coordinate system. e point set on the plane has the largest Z-value.
According to this definition, the planes were screened to calculate the qualified ground plane. e flowchart is shown in Figure 6.
First, the 3D point cloud in the camera coordinate system was input, and the plane was calculated using the MLESAC method. e set of interior points satisfying the plane was recorded. By using this plane and the method in Section 2.2, the extrinsic parameters of the camera were calculated. Combined with the internal parameters of the camera, the 3D point cloud in the camera coordinate system was transformed to the 3D point cloud in the world coordinate system, and it was judged whether the conditional equation (12) was met. If not, then the recorded set of internal points was removed from the point cloud, and fitting of the plane continued. Otherwise, the plane was the ground plane. No further operation was conducted, and the extrinsic parameters of the camera were output.

Experiment Process.
In this experiment, a PrimeSense camera was used to collect video data, and MATLAB was used for simulation to validate the proposed algorithm. To facilitate the accuracy comparison, the chessboard calibration results of the color camera were used as the reference data. Because the matrix was not suitable for comparison, the extrinsic parameters were transformed to camera height (H), roll angle (θ), and pitch angle (ϕ). e calibration accuracy was measured by comparing the camera height, roll angle, and pitch angle. Following the world coordinates established in Section 2, the values can be obtained as where P (2) c is the camera Z-axis in the world coordinate, P (0) c is the camera X-axis in the world coordinate, and P (2) w is the world Z-axis.
e experiment was carried out according to the following steps.
(1) A PrimeSense camera was used to collect video data (each frame of the video had a clear chessboard), including color video and depth video. (2) N(N > 20) video frames were selected randomly from the color video as the input to the Zhang camera calibration method [19], and the internal camera parameters were estimated. (3) Each frame of the color video was traversed. e camera's extrinsic parameters were estimated using the internal parameters and the chessboard corner detected by the current frame, and the camera attitudes (H chessboard , ϕ chessboard , and θ chessboard ) of each frame of color video were obtained. (4) Each frame of depth video was traversed. e ground plane was detected using the proposed method, and the extrinsic parameters of the camera were estimated to obtain the camera attitudes (H plane , ϕ plane , and θ plane ) of each frame of depth video. e camera height difference (ΔH), roll angle difference (Δθ), and pitch angle difference (Δϕ) of each frame was calculated, using the following equations: e video formats collected in Step 1 are shown in Table 1.
Each frame of the video contained a chessboard, as shown in Figure 8. In the experiment, the chessboard was used to estimate the camera's extrinsic parameters, which were taken as the reference data. Because different chessboards corresponded to different internal parameters when the camera's internal parameters were calculated using the Zhang calibration method [19], Steps 1 to 3 were executed 86 times. Each video frame could estimate 86 extrinsic parameters, with their medians as the reference data. P c = (x c, y c, z c, ) Figure 7: e model of the camera.
Discrete Dynamics in Nature and Society e size of a chessboard was 7 × 9, and both the length and width of each chessboard were 40 mm. e camera's extrinsic parameters, as obtained through chessboard calibration, are shown in Figure 8.
In Figure 9, θ chessboard , ϕ chessboard , and H chessboard are the medians of the camera pitch angle, roll angle, and height, respectively, as measured using the chessboard method and θ plane , ϕ plane , and H plane are the medians of the camera pitch angle, roll angle, and height, respectively, as measured using the proposed method. Figure 10 shows the experimental errors, and Table 2 shows the final measured results.

Error Analysis.
Because there was no high-precision instrument to measure the extrinsic parameters of the camera in the experiment, these were measured using the chessboard calibration method and taken as the reference data to measure the accuracy of the proposed method. e factors affecting the accuracy of this experiment are as follows.
(1) First is quantization error of corner detection. Because the video frame used was 240 × 320 pixels, the quantization error of corner detection was quite large when calculating the camera's internal and extrinsic parameters.    Discrete Dynamics in Nature and Society (2) e chessboard method and the proposed method were, respectively, used to measure the extrinsic parameters of the RGB-D camera's color sensor and infrared camera.
(3) e noise of the RGB-D camera scanning scene data would affect the detection of the ground plane, thus affecting the accuracy of the camera's extrinsic parameters.  (4) e parameters to stop MLESAC iteration in ground plane detection would also affect the results of this experiment.

Influence of Scene Noise on Ground Plane Detection.
RGB-D cameras have many noise sources, such as temperature, incident angle and intensity of ambient light, and texture [22]. e MLESAC used in this study could deal with small amounts of noise, but not with scenes with too much noise or data loss.
(1) Strong sunlight would cause too many ground plane noise points, resulting in inaccurate estimation of plane parameters and a slight influence on the camera height and roll angle. (2) If the reflectivity of the ground was too high (for example, a mirror was placed on the ground), the data of this area would be lost. If too much data were missing, then the ground plane could not be detected, and the extrinsic parameters of the camera could not be estimated.
To verify the influence of noise on camera attitude, Figure 11 shows the error change of camera attitude with the increase of Gaussian noise variance in the 929 th frame. e mean value of Gaussian noise was 0, and the variance was δ g . It can be seen that with the increase of noise variance, the height error increases, and the stability of pitch angle and roll angle decreases. In addition, when the variance is greater than 0.25, the plane cannot be correctly estimated.

Conclusion
During pedestrian detection based on RGB-D cameras, due to the impact of environmental vibration, an RGB-D camera will shift its original position, and the extrinsic parameters will change greatly, which will directly affect the detection accuracy. By using the automatic extrinsic parameter calibration method proposed in this study, the extrinsic parameters can be automatically corrected when there is no pedestrian, so as to solve the above problem. e proposed method can be applied to the automatic adjustment of extrinsic parameters of 3D cameras (speckle, TOF, and binocular camera) and has high parallelism. After parallel implementation, it is instantaneous and can be used for the automatic calibration of extrinsic parameters of 3D cameras on mobile robots.
In this method, we extracted the plane from the 3D point cloud and used the position relationship between the plane and world coordinates to obtain a camera's extrinsic parameters, which were used to determine whether the current plane was a ground plane. If not, then the next plane would be used to calculate the extrinsic parameters until the ground plane was found, and the extrinsic parameters of the infrared camera were obtained. In this study, the conditions of the ground plane were given, which could guarantee the correctness of the established world coordinate system and creatively combine the plane detection with the extrinsic parameter estimation, so as to achieve the objective of automatic extrinsic parameter calibration. is method does not require an additional calibration object, and it is aimed at the calibration of infrared cameras. e results are reliable. e currently proposed method has the limitation that there needs to be a ground plane in the scene. If the ground plane cannot get detected, then the calibration cannot be carried out regularly. Two problems still must be solved: (1) to improve the accuracy of calibration and use targets such as pedestrians in scenes to carry out fine calibration and (2) to calibrate the extrinsic parameters of multiple cameras automatically according to the common area taken by multiple cameras. If no common area is taken by the two cameras, then a simple calibration object should be designed for automatic calibration to be carried out according to the geometric size of the calibration object. e MATLAB resource code and the video data used to support the findings of this study are available from the corresponding author upon request.

Data Availability
Data are available on request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.