The ability to reliably measure the depth of the object surface is very important in a range of high-value industries. With the development of 3D vision techniques, RGB-D cameras have been widely used to perform the 6D pose estimation of target objects for a robotic manipulator. Many applications require accurate shape measurements of the objects for 3D template matching. In this work, we develop an RGB-D camera based on the structured light technique with gray-code coding. The intrinsic and extrinsic parameters of the camera system are determined by a calibration process. 3D reconstruction of the object surface is based on the ray triangulation principle. We construct an RGB-D sensing system with an industrial camera and a digital light projector. In the experiments, real-world objects are used to test the feasibility of the proposed technique. The evaluation carried out using planar objects has demonstrated the accuracy of our RGB-D depth measurement system.
Ministry of Science and Technology, TaiwanMOST 104-2221-E-194-058-MY21. Introduction
In recent years, 3D imaging has received a great value in industrial and consumer applications. Machine vision systems developed with 3D imaging allow faster and more accurate measurement of components at manufacturing whereabouts. Nowadays, RGB-D cameras, such as Microsoft Kinect and Asus Xtion, are very popular due to the ability to provide the depth information directly. However, they have the limitation on accuracy and thus are not suitable for the applications that require accurate shape measurements [1–3]. As a result, the development of real-time RGB-D cameras still receives much attention from researchers and practitioners. The objective is to provide highly accurate RGB-D sensing techniques with more effective implementation approaches in terms of the density of acquired point clouds, time consumption, working environment, noise level, etc.
3D reconstruction based on the structured light technique has been investigated in the past few decades due to its popularity in the manufacturing applications. Structured light systems are suitable solutions for structured light scanning, 3D reconstruction, and 3D sensing with accurate shape measurements [4, 5]. Structured light refers to the process of projecting predesigned known patterns on the scene and capturing the images to calculate the depth for 3D surface reconstruction. It is an important contribution to the development of 3D measurement systems. The patterns projected on the scene can be generated by a projector or other devices [6], and the relationship between the light source and the camera is a crucial factor. The accuracy of 3D reconstruction depends on the correctness of the calibration, which provides the relative pose between the camera and the light source projector.
In recent literature, several works presented the structured light systems for 3D reconstruction and proposed different approaches to deal with the related problems [7–10]. Scharstein et al. [11] proposed a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Some previous works such as [12–15] described various methods to perform 3D reconstruction and obtained some satisfactory results. However, those techniques require to use precalibrated cameras to find the 3D world coordinates of the projected pattern. Thus, they highly depend on the accuracy of camera calibration and may transfer the error to the projector calibration. In [16], Huang and Tang described a method to perform fast 3D reconstruction using one-shot spatial structured light. Although the method can provide relatively accurate results, the evaluation and analysis were not carried out comprehensively. Some restrictions are also shown in their experiments when performing the tests on complex object surfaces. Cui and Dai [17] proposed a simple and efficient 3D reconstruction algorithm using structured light from 3D computer vision. However, their approach has some limitations on measuring inclined objects, and the 3D information cannot be recovered for the shadow areas.
In this work, we develop an RGB-D camera system based on the structured light technique. A system flowchart is shown in Figure 1. The encoding method is based on the gray-code coding [5], and the 3D reconstruction is achieved by the ray triangulation principle with the estimation of intersection points. The accuracy and density of the obtained point clouds are both high, and therefore it is suitable for the applications such as accurate shape measurements, 3D object recognition, and pose estimation for robotic manipulation.
The overview of our RGB-D camera system with the structured light technique. The encoding pattern is a gray-code pattern. The acquisition is the images captured by the camera for each pattern in the sequence. The decoding is a coded map.
This article is organized as follows: Section 2 presents a general overview of the structured light system and an accurate calibration method to derive the parameters of the camera-projector system. Section 3 contributes a method to create encoded patterns and decode the captured images from the camera and presents a ray triangulation principle for 3D computation by point intersection. Section 4 provides some experimental results, including the experimental setup, results with several different objects, and the evaluation of the accuracy of the object reconstruction. Finally, Section 5 gives the conclusion.
2. Background2.1. Structured Light Technique
Currently, the development of structured light systems is in high demand. The structured light technique is based on the principle described in Figure 2. In general, the process of a structured light system can be divided into three basic steps:
Encoding. The encoding of the information into a sequence of patterns is performed in the temporal domain. A sequence of structured light patterns depends on the number of required patterns, parameters of the system, and the resolution of the projector and the camera.
Acquisition. The sequence of patterns is projected on the scene by a data projector, and a camera is used to continuously capture the images.
Decoding. The captured pattern-coded images are processed with the recognition of projected patterns to find the corresponding points associated with the projector and the camera.
The overview of three basic steps in the process of a structured light system [1].
In the implementation, there might be some additional steps depending on the solution of the system designer. It often follows a procedure to create range images, point clouds, or mesh models and possibly integrates several decoded coordinate maps, calibration, and triangulation principle. The calibration is to determine the intrinsic and extrinsic parameters of the camera and the projector, and the reconstruction is usually based on the ray triangulation principle by computing the intersection point.
2.2. Calibration
Calibration is an important issue which greatly affects the accuracy of the results [18]. In the proposed technique, firstly, we find the parameters of the system using the calibration method by Moreno and Taubin [6]. It is a simple and accurate method to calibrate the projector and camera systems. In this method, the projected corner locations are estimated with subpixel precision using local homographies to each corner in the images as illustrated in Figure 3. It includes three main steps as follows:
The camera calibration step to determine the intrinsic parameters of the camera. Camera calibration includes collecting a sequence of images of a planar checkerboard pattern. The intrinsic camera calibration is derived by estimating the parameters using the perspective camera model [19]. We find the coordinates in the camera image plane for all of the checkerboard corners captured with different pattern orientations. We use OpenCV’s findChessboardCorners() function [20] to automatically find the checkerboard corner locations. They are then refined to approach the subpixel accuracy. Finally, OpenCV’s calibrateCamera() function is used to derive the calibrated camera parameters.
The projector calibration step is to determine the intrinsic parameters of the projector. The mathematical model of our projector can be described the same as the camera. But the projector cannot capture the images from its viewpoint to find checkerboard corners. In this situation, we know a relation between the projector and image pixels extracted from the structured light sequences. Thus, we can estimate the checkerboard corner locations in the projector pixel coordinates based on a local homography [6] as an illustration in Figure 3.
The stereo system calibration step is to derive the extrinsic parameters of the system, which consist of the rotation matrix and the translation vector. We use OpenCV’s stereoCalibrate() function with the previously found checkerboard corner coordinates and their projections. The stereo parameters are a rotation matrix R and a translation vector T relating the projector-camera pair.
An illustration of the structured light system calibration. The captured image was selected in a set of the calibrated images, which are the projector project the patterns on the checkerboard and captured by the camera. The projected image is estimated with subpixel precision using local homographies to each corner in the captured images [1].
3. RGB-D Sensing Based Using Structured Light3.1. Encoding and Decoding Patterns
The gray-code pattern [4] is a sequence of images with black and white stripes created for encoding the scene from the camera viewpoint. The pattern sequence has two types, one is the horizontal stripe and the other is the vertical stripe, as illustrated in Figure 4. All patterns are projected to a scene or an object as shown in Figure 5. The horizontal patterns consist of 10 pattern images which represent 10-bit values for each pixel. The first pattern is half black and half white, which represents the most significant bit, and the rest patterns are the images that switch between black and white in the columns. After combining all of the 10 horizontal patterns into one image, each column has the same 10 bits with the columns in the same image.
The gray-code patterns for the RGB-D camera used by the structure light technique [1].
The acquisition of the projected patterns on an object. The gray-code pattern is projected by the projector and the scene is captured by the camera [1].
Structured light encoding depends on the resolution of the projector. The information is encoded into a sequence of patterns performed in the temporal domain. Commonly used approaches include gray-code coding and binary-code coding. Gray codes can be calculated by first computing the binary representation of a number and then converting it using the following process: copy the most significant bit as it is, and replace the remaining bits (taking one bit at a time) with the result of an XOR operation of the current bit, with the previous bit of higher significance in the binary form.
For the binary-code coding, only two illumination levels are used and encoded as 0 and 1. The gray-code coding is an alternative to the binary representation, with only one bit change at a time between any two adjacent numbers. If there is an error reading on any changed bit, the value will never be off by more than one unit. In our system, we use a projector with the resolution of 1024×768 and decode the pattern with 10 bits (210=1024 and 210-offset=768), where the number of vertical patterns is log(1024)/log(2.0)=10 and the number of horizontal patterns is log(768)/log(2.0)=10.
The camera captures the images of the projected patterns, and the coding step is to decode each pixel in the captured images into their corresponding decimal number presenting the column and row. It will be used to create a coded map as shown in Figure 1, which presents the corresponding point between the projector and the camera.
3.2. 3D Reconstruction
With a robust projector-camera calibration step, we define the location and orientation of the camera and projector with respect to the world coordinate frame. In the pattern encoding and decoding step, we determine one pixel in image Ic and its corresponding pixel in image Ip. Our reconstruction is based on the ray triangulation principle by the estimation of intersection points [4]. In order to compute the direction vector, two points in a ray are needed. The first point is the camera’s center of projection, which is determined based on the extrinsic parameters of the structured light system. The second point is the point corresponding to the pixel from which the ray passes through. One ray passes through Oc and pc of the left image (Ic), and the other ray passes through Op and pp of the right image (Ip), as shown in Figure 6. Here, O is the origin of the system and p is a pixel in the image. The 3D point cloud is obtained from the intersection point P, which is a midpoint of the shortest segment between the rays.
The intersection point of two rays from the pixels in the camera and projector coordinates [1].
To estimate the intersection point, we consider two rays M and N in 3D space passing through points pc and pp with direction vectors x→ and y→, respectively. Let the two closest points on the lines be m and n, as defined in (1) and (2), where g and k are scalar values.(1)m=pc+gx→(2)n=pp+ky→
The mn segment connecting rays M and N is perpendicular to the rays, and therefore the dot product of the vectors is equal to 0 as follows:(3)m-nx→=0(4)m-ny→=0With (1) and (2), (3) and (4) are represented by(5)r→·x→+gx→·x→-ky→·x→=0(6)r→·y→+gy→·x→-ky→·y→=0where(7)r→=pc-ppFrom (5) and (6), the scalar values are calculated by(8)g=r→·x→y→·y→-y→·x→r→·y→y→·x→y→·x→-y→·y→x→·x→(9)k=y→·x→r→·x→-x→·x→r→·y→y→·x→y→·x→-y→·y→x→·x→The midpoint of the shortest segment are then estimated by(10)P=pc+gx→+pp+ky→2
4. Experiments
In the structured light system, the quality of the captured images is important for obtaining a good pattern database to perform the calibration, decoding, and reconstruction. Hence, the resolution of the camera is usually higher than the resolution of the projector. Then, the projection field of view is adjusted inside the field of view of the camera. In the experiments, we use a Flea3 FL3-U3-32S2C from Point Grey Research with the image resolution of 2080 × 1552. The digital light projector is a DLP Light Crafter 4500 projector from Texas Instrument with the resolution of 1024×678. Their focus length, resolution, zoom, and direction were selected prior to calibration accordingly to the target of the system. All devices are connected to a host computer. After the system is calibrated, part of the system cannot be moved. We have to keep the distance and orientation between the projector and the cameras intact; otherwise it will be essential to perform a recalibration.
The settings of the camera and projector should be adapted according to the lighting in the scene. Other lighting sources projected directly in the scene should be rejected. If not, the calibrated and reconstructed results will be affected. The system was calibrated with 12 sets of acquired projected checkerboard patterns. An acquisition set includes the images captured by the camera for each pattern in the sequence. After the system is calibrated as exposited in Section 2.2, the calibration result is stored in a .yml file.
For reconstruction, our system includes three main steps. Firstly, our system loads the calibration parameters and projects one acquisition set of patterns on objects. Secondly, decoding the captured pattern-coded images provides a coded map to store the corresponding points between the projector and the camera. Finally, with calibrated parameters and the coded map, we apply the ray triangulation principle to get the 3D point that will be rendered simultaneously with one color image to create an XYZRGB point cloud. In Figures 7 and 8, we present the 3D reconstruction results of several objects. The results successfully measure objects with reflecting light for some of the projected colors. After performing the reconstruction, the 3D information of the reconstructed objects is saved in a .txt file.
3D reconstruction result with some objects (a lego box, three plastic pipes, and a big box), these objects are low reflecting light.
3D image of the font view
3D image of the upper view
3D reconstruction result with multiple objects (a cup, bottle, small box, lego box, and some plastic pipes), the bottle and cup are with high reflecting light.
3D image of the font view
3D image of the upper view
The evaluation of the proposed technique is performed by measuring the dimension of a reconstructed checkerboard pattern and its corner points. The checkerboard has the dimension of 399×285mm2 and each small square has the size of 28.5×28.5mm2 as demonstrated in Figure 9. After performing the 3D reconstruction for this checkerboard, we can use 3ds Max Design or Meshlab software to examine, as shown in Figure 10. A distance measurement tool is used to measure the dimension of the checkerboard. The accuracy is presented in Table 1, with the errors of the corner points. This table reports that our system can measure the objects with high accuracy. Compared with the algorithms proposed by Moreno et al. [6] with Max. error of 0.8546(%). They use a method to estimate the image coordinates of 3D points in the projector image plane and perform the calibration on both projector and camera. With Max. error of 0.1240(%), our proposed method provides the better 3D reconstruction results.
Measuring the accuracy of a reconstructed checkerboard (CB) as shown in Figure 10.
Name
Real CB (mm)
Reconstructed CB (mm)
Error (%)
Width of top checkerboard (AB)
399.000
399.382
0.0957
Width of bottom checkerboard (CD)
399.000
399.495
0.1240
Height of left checkerboard (AC)
285.000
284.746
0.0891
Height of right checkerboard (BD)
285.000
285.298
0.1045
Max. error
0.1240
The real dimension of each small square has the size of 28.5×28.5mm2 in the checkerboard reconstructed as shown in Figure 10.
The 3D reconstruction of a checkerboard shown with Meshlab software. We used a measuring tool of the Meshlab to measure it.
5. Conclusion
In this work, we have developed an RGB-D camera system based on the structured light technique. It contains a camera and a projector to perform accurate shape measurements with high-density point cloud outputs. 3D reconstruction with multiple objects and performance evaluation of the system are carried out in the real-world environment. Our method has high accuracy as presented in the experimental results. In the experiment, we tested the system with different objects to check the surface of reconstruction and accuracy evaluation. The results have demonstrated that the proposed technique is feasible for dense 3D measurement applications.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request (https://github.com/luantran07/data-for-a-structured-light-rgb-d-camera-system).
Disclosure
This publication is an extended version of 2017 International Conference on System Science and Engineering (ICSSE) [1].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The support of this work in part by the Ministry of Science and Technology of Taiwan under Grant MOST 104-2221-E-194-058-MY2 is gratefully acknowledged.
TranV.-L.LinH.-Y.Accurate RGB-D camera based on structured light techniquesProceedings of the 2017 International Conference on System Science and Engineering, ICSSE 2017July 2017Viet Nam2352382-s2.0-85032354842WasenmüllerO.StrickerD.Comparison of Kinect V1 and V2 Depth Images in Terms of Accuracy and Precision201710117ChamSpringer International Publishing3445Lecture Notes in Computer Science10.1007/978-3-319-54427-4_3SmisekJ.JancosekM.PajdlaT.3D with KinectProceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV '11)November 2011Barcelona, SpainIEEE1154116010.1109/iccvw.2011.61303802-s2.0-84856682719GuQ.HerakleousK.PoullisC.3DUNDERWORLD-SLS: An open-source structured-light scanning system for rapid geometry acquisition2016MorenoD.HwangW. Y.TaubinG.Rapid Hand Shape Reconstruction with Chebyshev Phase ShiftingProceedings of the 2016 Fourth International Conference on 3D Vision (3DV)October 2016Stanford, CA, USA15716510.1109/3DV.2016.24MorenoD.TaubinG.Simple, accurate, and robust projector-camera calibrationProceedings of the 2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012October 2012Switzerland4644712-s2.0-84872019962ChenC.-Y.HuangP.-S.HuangS.-W.ZhangJ.-H.ChangB. R.Structured light 3D face scanning systemProceedings of the 2nd IEEE International Conference on Consumer Electronics - Taiwan, ICCE-TW 2015June 2015Taiwan3443452-s2.0-84959556477Massot-CamposM.Oliver-CodinaG.KemalH.PetillotY.Bonin-FontF.Structured light and stereo vision for underwater 3D reconstructionProceedings of the MTS/IEEE OCEANS 2015 - GenovaMay 2015Italy2-s2.0-84965167691PiccirilliM.DorettoG.RossA.AdjerohD.A Mobile Structured Light System for 3D Face Acquisition2016167185418552-s2.0-8496217952910.1109/JSEN.2015.2511064LanmanD.2014AIT Computer Vision Wiki, Brown UniversityScharsteinD.SzeliskiR.High-accuracy stereo depth maps using structured lightProceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern RecognitionJune 2003USAI/2022-s2.0-0041939772WangJ.ZhangC.ZhuW.ZhangZ.XiongZ.ChouP. A.3D scene reconstruction by multiple structured-light based commodity depth camerasProceedings of the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012March 2012Japan542954322-s2.0-84867596353Bourgeois-RepubliqueC.DipandaA.KochA.A Structured Light System Encoding for an Uncalibrated 3D Reconstruction Based on Evolutionary AlgorithmsProceedings of the 2013 International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)December 2013Kyoto, Japan12412910.1109/SITIS.2013.31HuangS.XieL.WangZ.ZhangZ.GaoF.JiangX.Accurate projector calibration method by using an optical coaxial camera20155447897952-s2.0-8494236582410.1364/AO.54.000789HerakleousK.PoullisC.Stripe boundary codes for real-time structured-light range scanning of moving objects2Proceedings of the Eighth IEEE International Conference on Computer VisionVancouver, Canada359366HuangB.TangY.Fast 3D reconstruction using one-shot spatial structured lightProceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2014October 2014USA5315362-s2.0-84938074517CuiH.DaiN.LiaoW.ChengX.An accurate reconstruction model using structured light of 3-D computer visionProceedings of the 7th World Congress on Intelligent Control and Automation, WCICA'08June 2008China509550992-s2.0-52149092336FuM.LengY.ZhangH.A calibration method for structured light systems based on a virtual cameraProceedings of the 8th International Congress on Image and Signal Processing, CISP 2015October 2015China57632-s2.0-84966589406ZhangS.HuangP. S.Novel method for structured light system calibration2006458083 601083 6018BradskiG.The OpenCV Library2000