Research on Crop 3D Model Reconstruction Based on RGB-D Binocular Vision

. Taking maize seedlings as the object, the implementation of crops 3D reconstruction based on RGB-D binocular vision and the selection of some key parameters are investigated in this research. First, multiple images are taken from different angles around the target. By mapping the maize seedling region coordinate values after the Otsu algorithm and global threshold segmentation to the corresponding depth image, the depth data of the maize seedling region can be obtained accurately. An improved mean ﬁ lter is proposed to adaptively ﬁ ll the holes in the depth image. Then, the different point clouds with the ﬁ xed step angle of the maize seedling are registered and fuzed. Finally, after the fusion point cloud is simpli ﬁ ed, the 3D model of crops can be reconstructed. Experimental results show that the simpli ﬁ cation effect of the octree algorithm is better than that of the voxel grid ﬁ lter. Among all the step angles, the reconstruction error of the step angle with 60 ° is the smallest. Under this condition, the height error between the model and the maize seedling is 2.22 % , and the error in stem diameter is 11.67 % .


Introduction
As technology continues to evolve, the development of smart agriculture has brought many new technologies and solutions to modern agriculture, and 3D reconstruction technology [1,2] is one of them. 3D reconstruction of crops can be used not only to measure the phenotypic parameters of the target but also to visualize the object in virtual 3D space [3]. Crop 3D reconstruction technology [4,5] has gradually become a research hotspot in this field. At present, crop 3D reconstruction has several methods, mainly rule-based [6], 3D scanner [7,8], digitizer [9,10], and vision-based image reconstruction [11,12]. Compared with the 3D reconstruction methods based on scanners and digitizers, visionbased 3D reconstruction has the advantages of low cost and a certain accuracy guarantee, which has been extensively studied and applied in recent years [13].
There have emerged many innovative companies and teams in the field of smart agriculture, both domestically and internationally, exploring and innovating in the area of intelligent agriculture. Gibbs et al. [14] proposed an active vision cell, which consists of a camera-mounted robot arm, a combined software interface, and a novel surface reconstruction algorithm. Due to the active visual framework of the application and the automatic selection of key parameters for surface reconstruction, this pipeline can be applied to any plant species or morphology. Han and Burks [15] took images of the orange canopy from multiple angles with a camera and finally reconstructed the canopy surface by using the Plücker coordinate system. Shirazi et al. [16] used an active stereo vision-based 3D perception system to acquire 3D models of human body parts and surgical tools. To evaluate the proposed system, the article performed a 3D scan of a cardboard box as an object. The results showed errors in height and width measurements to be 9.4 and 23 μm, respectively, compared to the 3D scan results. Peng et al. [17] developed an SFM method based on binocular vision to acquire the physical parameters of plants and constitute the 3D model of the plant. Experimental results show that the mean errors of the measured sizes are all less than 2%. Dong et al. [18] discuss a sea wave measurement method with binocular vision. Using a binocular camera, an accurate 3D model of the ocean waves captured by the camera can be established without contacting the waves themselves. The metric 3D coordinates of the waves can also be measured in both the world coordinate system and the actual camera. The experimental results show that this method can achieve the expected results. Wu et al. [19] combined deep learning and classical image processing algorithms to calculate the number of banana bunches in two periods and designed a software for estimating the weight of banana fruits during the harvest period. The results show that during the bud removal period, the target segmentation MIoU is 0.878, the average pixel accuracy is 0.936, and the final bunch detection accuracy reaches 86%. During the harvest period, bunch detection is very challenging, with an accuracy of 76%, and the final overall bunch counting accuracy is 93.2%. Tang et al. [20] aim to explore the techniques for improving fruit detection methods in complex environments. For the common types of complex backgrounds in outdoor orchards, the improvement measures are divided into two categories: optimization before and after image sampling. By comparing the results of these methods, the future development trend of fruit detection optimization techniques in complex backgrounds is described.
This paper focuses on the implementation of crop 3D model reconstruction [21] based on RGB-D binocular vision and the selection of some key parameters in the process. Two sets of depth images and RGB images were taken from a fixed step angle around the target by the camera. The 3D point clouds were obtained through preprocessed depth images and RGB images. Furthermore, the point clouds of different angles of maize seedlings were coarsely and finely registered and simplified. The best set of model reconstruction parameters was obtained by comparing the errors of each step angle.

Materials and Methods
Vision-based 3D model reconstruction used visual sensors to obtain images, and then the images were processed to obtain a 3D model by a computer [22]. Intel RealSenseD435 RGB-D binocular camera [23] was utilized in the experiment, which has RGB images and depth modules [24]. Two cameras that have the same intrinsic parameters are arranged in parallel with identical focal lengths. The parallax depth image was obtained directly by using the depth module. As shown in Figure 1, the experimental platforms include a PC, binocular camera, experimental crop, and some other auxiliary devices. The software platforms and kits are VS2019, Matlab2019, pcl1.11.0, OpenCV4.5.0, and Cloudcompare2.12.1.
The 3D reconstruction process based on binocular vision was illustrated with the modeling of a maize seedling, including image acquisition, image preprocessing, point cloud construction, and crop modeling. The implementation process is shown in Figure 2. 2.1. Image Acquisition. The depth images and RGB images of the maize seedling were obtained. An origin alignment operation was required because the spatial coordinate system of RGB image and depth image is different. To make a full-view reconstruction of the maize seedling, multiple images from different view angles had to be taken over the maize seedling. The experiment took the method of rotating the target around a fixed camera. Comparative experiments were carried out from four methods of 90°, 60°, 45°, and 30°step angles by using the aligned camera.

Image
Preprocessing. Image preprocessing is mainly prepared for the acquisition of point clouds. By mapping the maize seedling region coordinate value after the Otsu algorithm [25] and global threshold segmentation to the corresponding depth image, the depth data of the maize seedling region could be obtained accurately. The impact of the environment might cause some data to be missed in the depth image. In order to reduce it, an improved zero-point culling mean filter was proposed to adaptively fill the holes in the depth image.

Point Cloud Construction.
After image preprocessing, the 3D point cloud in each angle could be recovered by converting all pixels in the crop region and their corresponding depth values.

Crop
Modeling. The binocular camera could not get complete information about the maize seedling at an angle due to the inevitable shielding of blades. Therefore, the experiment needed to register and fuze the point clouds of different angles [26,27]. It was necessary to simplify and smooth the fuzed point clouds due to the large number of points. Finally, the final point clouds could be reconstructed through the greedy projection triangulation algorithm [28,29].

Implementation and Results of 3D
Model Reconstruction 3.1. Image Acquisition. Both RGB images and depth images are obtained by an RGB-D camera. In order to reduce the influence of external factors, the images of maize seedlings are taken in front of a white wall. The camera is about half a meter away from the maize seedling. The maize seedling is placed on a plate with an eighth-equal circle drawing. Then, a set of RGB images and depth images are captured from eight angles by rotating maize seedling with a fixed camera, and the image size is 640 × 480. The 45°step angle is taken as an example, and the RGB images captured at 0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°angles are shown in Figures   The gray-level probability distributions for the two classes are as follows: The means of class C 0 and C 1 are The total mean of gray levels is denoted by μ The between-class variance is as follows: The optimal threshold t is chosen by maximizing the between-class variance.
The RGB image before segmentation is represented as Figure 4(a). After obtaining the best threshold t, it is easy to extract the maize seedling region with the global threshold. The pixels lower than the threshold value are set to zero, and the remaining pixels are set to 255, as shown in Figure 4(b).

Adaptive Filling of Missing Depth Data.
Because the depth value of the pixels on the template may be zero, using the mean filter in the experiment directly will cause bigger errors in the process of hole-filling. Furthermore, the depth of other nonhole pixels will also be changed. In this paper, an improved mean filter is proposed to fill such holes.
In the traditional mean filtering, the gray value g x i ; ð y j Þ of the current point is as follows: In the improved mean filtering, the number of zero points in the template is denoted by t, the gray value g x i ; ð y j Þ of the zero point is as follows: The depth image of the region where the target is identified is traversed. As shown in Figure 6, first, hole pixels with zero depth are detected and removed from the template of the hole pixel, and a new value is assigned to the hole pixel by using the average of the remaining pixels, which considerably reduces the errors.
As shown in Figure 7(a), after the mean filtering, the depth image becomes blurred, and some pixels in the nontarget region are filled with values. So, hole-filling by mean filter does not work well. Compared with the method of mean filter, the improved adaptive mean filter effectively fills the holes without affecting the depth of other pixels and protects the image details very well, as shown in Figure 7(b).

Point Cloud Construction.
All the target 3D point coordinates construct their point cloud. They can be calculated directly from the depth and RGB images. This process is the conversion from a 2D depth image to a 3D point cloud. And the transformation relationship of Formula (1) can be shown as follows:  Figure 8 shows the point cloud of a processed depth image.

Point Clouds Registration and Fusion.
The point cloud from one angle provides only partial information about the maize seedling. The images taken from the adjacent locations have an overlapping region. In order to obtain the complete information of the maize seedling, these point clouds from the same step angle should be registered to a unified coordinate system. There is a fixed angle difference between the point clouds of different angles of the same step in the experiment. However, the distance and position of the two-point clouds in fine registration should be short enough. The registration of point clouds includes two steps: coarse registration and fine registration. The two-point clouds are spatially close to each other after coarse registration. Then the coarsely registered point clouds can be fine-registered by using iterative closest point (ICP) algorithm with lower errors.  The process of registration and fusion is shown in Figure 9. The source clouds are from the depth images of n divide the circumference angle equally (n = 4, 6, 8, 12) that have been preprocessed. Taking the 45°step angle as an example, the eight-point clouds from 0°to 315°are respectively referred to as angle 1, angle 2, …, and angle 8. Point cloud with angle 1 and point cloud with angle 2 are registered and fuzed into a new point cloud which calls fusion point cloud 1, and then fusion point cloud 1 and point cloud with angle 3 are processed by coarse registration, fine registration, and fusion. As shown in Figure 10, at least fusion point cloud 7 is obtained by sequentially merging the eight-point clouds.
In the process of point cloud registration [26,27], the key point is to solve the transformation matrix parameters between the two point clouds, and the important prerequisite for solving the transformation matrix parameters is to obtain the corresponding point pair sets. Coarse registration manually selects at least three pairs of corresponding feature points as registration primitives and calculates the registration transformation matrix. ICP registration iterates from an initial value until the parameter sequence converges to meet the minimum requirement of the objective function. ICP algorithm is a nonlinear least squares problem. Selecting the appropriate threshold, numbers of iterations, and point cloud overlap to obtain the transformation matrix.
As shown in Figure 11(a), the final point cloud usually has some obvious errors due to camera depth errors and point cloud registration errors. It is necessary to manually clip and delete the data with depth errors in the registered point clouds to reduce the measurement errors. The processed point cloud is shown in Figure 11(b).

Point Cloud Simplification.
In the experiment, the final point cloud has more than 50,000 points after registration and fusion. The point cloud includes many redundant points that are not needed for measurement. Two typical voxel methods, VoxelGrid, and octree, are selected to simplify the point cloud for comparison.
Because the point cloud is unordered, we find the maximum and minimum values of the point cloud in the X, Y, and Z axes to construct the minimum space-bounding volume cube to partition the point cloud. The octree divides the minimum space bounding volume cube into eight nodes  Table 1. The VoxelGrid simplification method can divide the point cloud into small cloud blocks. At the same time, all points in the cloud blocks are replaced with the barycenter, as shown in Table 2.
As shown in Figures 12(a) and 12(b), the point cloud is more regular after the octree method. Therefore, the method of octree voxel center approximation is used to simplify the scattered point cloud.

Crop Construction.
Finally, the greedy projection-based triangulation algorithm [28] is used to reconstruct the final crop model. The greedy projection triangulation algorithm is an algorithm that quickly establishes the topological relationship of point cloud through the triangular grid structure. Greedy projection triangulation is an algorithm to quickly triangulate the original point cloud. The overall optimal result is obtained by local optimal processing when the problem is solved using a greedy algorithm [29] based on growth.
In 3D space of the point cloud, the topology cannot be directly established, so the point cloud needs to be projected to a 2D plane, and then a topological relationship is established on it. The point cloud in 3D space and K neighbors of the point cloud are projected into a plane; that is, it maps the point cloud from 3D to 2D, and it uses the 2D Delaunay triangulation growth algorithm to connect the triangular grids for the 2D point cloud mapped to the plane.
Assuming the normal vector of the point N 0 x 0 ; ð y 0 ; z 0 Þ is m ¼ A; ð B; CÞ, the tangent plane at point N x; ð y; xÞ crossing N 0 is cos α.
In order to project points in space into the 2D tangent plane Π, the projection matrix method is generally used. The projection of the 3D points on the tangent plane is obtained by a series of operations such as translation and rotation.
The projection matrix T M Π is as follows: The translation transformation matrix T c is as follows: The projection of the point q x i ; ð y i ; z i Þ on the tangent plane Π is as follows: The method selects a sample triangular piece as the initial surface, then continuously expands the boundary of the surface, and finally constructs a complete triangular mesh surface so that the point cloud in 2D space has a topological relationship. Then the topological relationship of the 2D point cloud is mapped to the original 3D point cloud, and the topological relationship of the original point cloud 3D space is constructed. Until a complete mesh topological relationship is constructed, the 3D reconstruction of the object surface is realized, as shown in Figures 13(a) and 13(b).

Discussion
Repeating the above steps, Figure 14(a)-14(d) corresponds to the final point clouds with the step of 90°, 60°, 45°, and 30°. The point cloud data are incomplete, the phenotypic information of maize seedling is missing, and the error is relatively large in the scheme of 90°step angle. The phenotypic information of maize seedling at 60°, 45°, and 30°step angles are relatively complete.
Phenotypic measurements of 60°, 45°, and 30°step angles are compared. The plant height and stem diameter of maize seedling are measured. Then the experimental results are compared with the manual measurements, as shown in Table 3.

Conclusions
(1) In this paper, the implementation of crops 3D reconstruction based on RGB-D binocular vision and the selection of some key parameters are investigated. The 3D point clouds are obtained through preprocessed depth images and RGB images. Furthermore, the point clouds are coarsely and finely registered and simplified. Finally, the greedy projection-based triangulation algorithm is used to reconstruct the final crop model. The best set of model reconstruction parameters was obtained by comparing the errors of each step angle. the improved adaptive mean filter effectively fills the hole without affecting the depth of other pixels. (3) By comparing four step angles schemes, the phenotypic information of the 90°step angle scheme is seriously missing, and the stem diameter error of the 30°step angle is serious. Compared with the 45°step angle, the error of the 60°step angle is relatively small; therefore, the 60°step angle of the four 3D reconstruction schemes in the experiment has the highest accuracy. Under this condition, the height error between the model and the maize seedling is 2.22%, and the error of stem diameter is 11.67%. The research needs of 3D reconstruction of maize seedling can be satisfied. (4) The 3D reconstruction method discussed in this article is effective for modeling single plants under simple experimental conditions. However, when it comes to complex population crops in field conditions, its effectiveness is limited due to the presence of textures or colors in the background that is similar to those of the crops, as well as depth errors caused by lighting and occlusion. Going forward, research will focus on exploring 3D reconstruction techniques for population crops in complex growth environments to gain insights into crop growth.
While the 3D reconstruction method described in this paper can reconstruct a model for a specific moment in a crop's growth cycle, it has not yet achieved full modeling of the entire growth process of the target crop or established a corresponding relationship with the crop growth model. Subsequent research will be required to achieve full 3D reconstruction throughout the plant's lifecycle. This entails further development of a crop's 3D model based on timeseries data combined with plant growth rules, which will aid agricultural managers in monitoring and managing crops more efficiently.

Data Availability
The data used to support the findings of this study can be obtained from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflicts of interest regarding the publication of this paper.