Automatic Segmentation for Plant Leaves via Multiview Stereo Reconstruction

This paper presented a new method for automatic plant point cloud acquisition and leaves segmentation. Quasi-dense point cloud of the plant is obtained frommultiview stereo reconstruction based on surface expansion. In order to overcome the negative effects from complex natural light changes and to obtain a more accurate plant point cloud, the Adaptive Normalized Cross-Correlation algorithm is used in calculating the matching cost between two images, which is robust to radiometric factors and can reduce the fattening effect around boundaries. In the stage of segmentation for each single leaf, an improved region growing method based on fully connected conditional random field (CRF) is proposed to separate the connected leaves with similar color. The method has three steps: boundary erosion, initial segmentation, and segmentation refinement. First, the edge of each leaf point cloud is eroded to remove the connectivity between leaves. Then leaves will be initially segmented by region growing algorithm based on local surface normal and curvature. Finally an efficient CRF inference method based on mean field approximation is applied to remove small isolated regions. Experimental results show that ourmultiview stereo reconstructionmethod is robust to illumination changes and can obtain accurate color point clouds. And the improved region growing method based on CRF can effectively separate the connected leaves in obtained point cloud.


Introduction
In greenhouse crop cultivation, the three-dimensional structures of plants convey the real-time continuous information of the whole process of crop growth in a most direct way.Also, it is the most intuitive feature for plant phenotype analysis [1].Therefore acquiring the 3D structure of plants by an automated, high-throughput, accurate, and rapid method not only helps to monitor crop growth, but also provides visual basis for optimizing the intelligent greenhouse environment control system; and finally, it has a significant meaning for plant phenotypic and genome-environment-phenotype synergistic analysis [2][3][4][5][6].Owing to noninvasive and highthroughput characteristics, automated 3D-digitization technology has greatly promoted the accurate modeling of 3D structures of plants.
To date, there are several approaches for 3D structure reconstruction.For example, Microsoft Kinect provides depth images by utilizing coded infrared light patterns.
However, Kinect has limited range and can only be used indoors.Laser Scanning can obtain highly accurate and complete 3D structure but the devices are quite expensive for agricultural usage.Therefore the passive stereo vision technology, which puts low requirements on equipment and environment [7][8][9][10], has received wide attention.Biskup et al. [8] used a binocular stereo vision system to measure the structural parameters of plant canopies.Binocular stereo vision system needs to manually calibrate the camera and its imaging angle is relatively fixed.Hence the method is sometimes inefficient and it is difficult for it to reveal the complex 3D structure of plants.Compared to binocular stereo, the multiview stereo reconstruction based on an image sequence is used more widely because it can perform automatic calibration of the camera and generate more intact 3D structure information.In the literature, multiview stereo algorithms are roughly categorized into three classes.The first class operates by first computing a cost function on a 3D volume and then extracting a surface from this volume [11][12][13][14].The second class is an image-space methodology that operates by fusing multiple depth maps [15][16][17][18].The third class is surface expansion method which builds a quasi-dense point cloud from a set of matches [19][20][21][22].The accuracy of volume-based method depends on the density of the grid, and therefore the computational and memory costs of volumetric grids are very high.The following two methods are both able to produce good results.In this paper, we propose a multiview stereo reconstruction method based on surface expansion which iteratively expands a sparse set of initial matches into a quasi-dense point cloud that represents surfaces of the scene.In order to overcome the negative effects from complex natural light changes and obtain more accurate plant point clouds, the Adaptive Normalized Cross-Correlation algorithm (ANCC) [23] is used in calculating the matching cost between points of interest in two images, which is robust to radiometric factors such as illumination direction, illuminant color, and imaging device changes.The ANCC method also can reduce the fattening effect that object boundaries are not reconstructed correctly; and this effect clearly deteriorates results of the Zero-mean Normalized Cross-Correlation (ZNCC)-the method used in comparison.
We then segment the plant leaves after getting the plant point cloud from the above multiview stereo reconstruction method.Generally, the segmentation of 3D point cloud can be roughly categorized into five classes [24]: edge-based methods, region-based methods, attributes-based methods, model-based growing methods, and graph-based methods.References [25][26][27] segmented a plant into two sets, one for the stems and the other for the leaves according to surface feature histograms, followed by a classification or a clustering method.Yin et al. [28] proposed an approach to accurately segment and reconstruct plant.However their approach requires manually cutting the plant leaves and scanning each leaf individually.Hétroy-Wheeler et al. [29] used spectral clustering to realize the segmentation of leaf point cloud, but it needed to manually specify the number of leaves before segmentation.
Based on the characteristics of the subject plants, an improved region growing method based on conditional random field (CRF) is proposed to segment the leaf point cloud.In a canopy point cloud, there may exist some connections between leaves, which easily disturb the segmentation.Therefore we perform some erosion operations around the leaf edge before segmentation.In the initial stage of segmentation, considering that different leaves usually have different surface normals and the points on the same leaf surface usually have similar normals, we use region growing algorithm based on local surface normal and curvature to segment leaves [30].Usually the segmentation result contains some small isolated regions; therefore a fully connected CRF model defined with the complete set of points is built to further classify these regions to where they ought to belong.A highly efficient approximate inference algorithm based on mean field approximation is applied to the CRF model in which the pairwise edge potentials are defined by Gaussian kernel [31].

Multiview Stereo Reconstruction
The proposed multiview stereo approach consists of two steps: structure from motion (SFM: from images to sparse points) and dense matching (from sparse points to quasidense points).The SFM step is to obtain 3D sparse feature points location of the scene and camera poses (location and orientation).In the step of dense matching, we use the ANCC algorithm as a robust and accurate correspondence measure to match points of interest, which is robust to radiometric factors and can reduce the fattening effect.

Structure from
Motion.An overview of the SFM process is presented in Figure 1.The first step is to find feature points in each image, where the SIFT key point detector [32] is used because of its invariance to image transformations.Other feature detectors could also be used instead.For instance, several detectors were evaluated in [33].In addition to the key point locations themselves, SIFT provides a local descriptor for each key point.For each pair of images, key point descriptors are matched by using the approximate nearest neighbors package [34].Then we robustly estimate a fundamental matrix for the pair using RANSAC.During each iteration of RANSAC, a candidate fundamental matrix is computed using the eight-point algorithm [35], followed by nonlinear refinement based on Levenberg-Marquardt algorithm.Afterwards, we convert the fundamental matrix to the essential matrix and then obtain camera poses [35].At last, the 3D positions of features points can be acquired through the triangulation calculation.(The scale factor is not considered during triangulation calculation because in the following stage leaf segmentation can be performed without the knowledge of a scale factor.)[35].
After generation of the point cloud of features points between each pair of images, we should merge these point  clouds because they are not in the same coordinate system, as shown in Figure 2. Finally, we get 3D positions of all the feature points in a single coordinate system with respect to all point clouds.

Dense Matching.
In this stage, we expand a sparse set of initial matches into a quasi-dense point cloud representing the surfaces of the scene based on the best-first expansion strategy by Lhuillier and Quan [36], which is robust to initial seed match outliers and is efficient in terms of spatial-temporal complexity.In order to overcome the negative effects from complex natural light environments and to obtain a more accurate plant point cloud, the ANCC algorithm is utilized in the process as the matching cost to match points of interest instead of ZNCC.It is noteworthy that the ANCC method is robust to radiometric factors such as illumination direction, illuminant color, and imaging device changes.Moreover, it can reduce the fattening effect from which the Zero-mean Normalized Cross-Correlation (ZNCC) suffers.The best-first expansion strategy is presented in the table of Algorithm 1.
Neighborhood of x in image 1 Neighborhood of x  in image 2 Figure 3: Possible matches (,   ) and (V, V  ) around a seed match (,   ) come from its 5 × 5 neighbor () and (  ) as the smallest size for discrete 2D disparity gradient limit.The match candidates for  (resp., V  ) are within the 3 × 3 (black framed) centered at   (resp., V).
In Algorithm 1, ANCC(,   ) is similarity measure to match points of interest in two images.Generally, the color image formation model at pixel  can be represented as follows: where the brightness factor () depends on the angle between the direction of light and the direction of surface normal at that point.The global scale factors , ,  depend on illumination color and  depends on camera.The wellknown similarity measure ZNCC that uses raw stereo images does not generate satisfactory result for two reasons.One is that various radiometric changes caused by , , , , and  are not taken into consideration.The other is that the object boundaries as the supporting windows from the left and right images do not appear exactly because of the view changes.
To keep the color values away from the impact of radiometric factors, we transform the nonlinear relationship between the corresponding pixels to the affine-transformed relationship that can be solved effectively by the NCC framework.First, we take logarithm to transform this nonlinear relationship into a linear one.Then, the color value can be represented by (use the red channel to describe) To delete the item log   (), we subtract the average of the transformed color values in , ,  channels.Then each color value becomes We can see that   () and   (  ) are not dependent on , , , , and .
In order to reduce the fattening effect caused by outliers in the matching windows, each pixel  in × window () around the center pixel  is allocated with different weights.The weight is computed by using the bilateral filter as follows: The weighted sum using the bilateral filter for the center pixel  is defined by the following equation, in which () is the normalizing constant: Then we subtract () for each channel, thereby resulting in the removal of .
The similarity between  and   is defined by the following equation that is close to NCC's framework: Because the intensity information of each channel is normalized in log-chromaticity, the discriminability is lower than the original color.We therefore combine log-chromaticity and original color for keeping the discriminability.The final similarity equation is defined as follows: where  ∈ {log Chrom , log Chrom , log Chrom },  ∈ {, , }. is a balancing factor between the log-chromaticity color and the original RGB color.To obtain the disparity map of the two images in different lighting conditions, we define stereo matching as a minimization problem of the following energy in the MAP-MRF framework: (  ,   ) , (11) where () is the neighborhood of pixel  and  is the disparity value we are trying to find for each pixel .  (  ) is the data cost function which is defined by   (  ) = 1 − ANCC(, +  ).  (  ,   ) is the smoothness cost function based on truncated quadratic model which is defined by   (  ,   ) =  * min(|  −   | 2 ,  max ).The total energy () was optimized using the Graph Cuts (GC) algorithm [37] in order to find the disparity map .Under different lighting conditions, it can be seen from Figure 4 that the algorithm using ANCC as the matching cost function is very robust to illumination changes compared with results of two classic algorithms SGBM and Graph Cuts by OpenCV library.
The point clouds from multiview stereo reconstruction in different matching method are illustrated in Figures 5 and  6.From the results we can see that many points are not reconstructed correctly (marked in red circles) based on the ZNCC matching method because of the fattening effect.And more accurate points are acquired with our reconstruction method based on ANCC in Figure 6.

Leaves Segmentation from Point Clouds
To segment each individual leaf from point clouds, an improved region growing method based on fully connected CRF is applied in this section.The method can be divided into three steps: boundary erosion, initial segmentation, and refinement.At the beginning, the edge of each leaf point cloud is eroded to remove the connectivity between leaves.Then leaves will be initially segmented by region growing algorithm based on local surface normal and curvature.Finally an efficient CRF inference method based on mean field approximation is used to remove small isolated regions.The details are further explained as follows.

Boundary Erosion and Initial Segmentation.
Because of the similarity of the colors of leaves, the color information is not reliable for segmentation.Considering the fact that the points on the same leaf surface usually have similar normals, and different leaves usually have different mean normals, the region growing algorithm is used.This method can find smoothly connected areas in point cloud based on point normals.First, the normal for each point is estimated by fitting a plane to some neighboring points.The neighborhood can be specified in two different methods, namely,  nearest neighbors (KNN) and fixed distance neighbors (FDN).The next is the region growing algorithm based on normal and curvature, which is presented in Algorithm 2. This algorithm is based on two constraints-local connectivity and surface smoothness.For local connectivity, the points in a segment should be locally connected.This can be realized by using only the neighboring points (through KNN or FDN) during region growing.For surface smoothness, the points in a segment should create a local smooth surface whose normals do not vary much.This constraint can be realized with a threshold  th on the angles between the current seed point and the points added to the region.Additionally, a threshold on curvature values  th can ensure that smooth areas are broken on the edges.The result of region growing segmentation without boundary erosion for a canopy point cloud is shown in Figure 7.
As it can be seen from Figure 7 some leaves are segmented into one part, which indicates relatively poor performance because some leaves are connected (marked by red circles) and these leaves have similar normals.To solve the problem, some erosion operations around the leaf edge are performed before segmentation.First, the boundary of the point cloud is extracted by detecting the boundary of the corresponding   → {A} (7) for  = 0 to size ({S c }) do (8) Find nearest neighbors of current seed point {B c } ← Ω(S c {i}) (9) for  = 0 to size ({B c }) do (10) Current neighbor point end-if (16) end-if (17) end-for (18) end-for  erosion, we carry out the region growing algorithm for the point cloud and the new result is shown in Figure 9.It can be seen that the result is much better than previous result without boundary erosion and also leaves that were originally connected are now separated.

Segmentation Refinement.
After initial segmentation, there still exist some small isolated regions (small red points region).Therefore we need to classify these regions and determine the actual regions they belong to.A common solution is to transform this problem as maximum a posteriori (MAP) inference in a conditional random field (CRF) defined over pixels or image patches [38][39][40][41].The CRF model contains pairwise potentials, which can simultaneously remove the noise and resolve ambiguous classification results.
In this subsection, a fully connected CRF is built, establishing pairwise potentials on all pairs of points from the canopy point cloud.Considering a random field  defined over a set of variables { 1 , . . .,   }, the domain of each variable is a set of labels  = { 1 ,  2 , . . .,   }.Also a random field  is defined over variables { 1 , . . .,   }.In our setting,  ranges over possible input point cloud of size  and  ranges Mathematical Problems in Engineering over possible point-level labelings.  is the position vector of point  and   is the label assigned to point .Hence, the corresponding Gibbs energy of the fully connected pairwise CRF model is formulated as where  and  both range from 1 to .
where  () is a kernel function and  () is a label compatibility function.A simple label compatibility function  is given by the Potts model (  ,   ) = [  ̸ =   ].We limit the choice of the kernel function to Gaussian kernel: where  () is a linear combination weight and Λ () is a symmetric, positive-definite precision matrix, defining the shape of the kernel.The vectors   and   are feature vectors for points  and  in an arbitrary feature space, respectively.For our points labeling problem, the smoothness kernel ] is used, defined in terms of point positions   and   to remove isolated regions.
Usually the high complexity of inference in fully connected models will limit its application.Hence, a highly efficient inference algorithm based on a mean field approximation to the CRF distribution [31] is adopted.This approximation yields an iterative message passing algorithm for approximate inference.As message passing in the presented model can be performed using Gaussian filtering in feature space, highly efficient approximations for high-dimensional filtering are utilized to reduce the complexity of message passing from quadratic to linear.

Results
The proposed method was applied to four sets of point clouds acquired from Pachira macrocarpa, Scindapsus, strawberry, and tomato plant.

Initial Segmentation.
Results of the initial segmentation are shown in Figure 10.Some erosion operations were performed around the leaf edges beforehand in case that two leaves are divided into one part.
When a large  th of angle threshold is set, the algorithm tends to segment the point cloud into connected subsets.The smaller  th , the higher probability that a leaf is segmented into several clusters (see Figure 11).

Final Segmentation Results and Comparison with the Original Region Growing Algorithm.
The results after inference of fully connected CRF are shown in the second row of Figure 12.It can be seen that all the small isolated regions are classified into its nearest leaves and all leaves are individually segmented.
For the original region growing algorithm, no matter how we adjust the values of  th and  th , there always exists the case that two or more leaves are segmented into one subset, which is shown in the first row (highlighted by red circles) of Figure 12.Meanwhile segmentation results contain many noise sections that are not classified.The results of our method are shown in the second row of Figure 12, and the results are superior than the first row.

Discussion and Conclusion
We proposed an automatic method to acquire and segment the point cloud of plant leaves.A multiview stereo reconstruction method based on surface expansion is first applied to obtain the 3D structure of plants.We use the Adaptive Normalized Cross-Correlation to match points of interest in two images, and it is robust to radiometric factors and can reduce the fattening effect.In the stage of leaf segmentation on point clouds, an improved region growing method based on fully connected conditional random field (CRF) is proposed to separate connected leaves with similar color.Our method requires no prior botanical knowledge; therefore it can be applied in a wide variety of cases.Qualitative  results on several kinds of plants show the effectiveness of our method.To prevent two leaves from being segmented into only one subset, some erosion operations around the leaf edge are performed before carrying out the region growing segmentation based on normal and curvature.A fully connected CRF model is built in which the pairwise edge potentials are defined by Gaussian kernel to remove small isolated regions after segmentation.
In the future, we will focus on developing advanced multiview stereo reconstruction algorithms to obtain more accurate point clouds for plant phenotype analysis.In addition, we will also study advanced point cloud segmentation algorithms to obtain other organs such as stems and fruits from plants, for realizing the final goal of accurate measurement on plant organs.

Figure 1 :
Figure 1: A simplified overview of the SFM process.

Figure 2 :
Figure 2: Merging point clouds in one coordinate system.

Figure 4 :
Figure 4: Results of test stereo algorithms on Aloe image pair in different lighting conditions.(a) The left image in lighting condition 1.(b) The right image in lighting condition 2. (c) The ground truth disparity map.(d) The disparity map of SGBM by OpenCV.(e) The disparity map of GC by OpenCV.(f) The disparity map of ANCC+GC.

Figure 7 :
Figure 7: Region growing segmentation without boundary erosion for the point cloud from Pachira macrocarpa. th = 3.25,  th = 1.0.The red points are small isolated regions because of too little points, which we will remove in the future refinement stage.

( 19 )
Add current region to global segment list {R c } insert   → {R} (20) end-while (21) Sort {R} according to the size of the region.(22) Return {R} Algorithm 2: The region growing algorithm for point cloud segmentation.

Figure 8 :
Figure 8: The boundary of the point cloud from Pachira macrocarpa leaves.
A conditional random field (, ) is characterized by a Gibbs distribution ( | ) = exp[−()]/().The maximum a posteriori (MAP) labeling of the random field is  * = arg max ∈  ( | ).The unary potential is computed independently for each point with a classifier that produces a distribution over the label assignment   given point cloud features.The following is the function of the pairwise potentials in our model:   (  ,   ) = ∑   () (  ,   )  () (  ,   ) ,