3D Autonomous Navigation Line Extraction for Field Roads Based on Binocular Vision

This paper proposes a 3D autonomous navigation line extraction method for ﬁ eld roads in hilly regions based on a low-cost binocular vision system. Accurate guide path detection of ﬁ eld roads is a prerequisite for the automatic driving of agricultural machines. First, considering the lack of lane lines, blurred boundaries, and complex surroundings of ﬁ eld roads in hilly regions, a modi ﬁ ed image processing method was established to strengthen shadow identi ﬁ cation and information fusion to better distinguish the road area from its surroundings. Second, based on nonobvious shape characteristics and small di ﬀ erences in the gray values of the ﬁ eld roads inside the image, the centroid points of the road area as its statistical feature was extracted and smoothed and then used as the geometric primitives of stereo matching. Finally, an epipolar constraint and a homography matrix were applied for accurate matching and 3D reconstruction to obtain the autonomous navigation line of the ﬁ eld roads. Experiments on the automatic driving of a carrier on ﬁ eld roads showed that on straight roads, multicurvature complex roads and undulating roads, the mean deviations between the actual midline of the road and the automatically traveled trajectory were 0.031 m, 0.069m, and 0.105m, respectively, with maximum deviations of 0.133, 0.195m, and 0.216m, respectively. These test results demonstrate that the proposed method is feasible for road identi ﬁ cation and 3D navigation line acquisition.


Introduction
Cultivated land in hilly regions accounts for 63.2% of the total cultivated area in China and is an important agricultural production base for various crops, such as grain, oil plants, and tobacco [1]. The transportation of agricultural materials and products on field roads, which accounts for 20% of the total workforce of agricultural production, is one of the most important tasks in agricultural production in hilly regions. Autonomous transportation machines are urgently needed in hilly regions due to the severe shortage of human labour and the intense requirement to improve productivity. With the development of rural construction, a large number of field roads of cement pavement with widths of 1.2 m to 2.5 m have been built in the hilly areas of China, thus providing basic conditions for agricultural mechanization. In fact, these field roads in hilly regions are often twisted, windy, and rolling; these characteristics, coupled with occlusion by different types of crops along both sides, make obtaining an accurate guide path extremely difficult. As a result, the development of automated field transport machines for field roads in hilly regions has been limited to date.
Obtaining a navigation line for a field road is a prerequisite for transport machines to drive automatically on the road. To solve this task, an autonomous transport machine must be equipped with a set of sensors that allow it to accurately determine its position relative to the surrounding limits. Currently, the most commonly used navigation systems for agricultural machines are the Global Navigation Satellite System (GNSS), Machine Vision Navigation System, Light Detection and Ranging (LIDAR), and Combined Navigation Systems composed of two or more subsystems [2][3][4][5][6][7]. The most affordable sensor for direct measurement of position, GNSS, does not reach this level of accuracy [8]. Furthermore, GNSS suffers from occasional outages in position due to communication link failures and loss of satellite lock due to occlusion by obstacles such as trees [9,10]. LASER scanners (LIDAR), or laser rangefinders, are commonly employed to obtain three-dimensional point clouds of the area for off-road navigation [11], for urban search and rescue, or for agricultural applications [12,13]. LASER-based sensors are able to directly measure distances and require less computer processing than vision-based techniques [14]. However, a drawback of this kind of sensors is that they are expensive. Therefore, the high cost of real-time kinematic navigation sensors has limited the commercialization of autonomously guided agricultural machines [15].
Machine vision systems are very suitable tools for wide perception of the environment, increasingly being used as a lower-cost alternative to LIDAR. Cameras are very inexpensive equipment. For example, the camera used in our prototype field road carrier costs 252 RMB in commercial shops. Another interesting point is that images convey a huge amount of information. In particular, binocular vision has good environmental perception ability [8,16]. The guide path for autonomous transport machines can be extracted through recognizing the driving range, road conditions, and surroundings by binocular vision. Therefore, binocular vision can be used as one of the main methods of navigation line detection for autonomous field transportation machines in hilly regions.
The field roads in hilly regions are typically unstructured roads. For applications in vision navigation, the navigation line of unstructured roads is usually acquired by analyzing the differences in textures, colors, edges, and other characteristics between the road and its surroundings based on the assumption that the road surfaces are planar or an idealized treatment [17][18][19]. Based on this idealized approach, the feasible guide path for vehicles can be identified. Liu et al. [20] proposed an online classifier based on the Support Vector Machine (SVM) to classify road scenes under different weather conditions in different seasons and presented an accurate road border model for the autonomous path detection of Unmanned Ground Vehicles (UGVs) by using the AdaBoost algorithm and random sample consensus (RANSAC) spline fitting algorithm. Wang et al. [21] proposed an unstructured road detection method based on an improved region growth with the Principal Component Analysis-SVM (PCA-SVM) method. A priori knowledge, such as the location of the road, the initial cell, and the characteristics of the road boundary cells, was used to improve the region growth method, and the classifier was used to select the cell growth method to eliminate the miscalculated area. Liu et al. [22] proposed an unstructured road detection approach based on the color of the Gaussian mixture model and the parabolic model. First, based on a combination of averaging filtering and subsampling, the color image was changed from high resolution to low resolution and given illumination compensation. Then, a Gaussian mixture model was formulated based on the K-means algorithm to obtain the optimized clustering center of the road area and other areas, and the parameters of the right and left road parabolic models were solved by using the Least-Squares Method (LSM). Finally, the road information was extracted after fitting the boundary of the road.
Methods for acquiring unstructured road navigation lines based on the plane assumption or idealized treatment have limited the adaptability and effectiveness of automatic driving machines in actual environments to some extent, and the requirements of road models under complex conditions have not yet been met. Multidimensional road perception models have been studied, but these models remain mainly in the theoretical analysis stage. Jiang [23] proposed horizontal and vertical methods of modelling the road surface. In the horizontal direction, a 3D-parameterized free-shape lane model was established according to the relationships between the 3D geometric points of the double boundaries of the lane. In the vertical direction, 3D information in the vertical direction of the road surface was obtained using scale-invariant features. Wang [24] used two vertical omnidirectional cameras to capture 3D information of road images, establish a road space model, and calculate the road width. Byun et al. [25] proposed a novel method for road recognition using 3D point clouds based on a Markov Random Field (MRF) framework in unstructured and complex road environments. This method transformed a road recognition problem into a classification problem based on MRF modelling and presented guidelines for the optimal selection of the gradient value, the average height, the normal vectors, and the intensity value. Jia et al. [26] was concerned with the road reconstruction problem of on-road vehicles with shadows. To deal with the effects of shadows, images were transformed to the proposed illuminant invariant color space and fused with raw images. The road region was reconstructed from a geometric point of view. Deng et al. [27] proposed a binocular vision-based, real-time solution for detecting the traversable region outdoors. An appearance model based on multivariate Gaussian was constructed from a sample region in the left image. A fast, self-supervised segmentation scheme was proposed to classify the traversable and nontraversable regions.
In view of the characteristics of field roads in hilly regions, such as the lack of lane lines, blurred boundaries, and complex backgrounds, this paper proposed a new method of 3D navigation line extraction in field roads to obtain key information (i.e., the autonomous guide line and slope gradient) based on a low-cost binocular vision system. The modified methods of image processing, statistical feature extraction and 3D reconstruction were studied in detail. The novel features and contributions of this paper include the following: (i) the problem of image recognition with shadows in the field roads was studied; (ii) according to the facts that the field roads were characterized by nonobvious features, the centroid points of the road area were used as matching primitives; and (iii) the fitting curve of continuous centroid points was used as the navigation line for unmanned agricultural machinery on the field roads.
The object was to obtain the navigation line with 3D coordination information. First, after obtaining the road area by threshold segmentation and shadow recognition, the centroid of the road area as its statistical feature was extracted and then smoothed as the geometric primitives of stereo 2 Journal of Sensors matching. Then, the homography matrix was solved through Speeded-Up Robust Features (SURF) detection based on the RANSAC algorithm, and the epipolar constraint was applied to achieve accurate feature matching. Furthermore, the 3D information of the navigation line was extracted from the matched centroid points. Finally, an automatic driving test of an autonomous carrier was conducted to verify the proposed method.

Image Processing
2.1. Image Processing Method Architecture. The objective of image processing is to distinguish the road area from its surroundings. The proposed image processing procedure consists of three main linked phases: (i) image segmentation, (ii) identification of the shadow areas, and (iii) the integration operation. Figure 1 shows the full structure of the proposed procedure as a flowchart. Field roads in hilly regions are irregular and have blurred boundaries. These characteristics, coupled with the complex surface status, surroundings, such as trees and crops, covering the two sides of the road, and various water stains and shadows smearing the surface, makes acquiring information on field roads through original images extremely difficult. Therefore, the multiprocessing of original images is required to recognize field roads from their surroundings. First, the V component in the HSV color space is separated to perform Otsu threshold segmentation and postprocessing, and the obvious road area and nonroad area are obtained. Then, by selecting appropriate parameters, the S and V components are each subjected to point calculation and then weighted and merged according to different weights to extract the shadow features. Finally, the shadow area and the nonshadow road area are combined and postprocessed again to obtain the complete road area in a binary image.

Segmentation.
Hundreds of field road images were captured at Chongqing, China, which is a typical hilly area. The Otsu threshold segmentation effects of these images in RGB, Lab, HSV, and HSI color spaces were compared. The results showed that the V component in the HSV color space has a better adaptation to the influence of water stains and weeds on roads, while the S component is insensitive to shadows on the road. Therefore, Otsu threshold segmentation based on the V component in the HSV color space was adopted to detect the road area. Because the target of image segmentation is the road scope, which is relatively large, and there is no detailed requirement for small parts, the morphological operations and connected region area treatment are gradually introduced to segment a road from its surroundings.
In morphological operations, the opening and majority operations (size 3×3) are applied to remove insignificant small patches and spurious pixels over the binary image. Then connected domains are labeled with the 4-adjacent Seed-Filling method, and their areas are calculated. The contours of the connected domains with a small areas is discarded. The contour curves of the connected domains with larger areas are redrawn with the polygon fitting method. Then, the obvious road area and nonroad area are obtained.

Shadow Processing.
Usually, crops or trees along both sides of the road will cast shadows on the surface with various shapes during different periods, which will hinder the road from being distinguished. The V component in the HSV color space has good adaptability to recognize the road areas inside the image but is not effective in identifying the shadows that are often classified as a part of the background. This inability directly affects the integrity of the road information; thus, recovery of the road area with shadows is particularly important. In this paper, the characteristics of the S component are utilized because the S component in the HSV color space is not sensitive to shadows. By selecting appropriate parameters, the S and V components are each subjected to point calculation and then weighted and merged according to different weights to extract the shadow features.
Image display effect can be changed by point calculation. Define A x, y as the input image and B x, y as the output image; the point calculation is then where k is the coefficient, b is the intercept, and x, y are the pixel coordinates. This paper chooses the straightforward method of Weighted Averaging (WA) to fuse the S and V components. Although this method weakens the details of the image to a certain extent, it is easy to implement, fast, and can improve the signal-to-noise ratio of the fused image. Let the image of the V component after the point operation be src1, the image of the S component after the point operation be src2,a n d the weighted and fused image be dst, then the mathematical expression between the images is where I is the index value of the multidimensional array element; α is the weight of the src1 matrix element; and β is the weight of the src2 matrix element. In order to better obtain the k of the S and V component point operations and the α and β of the weighted fusion, Table 1 was designed to perform point operations and weighted fusion under different k, α and β. Then, the threshold segmentation processing results were evaluated and represented on a scale of 1 to 10, where 1 indicates the worst effect and 10 indicates the best effect. The appropriate k, α, and β values were selected by analyzing and comparing the processing results.
After a large number of experiments, the V component point operation slope k 1 =05, the S component point operation coefficient k 2 =8, the weight of the src1 matrix element α =05, and the weight of the src2 matrix element β =05 were finally selected for shadow processing of the road. The road shadow detection results are shown in Figure 2.

Journal of Sensors
It can be seen from Figure 2 that for the road shadows of different depths and areas, the weighted fusion shadow processing algorithm can extract the road shadows effectively and accurately and obtain a complete shadow area. In addition, the algorithm is simple to use without particular limitations on the scene composition of the original image, and thus has a broad application scope.

Image
Merging. At this stage, the road area segmented by the V component and the area recovered from shadow recognition are merged through logic integrated operations and morphological operations. Then, the complete road area is distinguished from the nonroad area and presented as a binary image.
The results of the shadow recognition and image merging are shown in Figure 3.

Extraction of Statistical Features.
The rural field road has no obvious features and also has little difference of the gray value. Under such conditions, the centroids of the road area are extracted as the road's statistical feature points and as the stereo matching primitive in this paper. Moreover, these centroids are smoothed by the LSM to eliminate the disturbance factor in road area recognition.
The binocular camera used in this study was mounted on the front of a field transportation machine. Two digital     5 Journal of Sensors the binary image. The formulas for calculating the coordinates (x i , y i ) of the centroid of area A are as follows:

Slope of point operation
where n is the total number of pixels in area A, and x and y are the pixel coordinates.
The extracted centroids are shown in the original RGB image, such as Figure 4(a).
As shown in Figure 4(a), the centroids of the extracted road area can accurately express the direction and, to some extent, the midline of the road if being connected continuously. However, due to the influence of irregular factors, such as weeds and water stains, the distinguished field road area may be inaccurate, thus causing the road area centroids extracted through the above method to deviate from the actual centerline. Since the path of the actual field road is continuous, the line connected continuously by all centroids should be smooth. Therefore, the extracted centroids are smoothed through the following phases: (i) least-squares curve fitting for the centroid points, (ii) obtaining the fitting function, and (iii) recalculating the new abscissa value corresponding to the original ordinate of each centroid using the fitting function. This method can ensure the continuity of the path and eliminate the impact of incorrect path information. Figure 4(b) shows the reacquired centroids of the original images in Figure 4(a). Then, the reacquired centroids are taken as the statistical feature points of the road areas.

Characteristics Matching.
Stereo matching is a critical step of the three-dimensional navigation information extraction of field roads. Based on image preprocessing, five sequential processed are carried out for characteristics matching: (i) first process the left image, extract and smooth the centroid points of the road area; (ii) then use SURF to detect the automatic matching of multiple sets of corresponding points in the left and right images to find the homography matrix; (iii) use the homography matrix and the centroid points extracted from the left image to find the correspondence points in the right image; (iv) perform the epipolar constraint test; and (v), use the obtained pairs of the corresponding centroid points in the right and left images to perform 3D reconstruction.
For a binocular camera, each actual centroid of the road corresponds to two related pixels, namely, one inside the left image and the other inside the right image. The relationship between the two pixels is described by the homography matrix. The matching relationship of the road area centroids in the left image and those in the right image can be obtained by solving the homography matrix.
Suppose that p = u, v,1 T is the homogeneous coordinate of a 3D point P in an image, and p ′ = x, y,1 T is the homogeneous coordinate of the corresponding point of point P in the matching image. The transformation from point p to its corresponding point p ′ can be obtained through the homography matrix H [28,29]: The homography matrix H describes the transformation relationships of an actual point in two images, namely translation, rotation, and scaling. To obtain the relationships between the statistical features of two images of a field road more precisely, Speeded-Up Robust Features (SURF) detection based on the RANSAC algorithm [30] is used to match the corresponding feature points. The homography matrix H is then calculated by finding the relationships between multiple pairs of matching points. In this case, the procedure is as follows: (c) The feature points are matched by the Euclidean distance and the Hessian matrix trace between two feature points, and the RANSAC algorithm is used to remove the pseudomatched points to ensure the effectiveness of the match. The matching results of an image in Figure 3(b) with its corresponding image captured by the other camera of the binocular are shown in Figure 5.
(d) The homography matrix H is calculated using the findHomography function in the OpenCV visual library.
(e) The unique matching point in the right image corresponding to each statistical feature point of the road area (reacquisition centroid) in the left image is calculated according to Equation (4). The matching results of the road centroids in Figure 4(b) are shown in Figure 4(c).
Usually homography is estimated between points that belong to the same plane. This paper uses the SURF algorithm for feature matching and uses the RANSAC method to remove mismatched points in the whole image plane based on the following considerations. (1) At present, China's hilly field roads have basically been cement hardened, and the differences in gray scale and texture of the road surface are small, with no obvious structural features. If the homography is limited to the road plane, the matching accuracy may be 6 Journal of Sensors reduced.
(2) It is difficult to recognize and mark the boundaries of the field roads because the boundaries are nebulous. If the acquisition of the homography matrix is limited to the road plane, the image processing needs to be increased, such as dividing the boundaries of the road, and thus the time for image processing will be increased.

Validation of Matching Pairs.
In an unknown environment, the disturbances are complex and changeable, and the single constraint may not accurately match feature points. Therefore, the epipolar constraint is introduced to further validate the matching pairs. The epipolar constraint describes the constraint of the point to a line in two images, thus reducing the search for the corresponding matching point from the entire image to a line [31,32]. Figure 6 shows two pinhole cameras, their projection centers, C l and C r , and image planes I l and I r . The vectors p l and p r refer to the projections of P onto the left and right image planes, respectively, and are expressed in the corresponding reference frame. The line C l C r connecting the projection centers of the two cameras is the baseline. The plane defined by P, C l , and C r is called the epipolar plane π. The intersection points e l and e r of the baseline and the two camera planes are the epipoles. The intersection lines e l p l and e r p r between the polar plane and the two camera planes are the epipolar lines, defined as l l and l r , respectively.
Consider the triplet P, p l and p r , and if p l is given, P can lie anywhere on the ray from C l through p l . However, since the image of this ray in the right image is the epipolar line through the corresponding point p r , the correct match must lie on the epipolar line [32]. Lines l l and l r are called a pair of polar lines and constitute the epipolar constraints of the matching points. The epipolar constraint between two images can be described by the fundamental matrix F.
Define p r and p l as the points in the pixel coordinates corresponding to p l and p r in the camera reference frame. According to epipolar geometry, for point p l on the left  Correspondingly, for point p r on the right image, the corresponding epipolar line on the left image can be expressed as follows: If the corresponding point p l in the left image is p r in the right image, point p r must be on line l r and satisfy the following condition: The key to obtaining the epipolar lines is the calculation of the fundamental matrix F. The fundamental matrix is a 3×3 matrix, which represents the correspondence between the matching points and includes the information of the camera's internal and external parameters. The matrix forms the foundation for the camera's matching, tracking, and three-dimensional reconstruction.
Suppose that (u l , v l ) and (u r , v r ) are the coordinates of p l and p r , respectively, which can be written as (u l , v l ,1) and (u r , v r ,1) in the homogeneous reference frame. Then, according to equation (7), we have To rewrite the elements of the fundamental matrix F into a column vector F T = f 11 , f 12 , f 13 , f 21 , … , f 33 , we use Let A be the coefficient matrix of the equation (9); then,

AF =0 10
Thus, the fundamental matrix F can be obtained through the eight-point algorithm [32] based on equation (10). By utilizing the multiple correspondence points obtained from the SURF detection based on RANSAC, the fundamental matrix F is obtained through the findFundamentalMat function in the OpenCV visual library. Then, according to equation (5), the corresponding epipolar line in the right image of any point in the left image is obtained, and the search range of its matching point is reduced to a line.
After obtaining the epipolar line, an additional step is applied to extend the unique matching point obtained by the homography matrix processing to a rectangle, and then estimate the positional relationship between the rectangle and outer epipolar line. As shown in Figure 7, if the epipolar line intersects the rectangle, the matching point is retained. If it is not intersected, the point is eliminated due to the larger matching error. Therefore, the matching pairs obtained by the homography matrix processing are validated through the epipolar line constraint processing.
After homography matrix processing and epipolar line validation, the matching results of the field road's statistical feature points of images in Figure 4(c) are as shown in Figure 8.
The matching results of the images in Figure 8 are evaluated by the matching error and running time of the program. The matching error includes two parts: (i) horizontal matching error: the ratio of the pixel difference and total pixel points in the horizontal direction and (ii) vertical matching error: the ratio of the pixel difference and total pixel points in the vertical direction. The pixel difference is defined as the difference value between the matching point and the precise position. The evaluation results are shown in Table 2. Figure 8 and Table 2 show that the proposed matching method based on the homography matrix and epipolar line constraint has good matching accuracy, a good matching effect, and a fast matching speed due to the fewer matching primitives. Furthermore, more processing results of other images demonstrate that this method has good performance in suppressing noise, anti-interference, and is robust in image transformation.

3D Reconstruction.
Most roads in hilly regions fluctuate due to the rugged terrain. The 3D information of the navigation line not only provides the changes in the direction of a field road but also offers gradient variation, which has a considerable influence on the control of an autonomous transportation machine. According to the principle of binocular vision [33], the three-dimensional coordinate information of the road's statistical features can be extracted through the LSM processing of the intrinsic parameters and the extrinsic parameters obtained by calibration and the coordinates of the statistical feature points of the road obtained from stereo matching. This process is also called 3D reconstruction of the binocular vision [34]. As previously described, the pixel positions in the right and left cameras of point P are p l and p r , which can be obtained through characteristic matching. The projection matrixes for the right and left cameras, namely M 1 and M 2 , can be achieved by camera calibration.
The relation between the pixel coordinate and world coordinate of the left camera image can be expressed as follows: Similarly, the corresponding relationship for the right camera image can be expressed as follows: where Z c 1 and Z c 2 are the coordinate values of P on the two respective optical axes; (u 1 , v 1 , and 1) and (u 2 , v 2 , and 1) are the homogeneous coordinates of p l and p r in the image reference frame, respectively; (X w , Y w , Z w , and 1) is the homogeneous coordinate of P in the world reference frame; and m k ij k =1,2; i =1,2,3; j =1,… 4 is the ith row and jth column element of M k .
From equations (11) and (12), four linear equations about X w , Y w , and Z w are obtained:  The coordinate values of P can be obtained by equation (13) because 3D point P is the intersection of C r p r and C l p l (see Figure 6). In order to reduce the influence of data noise, LMS method is applied. Equation (13)  According to the LSM, the following equation can be obtained: According to equation (17), the 3D coordinates of the extracted road area centroids can be solved. Thus, the line that continuously connects all the extracted centroids can be used as the navigation line of the field road.
Furthermore, the slope gradient of the road can be calculated using the 3D coordinate information of the extracted road area centroids. The three-dimensional information can provide the vehicle with the slope change of the road, which has a great influence on the vehicle control of the carrier. Figure 9 shows a slope model in a vehicle reference frame. If calculated along the column indicated by the dotted line p 1 p 2 on the image, the obtained relative variation of the coordinate is the slope along the intersection of the surface O V P 1 P 2 and the ground, which represents the fluctuation of the road to be driven in front of the vehicle.
From the geometric relationship, the slope component along the Y v direction is The slope component along the X v direction is The horizontal distance of the two spatial points P 1 and P 2 is △X 2 + △Y 2 , then the slope can be calculated as where ΔX, ΔY, and ΔZ are the coordinate differences between two road centroids in the vehicle reference frame, in which Y v is the direction of motion.
Using the three-dimensional coordinates of the road's centroid point, the fluctuations of the road can be clearly obtained, providing data support for the subsequent vehicle control of the carrier. Figure 10 shows the 3D coordinates of the road area centroids and their connecting line extracted from the images in Figure 4. Table 3 shows the calculated slope gradient of the roads in Figure 4 and its error with the actual slope gradient of the road.
As shown Figure 10 and Table 3, the three-dimensional coordinates of the extracted road area centroids can clearly describe the fluctuations of the road, with the reconstruction error of the slope gradient remaining below 10%. In fact, many factors affect the accuracy of 3D reconstruction, including the accuracy of the intrinsic parameters of the camera, changes in the calibration environment, collocation and position of the camera, three-dimensional model of the camera, matching accuracy, and target size [35].

Methods.
To verify the feasibility and accuracy of the proposed method to extract the navigation line, an autonomous field road carrier with a binocular vision navigation system was built, as shown in Figure 11. The length of the field road carrier is 1.  Figure 9: Slope model in vehicle reference frame. 10 Journal of Sensors to 25°with respect to the ground and without lateral displacements. This device was equipped with two 640 × 480 pixel cameras with a center spacing of 62 mm. The images were processed by OpenCV using the Microsoft Visual Studio (2010) integrated development environment (IDE). A high-accuracy real-time kinematic-global positioning system (RTK-GPS), which included a fixed base station and a rover on the carrier to reduce the carrier's position error, was used to collect the real-time location coordinate information. The positioning accuracy of the RTK-GPS is 2 cm. The images were obtained with the two cameras of the binocular vision system. A personal computer (PC) was used for image processing, road feature extraction, stereo matching, and 3D reconstruction. On this basis, the extracted 3D navigation line of the road was applied as the reference of the path tracking while the carrier automatically drives on the field road. Using a USB2UIS adapter board, the PC sent the navigation information to the carrier controller by RS-232 serial communication. The carrier controller directly controlled the steering servo motor and the drive motor of the carrier to realize the automatic driving of the carrier. A fuzzy neural network control algorithm was adopted to realize path tracking [36]. During visual navigation driving, an image frame was captured every 0.2 s and then the navigation line was extracted. Based on the navigation line, the lateral deviation, heading deviation, and path curvature were calculated and taken as the input parameters of the neural network controller. The output parameter of the controller was the turning angle of the carrier. Figure 12 shows a flowchart of the entire process.
To study the deviation between the carrier's automatic travel trajectory and the actual midline of the road under various conditions in hilly regions, three types of field roads-straight, complex multicurvature, and fluctuating roads-were selected as test roads, as shown in Figure 13. The carrier drove twice on the same road.
In the first instance, the carrier drove accurately along the actual midline of the road by manual operation, and the trajectory and coordination values of the carrier were measured by the RTK-GPS; these values were taken as the midline of the road. To do so, first, we marked the actual midpoint every 10 cm on the road using a ruler to measure the width of the road; second, we drove the carrier along these midpoints at a low speed of 2 m/s. A rod was installed in front of the carrier head to mark the central position of the carrier. As the carrier drove, the steering was accurately controlled so that the marking rod passed through the midpoints of the road. In this way, the carrier was driven along the middle of the road with minor deviations. Accordingly, the collected coordinates could be taken as the ground-truth and used to validate the visual navigation approach.
In the second instance, the carrier drove automatically along the road under the guidance of the extracted 3D   Figures 14-16, respectively. The deviation includes a left deviation and a right deviation. The left deviation value is positive, and the right deviation value is negative. Figure 14 shows that under the straight road condition, due to the regular road conditions and lack of other unfavorable factors, the automatic travel trajectory and midline largely overlap, with a maximum deviation of 0.133 m and an average deviation of 0.031 m. It indicates that the carrier can drive automatically under the guidance of the extracted navigation line with a small deviation near the midline of the road in a straight path. Figure 15 shows that under complex multicurvature road conditions, the maximum deviation between the automatic (a)     Journal of Sensors travel trajectory and the road midline is 0.195 m, and the average deviation is 0.069 m. Compared to that on the straight road, due to the influence of unfavorable factors such as curves, shadows, and water stains, which disturb the extraction of the navigation line, the deviation of the automatic travel trajectory on the complicated multicurvature road has increased. However, in the real test, the carrier can keep running along the midline of the road and meet the requirement of the carrier driving on the field road automatically without going off the road.
The fluctuating road that was tested is composed of multiple complex segments, including straight and multicurvature sections with shadows, water stains covering the surface and weeds or crops covering the two edges. Figure 16   Through the test results on various roads, the main contributors to the deviation between the automatic travel trajectory and midline include the following: (i) weeds or crops on the edges of the road that are classified as nonroad areas after image processing, which results in an extracted navigation line that differs from the actual midline of the road; (ii) real-time changes in the carrier posture, which lead to frequent changes in the extracted navigation line, resulting in a tracking deviation; (iii) the intrinsic error of the RTK-GPS, which is at least 2 cm; (iv) the measurement method of the actual midline of the road, which is imprecise under manual operation mode; (v) the measurement accuracy of the midline coordination of the road and the real-time position coordination of the carrier, which can be interrupted by the jittering of the cameras and the carrier; and (vi) the error of the automatic steering control based on the extracted navigation line.
In fact, the intrinsic errors of the RTK-GPS and real-time position measurement accuracy cannot be eliminated, but they have no effect on the autonomous driving of the carrier. In addition, test results have shown that the utilized fuzzy neural network control algorithm gives satisfactory results for automatic steering control [36] and the improvement obtained by changing the control parameters is small. The influences of the carrier posture change and the jittering of the cameras, which are related to the robustness of image capture, can be reduced by using image mosaics [37] and adopting an optimal cohesion algorithm for two adjacent images. Therefore, the extraction of the midline of field roads under various situations is the most critical factor responsible for deviations on the road during autonomous navigation driving.

Conclusions
This paper proposed a 3D autonomous navigation line extraction method for field roads in hilly regions based on a low-cost binocular vision system. A modified image processing method was presented to strengthen shadow identification. The centroid points of the road area as its statistical feature were extracted and smoothed and then used as the geometric primitives of stereo matching. The epipolar constraint and homography matrix were applied for accurate matching and 3D reconstruction to obtain the autonomous navigation line for the field roads. Finally, an automatic driving test of a carrier in hilly regions was carried out to verify the proposed method. The experimental results indicate the following findings:  14 Journal of Sensors (b) The proposed 3D autonomous navigation line extraction method for field roads can realize road recognition and 3D coordination information acquisition and can meet the requirements for a carrier to drive automatically on a field road. To some extent, this method can also be applied for the automatic driving of other agricultural machines on field roads.

Data Availability
The experimental result data used to support the findings of this study are included within the article, and the source code data are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.