Monocular Vision Ranging and Camera Focal Length Calibration

+e camera calibration in monocular vision represents the relationship between the pixels’ units which is obtained from a camera and the object in the real world. As an essential procedure, camera calibration calculates the three-dimensional geometric information from the captured two-dimensional images. +erefore, a modified camera calibration method based on polynomial regression is proposed to simplify. In this method, a parameter vector is obtained by pixel coordinates of obstacles and corresponding distance values using polynomial regression. +e set of parameter’s vectors can measure the distance between the camera and the ground object in the field of vision under the camera’s posture and position. +e experimental results show that the lowest accuracy of this focal length calibration method for measurement is 97.09%, and the average accuracy was 99.02%.


Introduction
Measuring the distance between self and the obstacle is a crucial part of many fields. e type of method to measuring distance by cameras is called the vision-based ranging method, which may promote the development of automatic measurement and has a great research value [1,2]. e vision-based ranging method includes the monocular vision-based ranging method and stereo vision-based ranging method. Stereo vision-based ranging methods use the parallax of cameras to measure, which needs to match multiple images taken by multiple cameras. Monocular vision-based ranging method has a higher performance than the stereo vision-based ranging method because it does not need match images in the data preprocessing stage [3].
Monocular vision-based ranging methods can be divided into three categories, including proportion-based methods, machine learning-based methods, and coordinate transformation-based methods. Proportion-based methods are according to the principle that the distance is inversely proportional to the image's size from the target in the image plane [4,5]. Taking the ranging model proposed by Bao and Wang [5], this model first assumes the width of all vehicles as a fixed value and then uses this value to learn the model parameters from the images of vehicles at different distances. e ranging accuracy of this model is unsteady because if the measured vehicle is not directly in front of the camera, the value of width will no longer be correct. Machine learningbased methods try to let the computer learn the parameters of the ranging model. For example, Meng et al. [6] proposed the ranging model based on the R-CNN [7] and nonlinear regression. is model first utilizes the R-CNN to detect the position of the preceding vehicle, and then, the distance is obtained by a nonlinear regression model. e accuracy of the objective decision is more than 90%. Coordinate transformation-based methods have strong interpretability.
is type of method utilizes the relationship between world coordinates, camera coordinates, pixel coordinates, and camera parameter matrix to calculate the distance between the camera and obstacles [8,9]. Taking the method proposed by Lu et al. [9], this model increased the ranging accuracy to 95.43%, which derives the ranging model based on the principle of visual imaging. Although the ranging accuracy of this model is higher than previous methods, camera calibration as a key technology in this model needs to be improved. e high-precision camera calibration aims to determine a set of geometric parameters. It is a prerequisite to extract 3D information from the captured images using the object's projections in the image plane. Camera calibration methods can be divided into two categories, including traditional calibration and self-calibration. e traditional calibration utilizes multiple reference points on the object to establish the relationship between the 2D pixel coordinates and the 3D world coordinates. Yakimovsky and Cunningham [10] in 1978 proposed a camera calibration method to calculate the transformation matrix of stereo cameras. ey used a highly linear lens and ignored the distortion to improve the accuracy of 5 mm at a distance of 2 m. Due to the narrow field of view and not considering lens distortion, this method may cause more errors in a wide field. e unknown parameters computed by linear equations in the traditional camera calibration method may not be linearly independent, which increases the error of calibration. Martins et al. [11] in 1981 proposed the biplane calibration method using the points on the double calibration plane. is method avoids any restrictions on the extrinsic camera parameters. On the premise that there is no deflection in the coordinate system, the average error is about 4 mils with a distance of 25 inches. However, the nonlinear lens distortion is not corrected. Tsai [12] proposed a novel camera calibration method which considered the camera distortion. First, the camera parameters are solved using the direct linear transformation method or perspective projection transformation matrix. en, the obtained parameters are taken as the initial values. e nonlinear optimization method is used to improve the precision of the calibration.
is method requires highprecision calibration targets and a similar size field of vision. Chen et al. [13] proposed a calibration method based on a bundle adjustment system, which reduces the requirement of calibration targets. e world coordinate system is relatively stationary with the steam hammer to decrease the error of calibration. Experiments show that the calibration system measures the ram speed of a steam hammer accurately.
Traditional camera calibration has a strong dependence on calibration targets, which has high requirements for the precision of calibration targets. is method is difficult to transfer the unknown scene because it is usually limited to a specific field of vision and distance. erefore, the selfcalibration was proposed because of its higher operability, which obtains the image sequence by controlling the camera motion and then calculates the parameters by matching the image sequence. Luong and Faugeras [14] proposed a selfcalibration method based on point correspondences and fundamental matrices. ey used point correspondences between three images to estimate the perspective projection matrices and parameters of the camera. In contrast to traditional camera calibration methods, this method washes calibration targets with a known 3D shape. is method has the disadvantages of the high cost of polynomial calculation and sensitivity to noise since the continuation work in the complex plane. On this basis, Zhang [15] proposed a calibration method between the traditional calibration method and the self-calibration method. is method utilizes the two-dimensional measurement information for the camera's calibration, which reduces the requirements for the equipment compared with the traditional calibration. In this method, many different angle images of a two-dimensional plane calibration target (checkerboard) are taken to detect the feature points and calculate the parameter matrix of the camera. Nonlinear refinement is based on the maximum likelihood criterion to optimize the calibration results. Although this method has simplified the calibration process compared with previous methods, it still has some requirements for the standardization of operation. For example, the image taken should avoid noise. Otherwise, the inaccurate corner extraction will increase the calibration error. Dong and Isler [16] proposed a method for external parameters of camera calibration that obtained the point-toto plane constraints by two noncoplanar triangles.
is method reduces the dependence on the camera's initial state estimation. At the same time, it reduces the number of observations without reducing the accuracy of the calibration. Xu [17] proposed a camera calibration method based on mirrored. is method reflects two groups of orthogonal phase-shifting sinusoidal figures by a mirror to calculate the relationship between cameras by constraints between phases.
To sum up, monocular vision-based ranging methods based on coordinate transformation have higher accuracy than other methods. is type of method [18,19] has not only a model efficiency but also interpretability and stability. ey are generally based on the linear imaging model constructed by the pinhole imaging principle to simplify the derivation of the ranging model because the pinhole imaging principle can directly construct the geometric correspondence between the world coordinates and the pixel coordinates. At the same time, considering that most of the vision-based ranging methods are used in the automatic measurement of robots or intelligent vehicles, where the camera is fixed in a certain attitude, a lot of measurement targets are ground objects. erefore, this study proposes a ranging model different from the previous vision-based ranging methods, which measures the distance between the camera and the ground object when the camera has an inclination angle in three dimensions. Because the world coordinate system is two-dimensional, the computational complexity of the model is lower and there are not many requirements for the three-dimensional structure of the object.
As the only internal parameter of the camera needs to be calibrated in the model, the accuracy of focal length calibration directly affects the range's accuracy. In the linear imaging model, the object is presented to the image plane through a small hole depending on the principle of straightline propagation. But in fact, the camera is convex lens imaging, which has distortion and defocusing phenomenon. Distortion will lead to the inconsistency between the theoretical imaging position and the actual imaging position, and defocusing will lead to the image in some positions not presented on the image plane. Previous references [20,21] proposed the calibration of a nonlinear camera, which calibrated the focal length and distortion parameters and then restored the theoretical imaging position through the distortion parameters and the actual imaging position. Although radial distortion, centrifugal distortion, and thin prism distortion are considered in these methods, defocusing or other possible factors are not well considered. erefore, the accuracy of the ranging model can be improved.
In this study, all the imaging points are considered to be obtained based on pinhole imaging, so there is no need to restore the pixel coordinates. e effects of distortion and defocusing are reflected in the value of focal length. at is to say, each point at each position has its corresponding focal length. is focal length includes various factors that may affect the imaging position. is study uses the actual distance and pixel coordinates to calculate the focal length corresponding to different pixel positions and then analyses the distribution of focal length value relative to pixel coordinates. It is found that their distribution is not a simple linear distribution. erefore, to improve the accuracy of range, this study attempts to learn the distribution of focal length corresponding to different pixel positions by nonlinear regression (polynomial regression). e results show that the method can effectively improve the accuracy of the ranging model.
In a word, this study focuses on proposing a simple and high-accuracy camera's focal length calibration method to improve the accuracy of the monocular vision-based ranging model. e major method is to use a simple linear imaging model to deduce the complexity of the ranging model and then combine the distortion and defocusing phenomenon caused by the nonlinear imaging of the camera into the focal length calibration process.
e main innovations of this study are as follows: (1) When the camera has an inclination angle in three dimensions, the ranging model for ground object based on the linear imaging model and geometric coordinate transformation is proposed (2) e distortion caused by convex lens imaging and the influence caused by defocusing is reflected in the focal length of the linear imaging model (3) e calibration process does not require the calibration target, and the camera does not need to move. (4) e focal length values containing the effect by distortion and defocusing are calculated by the ranging model, and the nonlinear distribution is learned by polynomial regression.

Monocular Vision-Based Model.
In the real scene, most of the obstacles are on the ground, such as pedestrians or other vehicles. e model in this study focuses on the ground feature points, that is, the 3D world coordinate system is reduced to the 2D coordinate system. e posture of the installed camera depends on the installed location. e camera does not have an inclination angle when the camera is facing the front and the optical axis is parallel to the ground. e optical axis is defined as a line perpendicular to the image plane and passing through the optical center. Based on the pinhole camera model and the principle of straight-line propagation of light [22], the ranging model for the ground objects without the inclination angle of the camera is shown in Figure 1.
As shown in Figure 1, O ′ is the projection of the camera optical center O on the ground plane. e optical axis (OO ″ ) passes through the camera optical center O and is parallel to the ground plane; the optical axis (OO ″ ) is perpendicular to the image plane. e world coordinate system is established with O ′ as the origin, the line passing through O ′ and parallel to the optical axis as Y-axis, and the line passing through O ′ and perpendicular to Y-axis as X-axis. P is the measured point on the ground plane, P Y is the projection of P on the Y-axis, P X is the projection of P on the X-axis, O ″ is the projection of the optical axis on the image plane, the height from the optical center O to the ground is H (the length of OO ′ ), the image on the image plane of the point P is the point P ′ , and the physical coordinates of P ′ on the image plane are P ′ (x, y). e length of OO ″ is recorded as the focal length f. e length of O ′ P (d) is the distance which we will calculate. e camera will have an inclination angle with three different dimensions in the real scene. e details of the three dimensions are shown in Figure 2.
As shown in Figure 2(a), the field of vision of the camera is changed after the camera rotates. e ranging model is consistent with the noninclination angle. As shown in Figure 2(b), the position of the optical center did not change, and the optical axis is still parallel to the ground. e difference with the noninclination angle is that the image plane rotates around its center in the two-dimensional plane. e image plane can be restored to the state without inclination by reverse rotation in the two-dimensional plane. e inclination angle obtained by this way of rotation is called the left or right inclination angle. Since it is not involved in reconstructing the ranging model, the detail of restoring the left-right inclination angle to a noninclination angle is described in the discussion. As shown in Figure 2(c), the optical axis of the camera is not parallel to the ground after rotation. e ranging model needs to reconstruct. e inclination angle obtained by this way of rotation is called the up or down inclination angle. Taking down the inclination angle for an ample, the illustrated of the ranging model is shown in Figure 3. e meanings of P, P′, O, O ′ , O ″ , P X , P Y , P X ′ , and P Y ′ in Figure 3 are the same as is shown in Figure 1. e difference between Figures 1 and 3 is that OO ″ is the optical axis with an inclination angle α, that is, the angle between the horizontal line and the optical axis is called α. When the camera has a down inclination angle, regulation α > 0, and when the camera has an up inclination angle, regulation α < 0. M is the intersection point of P ′ P X ′ and the horizontal plane (plane OMN), N is the intersection point of P Y ′ O ″ and the horizontal plane (plane OMN), and the plane ONM is parallel to the ground plane; the optical axis (OO ″ ) is e length of OO ″ is defined as the focal length f, and the length of O ′ P (d) is the distance which we will calculate.
MO is the projection of P ′ O on the plane OMN, O ′ P is the projection of OP on the ground plane, and the three points of P, O, and P are colinear. Because the plane OMN is parallel to the ground plane, according to the property of parallel planes, the angle between a line and its projection on two parallel planes is equal. erefore, e optical axis OO ″ is perpendicular to the image plane and O ″ N is in the image plane. According to the vertical property of line and planes, a straight-line perpendicular to a plane is perpendicular to any straight line in the plane. In Because the optical axis OO ″ is perpendicular to the image plane and O ″ P X ′ is in the image plane, OO ″ ⊥O ″ P X ′ . Following Pythagoras' theorem, OP X ′ is OO ″ is perpendicular to the image plane and in the plane OO ″ P X ′ . According to the judgment theorem of plane perpendicularity, OO ″ is perpendicular to the image plane. e plane OO ″ P X ′ passing through OO ″ is also perpendicular to the image plane. OP X ′ is in the plane OO ″ P X ′ and P X ′ M is in the image plane. In ΔP X ′ OM, OP X ′ ⊥P X ′ M, O ″ N � � � � � P X ′ M, and O ″ P X ′ � NM. According to the properties of a parallelogram, the opposite sides are parallel and equal, e plane OO ″ P X ′ is perpendicular to the image plane, and OP X ′ is in the plane OO ″ P X and P ′ P X ′ is in the image plane. In ΔP ′ OP X ′ , OP X ′ ⊥P ′ P X ′ , ∠P ′ OP X ′ is OO ′ is perpendicular to the ground plane, in e simultaneous equations (8) and (11)-(13) can be obtained: According to the arctangent addition theorem, we get the following results: Namely, ������ � x 2 + f 2 · (f · tan α + y) From equation (9), it concludes that From equation (10), the coordinate (x, y) of the measured point P ′ on the image plane is the physical coordinate. Its unit should be a millimeter. However, the coordinate (x0, y0) of the point P ′ directly extracted from the image plane is the pixel coordinate. is pixel coordinate system is established with the upper left corner of the picture as the original center, and the unit is pixels. It is necessary to translate the pixel coordinate (x0, y0) into the physical coordinate (x, y) in equation (10). e transformation formulas are shown in equation (11).
In equation (11), x and y denote the physical coordinates of point P ′ , x0 and y0 denote the pixel coordinates of point P ′ , and d X and d y denote the size of a pixel unit in x and y directions. RR and CR denote the row resolution and the column resolution of the image plane.

e Proposed Camera Calibration
Model. According to equation (10), the constraint equation of focal length is shown in equation (12), where the distance d in equation (10) is replaced by the accurate distance D in equation (12) measured by the rangefinder.

Scientific Programming
When the camera's height from the ground and the camera's inclination angle remain unchanged, the pixel coordinates of the targets at different positions are different. According to the experimental data, the focal length which is calculated by equation (12) is different. e reason for this phenomenon can be explained by the defocus phenomenon [23] of convex lens imaging. For a camera, the image plane is fixed according to the clearest image in the center. erefore, the image outside the center of the image plane does not exactly fit the image's position. e points with different object distances correspond to different focal lengths. To obtain the focal length corresponding to different physical coordinates (x, y), this study utilizes polynomial regression to learn the relationship between focal length and the physical coordinates. e physical coordinates are the independent variables, and focal lengths are the dependent variables. e polynomial in the form of equation (13) and the coefficient vector V in equation (13) are obtained by polynomial regression.

Error Compensation Algorithm.
To improve the accuracy of the ranging, we propose an algorithm to compensate for the ranging error which is produced from the coordinate extraction of the feature point and the measurement of the camera inclination angle by adjusting the pixel coordinates (x0, y0). According to the experimental data, the generalization ability of the model is the strongest and the most stable when the ranging error threshold is set at 0.5%. e procedure is outlined in Algorithm 1. E(d) is the difference between the calculated distance d and the accurate distance D.

Experiments
e equipment in our experiment is shown in Figure 4. e camera in this experiment is 1920 * 1080 pixels. 100 targets are placed in the range from 2 to 20 meters. e spatial location of the camera remains the same. e accurate distances D between the optical center projected on the ground and the targets are measured by a laser range finder.
As is shown in Figure 5, the pixel coordinates of the contact point between each target and the ground are extracted using MATLAB. From the 100 pieces of data, 14 pieces were randomly selected as the training dataset and 6 pieces as the test dataset. e focal length of the camera in the training dataset is calculated by equation (11). Utilizing the polynomial regression function in the fitting curve toolbox of MATLAB obtained the regression vector V in equation (13). e focal length of the test dataset is calculated by equations (11) and (12). Equation (10) is used to calculate the distance d of the test dataset.

Results
e values of focal length calculation results of a training set are given in Table 1. For H (the height of the camera), α is the inclination angle of the camera. D is the distance between a target and the camera, which is measured by the laser rangefinder. (x, y) are the physical coordinates calculated according to equation (11), where CR � 1920, RR � 1080, and d x � d y � 0.0026 millimeters. en, the focal length f is calculated according to equation (12). All variables in Table 1 are in millimeters (mm), except the variable a.
Taking (x, y) and f in Table 1 to polynomial regress, the number of parameters in the regression vector is called the length of the vector. Before regression, the convergence value of the length of the regression vector V needs to be determined. e length of the vector V depends on the regression accuracy and the size of the dataset. When the regression accuracy is constant and the size of the dataset is large enough, the value of vector V and the length of the coefficient vector V should be convergent. When the regression accuracy is fixed above 99.5%, Figure 6 shows the length of the vector V for different sizes of the dataset. As is shown in Figure 6, when the size of the dataset is less than 70, the length of the vector V gradually increases with the increase in the size of the dataset. When the size of the dataset reaches 70, the length of the vector V is stable at 12. erefore, the length of the vector V is converged to 12. e calibration equation of focal length is shown in equation (14), and there are 12 regression parameters in equation (14). After the length convergence value of the regression vector V is determined, Figure 7 shows the parameters in the regression vector obtained by polynomial regression of the data in Table 1. Figure 8 is the regression model.
f and E (d) are recalculated based on equation (14) and the values in Table 1, and the results are given in Table 2. f ′ is the regression focal length calculated from Figure 7. Before E (d) and after E (d), respectively, represent the ranging error before and after using Algorithm 1. To verify the effectiveness of the proposed method, a number of experiments under different camera heights and inclination angles were made. e result of the maximum and average ranging error is given in Table 3. e maximum error of ranging is 2.91%, and the maximum average error of ranging is 0.98%. 6 Scientific Programming Repeat t � the number of E(d) greater than the training threshold in the training set, Repeat y0′ � y0 + 1 or y0′ � y0 − 1, x0′ � x0 + 1 or x0′ � x0 − 1. Until E(d) of all data in the training set is less than the threshold, Utilizing the first-order linear regression method to obtain the regression expression like: y0′ � a1 * y0 + b1, x0 � a2 * x0 + b2. Until t � 0 ALGORITHM 1: Error compensation.

Ranging Model for the Camera with a Left or Right
Inclination Angle. Compared with no inclination angle, when the camera has a left or right inclination angle as shown in Figure 2(b), it is equivalent to that the pixel coordinate system rotates the same angle in the same direction. It is also equivalent to that in the same pixel coordinate system, and the imaging points rotate in the opposite direction with the same inclination. e diagram of coordinate transformation is shown in Figure 9. P1 is the image of the fourth quadrant when the camera has a right inclination angle, and P is the image point without an inclination angle restored to the left, where |OP1| � |OP|. In ΔP 1 OP1 y , according to the arctangent function, Clearly, In ΔPOW, P(x0, y0) in equation (17) is the pixel coordinates of the measured point without a left inclination angle and right inclination angle in equation (11). When the pixel of the target is in other quadrants, the formula of coordinate transformation is different from the fourth quadrant, but the principle is the same.

Robustness of the Vector V.
To study the robustness of vector V, the height and selection of the camera were analyzed in this study. Table 4 provides the value of vector V when the height of the camera is 229 millimeters and 239 millimeters.   f (x, y) = p00 + p10 * x + p01 * y + p20 * x 2 + p11 * x * y + p02 * y 2 + p21 * x 2 * y + p12 * x * y 2 + p03 * y 3 + p22 * x 2 * y 2 + p13 * x * y 3 + p04 * y 4  Table 5 provides the value of the vector V under camera A and camera B at the same height. As given in Tables 4 and  5, the height and selection of the camera will affect the value of the regression vector V. erefore, in the application, the same regression vector is only suitable for the same height and the same camera.

Compare with Other Methods.
To verify that the calibration method proposed in this study is suitable for the monocular vision ranging model and can effectively compensate for the distortion and defocus phenomenon, this study is compared with the Zhang [15] and Li [20] method calibrate the same camera, and the calibration results are given in Table 6. Table 6 provides the intrinsic matrix calibrated by Zhang and Li methods. is matrix contains the focal length information of the camera. e radial distortion and the tangential distortion in Table 6 represent the distortion information of the camera, and the pixel coordinates can be corrected by these distortion parameters. e focal length (regression model) in Table 6 is the regression model of focal length obtained in this study, where x and y are the pixel coordinates.
In order to verify the performance of the three calibration methods in the monocular vision ranging model, a large number of ground measured points are randomly distributed on the ground. e distribution of the measured points on the image plane is shown in Figure 10. ese measured points are randomly distributed in all areas of a picture and randomly selected four groups of test sets, 30 feature points in each group. Zhang and Li's methods use their distortion parameters to correct the pixel coordinates before ranging. e focal length obtained by the three calibration methods and the pixel coordinates after correction is substituted into the ranging model to calculate the distance. e ranging error results are shown in Figure 11.
As shown in Figure 11, the proposed calibration method is more stable and accurate than Zhang and Li's method for the monocular vision-based distance measurement model. Table 7 provides the accuracy comparison of the ranging model in this study compared with other ranging models.
Combined with Figure 11 and Table 7, it can be seen that the proposed calibration method in this study is more suitable for the ranging model in this study than other calibration methods. e monocular vision-based ranging model in this study has higher accuracy than other ranging methods.

Conclusions
In this study, a camera's calibration for a monocular visionbased ranging model based on polynomial regression was proposed to establish the relationship between world coordinates and pixel coordinates. For the possible distortion of the captured images, we utilize an error compensation algorithm to revise the pixel coordinates. Compared with other methods, our method has a high score for matching. e process of regression and error compensation can effectively compensate for the errors caused by distortion without the distortion parameters of the convex lens. e experimental results show that the accuracy of ranging in this study is more than 97%. In the future, we will combine our method with image recognition to improve safety in fully autonomous driving.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.