High Precision Calibration Algorithm for Binocular Stereo Vision Camera using Deep Reinforcement Learning

Camera calibration is the most important aspect of computer vision research. To address the issue of insufficient precision, therefore, a high precision calibration algorithm for binocular stereo vision camera using deep reinforcement learning is proposed. Firstly, a binocular stereo camera model is established. Camera calibration is mainly divided into internal and external parameter calibration. Secondly, the internal parameter calibration is completed by solving the antihidden point of the camera light center and the camera distortion value of the camera plane. The deep learning fitting value function is used based on the internal parameters. The target network is established to adjust the parameters of the value function, and the convergence of the value function is calculated to optimize reinforcement learning. The deep reinforcement learning fitting structure is built, the camera data is entered, and the external parameter calibration is finished by continuous updating and convergence. Finally, the high precision calibration of the binocular stereo vision camera is completed. The results show that the calibration error of the proposed algorithm under different sizes of checkerboard calibration board test is only 0.36% and 0.35%, respectively, the calibration accuracy is high, the value function converges quickly, and the parameter calculation accuracy is high, the overall time consumption of the proposed algorithm is short, and the calibration results have strong stability.


Introduction
At the moment, computer vision is a hot research field. It is widely used in various fields and is particularly useful in UAV visual positioning, robot navigation, and other areas [1,2]. Binocular stereo vision is based on the premise of mimicking human vision, and it employs two cameras to complete visual measurement using parallax calculations. It offers numerous advantages, including noncontact, high precision, and great concealment. It is capable of meeting people's growing measuring and detecting requirements. erefore, binocular stereo vision has a promising application future [3]. High precision camera calibration is one of the keys to ensuring the effective functioning of a binocular stereo vision system. As a result, it is vital to investigate the high precision calibration of binocular stereo vision cameras. e primary goal of binocular stereo vision calibration is to calculate the internal parameters and spatial position parameters of the camera, as well as to determine the correlation between two-dimensional coordinates and three-dimensional coordinates [4], thereby ensuring the accuracy of the vision system measurement. Traditional camera calibration and self-calibration are the two main types of extant camera calibration technologies. e traditional camera calibration method computes the camera's internal characteristics based on a predetermined model and appearance data such as target size. is calibration method has the drawbacks of being difficult to use and being extremely dependent on the equipment.
is calibration method has the disadvantages of complex operation and high equipment dependence. e self-calibration of cameras does not need external help, but it only calculates the camera parameters through the feature point data between the target images. Despite the fact that this method is easier than traditional camera calibration, the calibration accuracy is low [5,6]. erefore, this paper proposes a high precision calibration algorithm for binocular stereo vision camera using deep reinforcement learning and attempts to combine deep learning and reinforcement learning to give a new concept for binocular stereo vision camera calibration. e main contributions of this paper are as follows: (1) the camera distortion is considered when calculating the internal parameters of the camera to improve the calculation accuracy of the internal parameters; (2) the external parameters of the camera are calculated using deep reinforcement learning algorithm, which fully utilizes the advantages of deep learning and reinforcement learning; (3) the proposed algorithm can effectively complete the high precision camera calibration and has a specific application.

Related Work
In the field of computer science, binocular stereo vision is a hot topic. e method of camera calibration has been proposed by a number of academics both at home and abroad. Literature [7] proposed an alternative adjustment-based camera calibration algorithm for binocular stereo vision systems, established a binocular vision calibration system with left and right camera coordinates as reference coordinates, and optimized the internal parameters of the two cameras through alternating adjustment experiments to achieve the best value. e optimal distortion parameters and internal and external parameters are then obtained by optimizing all internal and external parameters although the algorithm's convergence time is slow. e deep learning is updated using the projection vector of feature points, and the best translation vector is found using the projection vector of feature points. Literature [8] used the singular value decomposition approach to calculate the relative attitude matrix during the absolute azimuth interpretation stage.
e posture estimation problem of a stereo vision measuring system based on feature points is solved, and stereo vision is expanded. In the image, just one pose parameter from the two collected images is optimized. e algorithm is designed in such a way that it does not effectively increase camera calibration accuracy. Literature [9] established and calibrated a heterogeneous binocular stereo vision system, which included a high-definition color camera and an infrared thermal camera system and designed an algorithm for accurate positioning and sorting of calibration points on the calibration plate. e camera is then calibrated, as is the binocular stereo vision system. is method has a low mistake rate, but it takes a long time. Literature [10] demonstrated online calibration of dynamic binocular stereo vision's external parameters for rectangular images of undetermined size. e elliptical pose and heading reference system is used in real time to provide an approximate value of the rotation angle, and the rotation angle of each camera is solved iteratively using only a single rectangular centroid according to the homology map between images. To complete the camera calibration, the yaw angle is corrected according to the matching rectangle prime angle. However, the algorithm's accuracy is low. Literature [11] examined the methods for calibrating the ultra-wide field of view long wave infrared camera's internal and external parameters. In order to address the issues of camera imaging distortion and low resolution, an external parameter calibration method based on the least square method is proposed, and the calibration results of a long wave infrared camera are evaluated in conjunction with the relevant data of internal parameters. Experiments validate the approach's objective correctness. However, its stability is low. Literature [12] investigated the parallel binocular stereo vision system and zoom calibration method. e image information is gathered using the triangulation concept, the baseline accuracy is ensured by moving the camera, the calibration results are produced, and the BP neural network is used to process the calibration data further to increase the visual measurement accuracy. However, due to the characteristics and mutual restrictions of left and right images in binocular stereo vision, this strategy is prone to local optimization, and overall stability is not satisfied. To address the disadvantages of traditional methods, this work investigates the high precision calibration for a binocular stereo vision camera using deep reinforcement learning, with an emphasis on addressing the camera's internal and external parameters. Experiments validate the algorithm's performance, and camera calibration may be accomplished quickly.

Binocular Stereo Vision Camera Model.
rough the imaging lens, the camera translates the projection from threedimensional coordinates to two-dimensional coordinates.
is process is known as imaging transformation, and it is referred to as camera model. e camera model can be used to determine the location relationship between each point on the measured image and the space object [13]. Binocular stereo vision cameras use the parallax principle to obtain image information from left and right cameras. Figure 1 depicts the positioning and coordinates of the two cameras in binocular stereo vision assessment. Figure 1 shows the location and coordinates of the two cameras in binocular stereo vision measurement.O-XYZ represents the coordinate system of the left camera. e origin is located at the start of the global coordinate system. e coordinate system of the left camera image is o-x 1 y 1 z 1 , the coordinate system of the right camera is o-xyz, and the coordinate system of the right camera image is o-x 2 y 2 z 2 . e camera transformation model is then developed using the imaging lens principle [14].
where A l and A r represent the image scale coefficients of the left and right cameras, and c X and c x represent the scale coefficients of the left and right cameras. Using axis d and axis f as measurement scales, d l , f l , d r and f l are the optical center of left and right cameras, while b l and b r are the error coefficients in vertical direction of left and right cameras. ere will be some translation and rotation during the pixel location conversion of the left and right cameras. e original position coordinate of the target object is designated as K(x k , y k , z k ), and a corner of the target object is chosen for translation.
Comparing the corresponding coordinates of the corner point before and after the pose transformation of the target object, the translation matrix T can be obtained, and the calculation Equation is as follows: where K(x k ′ , y k ′ , z k ′ ) is the corner position of the target object after translation. Assuming that the target object's rotation angles along the global coordinate system O-X 0 Y 0 Z 0 are c, λ and μ, respectively, the rotation matrix for different angles of rotation around the X 0 , Y 0 and Z 0 axes can be expressed as If a rotation of an angular value is made around a fixed axis, the rotation matrix can be regarded as a superposition of the rotations of X 0 , Y 0 and Z 0 as rotation axes.
Equation (3) can be used to calculate the relationship between the initial pose coordinate k of the target object's corner and the transformed pose coordinate k ′ [15]:

High Precision Calibration Algorithm for Binocular Stereo
Vision Camera. Camera calibration is the process of comparing the camera system to the measurement standard and determining the camera parameters through coordinate and related factor calculations [16,17]. From two-dimensional data, camera calibration can determine the true location state of the measured object. It is not only a significant step in computer vision research, but it is also a necessary connection in binocular vision noncontact measurement. e accuracy of the stereo vision measurement method is directly affected by whether the computation is accurate or not [18,19].
Internal parameter calibration and external parameter calibration are the two primary types of camera calibration. Table 1 describes the parameters.
External parameters are used to determine the position relationship of camera coordinate system, including rotation matrix and translation matrix. e degrees of freedom of translation matrix and rotation matrix are three, respectively, and a total of six camera external parameters are obtained by adding. ese external parameters usually need to be obtained by experimental calculation [20]. e parameter calculation process can be regarded as camera calibration. Internal camera parameters, such as focal length, optical center, nonvertical factor, and distortion parameters involved in perspective translation, are included in Table 1. External parameters such as the rotation matrix and translation matrix are used to determine the position connection of the camera coordinate system. e degrees of freedom of the translation matrix and rotation matrix are three, respectively, and adding them yields a total of six camera external parameters. ese external parameters are normally derived through experimental calculation [20]. e process of calculating parameters might be thought of as camera calibration.

Internal Parameter Calibration.
ere will be an intersection point between the parallel line and the infinite plane, which is known as the blanking point, according to projective geometry theory. e existence of the blanking point is determined by the line's direction. According to this theory, a blanking point must exist between the camera's optical center and the camera plane. e blanking points can be used to calibrate the camera's internal parameters. It is assumed that there are two blanking points on the camera plane, g and h, in the vertical and parallel directions, respectively, which are connected to the camera's optical center O to produce OG and OH. If the coordinate of the Computational Intelligence and Neuroscience camera's principal point is (d, f) and the coordinates of the blanking points G and H are (g, h), then where f 0 is the focal length of the camera, and T is the transpose symbol. G and H are orthogonal fading point pairs, as the following Equation: OG · OH � 0.
Calculation Equation of hidden points cancelled is shown in equation (8).
e internal parameter calibration of the camera can be accomplished preliminary using equation (8). e camera model is typically split into linear and nonlinear models based on the imaging geometric connection. However, the premise of the linear model is based on an ideal assumption, which can only simply express the relationship between image coordinates and spatial coordinates [21]. ere will be distortion and camera deformity throughout the actual filming process owing to the influence of numerous circumstances. e real imaging position is (U 1 , V 1 ) if the imaging position in the linear model is (U, V).
where β and α are distortion value in transverse and longitudinal imaging direction. Radial and tangential distortion are the most common types of camera distortion. e tangential distortion is usually minor and unnoticeable. As a result, the radial distortion polynomial is used to express the camera distortion value.
where χ represents the radial distortion parameter of the camera. p represents the tangential distortion parameter, and r represents the radial distortion distance dominated by the image center.

External Parameter Calibration using Deep
Reinforcement Learning. Internal parameters are used to calibrate the camera's external parameters. In general, the precision calibration board is chosen to compute the corresponding relationship between camera coordinates and spatial coordinates, as well as to define the structural parameters of the binocular vision system. For external parameter calibration, the deep reinforcement learning algorithm is applied in this study. e deep reinforcement learning algorithm is a new algorithm that was created by combining deep learning and reinforcement learning. It not only has deep learning's feature extraction ability, but also has reinforcement learning's decision-making power. e traditional reinforcement learning algorithm's applicability space is narrow and discrete. Reinforcement learning effectively overcomes the limitation that it cannot be applied to high-dimensional data analysis by optimizing deep learning, allowing it to be well applied to vast spaces practical scenes [22]. Figure 2 shows the deep reinforcement learning framework. e goal of reinforcement learning, as shown in Figure 2, is to learn the best approach through environmental interaction and reward accumulation. It is a constant process in which agents interact with their surroundings in order to attain their objectives. e camera external parameter calibration process can be seen as a reinforcement learning problem, and the optimal parameters can be determined as much as feasible through the camera target and coordinate analysis, according to the description of reinforcement learning.
At the moment, classical reinforcement learning can be classified into three types: value-based reinforcement learning, policy-based reinforcement learning, and actor critical learning, which combines value and policy. e actor critical method is a hybrid of the two ways, having the benefits of the policy method for generating actions and dealing with continuous actions, but it requires the calculation of the value function. As a result, in this study, the actor critical method is chosen to calibrate the camera's external settings. e value function must be calculated, and deep learning is a powerful function calculation tool. When applying deep learning to reinforcement learning, however, it is necessary to use a neural network to fit the mapping relationship, which will form a very complex mapping relationship network, and the parameters must be adjusted continuously, implying that the adjustment and convergence of the value function have become a critical problem. As a result, in order to tackle this challenge, this work examines the structure fitting of deep reinforcement learning. e estimating procedure of the state action value function is frequently done in practice using function approximation, which is stated as where Q(q, a) is the state action value function, where q denotes the state, a denotes the action value, and ϖ denotes the value function's parameter, which is the reinforcement learning parameter. Equation (13) shows the update method for the value function parameter ϖ.
where ϖ 0 is the initial value of the function parameter, and φ is the update coefficient of the value function.
To finish the neural network training, it is required to constantly update the parameters while using a neural network to calculate the value function. is parameter is the value function's parameter. To adjust to the optimal parameters [23,24], the target network is built, and the parameters are updated in hard and soft modes. When the network unit size must be rigorously controlled, it is considered hard mode. e operating steps are fixed in hard mode. Following the completion of this step, the network parameters are updated by copying. When the network unit size is affected by the overall division unit size, it is considered soft mode, and the update value is minimal in soft mode. e target network parameters (neural network parameters) can then be updated and stated as equation (14).
where θ denotes the neural network's initial parameters e updated neural network parameters are denoted by θ ′ , and the value function is denoted by equation (15).
where ϖ ′ is the updated value function parameters, and η is a small value in soft mode, which can help update the parameters properly. According to the equation (15), after n iterations, the value function has the following equation (16).
e parameter convergence of the value function can finally be accomplished after equation (16), that is, (2) Camera External Parameter Calibration. e binocular stereo vision camera data is input, and the external parameters of the camera are calibrated using deep reinforcement learning calculations based on the fitting structure.
Input: sample data is collected by a binocular camera; Output: camera external parameter calibration results.
Reinforcement learning parameters are expressed as value function parameters, and the initial reinforcement learning parameter is ϖ, the initial value of neural network parameters is θ, the deep reinforcement learning structure and related parameters are initialized, and the deep reinforcement learning binocular stereo camera parameters are calibrated. Deep reinforcement learning is used to calibrate the parameters of a binocular stereo vision camera.
(1) Half of the binocular stereo vision cameras in the experimental data set were chosen to collect target data as training samples (2) e numbers of hidden layers and nodes of the neural network are determined based on the size of the training samples (3) e fitting structure of deep reinforcement learning is constructed, as shown in Figure 3 (4) e neural network is utilized to fit the camera data in order to obtain the value function, and the target network's value function and parameters are established (5) Repeat the iterative value function and neural network, using equations (15) and (16)

Experimental Analysis and Results
To evaluate the performance of the binocular stereo vision camera's high-precision calibration algorithm based on deep reinforcement learning, an experimental binocular stereo vision system is constructed.

Experimental Environment.
e vs2019 development platform has been completed. e simulation data is run on Windows 10, and the algorithm is developed in opencv2.49. Table 2 shows the experimental apparatus, which consists of two cameras, two chess and card grid calibration boards, and a computer.
AutoCAD software is utilized in the experiment to construct chess and card images, develop and print them, and create a calibration board, as illustrated in Figure 4.

Data Set.
e experimental data are drawn from two common data sets as well as a visual system measurement data set: the KITTI data set, the cityscapes data set, and the visual system measurement data set. e KITTI data set is the world's largest automatic driving scenario visual measurement dataset, and it is utilized for visual ranging, target detection, and tracking. e data gathering platform is outfitted with four cameras, one sensor, and one GPS navigation system to collect image data in a variety of scenarios such as cities, towns, and roads, including 389 pairs of stereo images and optical flow diagrams. e cityscapes data set is of a vast order of magnitude, containing street stereoscopic images of 50 distinct cities as well as numerous pixel level annotations, including 5,000 highquality pixel level annotations and 20,000 poor annotations. e data set is ideal for training deep neural networks. A vision system measurement data set: the vision system collects stereoscopic images of six streets using binocular cameras, yielding a total of 20,000 images with a pixel resolution of 1280 × 960. During the experimental test, 1000 images are chosen from each of the three data sets mentioned above, for a total of 3,000 images evaluated. e first half of the data is utilized to train deep reinforcement learning algorithms, while the other half is used for experimental testing.
e studies were performed in the same noise and light environment to ensure the image acquisition impact. Two groups of studies were conducted, each with a 10 mm and 20 mm chess and card grid calibration board. e binocular stereo vision system captured a total of 1,000 images. At the same time, the collected image is filtered and preprocessed to strengthen the image edge information in order to increase image quality and prevent interference from external variables such as noise and illumination. To improve calibration board accuracy, the dimensions of the two chess and card grid calibration boards are 10 mm and 20 mm, respectively, and the measurement field of view is 7m × 6 m, chess and card grid calibration plates are randomly placed in the camera system's measurement field, and the spacing between the two calibration plates is 4 m.

Evaluation Criteria
(1) Calibration precision: is study proposes a calibration algorithm with great precision. To validate the algorithm's completion impact, a special comparative examination of calibration accuracy is required. e error is a method of expressing the precision of the calibration results. e calculation Equation is shown in equation (18).    Computational Intelligence and Neuroscience where e is calibration error. (x w , y w ) and (x w ′ , y w ′ ) represent the real coordinates and measurement coordinates of the target pixel, respectively. (2) Convergence of value function: Convergence of the value function: the convergence of the value function is one of the keys to realizing the fit between deep learning and reinforcement learning in the use of deep reinforcement learning algorithms. As a result, this experiment draws the value function network loss function curves of several algorithms to ensure that this method is convergent. (3) Parameter calculation accuracy: Parameter adjustment is also one of the keys to realize the fitting of deep learning and reinforcement learning. erefore, parameter calculation accuracy is also an effective index to show the performance of the proposed algorithm. e accuracy calculation Equation is as follows: where L tot represents the actual number of parameter calculations. L 1 is the number of correct parameters in the calculation result. (4) Camera calibration time consumption: Camera calibration is an important prerequisite in the application of binocular stereo vision system. It is very important for the vision system to complete camera calibration quickly. (5) Stability of calibration results: e stability of the calibration results of the proposed algorithm is compared with those of Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12]. e measurement of stability is based on the change of camera calibration result data sequence. It is assumed that the calibration data series has the same keywords. If the relative order of these terms does not change after sorting, the algorithm is stable.

Comparison of Calibration Precision.
is paper's main goal is to achieve high precision calibration of a binocular stereo vision camera. As a result, the proposed algorithm is compared to the algorithms in Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12] algorithms in order to reflect the efficiency of the algorithm established in this work as shown in Table 3.
It can be seen from Table 3 that the test findings are quite important. e calibration errors of the algorithm are 0.36% and 0.35% for 10 mm and 20 mm chess and card grid calibration plates, respectively. In comparison to other literature, the minimum calibration error of Literature [7] under two chess and card grid calibration boards is 0.90%, the minimum calibration error of Literature [8] under two chess and card grid calibration boards is 1.66%, the minimum calibration error of Literature [9] is 1.94%, the minimum calibration error of Literature [11] is 5.20%, and the minimum calibration error of Literature [12] is 1.74%. When we compare the proposed algorithm with five traditional literature algorithms, we can clearly see the advantages of proposed algorithm, demonstrating that the deep reinforcement learning algorithm used in this paper for camera calibration has very high precision and a better calibration effect than the traditional literature algorithm.

Comparison of Convergence of Value Function.
e loss function curve of the value function network is drawn by using the number of iterations as the abscissa and the mean square loss as the ordinate as shown in Figure 5.
According to Figure 5, each algorithm eventually converges, and the loss of mean square error reduces as the number of iterations grows. When comparing the proposed algorithm's convergence speed to that of the five traditional literature algorithms, it is clear that when the number of iterations is close to 30, the trend of the proposed algorithm's loss function curve begins to gradually tend to be stable, the mean square deviation loss is close to 0, and the value function's convergence is completed. After 70 iterations, the algorithms in Literature [8,11] and Literature [12] rapidly converge. e convergence of the Literature [7] and Literature [9] algorithms is relatively poor, with a minimum root mean square error of more than 0.2 after convergence. It can be seen that the proposed algorithm's convergence speed is quick, and the convergence effect is good, demonstrating the effectiveness of the value function convergence of the design target network.

Comparison of Parameter Calculation Accuracy.
is study modifies the value function parameters in reinforcement learning and uses neural network to continually update the parameters to complete the fitting between deep learning and reinforcement learning. e precision of parameter calculation is then critical for camera calibration. It is impossible to acquire accurate calibration results if the accuracy of parameter calculation is low. Figure 6 depicts the comparison result of parameter calculation accuracy. e neural network is utilized to update the parameters of the value function, as shown in Figure 6. e modification of the median function of reinforcement learning may be performed with high accuracy through numerous iterations, and the maximum computation high accuracy is about 95%. e algorithm in Literature [9] has a relatively good calculating effect on parameters, with the highest accuracy of around 80%. However, it is still very different from the proposed algorithm. e results of the data comparison can be used to demonstrate the benefits of the proposed algorithm, validate its performance for parameter computation, and ensure the high accuracy calibration of binocular vision camera parameters in this study.

Comparison of Camera Calibration Time
Consumption. Table 4 shows the camera calibration time consumption results. Table 4 shows that when different data sets are used as data sources to assess the calibration time consuming of algorithm, the test results are quite significant. e calibration time consuming of the algorithm in the KITTI data set, cityscapes data set, and vision system measurement data set is 5.2 s, 6.2 s, and 5.3 s, respectively, with an average time consuming of 5.6 s. e proposed algorithm is faster than the average time of algorithms in Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12]. e deep reinforcement learning technique has a very efficient operation rate, which can effectively improve the camera calibration in this work.

Comparison of Stability of Calibration Results.
e stability comparison results of camera calibration results are shown in Figure 7.
According to the data in Figure 7, the proposed algorithm's stability is substantially higher than that of the other five literature algorithms, and the overall stability is controlled at approximately 92%. Among other algorithms, the highest stability of Literature [7], Literature [8], and Literature [9] algorithms is close to 80%, while the stability of Literature [11] and Literature [12] algorithms is almost 60%.
is clearly demonstrates the benefits of the proposed   The proposed algorithm Literature [7]algorithm Literature [8]algorithm Literature [9]algorithm Literature [11]algorithm Literature [12]algorithm  Computational Intelligence and Neuroscience binocular vision camera calibration algorithm, which can eliminate external interference and improve algorithm stability.

Conclusions and Future Works
is paper proposes deep learning to improve reinforcement learning, creates a deep reinforcement learning fitting structure, and investigates the calibration process of a binocular stereo vision camera. e camera's internal and external parameter calibrations are explained in depth, and the proposed algorithm is validated through experimentation. e results show that the proposed algorithm is capable of completing the camera's high precision calibration and has some theoretical utility. is study still has several flaws, and the numerous properties of the camera target are not thoroughly explored. Future works are required to account for target distance, image color, and other parameters, in order to improve the application efficiency and scope of the camera and unlock more possibilities.

Data Availability
Readers can access the data supporting the conclusions of the study from KITT data set and cityscapes data set and measurement data set of a vision system.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Computational Intelligence and Neuroscience 9