Autonomous Piloting and Simulation on Underwater Manipulations Based on Vision Positioning

ROV equipped with an underwater manipulator plays a very important role in underwater investigation, construction, and some other manipulations. Moving ROV precisely and operating an underwater manipulator to grasp and move some objects are frequent operations in underwater manipulations. At the same time, they also consume a lot of the physical strength of operators, which seriously degrades the efciency of underwater manipulations. In this paper, a scheme of grasping the rod-shaped object autonomously is proposed. In the proposed scheme, two cameras are arranged on the ROV frame to form a stereo vision system, and then, the parameters of the position in space of the rod-shaped object are calculated from the stereo images. Accordingly, the ROV is driven, and the manipulator is controlled according to these parameters such that the end efector of the manipulator can clamp the rod-shaped object exactly. As a result, the task of capturing an object is completed autonomously. In this paper, images of the scene about underwater manipulations are simulated with the marine engineering simulation software Vortex Studio, and the position parameters of the rod-shaped cable in the scene are obtained by the algorithm proposed in this paper, and the displacement to move the ROV and the joint angles to operate the manipulator are obtained consequently. Terefore, the feasibility of autonomous capture underwater object is verifed.


Introduction
For a long time, work class ROV (remote operated vehicle) has been widely used in marine scientifc research, marine resource development, marine military security, and so on. It plays an important role in seabed operation, including pipe inspection, salvage of sunken objects, mine disposal, surface cleaning, valve operating, drilling, rope cutting and geological sampling, archaeological work [1].
Traditionally, work class ROV is teleoperated by at least 2 highly skilled operators, who stay in the surface mother ship or other shore-based control room. One in charge of piloting the ROV tries to move ROV as desired and keep as stable as possible by compensating for external motion disturbances (sea current, waves, tides) and ROV motion induced by the manipulator's reaction forces/moments, and the other operator performs the actual teleoperated manipulation task [1,2]. During this procedure, the scene of underwater manipulations is captured by the camera installed on the ROV and transmitted to the control room, and the operators observe the video, analyze the scene situation, decide the motion of the ROV or manipulator, and then issue the motion control commands. Obviously, operating work class ROV requires many skills of operators and consumes a lot of labor strength. What is more, the cost of training a qualifed operator to complete underwater manipulations is very high. Terefore, if some operations of ROV or manipulator can be done autonomously, the requirements of skill and labor intensity will be reduced, and consequently, the efciency of underwater manipulations will be greatly improved.
In recent years, Schjølberg and Utne have graded the autonomy in ROV operations [3]. Tey considered that current ROV operations are mainly in levels a, b, and c-from direct control of equipment by personnel to remote control within the visual range, and then to remote control based on the remote video. Te ROV LATIS [4] is the frst project to demonstrate levels d and e, which is covered by future autonomous ROV operations. In level d, logic-driven vehicle control is semiautonomous control where some operations are performed through automatically generated wave points.
In level e, logic-driven with goal orientation is that the operations are performed autonomously when high-level task instructions are uploaded. Trslic et al. studied the autonomous docking of work class ROVs [5]: vision-based pose estimation techniques are used to guide ROV for autonomously docking on both static and dynamic docking stations (Tether Management System-TMS). Several lights are mounted at the back of the TMS, and the position and pose of the ROV relative to the TMS are estimated through the image of the lights in the ROV camera. Here, attention was paid to the situation that some light markers are fully covered by the TMS frame or the tether, and the pose of the ROV cannot be estimated consequently. Te pose of ROV in the new time point is estimated with the previous pose and the motion information. In this way, the problem of positioning continuity has been addressed.
Peñalver et al. studied ROV autonomous manipulations on an underwater panel mockup [6], where an underwater manipulator is automatically controlled to open/close a valve and to plug/unplug a hot-stab connector by using visually guided manipulations techniques. Markers were arranged on the manipulator and the panel, respectively, and then were observed simultaneously by the same camera on the frame of the ROV. By locating the markers in the coordinate system of the camera on the ROV, the relative position between the object to operate and the end efector of the manipulator can be determined. And, therefore, the manipulator will be controlled automatically to reach the target and consequently operate it-open/close a valve or plug/unplug a hot-stab connector. In order to estimate the joint angles of a hydraulically actuated manipulator for commercial use, Sivčev et al. [1] arranged a plane marker called AprilTag [7,8] on the manipulator and then observed it with a camera on the frame of ROV-to determine the transformation matrix between the local coordinate system of the marker and the one of the camera frst, and then derived the transformation matrix between the manipulator and the base of the manipulator. Finally, the joint angles of the manipulator were estimated consequently. At the same time, the position of the object to operate was determined in a similar way. To overcome the difculty about the large delay of the visual servo system, they proposed a control solution between open loop and complete closed loop with variable steps approaching the target.
Kawamura et al. [9] proposed a control method on manipulator motion based on a calibration-free visual servo system. Some mark points were set up on the manipulator, and a stereo camera was arranged to locate the manipulator. Servo control was performed with the diference between the image of the marker point and that of the target position, and therefore, the accurate position of the target in the world coordinate system was avoided. However, a very important problem was ignored in [9], that is, how to obtain the imaging position of the target position in the camera, which will directly decide the feasibility of this method in engineering applications. In order to improve the autonomy of underwater investigation missions, García et al. [10] divided the underwater investigation task into multiple subtasks and introduced the techniques such as image segmentation and target recognition. Taking intelligent grasping as an example, the human-computer interaction and user interface were designed in detail.
With the evolution of the convolutional neural network (CNN), object detection in the underwater environment has gained a lot of attention. Naseer et al. [11,12] employed a CNN-based detector to detect the Norway lobster Nephrops norvegicus burrows from underwater videos and proposed a detection refnement algorithm based on spatial-temporal analysis to improve the performance of generic detectors by suppressing the false positives and recovering the missed detections. Te CNN-based detector is good at classifying images, but with less accuracy in positioning.
For a commercial underwater manipulator HLK-HD6W, a vision-based pose estimation algorithm was stated in [13]. Marker points were arranged on the arm link of the manipulator, and cameras were installed on the ROV body to image these markers, and consequently, the joint angles of the manipulator were estimated as the input data to automatically control manipulator. Based on this work, this study in this paper will develop a method to identify underwater rod-shaped objects and extract the position parameter of it, and then plan the motion path of underwater manipulator so as to autonomous grasp the rod-shaped object. Tis paper is organized as follows. In Section 2, how to identify rod-shaped object and extract its parameter from images is described in detail. In Section 3, the method about how to automatically control a 5-DOF manipulator to grasp rod-shaped object is stated. In Section 4, the identifcation and positioning of rod-shaped object and the motion control of the manipulator are simulated and analyzed. Finally, in Section 5, the feasibility of autonomously operating the manipulator to grasp an underwater object based on a visual image system is summarized.

Identification and Positioning of Rod-Shaped Objects
Images of an underwater rod-shaped object taken by a stereo camera with parallel optical axes are shown in Figure 1, respectively, from left camera and from right camera. In Figure 1, the selection boxes indicate regions of interest selected by manual intervention for processing, which will be described in detail later. In order to position the rodshaped object in 3D space, frst, the object needs to be segmented from the backgrounds in both the left and right images, and then the center line of the rod-shaped object is determined consequently; Ten, the position of the center line in 3D space is calculated according to the diference of images of center lines between the left image and the right image, so as to provide input data for the automatic control of the manipulator.

Identifcation of Rod-Shaped Object.
In order to identify the rod-shaped object from an image, frst, the edge detection method is taken to fnd the boundary between the rod-shaped object and the background. Ten, the Hough transform is a token to detect the straight line among the boundary line. Finally, parallel line groups are expected to fnd from these lines, which may be the boundary lines on both sides of the rod-shaped object, and then, the center line of the rod-shaped object is estimated from the boundary line parameters. Edge detection in digital image processing is actually a very classical problem. Many methods have been proposed until today, and they all deal with gradient calculation. Tese methods have been integrated into the open-source computer image library-OpenCV [14], where the Canny [15] method is particularly the commonly used one.
Further, the Hough transform algorithm [16] is often used to identify line parameters from boundary images. In the Hough transform, the line equation in the plane rectangular coordinate system is described as where u, v are the image coordinates in pixels, the origin is located in the upper left corner of the image, and vector n l � (n x , n y ) T � (cos θ, sin θ) T is the line's perpendicular direction. In other words, θ is the perpendicular angle, and ρ is the distance from the origin to the line in pixels.
Any point in a plane may belong to multiple lines. When the perpendicular direction is fxed, the distance from the origin to the line ρ can be calculated with (1). In line detection with the Hough transform, resolutions (region width) are set for the distance and direction parameters (Δρ, Δθ), and then, every point in the image is visited, while the candidate line parameter is calculated for it, i.e., for every (2) Taking Δρ as the distance resolution, the above formula can be rasterized.
Terefore, the transform from the image pixel feld to the Hough parameter feld has been established. In the Hough parameter feld, every point on the boundary edge votes for the potential parameter and results in a histogram graph on parameter N(ρ j , θ j ), which stands for the number of pixels located in line with parameter (ρ j , θ j ). Terefore, the parameter with the largest votes is the longest line. In the image library OpenCV, since line detection with Hough transform has been implemented, it is only needed to setup resolution for line parameters (Δρ, Δθ) by using the library function, which results in parameter pair (ρ, θ) sorted in votes reversely.
Te line parameters obtained directly by Hough line detection are discrete with resolutions (Δρ, Δθ). Tis will bring some bad consequences: (1) the discretization of the direction of the line will bring larger error to it. (2) Tis error will make it possible for a line to be divided into multiple segments, which are identifed as multiple parallel lines. When it is necessary to determine the line parameters more fnely, this result may not meet such a requirement. However, simply refning the resolution parameters-increasing the resolution (Δρ, Δθ)-will reduce the number of votes for every grid in the parameter pair, which may reduce the efect of line detection. Tis is the question of Hough's transform in substance. Tough there are multiscale Hough transform and progressive probabilistic Hough transform algorithm in OpenCV, the Hough transform cannot solve this question itself by tuning parameter resolutions.
Terefore, in this study, it is envisaged to collect edge points near the line originally detected by the Hough transform, and then, new line parameters are ftted from these points. Tis process of fnding points and ftting new lines can be performed many times. On one hand, it modifes the parameter results obtained by the Hough line detection. On the other hand, it is also possible to combine the original multiple line segments into one, which becomes a longer straight-line segment. Even if the "straight line" is of a little bend, the parameter pair of ONE line will be ftted in this way.
For a candidate line parameter pair (ρ c , θ c ), which is coarse one, edge points whose distance from the line is less than a threshold are collected. For every edge point (u, v), the distance from the line is calculated as If this distance is less than a threshold ε, the point is marked as an edge point corresponding to the candidate line, denoted as points set After diferentiating the above object function Ψ(ρ, θ) with respect to its independent variables, and letting them be zeros, we have Reforming (6), we immediately have Substituting (8) into (7), we have Terefore, parameters (ρ, θ) of line can be refned by (8 and 9).
After line parameters of both side boundaries of rodshaped object, which are "parallel lines," have been obtained, the center line of rod-shaped object can be calculated by averaging the parallel lines.
In a plane, it is easy to calculate the direction vector of the line d � (d u , d v ) T from its normal vector n l � (n u , n v ) T , in the form of where the arbitrary direction of the line is picked, and the opposite direction is also appropriate. Terefore, the line may be expressed in the form of the parametric equation where (u A , v A ) T is the coordinate of any point on the line, and d is the direction vector of the line, t ∈ R the independent parametric variable.

Positioning of Rod-Shaped Object.
According to the method in the previous section, the position of the center lines of the rod-shaped object, respectively, in the left and right images can be obtained, which appear as straight lines. Tey actually are the projections of the same line from diferent perspectives. In order to determine the position of the rod-shaped object in space, the position parameters of the spatial line need to be recovered from the two projection lines. With the pinhole camera model, a point on the imaging plane corresponds to the ray starting from the optical center of the camera and passing through the imaging point in 3D space. And similarly, a straight line on the imaging plane corresponds to the plane determined by the camera optical center and the imaging line in 3D space.
If the virtual imaging plane of the camera is located on the plane z � 1 in the camera coordinate system, the projection relationship between a point P(x, y, z) in 3D space of the camera coordinate system and the point P I (x, y) in the imaging plane can be expressed as where x, y, z are the coordinates of point P in the camera coordinate system, and x, y is the coordinates of the projection P I on the imaging plane. Given the pixel resolution of the imaging sensor, k, the coordinate of the point in the pixel coordinate system can be obtained as where u c , v c is the coordinate of the center of the image in the pixel coordinate system. Substituting (13) into (1), the line equation in the imaging plane may be expressed as For convenience, we introduce a new symbol ρ which is defned as Tere is a line l in the imaging plane-the plane z � 1 in the camera coordinate system, which can be expressed in (1), as shown in Figure 2.
When the perpendicular direction vector n l � (n x , n y ) T is the unit vector, i.e., n 2 x + n 2 y � 1, the intersection of the perpendicular and the line l, denoted as point A, can be expressed as In the camera coordinate system Oxyz, the coordinates of the point A should be expended to 3 components, that is, (ρn x , ρn y , 1) T . Meanwhile, the direction vector of a line, which can be derived from the direction of its perpendicular, resulting in (− n y , n x ) T , is located in plane z � 1 and expanded into the 3D vector in the camera coordinate system, resulting in d → � (− n y , n x , 0) T . Line l, which is located in plane z � 1, and origin O of the camera coordinate system defne a plane, where any line will project into the line l perspectively. Terefore, the normal vector of this plane can be derived from the vector product of the line l and the line from the origin to point A, as follows: It should be noted that the normal vector expressed by the above formula is not a unit vector. Since the plane passes through the origin of the camera coordinate system, it can be expressed as the equation about the normal vector and the point A as follows: where p(x, y, z) is any point on the plane, which is defned in the camera local coordinate system. 0 is null vector in the local coordinate system, which may become nonzero vector after the above equation is transformed into a new coordinate system. It is given that the transformation matrix and the ofset vector from the left and right cameras to the global coordinate system are R L , R R and t L , t R , respectively. Ten, in the left camera, the plane defned by the image line l L and the optical center, denoted as plane p L , can be expressed in the global coordinate system as p L (x, y, z) · R L n p,L � t L · R L n p,L , where n p,L is the normal vector of the plane defned in the local coordinate system of the left camera. In a similar way, in the right camera, the plane defned by the image line l R and the optical center, denoted as the plane p L , can be expressed in the global coordinate system as Tese two planes intersect in space, as shown in Figure 3, and therefore, the intersection line l c must be perpendicular to the normal vectors of both planes simultaneously. Subscript . ·,g of the symbol in the fgure stands for the vector described in the global coordinate system. Consequently, the direction vector of the intersection, d c , can be derived by the vector product of the normal vectors of these two planes d c � R L n p,L × R R n p,R . (21) After obtaining the direction vector, it is necessary to determine a point on it to defne the line l c in space. Terefore, moving the point located in the optical center of the left camera-denoted by t L , a distance λ along the perpendicular of the line l c in the plane p L , reaching a new point C, which is located in plane p R . Tat is to say, the point C, located in both plane p L and plane p R , is located in the intersection line l c .
where λ is an undetermined coefcient, which can be solved as where the denominator is the scalar triple product of tree vectors d c × R L n p,L · R R n p,R � R L n p,L × R R n p,R · d c .
It can be seen from (21) if the normal vectors of the two planes are not in parallel, d c will not be zero and (23) is always valid; that is, the intersection of the two planes exists and keeps unique. When the center line of the observed rodshaped object is in parallel with the line defned by the optical centers of the cameras in the stereo vision system, the two planes coincide. In this case, the normal vectors of the two planes are in parallel, and (23) is invalid. Consequently, positioning based on a stereo vision system fails.
After the coefcient λ is determined, the coordinates of the point C in the global coordinate system can be determined too.
where λ is shown in (23), and all the others are also known. Terefore, the intersection, which is the center line of the rod-shaped object, is determined in the parametric form in the global coordinate system.
where μ is the independent parametric variable, whose value corresponds to every point on the line.

Scheme on Semiautonomous
Manipulating. Based on the theoretical method described in the previous section, ROV semiautonomous manipulating system can be designed to grasp rod-shaped objects. Tis study focuses on the combination of manual operation and machine intelligence, rather than the full-autonomous manipulating method only with the help of machine intelligence. Terefore, the traditional operation console and the corresponding control software are still set up to display the underwater manipulation scene observed by the camera and receive user operation commands. Tis system fully syncretizes human operation and machine automation so as to construct an efcient semiautonomous manipulating process about rodshaped objects. Te whole process of clamping rod-shaped objects can be divided into the following steps: (a) manually move the ROV to a suitable area, where the rod-shaped object can be observed completely by the stereo vision system on the ROV. During this procedure, the operation accuracy is not high, which is suited to be operated manually. (b) accurately control the ROV to move with a small distance and operate the manipulator, such that the end efector of the manipulator clamps the rodshaped object exactly. Tis procedure needs a very accurate operation, which requires operators to be skilledly trained and is generally the most timeconsuming procedure. (c) perform cutting or moving operations on the clamped rod-shaped object, including moving the ROV. Relatively, this is another operation procedure with low precision that is suited to be operated manually.
Te whole process with manual and automatic operations is summarized in Figure 4.
Since it is the most time-consuming and skill-demanding procedure to accurately control the ROV and operate the manipulator to clamp the rod-shaped object, this study focuses attention on a scheme designed to automatically control during the procedure. To start the automatic control procedure on grasping the rod-shaped object, an operator only needs to select the approximate area where the rod-shaped object is located and point out the approximate spot to grasp the rod-shaped object, as shown in the following steps: (a) manually move the ROV to a suitable area, where the rod-shaped object can be observed completely by the stereo vision system (b) pick regions of interest for the target and grasping point on images, and then start the automatic control procedure, whose details are as follows: (i) pick the regions of interest for the rod-shaped object, respectively, in the images from the left and right cameras of the stereo vision system (ii) pick the approximate picking point on the rodshaped object in the image from either the left or right camera (iii) push the "START" button on the user interface to start the automatically grasping procedure, during which a computer will process images, identify and locate the rod-shaped object, and then drive the manipulator to clamp the rodshaped object (c) continue the postgrasping operation, such as cutting, moving, which are operations with low accuracy, suited to be operated manually In the above operation process, the procedure requiring the highest operation accuracy and expending the largest workload is done by computer-aided automatic control based on visual positioning. It can greatly reduce the requirements for professional skills of operators and the threshold of underwater manipulation. And consequently, it is of great value in engineering application. After the manipulator grasps the target object, the procedure of automatic operation terminates, and the underwater manipulation task turns to the manual operation mode to dispose of the target object or other intelligent workfow to fnish further tasks, which is out of the scope of this study.

Automatic Piloting Method.
Based on the method described in Section 2.2, the position parameters of the centerline of a rod-shaped object, which is a straight line in 3D space, can be obtained. During the automatic grasping rodshaped object on the graphic user interface, after the operator points out the approximate area where the target object is located on and the approximate spot near the target object to clamp, the automatic control algorithm will search lef camera right camera global coord. sys.
x C x y y z z l c Figure 3: Intersection of planes corresponding to lines in images from stereo cameras.
for the grasping point on the rod-shaped object which is near the spot pointed out by the operator. It is assumed that the coordinate of the grasping spot M specifed by the operator on the left image is (x M,L , y M,L ) T . Ten, the point on the image corresponds to a line in 3D space, which can be expressed in the left camera coordinate system as where ξ is the parameter of the line equation in parametric form, and d M is the direction vector of the line. Transforming the equation of the line from the left camera coordinate system to the global one can be done as follows: Tis line specifes the grasping point expected by the operator, which is not accurate; that is to say, the real grasping point should be one near this line and will be the nearest one to the line.
Te centerline of the rod-shaped object is described by (26), where the real grasping point, denoted as M ′ , should be located. Furthermore, the point M ′ should be the nearest point to the line described by (28). And now, the line MM ′ is perpendicular simultaneously to the lines described by equations (26) and (28), whose direction vector is denoted as Using the method of undetermined coefcients, the position of the grasping point can be expressed as where μ, ξ, η are the undetermined coefcients. It is a linear equation system with 3 unknown variables that can be solved as where the vectors d c , R L d M , d ⊥ are linearly independent. Te linear equation system in (31) has one unique solution, and accordingly, the three unknown variables μ, ξ, η are obtained. Only when d ⊥ � 0, the lines determined by (26) and (28) are in parallel, and so there is no unique point pair with the minimum distance.
So far, the position of the destination at which the end efector of the underwater manipulator should reach has been obtained, which is expressed as p C + μd c in the global coordinate system. To grasp the rod-shaped object, the end efector should clamp along the direction d c at the specifed position. Terefore, the end efector of the underwater manipulator is constrained in 5 degrees of freedom. As described in [13], the underwater manipulator model HLK has exactly 5 degrees of freedom in the end efector. Terefore, using this underwater manipulator, according to the destination of the end efector, it is possible to solve the angles of the joints of the underwater manipulator by inverse kinematics. Consequently, automatic control of the manipulator to grasp a rod-shaped object is achieved.

System Confguration.
Te tested ROV is equipped with a stereo vision system with parallel optical axes and underwater manipulator model HLK, shown in Figure 5. A satellite coordinate system is fxed on the ROV, so as to describe the confguration of the ROV, object to operation and the manipulator.   According to the general convention of camera optics, the origin of the camera local coordinate system is located at its optical center, and the z axis coincides with the optical axis and points to the object to observe. At the same time, x, y axes are parallel to the axes in the image coordinate system, respectively. In the ROV's satellite coordinate system, as shown in Figure 5, the installation position and attitude parameters of stereo vision system and manipulator are shown in Table 1.
Here, the distance of both optical axes of the stereo vision system is 1000 mm, which is located in the front and top of the ROV's frame and points ahead. Te horizontal feld of view of the camera is 63°, and the resolution of the image is 1920 × 1080. Figure 1(b) are taken as input, the image data in selected regions are converted from RGB color space to HSV color space and then are binarized by the hue component. Ten, Canny edge detection is employed on the binarized image, resulting in Figure 6(a). Now, what we need to do is to fnd parallel line pair as the side boundary of the rod-shaped object from these points.

Result of Simulation. While images shown in
First, the Hough transform is employed to detect lines among the edges with the resolution parameters 4.5°in circumference and 2.5 pixels in radius. As a result, 4 straight lines have been detected, as shown in Figure 6(b). Due to the little bend in the edge of the object, two linear segments with diferent lengths are identifed on both sides of the edges, and as a result, parameters of 4 lines are obtained. Second, the procedure to refne the line parameter described in the previous section is applied to collect the edge points adjacent to the lines with similar parameters and then ft the edge points into a line, respectively, which are shown in Figure 6(c). In this procedure, points in distance 3 pixels from the line are collected for ftting refned parameters. It can be seen that the edge lines on the same side have been identifed as a coincident straight line in substance. In fact, refned again, the results will be better, as shown in Figure 6(d). During this process, the parameters of each straight line are shown in Table 2, where the parameters of each group are in the form (ρ, θ). However, for the sake of convenience, the direction of line is expressed in degrees rather than radians. Because of the errors of imaging and image processing, there is an angle about 0.72°between the "parallel lines." Similarly, there are 3 lines detected from the image from the left camera, in which one edge is detected as two distinguished line segments with similar parameters. After refning the procedure, parameters of "more accurate" lines are obtained, as shown in Table 3.
In the process of refnement, 2 lines detected from a side edge have been combined into one new line, whose parameters are (644px, − 12.23°). For the other side edge, it is one line for original detection, and the direction becomes more accurate after refnement.
Averaging is done for lines, which stand for the side edges of the rod-shaped object, as shown in Tables 2 and 3. Consequently, parameters for center lines of the rod-shaped object in the left and right cameras, respectively, can be obtained, as shown in Table 4.
Finally, with parameters of installation of the stereo vision system, the direction of the center line of the rodshaped object in ROV's satellite coordinate system can be obtained as (0.10, 0.02, 0.10) T , while the grasping point preferred by the operator is located at (2.26, − 0.40, − 0.86) T [m]. As described in [13], the underwater manipulator cannot reach the grasping point, whose maximum extension is about 1.5 m. To grasp an object, it is needed to move ROV additionally and be nearer to the  . Consequently, automatic controlling of the underwater manipulator to grasp rod-shaped object has been achieved.

Conclusion
Accurately moving the ROV and operating the underwater manipulator to grasp and place objects play an important role in underwater manipulations using a ROV equipped with underwater manipulators. When operators watch the scene of underwater manipulation on television, which is lack of 3d spatial information, it is difcult for operators to determine the relative position between the object and the end efector of the manipulator. And consequently, it is very difcult to operate the underwater manipulator to grasp the object underwater. To solve this problem, a scheme about autonomously grasping rod-shaped objects is proposed in this paper: frst, a stereo vision system is arranged on the ROV frame to take a photo of the rod-shaped object to grasp. Ten, the edge lines of the rod-shaped object in the images of respective cameras are detected, and the center line of the rod-shaped object is obtained. Furthermore, according to the installation parameters of the stereo vision system and the images in respective cameras, the position of the center line of the rod-shaped object in the ROV's satellite coordinate system is obtained. Finally, the joint angles of the manipulator to grasp the object are solved according to the relative position between the rod-shaped object and the ROV. When the object is out of the workspace of the underwater manipulator, it is also necessary to drive the ROV nearer frst. In this way, automatic controlling the underwater manipulator to grasp rod-shaped object has been achieved.
In this paper, simulation software Vortex Studio [17], which is extensively used in marine engineering, is employed to simulate the scene of an ROV carrying an underwater manipulator to grasp cable underwater and generate the images which should be observed by the stereo vision system fxed in the front of ROV. Taking this image as the input, the relative position between the cable and the ROV is obtained successfully, and then, the ROV motion and the joint angles of the underwater manipulator to grasp the object are calculated. As a result, the feasibility of an autonomously operating underwater manipulator to grasp a rod-shaped object is validated.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.