Underwater Object Tracking Using Sonar and USBL Measurements

In the scenario where an underwater vehicle tracks an underwater target, reliable estimation of the target position is required.While USBL measurements provide target position measurements at low but regular update rate, multibeam sonar imagery gives high precision measurements but in a limited field of view. This paper describes the development of the tracking filter that fuses USBL and processed sonar image measurements for tracking underwater targets for the purpose of obtaining reliable tracking estimates at steady rate, even in cases when either sonar or USBL measurements are not available or are faulty. The proposed algorithms significantly increase safety in scenarios where underwater vehicle has to maneuver in close vicinity to human diver who emits air bubbles that can deteriorate tracking performance. In addition to the tracking filter development, special attention is devoted to adaptation of the region of interest within the sonar image by using tracking filter covariance transformation for the purpose of improving detection and avoiding false sonar measurements. Developed algorithms are tested on real experimental data obtained in field conditions. Statistical analysis shows superior performance of the proposed filter compared to conventional tracking using pure USBL or sonar measurements.


Introduction
Tracking underwater targets presents a great challenge in marine robotics due to absence of global positioning signals that are usually available in areas reachable by satellites.In order to tackle this problem, acoustic based sensors such as LBL (long-baseline), SBL (short-baseline), and USBL (ultrashort-baseline) are used for underwater localization and navigation, by triangulating responses obtained from acoustic beacons.While LBLs require inconvenient deploying of underwater beacons around the operational area, USBLs that enable relative underwater localization using acoustic propagation are most often used for tracking underwater objects.The greatest advantage of USBL systems is their easy deployment (the system consists only of two nodes, a transmitter and a transducer) and relatively long range.On the other hand, the precision of USBLs deteriorates with distance and multipath issues may arise.In addition to that, due to acoustic wave propagation, measurements are sparse (arriving at intervals measured in seconds) and time is delayed depending on the distance between the receiving and the transmitting node.
Besides using USBL devices, multibeam sonar devices (also known as acoustic cameras) are commonly used underwater in order to get relative position measurements.While state-of-the-art multibeam sonars provide almost real-time acoustic image at high frequency with high precision, they are characterized with limited field of view and usually lower range.Unlike USBLs, sonars require additional acoustic image processing in order to obtain position of an object within the field of view, which can often result in false measurements due to noise.
The objective of work presented in this paper is to exploit the advantages of both USBL and sonar devices by fusing their measurements for the purpose of achieving precise and reliable underwater object tracking.The main contributions of this paper are (i) development of the tracking filter that fuses USBL and processed sonar image measurements with diverse characteristics, for the purpose of obtaining reliable tracking estimates at steady rate, even in cases when either sonar or USBL measurements are not available or are faulty; (ii) adaptation of the region of interest within the sonar image by using tracking filter covariance transformation for the purpose of improving detection and avoiding false sonar measurements; (iii) experimental validation (in field conditions) of the developed tracking algorithms together with comparative analysis that demonstrates the quality of the obtained results.
The main motivation for the presented work arises from the FP7 "CADDY-Cognitive Autonomous Diving Buddy" project that has the main objective to develop a multicomponent marine robotic system comprising of an autonomous underwater vehicle (AUV) and an autonomous surface marine platform that will enable cooperation between robots and human divers.Three main functionalities of the envisioned system include "buddy slave" that assists divers during underwater activities, "buddy guide" that guides the diver to the point of interest, and "buddy observer" that monitors the diver at all times by keeping at a safe distance from the diver and anticipating any problems that the diver may experience.
In the context of the CADDY project one of the main prerequisites for executing envisioned control algorithms and ensuring diver safety during human-robot interaction is precise diver position estimation.In order to achieve this requirement, multibeam sonar imaging is used.However, the main problem that arises when using multibeam sonars is limited field of view.If the observed target (diver or an underwater vehicle) would leave the sonar's field of view, it would be impossible to track it or even distinguish the tracked object from another target that might enter the field of view.To cope with this problem, fusion between USBL and sonar measurements is incorporated.The low precision USBL measurements are used by the estimator to provide target position, albeit with higher variance.This information is used by the sonar target detector to set the region of interest in which the target is located.Finally, if the sonar detector finds the target in this region of interest, estimator is updated with the high precision (low variance) sonar measurement.The combination of the two sources of measurements ensures reliable target tracking.
The USBL is usually used in vehicle localization and navigation, with a very limited number of papers dealing with target tracking.Fusion of USBL measurements with inertial sensors data and/or vehicle dynamics, used for accurate vehicle localization, is shown in [1,2].In [3] the authors have used USBL to track white sharks with an autonomous underwater vehicle, and in [4] USBL tracking was used to track the diver with an autonomous surface vehicle.
Several papers have been published on the use of imaging sonars for object detection and tracking.A method based on the particle filter, shown in [5], was proposed to resolve the problem of target tracking in forward-looking sonar image sequences.In [6] image processing algorithms as well as the tracking algorithms used to take the imaging sonar data and track a nonstationary underwater object are presented.In [7] the real-time sonar data flow collected by multibeam sonar is expressed as an image and preprocessed by the system.According to the characteristics of sonar images, an improved method has been carried out to detect the object combining with the contour detection algorithm, with which the foreground object can be separated from background successfully.Then the object is tracked by a particle filter tracking method based on multifeature adaptive fusion.In [8] the authors explore the use of such a sonar to detect and track obstacles.In [9] authors provide algorithms for detection of man-made objects on sea floor, where they mostly focus on target-seabed separation issue.The most similar attempt to our work was done in [10], where the sonar was used to detect a human diver.The authors used a similar image processing approach as us, followed by a hidden Markov model-based algorithm for candidate classification.
The papers mentioned above are also mostly focused on the use of image processing and contour based algorithms to detect object.However, they are not directly comparable to our approach as they focus more on the detection part inside the sonar image.Our approach differs from all of the above as it is based on fusion of sonar and USBL.This allows target tracking even when the target is outside of the sonar's very narrow field of view.It also helps eliminate false positive detections which would cause tracking of the wrong object if multiple objects are present.
The rest of the paper is organized as follows: Section 2 describes deployed sonar image processing algorithms.In Section 3 tracking filter kinematic model is defined.Section 4 gives insight into region of interest adaptation by using transformed position covariance matrix.Experimental results are given in Section 5.The paper is concluded with Section 6.

Sonar Image Processing
In order to determine target position within the sonar field of view, the sonar image has to be processed.This section is devoted to the description of algorithms used to detect the object in the multibeam sonar image and determine its position within the sonar image.

Multibeam Sonars.
Multibeam sonars are also known as acoustic cameras because they, like a video camera, produce a two-dimensional image, although with very different geometric principle.They emit a number of acoustic beams, each one formed to cover a certain horizontal and vertical angle.
2.2.Target Detection.Some of the most widely used methods and algorithms for object detection and recognition in images are Haar cascades [11], histograms of oriented gradients [12], and, especially recently, artificial neural networks [13,14].Even though these are commonly used in video imagery, they have limited application in sonar-based target detection mostly due to the fact that sonar imagery is usually of very low quality, with incomplete target visualization, preventing even a human observer to reliably detect or recognize the target.In addition to that, our tests with OpenCV implementations of feature descriptors have shown that conventional image descriptors are highly susceptible to noise in sonar image, thus giving poor results.Due to these reasons, the implemented target detection algorithm relies on clustering contours and finding the ones that are most likely to belong to the target.In order to increase reliability of object detection in sonar image, only the region of interest (ROI) obtained by USBL measurements is searched.
The tracking algorithm implemented can be split into three steps.The first step involves basic image processing, blurring, and binarization of the image.The second step is finding the contours in the obtained binarized image and clustering them together.The final step includes searching for the best candidate inside the region of interest.

2.2.1.
Step 1: Image Processing.In the first step, a Gaussian blur filter is applied to the image to remove the noise in the image.Often the image is very noisy and has many very little white contours consisting of only a few pixels which we want to ignore.Gaussian blurring is performed by convolving the image with a 2-dimensional Gaussian function: A similar result could be obtained by eroding and dilating the white areas after binarization, as performed in [10].
After blurring, binarization of the image is performed with adaptive thresholding.Each pixel is compared to the mean value of its neighbouring pixels and is set to white if it is above that value, or black otherwise.Equation ( 2) describes the binarization algorithm, where V before is the pixel value between 0 and 255 before applying binarization and V after takes the value of either 0 or 255 after binarization: where The results of image blurring and binarization are displayed in Figure 1.

2.2.2.
Step 2: Contour Detection and Clustering.In the second step, all white contours in the image are clustered together if they are closer than some predefined distance.This distance is chosen depending on the target tracked.For example, if a human diver is tracked, we can expect that the diver's head or limbs appear disjoint from the torso.To cluster them together, it is reasonable to allow contours that are closer than half a meter to be clustered together.
To achieve the clustering, a graph approach could be taken by using Kruskal's minimum spanning tree algorithm with early termination.However, simple union find algorithm with disjoint set data structure can achieve the same with even lower complexity: while Kruskal's algorithm runs in ( log V), where  is the number of edges in the graph and  is the number of vertices, union find runs in (()), where  is the number of items and () is the extremely slowgrowing inverse of the Ackermann function [15].
The results of the implemented clustering algorithm are displayed in Figure 2. The diver's body is disconnected, but with the clustering algorithm the pieces are merged together into the same cluster and marked with the same color.

2.2.3.
Step 3: Finding Target inside the ROI.The final steps assumes that the approximate area where the target should be already familiar; that is, it is estimated by an extended Kalman filter that uses USBL measurements and sonar measurements from the previous step, as explained in the following chapter.
This assumption is required due to the fact that accurate tracking using only sonar image is difficult, especially if there are other similar objects present in the image, for example multiple divers or autonomous underwater vehicles.
All the clusters that are inside the ROI are given a quality score based on a criterion that consists of two parts: (1) Distance from the ROI center: the closer the cluster to the ROI center is, the higher its score is.
(2) Visual similarity of each cluster and the target: even though very little training data is available, similarity of the cluster is compared with known target's properties (by comparing the size and shape and applying a simple template-based object detector or a small neural network).
The object with the highest score above the (empirically set) threshold is then selected as the most likely target.This allows us to score multiple objects and reliably choose the one that fits best both the current estimated position of the target (obtained from the tracking filter) and the known characteristics of the target.

Tracking Filter
Once the target position within the sonar field of view is known, it can be used as a measurement for the tracking filter.
Rotation matrix R() is given with The vehicle tracking the underwater target and carrying the imaging sonar is modeled as an overactuated marine surface vehicle; that is, it can move in any direction by modifying the surge, sway, and heave speed, while attaining arbitrary orientation in the horizontal plane.Kinematic model of the target is given with the following set of equations: where p  is target position vector and k  is speed vector consisting of surge speed   and heave speed   .State   denotes target course and   course rate.Process noise for respective states is denoted by .Finally, state vector of target absolute position tracking filter is where subscript  denotes target related states.Measurement vector is given with Vector p  denotes vehicle position measurement,   heading measurement, and   target depth measurement, while  (⋅) and Θ (⋅) denote USBL and sonar range and bearing measurements, where respective measurement equations are Parameter ] denotes measurement noise which is, in this case, modeled as zero mean Gaussian noise.Note that bearing measurement is relative; therefore, there is a heading state  included in (10).
The target depth measurement   can be acquired using elevation angle and range measurements between two units provided by the USBL device.Also, acoustic communication can be used to transmit depth measurements taken directly on board the target if they are available.
It was already noted that sonar measurements arrive with high frequency and small delay while USBL measurements are low frequency and delayed; therefore, Kalman filter measurement matrix H is changed every time step, according to available measurements.Also, to account for measurement delays methods of backward recalculation can be applied.

Region of Interest Adaptation
In order to improve detection and avoid false sonar measurements, region of interest (ROI) is defined by using tracking filter estimates covariance.Sonar image processing can be performed in relative Cartesian or polar coordinates; therefore, it is necessary to transform absolute position covariance accordingly.

Covariance Transformation. By definition, covariance matrix of vehicle and target relative position can be written as
where The assumption is that the position of the vehicle carrying the sonar is known without uncertainty and that all uncertainty stems from unknown target position.
The assumption is made that the vehicle and the target are at the same depth when the target is visible in the sonar image, since sonar vertical field of view is quite small.For this reason, target depth is considered to be known and is omitted from p  .

Covariance Transformation between Two Cartesian
Coordinate Systems.Covariance transformation between relative position in earth-fixed NED coordinate frame and relative position in body-fixed frame is given with (12) where Σ is NED coordinate covariance matrix and R  is the rotation matrix given with (13) [16]:

Covariance Transformation between Cartesian and Polar
Coordinate Systems.Relationship between relative Cartesian and polar coordinate system is given with the nonlinear equation expression: In order to transform the covariance matrix, Jacobian of Cartesian-to-polar covariance transformation is written as Finally, covariance matrix in relative polar coordinates Σ pol is calculated as

Using the Tracking Filter Covariance for Region of Interest.
After transforming the filter covariance in relative coordinate frames (Cartesian or polar), it is used to define a region of interest used in sonar tracking as described in Section 2.More specifically, given covariances   and   in relative coordinate frames, estimated object size along these axes   and   , and tracking filter estimate position (  ,   ), region of interest is defined as follows: where covariances   and   are members Σ rel1,1 and Σ rel2,2 from relative covariance matrix (12).Similarly, in case of polar coordinates, line segments are defined for radius  and angle , and the region of interest is the Cartesian product between the two.Figure 3 illustrates the size of the region of interest and the estimated location of the target (center of the ROI). Figure 3(a) shows the case when only USBL measurements are available, while Figure 3(b) shows results with both USBL and sonar measurements.The ROI (covariance) is much smaller when sonar measurements are available.However, it is worth noting that tracking is possible even when the target is outside of sonar field of view, due to the fact that USBL measurements are fused within the tracking filter.
Minimum area of the ROI can be set by adjusting measurement noise variance ], while the rate of ROI growth when there are no measurements available can be defined by adjusting process noise parameters, especially   .

Experimental Setup.
Experiments related to target tracking using sonar and USBL data were conducted in October 2015 in Biograd na Moru, Croatia, during CADDY project validation trials.The experimental setup consisted of an autonomous underwater vehicle BUDDY AUV and an autonomous overactuated marine surface platform PlaDy-Pos, both developed in the Laboratory for Underwater Systems and Technologies [4,18].Multibeam sonar was installed horizontally and forward-looking on the BUDDY AUV here referred to as the vehicle, while PlaDyPos vehicle played the role of the target to be tracked.Buddy AUV, shown in Figure 6, has been developed in the scope of CADDY project.It is equipped with six thrusters that allow omnidirectional motion in the horizontal plane, thus ensuring decoupled heading and position control.Among other sensors, it is equipped with a multibeam sonar and a USBL used for positioning and communication.Overall dimensions of the BUDDY AUV are 1220 × 700 × 750 mm and the weight is about 70 kg.PlaDyPos vehicle, used as a target, is a small scale overactuated unmanned surface marine vehicle capable of omnidirectional motion.It is equipped with four thrusters in "X" configuration.This configuration enables motion in the horizontal plane under any orientation.The vehicle is 0.35 m high and 0.707 m wide and long and weighs approximately 25 kg.
The sonar used for experiments reported in this paper is Soundmetrics ARIS 3000 [19], with 128 beams, covering 30 ∘ angle in horizontal and 14 ∘ in vertical plane.It supports two operating modes: high frequency at 3 MHz for higher detail at ranges up to 5 meters and low frequency at 1.8 MHz for ranges up to 15 meters.Also, during experiments, Seatrec X150 and X110 USBL modem pair was used [20].The combined modem/USBL units are designed as a very compact assembly.They operate in the frequency band 24-32 kHz and the communication rate of 100 bps can be achieved.
USBL modems were installed on both the vehicle and the target object.During experiments, it was assumed that the vehicle and the target are in the same horizontal plane when the target is visible in the sonar image; that is, the vehicle and the target have the same depth.Filtered GPS measurements, from the measurement units installed aboard the vehicle and the target, are taken as ground truth.It should be noted that errors in ground truth measurements are present due to inherent GPS measurement covariance and the fact that different GPS modules were installed on the vehicle and the target, which induced small variable drift.By visual inspection of sonar images it was observed that when image processing algorithm detects correct target, acquired relative sonar measurements are more accurate and precise than relative distance calculated from GPS measurements.

Results.
During validation trials, a large number of target tracking experiments were conducted.In this paper, the analysis of results is performed on two datasets, each describing one experimental scenario.In Scenario 1, the vehicle is moving while the target is static or slowly drifting (Figure 4).In Scenario 2, the vehicle is static while the target is moving (Figure 5).In both scenarios, three different filter configurations are investigated, defined by available measurements: (i) "Sonar" configuration where only sonar measurements are available, (ii) "USBL" configuration where only USBL measurements are used, and, finally, (iii) "Sonar + USBL" configuration where both sonar and USBL measurements are available.
The dataset corresponding to Scenario 1 is shown in Figure 4, while Figure 5 shows the dataset of Scenario 2. In both figures, first two subplots show north and east coordinates, while the third subplot shows the errors (Euclidean distance) between the estimated positions and the ground truth obtained via GPS measurements from both the vehicle and the target.Red line shows the results obtained from the tracking filter that uses only sonar measurements (filter configuration "Sonar"), green line is obtained from tracking filter that uses only USBL measurements (filter configuration "USBL"), and the blue line is the results obtained from the tracking filter that utilizes both sources of measurements as they become available (filter configuration "Sonar + USBL").Black line shows the ground truth position.

Frequency of Measurements.
In the third subplot of both Figures 4 and 5, one can appreciate magenta and yellow circles that mark the time instances in which sonar and USBL measurements were available.Table 1 gives a comprehensive analysis on the amount of time when sonar and USBL measurements were available.Taking into account that the tracking filter provides estimates at 10 Hz sampling frequency, it can be seen that, in both scenarios, sonar measurements were available at around 30% of sampling instances, whether due to lower running frequency of  the sonar image processing algorithms or due to the fact that some of the time the target was not present in the sonar image.
On the other hand, USBL measurements are available at only 5% of time instances.It can be seen from Figures 4 and 5 that USBL measurement availability is consistent during the whole duration of both scenarios; however, the update rate of USBL measurements is around 2 s which corresponds to approximately 5% availability taking into consideration the 10 Hz tracking filter sampling frequency.fact that position estimates quickly drift when the target is not in the sonar field of view can have serious consequences, especially in situations where a human diver is the target to be tracked and the position estimate is used to control the vehicle position relative to the diver.On the other, using only USBL measurements (as in filter configuration "USBL") enables tracking even when the target is outside field of view, as long as there is a clear path between the target and the vehicle, ensuring obstruction-free propagation of the acoustic wave.However, USBL measurements arrive at a low update frequency.

Comparison of Filter
Fusion of sonar and USBL measurements combines the best features of both types of measurements: high precision of sonar measurements and availability of USBL measurements.This is also clear from Figure 7(b) which shows tracking filter position variance for each filter configuration.Using both USBL and sonar measurements, filter estimated variance is more stable regardless of which measurements are available.In the case when only USBL measurements are used, variance grows between two measurements.In the case when only sonar measurements are used, variance grows unboundedly when measurements are not available.

Statistical Analysis of Results.
In order to quantify the result that the sonar and USBL fusion approach gives, the most reliable results metrics is defined based on the localization error obtained as Euclidean distance between the ground truth position (obtained using GPS on board both the vehicle and the target) and position estimates using all three filter configurations.These errors are shown in the form of a boxplot, where Figure 8(a) gives the analysis for Scenario 1 (shown in Figure 4), and Figure 8(b) gives analysis for Scenario 2 (shown in Figure 5).Both boxplots show results for filter configurations "Sonar," "USBL," and "Sonar + USBL."In addition to that, the results are shown for the filter configuration "Sonar", taking only into account position estimates when sonar measurements were available, that is, when the target was within the sonar field of view-this is labeled with "Sonar (available)." As expected, the "Sonar (available)" data gives the most precise results for both scenarios.However, this measure does not represent the real situation, since it was shown that the target was available within the sonar field of view only around 30% of time in both scenarios.This measure should be regarded as the best possible results that can be obtained using the measuring devices available in the setup.
Localization error boxplot for filter configuration "Sonar" over the whole dataset shows that the results are the least precise as it can be seen in Figures 8(a) and 8(b).This is a result of the fact that all the data is included, even the data when target is lost from the sonar FOV and there is no way to estimate target position since the filter presumes that target continues in the direction it was going before leaving sonar FOV.
In both scenarios, filter configuration "USBL" provides the least accurate mean position error, but the variance over the whole dataset is much lower than in the filter configuration "Sonar." As it can be seen from Figures 8(a) and 8(b), in both scenarios, filter configuration "Sonar + USBL" gives mean localization error lower than filter configurations "Sonar" and "USBL."The same can be said for position error variance.It should be noticed that this filter configuration provides results which are very close to our "ideal" situation where the target is always present in the sonar image, that is, the "Sonar (available)" case.
In Scenario 2 (the case of the static target), all the obtained localization error statistical results are smaller but the same  pattern can be observed as in Scenario 1 (the case of the moving target).

Video.
Video representing the results with target position estimate obtained by fusing sonar and USBL measurements can be found in [21].

Conclusions
The paper addresses the issue of underwater target tracking by using sonar and USBL measurements.The results that were used to analyze the tracking quality were obtained from data gathered using BUDDY AUV, an autonomous underwater vehicle developed for diver-robot interaction that served as the tracking vehicle in the experiments, and PlaDyPos autonomous surface marine platform that played the role of the target to be tracked.The experiments have shown that sonar measurements, when available, are very accurate and precise, but there is always a possibility of detecting false targets especially in cluttered environments.Also, when tracking divers false measurements due to bubbles are common.Using USBL measurements even when the target is in the sonar FOV helps reduce number of false detection incidents.For example, in Figure 4 we can see false detection at time instants 280 s, 360 s, and 450 s.Using USBL and sonar sensor fusion discards such measurements since they are out of ROI, and there are no abrupt changes of position estimate.As a consequence, mean localization error is the lowest as seen in Figure 4. Finally, the developed tracking filter that fuses USBL measurements with position measurements obtained from the processed sonar image shows superior performance.
Future work will focus on exploiting knowledge gained through these experiments for designing algorithms in which underwater vehicle actively tracks the underwater target while trying to keep it in the sonar FOV as often as possible.

Figure 1 :
Figure 1: First step in sonar image processing demonstrated on an image with a diver in the field of view.(a) Original sonar image; (b) image after blurring; and (c) image after binarization.

Figure 6 :
Figure 6: BUDDY AUV in water seen from above.The front end has a waterproof casing with a tablet.
Configurations.Datasets shown in Figures4 and 5instantly show the disadvantage of filter configuration "Sonar"-whenever sonar measurements are not available, the position estimate drifts from the true value.One can appreciate this more clearly in Figure7(a) which shows a 45-second segment of the full-time response.The