Underwater Depth Estimation for Spherical Images

—Underwater depth estimation is an open problem in robotics and computer vision. Currently, there still exists many challenges in collecting corresponding ground truth data in the underwater domain. To this end, we propose the leverage of publicly-available in-air RGB-D image pairs for underwater depth estimation in the spherical domain with a unsupervised approach. We are able to recover the depth up-to-scale with no corresponding ground truth data.


I. INTRODUCTION
Underwater depth estimation is an open problem for many marine robotics.Currently, there are no available ground truth depths for underwater spherical images.Still, there exists many challenges to capture RGB-D image pairs in the underwater spherical domain, which makes ground truth depth unavailable.In this report, we propose to leverage publicly-available in-airs spherical images for depth estimation in the underwater domain.Specifically, our approach follows a two-stage pipeline.i) Given in-air RGB-D spherical pairs from Stanford 2D-3D-S dataset [1], we train a styletransfer network [2] to convert in-air images to the underwater domain.ii) Given the generated underwater images and their depth maps, we train a depth estimation network which is specially designed for spherical images.During testing, we can generate depth directly from the input image.Our approach is unsupervised in that only underwater images (i.e., no ground truth underwater depth) are required for the whole training process.

II. RELATED WORKS
a) Panoramic Images: One common line of work in panoramic images tackles the lack of dataset.In [3], the author proposed to solve depth estimation and color correction in spherical domains at the same time by solving left-right consistency under a multi-camera setting.Another line of work focuses on the distortion problem.[4] deforms the sampling grid according to the image distortion mode.[5] reprojects the image onto an icosahedral spherical polyhedron.
b) Underwater Color Correction: Color correction can change the images shot by underwater environment to the normal images, and the reversion process can make the normal images into the underwater images.There are many methods to correct the color of underwater images, like deep *This work was supported by ShanghaiTech MARS Lab Fig. 1.A typical underwater omni-directional image learning or mathematical methods.In [6], they use Jaffe-McGlamery model [7], [8], a mathematical method, to handle the problems.It considers absorption and scattering effects, which denote α(λ) and β(λ).So the main equations are, where R is the initial irradiance before propagation through the water column, d is the range from the camera to the scene and L is the final irradiance subject to water column effects.And the coefficients can be optimized by various traditional methods.An ideal method [6] is implemented by the idea for bundle adjustment, which means the errors about color correction could be apportioned into each step.c) Underwater Learning: WaterGAN [9] is one the pioneer works in the underwater domain.The author divide the problem of underwater style transfer into three different parts: Attenuation, Back-scattering and Camera Model.The first stage accounts for the attenuation of light.The network is designed to predict the attenuation factor and reconstruct a rough underwater images from an in-air image.In the second stage, to simulate the characteristic haze effect in the underwater images, depth is combined with a random noise vector as input to generate the scattering effects.In the final stage, WaterGAN further models the shading pattern from the camera models into the network.Given the generated underwater images, we want it to look as similar as real underwater images.Another discriminator is appended in the end to tell real or fake underwater images appart.During training, the generator aims at producing photo realistic images so that the discriminator can not tell apart, while the discriminator aims at distinguish them [10].Our work is Fig. 2. Full pipeline of our approach.We propose to leverage publicly-available RGB-D datasets for style transfer and depth estimation in an unsupervised approach.
different from WaterGAN [9] in that i) WaterGAN requires depth as input to simulate the attenuation and scattering effect, while we only need underwater and in-air images as input.ii) We aim at depth estimation, while WaterGAN [9] targets at image restoration.
Our work is in spirit most similar to [11].The author in [11] also proposed a two-stage pipeline to solve underwater omni-directional depth estimation.In the first perspective image pipeline, the author used the WaterGAN [9] to transfer RGBD images to underwater RGBD images.Then, he trained a FCRN [12] depth estimation network with underwater image as input.In the second omni-directional stage, the author synthesised image from in-air equirectangular image to underwater equirectangular image by decreasing the values in red channel (due to its short wavelength nature in underwater environment) and blurring the image based on its distance to the camera origin.Finally, following [4], a distortion-aware convolution module replaced the normal convolution in FCRN based on the spherical longitudelatitude mapping.III.METHODOLOGY Fig. 2 demonstrates our two-stage pipeline.i) Given inair RGB-D spherical pairs from Stanford2D-3D-S dataset [1], we train CycleGAN [2] to convert in-air images to the underwater domain.ii) Given the generated underwater images and their depth maps, we train a depth estimation network to learn depth.We will introduce the two parts separately in the following context.

A. Style Transfer
Generative Adversarial Nets (GANs) are designed for data augmentation and now are widely used about the style transfer tasks.GANs are two-player mini-max games between a generative model G and a discriminative model D [13].The value function about this adversarial process is where p data denotes the features in the data, and p z is set by noise variables at first.This value function is also a loss function about deep neural network.
For underwater style transfer, CycleGAN [2] is implemented.Thus, about the mapping function G : X → Y , the loss function is Moreover, CycleGAN apply a new idea about cycle consistency, which is y → F (y) → G(F (y)) ≈ y.And the loss function on this step is so finally, the full objective for CycleGAN is Because the method is pixel-to-pixel, the dataset are preprocessed by cropping and resizing into a reasonable size about the images.

B. Depth Estimation
For depth estimation, we adopt FCRN, one of the state-ofthe-art single model on NYUv2 [12].The network consists a feature extraction model, then several up-convolutions layers to increase the resolution.Finally, we calculate the L1 difference between the output depth and ground truth depth maps.
For depth estimation in planar images, smoothness regularization has been used frequently in previous research, to encourage estimated depths to be locally similar when no significant image gradient exists.The term is defined as follows: , where L sm is a spatial smoothness term that penalizes the L1 norm of second-order depth gradients along both the x and y directions in 2D space.Here, the number 2 represents the 2nd order.
Further, we also add a surface norm regularize to the network.Given the predicted depth and ground truth depth, we can calculate their surface norm in a local window.Finally, we calculate the cosine distance.
Our final loss is a weighted combination of the above factors.

C. Implementation Code
We demonstrate some implementation code in this section.In depth estimation, the L1 loss is implemented as follows.Note that we need to get rid of the invalid values.def forward(self, pred, target):

abs().mean()
For the smoothness term: 1 def get_smooth_loss(depth, img): And for the normal term:  IV.EVALUATION AND RESULT a) Dataset: Stanford 2D-3D-S [1] is one of the standard benchmarks for in-air dataset.The dataset provides omni-directional RGB images and corresponding depth information, which are necessary data for underwater depth estimation tasks.Furthermore, it also provides semantics in 2D and 3D, 3D mesh and surface normal.
b) Hyper-parameters: We implement our solutions under the PyTorch framework and train our network with the following hyper-parameters settings during pretraining: mini-batch size (8), learning rate (1e-2), momentum (0.9), weight decay (0.0005), and number of epochs (50).We gradually reduce the learning rate by 0.1 every 10 epochs.Finally, we tune the whole network with learning rate (1e-4) for another 20 epochs.c) Metrics: Following [12], we use the following metrics for the comparisons on the datasets mentioned above: Root mean square error (RMS) d) Results: Since no ground truth depth is available for in the underwater domain, we report the results on the converted Stanford 2D-3D-S dataset.Numbers are reported in I. Some quality results of our generated underwater dataset are demonstrated in Fig. 3.In Fig. 4, we show some results of our depth estimation network.Although we are able to generate realistic images in the underwater domain, and achieve a good results on the underwater Stanford 2D-3D-S dataset, the result of the depth from the underwater images still have room for improvement.Generated depth from two datasets.On the left are the input images from the underwater Stanford 2D-3D-S dataset and their predicted depth maps.On the right are images and depth from our underwater datasets.

V. CONCLUSION
In this project, we aim at unsupervised depth learning for the underwater spherical images.However, this is still specially designed for a certain underwater situation.In the future, we are planning working on a unified approach that can work in all kinds of different underwater situations.Collecting a in-air dataset that looks closer to the underwater images might also further improve our performance.

3
loss_normal = torch.abs(1-cos(output_normal, depth_normal)).mean()D. Reproduce the Project In the code/style transf er folder, the dataset should be stored into a folder, contain four sub-folders (named trainA, trainB, testA, and testB).To train CycleGAN with the underwater and in-air images, run 1 python3 train.py--dataroot [your_data_folder] -name [your_result_folder] --model cycle_gan In the code/style transf er/checkpoints folder, we can find the model files.Then choose one to be put into the code/style transf er/scripts/checkpoints/[your model f older].For testing and generating the data after processing, run1 python3 test.py--dataroot [your_data_folder] -name [your_model_folder] --model test --no_dropoutIn the code/depth folder, to train depth network with the generated images and depth, run 1 python3 main.pyThis will train the network and report the performance on the converted underwater Stanford 2D-3D-S dataset.

Fig. 3 .
Fig. 3. Generated images with our CycleGAN.On the left are examples from Domain I, inair.On the right are our generated images.We are able to produce the lightening the color effects from the original underwater dataset.

Fig. 4 .
Fig. 4.Generated depth from two datasets.On the left are the input images from the underwater Stanford 2D-3D-S dataset and their predicted depth maps.On the right are images and depth from our underwater datasets.

1
In style transfer, the generative model G and the discriminative model D are designed by these codes.