CNN-Enabled Visibility Enhancement Framework for Vessel Detection under Haze Environment

Maritime images captured under haze environment often have a terrible visual effect, making it easy to overlook important information. To avoid the failure of vessel detection caused by fog, it is necessary to preprocess the collected hazy images for recovering vital information. In this paper, a novel CNN-enabled visibility dehazing framework is proposed, consisting of two subnetworks, that is, Coarse Feature Extraction Module (C-FEM) and Fine Feature Fusion Module (F-FFM). Specifically, C-FEM is a multiscale haze feature extraction network, which can learn information from three scales. Correspondingly, F-FFM is an improved encoder-decoder network to fuse multiscale information obtained by C-FEM and enhance the visual effect of the final output. Meanwhile, a hybrid loss function is designed for monitoring the multiscale output of C-FEM and the final result of F-FFM simultaneously. It is worth mentioning that massive maritime images are considered the training dataset to further adapt the vessel detection task under haze environment. Comprehensive experiments on synthetic and realistic images have verified the superior effectiveness and robustness of our CNN-enabled visibility dehazing framework compared to several state-of-the-art methods. Our method preprocesses images before vessel detection to demonstrate our framework has the capacity of promoting maritime video surveillance.


Background and Related Work.
It is well known that the maritime surveillance system is an indispensable part of vessel traffic services [1]. As an efficient, convenient, and intuitive monitoring method, Closed Circuit Television (CCTV) is thus widely applied to critical regions, for example, ports and waterways. As shown in Figure 1, significant information in the images, however, is easily buried under the haze. erefore, it is difficult for maritime regulatory authority to effectively extract detailed information (e.g., monitoring targets and water traffic conditions) from degraded images, which seriously affects maritime supervision efficiency. Besides, the low-quality images collected under haze environment have also brought severe challenges to intelligent surveillance methods based on vessel detection, recognition, and tracking [2][3][4][5]. To improve the maritime safety surveillance capability under haze environment, it is necessary to restore images under CCTV monitoring. In current literature, dehazing methods can be categorized into image enhancement-based methods, physical model-based methods, and deep learning-based methods.

Image Enhancement-Based Methods.
Early research mainly enhanced the contrast of hazy images to highlight the scene characteristics of the interest region. Histogram Equalization (HE) [6] is a classic enhancement method devoted to enhancing the contrast by stretching the dynamic range of image pixel values. In current literature, HE-based methods can be divided into two categories, that is, global and local histogram equalization. Since the global histogram equalization can enhance the entire image by single mapping, it has the characteristics of simple principle and fast calculation. However, these methods often ignore the local information, resulting in the haze-free images having poor performance. To solve this problem, Stark et al. [7] proposed an adaptive local histogram equalization method. Subsequently, Kim et al. [8] proposed a nonoverlapping subblock histogram equalization method to reduce the blocky effect and computational complexity. Retinex theory-based image dehazing method is devoted to separating the illumination and reflection from the hazy image and enhancing the image by reducing the illumination impact. Jobson et al. [9] first used the Gaussian filter to obtain a smooth illumination according to the Retinex theory and thus proposed a singlescale Retinex (SSR). To avoid color distortion, Rahman et al. [10] proposed a multiscale Retinex algorithm with color restoration (MSRCR) by introducing a color compensation factor. To sum up, the image generated by these methods has higher contrast and color fidelity, but the halo often appears on the edge of the interest object.

Physical Model-Based Methods.
ese methods are proposed based on a certain physical model that describes the process of image degradation under haze weather. Because these methods use mathematical methods to describe the haze formation process based on light scattering, the final restored target is clear and natural. Physical modelbased methods include the following categories, that is, depth-based method and prior-based method. e depthbased methods mainly obtain depth information through a specific method and then get stable model parameters. Finally, the potentially clear image can be obtained by the atmospheric scattering model. For instance, Oakley et al. [11] first used radar and other types of equipment to measure the shooting scene depth. Hautiere et al. [12] proposed an image dehazing algorithm based on the 3D geographic model for vehicle vision systems. Although these methods have an excellent dehazing effect, they heavily rely on distance measuring equipment. erefore, Liu et al. [13] proposed a dehazing method to estimate the depth map through a second-order variational framework. In contrast, the prior-based method mainly analyzes haze formation and relies on specific prior information to achieve image dehazing. Dark channel prior (DCP) [14] and its improvements [15][16][17] have an excellent performance in the image dehazing task. rough numerous statistics on outdoor haze-free images, He et al. proposed DCP based on the assumption that most local color blocks contain some pixels with very low intensity in at least one color channel. Zhu et al. proposed a novel linear color attenuation prior [18], based on the difference between the brightness and the saturation of pixels within the hazy image. Subsequently, a nonlocal prior dehazing method [19] is employed to obtain the nonlocal transmission map from the haze-line property. To reduce halo and unnatural artifacts, a low-complexity color ellipsoid prior [20] is designed to accurately and swiftly estimate the transmission map. In current literature, several variational model-based transmission estimation methods [15,21,22] are also proposed. Although prior-based methods have verified excellent dehazing performance, they may cause a loss in color fidelity under certain circumstances and fail to obtain pleasing visual effects on maritime images. However, it produces more parameters and calculations. To simplify the calculation, Li et al. [28] designed an end-to-end light-weight convolutional neural network (AOD-Net) that effectively balances calculation speed and visual effects. Inspired by image denoising, Du et al. [29] proposed a Deep Residual Learning (DRL) network to reconstruct the potential image. Besides, Chen et al. [30] proposed an end-toend gated context aggregation network to directly restore the final haze-free image. It is worth noting that if the training datasets do not contain the geometric features presented in the haze-free target, it is usually difficult to produce satisfactory image quality. erefore, it is necessary to design a CNN-enabled visibility enhancement framework for vessel detection under haze environment to further improve maritime video surveillance efficiency.

Contributions.
is paper presents a CNN-enabled framework for practically solving vessel detection problem under haze environment. e main contribution of our method differs from others in the following aspects:

Construction.
e remainder of this paper is divided into the following sections. Section 2 mainly describes the problem formulation related to the imaging model. In Section 3, a CNN-enabled visibility enhancement framework is proposed to improve the visual effect of hazy images. Implementation details and experiments are implemented in Section 4. Finally, we conclude our main contributions in Section 5.

Problem Formulation
2.1. Atmospheric Scattering Model. Video images collected by maritime video surveillance system under haze conditions often have poor visual quality. As shown in Figure 2, Narasimhan et al. [31] proposed the atmospheric scattering model to divide the light irradiance into the incident light attenuation part J(x)e − β d(x) and the atmospheric light ). e incident light attenuation model considers that the reflected light by the vessel surface is scattered and attenuated by particulate impurities in the air, reducing the intensity of light reaching the imaging system. Note that as the propagation distance increases, the reflected light intensity decays exponentially. On the contrary, the atmospheric light imaging model believes that light intensity scattered by natural light enters the imaging system to participate in imaging. As the propagation distance increases, the scattered light intensity will gradually increase. Finally, the images collected by the imaging system under haze environments exhibit degradation phenomena such as low contrast, blurred images, and color distortion under the combined action of these two models. Mathematically, the atmospheric light scattering model can be expressed as where J and I, respectively, denote the hazy image and hazefree image, A ∞ and β represent the atmospheric light value and scattering coefficient, x is the image pixel index, and d is the distance between the scene point and the imaging system, that is, field depth. When we set t(x) � e − β d(x) , equation (1) can thus be rewritten as follows: with t being the transmission. According to equation (2), the restoration haze-free image can be easily obtained by

Transformed Formula.
According to equation (3), we can obtain a satisfactory haze-free image J by accurately estimating A ∞ and t. However, it is intractable to estimate two parameters simultaneously. For the sake of better performance of the end-to-end network, Li et al. [28] proposed the transformed atmospheric scattering model, which can be given by

Journal of Advanced Transportation
where K is a particular parameter to integrate A ∞ and t; that . It is worth noting that hazy maritime images usually contain background (i.e., sky and water regions). Many statistical features-based methods, for example, DCP and maximum local contrast, often fail to obtain ideal transmission maps. Deep learning-based methods do not rely on these statistical features and can learn the mapping of hazy and haze-free images. erefore, we will propose the CNN-enabled visibility enhancement network to effectively improve the quality of hazy maritime images and improve vessel detection accuracy.

CNN-Enabled Visibility Enhancement Framework
In this section, a CNN-enabled visibility enhancement framework is proposed to process hazy maritime images shown in Figure 3. is framework consists of two subnetworks, that is, Coarse Feature Extraction Module (C-FEM) and Fine Feature Fusion Module (F-FFM). In this work, C-FEM is introduced to learn multiscale hazy features. Meanwhile, F-FFM, an improved encoder-decoder network, is proposed to fuse and enhance the hazy image and the multiscale output obtained by C-FEM. Once our method gets the sharp image, it can easily detect the vessel containing the image by any target detection method.
3.1. C-FEM. C-FEM is a module for initial extracting the features of the hazy image. In particular, C-FEM can perform mapping learning on three scales (i.e., 1, 1/4, and 1/16) to obtain coarse feature information with different resolutions simultaneously. Figure 4 shows the network architecture of C-FEM under one resolution, which is only composed of six convolutions. In this work, dilated convolution is embedded to increase the reception field of C-FEM. According to our research, dilated convolution can reduce the loss of spatial features without reducing the receptive field. However, the use of dilated convolution may increase the risk of spatially continuous information loss, destroying image feature information (especially edges). To alleviate the interference caused by dilated convolution, we combine standard convolution (Conv) [32] and dilated convolution (DConv) with improving the detailed information extraction ability. DConv can effectively solve this difficult problem with different receptive fields by adjusting the dilation rate value. Formally, standard convolution and dilated convolution are, respectively, defined as follows: where I is the discrete signal, w is convolution kernel, subscript (·) is the position of a discrete signal, d ∈ Z + is the dilation factor, and * d is dilated convolutions with a factor d. e only difference between standard convolution and dilated convolution is the influence of the dilation factor d on the multiplication position of I(p) and w(q). Dilated convolution benefits from w(q) are no longer limited to a fixed receptive field, and the dilation factor d can be adjusted to have a larger receptive field. In this work, the method of fusing Conv and DConv can reduce the loss of spatial information caused by the excessive dilated rate and fully consider long-and short-distance information to present a better visual effect. Furthermore, Instance Normalization (IN) [33] and Rectified Linear Unit (ReLU) [34] are deployed after each Conv layer. Meanwhile, the feature map channels of the first five convolution outputs are set to 32.
According to our research, most deep learning-based dehazing methods rely on more complex network models to obtain better visual effects. When the model is relatively simple, it is usually hard to learn the fog feature, causing information damage to potential images. In contrast, the imaging model J(x) � K(x)I(x) − K(x) + 1 reduces the algorithm complexity, making it easier for the network to extract information.
e introduction of this model makes it possible for a simple network model to extract potential multiscale features from the original image. It is worth noting that the output of C-FEM only introduced to provide a prior is not used as the final result. Simultaneously, it has a faster calculation speed and can satisfy the needs of real-time processing.

F-FFM.
Coarse feature maps of three resolutions (i.e., 1, 1/4, and 1/16) have been obtained by C-FEM, which e feature map information obtained by a standard encoder-decoder CNN is usually found irregularly. When the prior information obtained by C-FEM is introduced to the encoder, we believe that F-FFM can obtain better parameters and accelerate the convergence speed. When the feature maps are fused at different scales, the deep network can further extract edge detail information. Table 1 shows that the architecture of Fine Feature Fusion Modul (F-FFM) is a special encoder-decoder structure. Specifically, F-FFM only performs two downsampling operations and merges with the corresponding output of C-FEM. Both the encoder and the decoder consist of the same module, that is, a 3 × 3 convolution filter (Conv) [32], Instance Normalization (IN) [33], and Rectified Linear Unit (ReLU) [34]. Maximum pooling and bilinear interpolation are exploited to perform down-and upsampling operations on the feature map, respectively. Different from traditional encoder-decoder structures, our F-FFM encoder integrates the output of C-FEM. is strategy can guide F-FEM to learn the mapping of hazy images and haze-free targets. To better preserve the boundary details of the input, we adopt a global skip connection strategy to further ensure the details of the output image. In other words, the output of the last convolution and the input image is directly added as the output of F-FFM, and we find that it can significantly improve the dehazing effect through comparative experiments.

Loss Function.
To robustly learn the multiscale mapping relationship between hazy image and haze-free image, a specific loss function L C− FEM is proposed. As shown in Figure 3, C-FEM has three scale outputs (i.e., J 1 , J 2 , and J 3 ). ese three images sequentially have 1, 1/4, and 1/16 of the original image size. Subsequently, the maximum pooling operation is used to obtain clear images with three scales named J 1 , J 2 , and J 3 , which, respectively, correspond to the scale of J 1 , J 2 , and J 3 . In this work, Mean Square Error (MSE) loss function is employed to constrain each scale output of C-FEM; that is, where L MSE (J * , J * ) � (J * − J * ) 2 , λ 1 , λ 2 , and λ 3 are trade-off parameters of corresponding loss functions. To further preserve the high-frequency details of the potential haze-free image while eliminating boundary artifacts, a hybrid loss function L F− FFM is introduced to limit the ground truth J and the predicted restored image J; that is,

Journal of Advanced Transportation
with λ 4 , λ 5 , λ 6 , and λ 7 being the penalty weights. Multiscale structural similarity (MS-SSIM) [35] is firstly employed to constrain the structure, brightness, and contrast of the image. e MS-SSIM loss function can be defined as follows: with MSSSIM being the calculation operation of the multiscale structural similarity index between two images. e hazy image inevitably has a low contrast phenomenon in local regions, resulting in color distortion. To solve this problem, the Mean Absolute Error (MAE) loss function L MAE is introduced as a part of L F− FFM , which can reduce the color distortion problem to a certain extent. In particular, L MAE is defined as e high-frequency detail information is easily destroyed in the process of image dehazing. To further improve the fidelity and authenticity of details, we propose an additional edge loss function [36] to limit the high-frequency components, for example, edge and texture. L Edge can be written as where Lap(J) and Lap(J) represent edge maps extracted from J and J via the Laplacian operator, respectively. e penalty coefficient ε is empirically set to 10 − 3 . In addition, the Total Variation (TV) loss function [37] is exploited to suppress the pixel-jump problem, which can be given bywhere ∇ h and ∇ v represent the operators of the horizontal and vertical gradients, respectively. We refer interested readers to [35][36][37] for more details on calculations of MS-SSIM, edge loss, and TV. To sum up, the total loss function can be written as follows: where c 1 and c 2 are the penalty coefficient of L C− FEM and L F− FFM . By comparative experiment, we manually selected the optimal weight of all loss functions; that is,

Experimental Results and Analysis
is section will describe all the implementation details of network training, including dataset construction and network parameter settings. We will compare our method with several state-of-the-art dehazing methods on both synthetic and realistic hazy maritime images. To prove that our method can improve detection accuracy, our proposed framework will be employed in vessel detection tasks under haze environment.

Comparison Methods and Evaluation Indicators.
Our method will be compared with four handcrafted prior-based methods and three deep learning-based methods. For the sake of fair comparison, the parameters of other competing dehazing methods are provided by the authors' code.
rough numerous statistics on outdoor haze-free images, DCP is proposed based on the assumption that most local color blocks contain some pixels with very low intensity in at least one color channel. According to this statistic prior and the haze imaging model, a high-quality haze-free image can be directly obtained.
is method finds that the pixel values of a hazy image can be modeled as lines intersecting at the air light. Based on this prior condition, a novel haze-lines-based method is proposed to restore the hazy image better. It is worth noting that the complexity of HL is linear in the number of pixels, having higher computational efficiency.

(4) F-LDCP: Fusion of Luminance and Dark Channel
Prior-Based Method [39]. To make the sky region more natural in long-shot images, a Fusion of Luminance and Dark Channel Prior (F-LDCP) method is proposed. e transmission maps estimated by the brightness model and the DCP model are fused through a soft segmentation. (5) MSCNN: Multiscale Convolutional Neural Networks [25]. To learn the practical features of a hazy image, a multiscale deep network (MSCNN) is designed to address the image dehazing problem. MSCNN can be divided into the coarse-scale network and finescale network. e coarse-scale network can learn a holistic estimation of the scene transmission, and the fine-scale network is used to optimize the obtained transmission. Finally, the haze-free image can be obtained by the atmospheric scattering model. (6) AOD-Net: All-in-One Dehazing Network [28]. AOD-Net, a light-weight CNN, is designed according to the reformulated atmospheric scattering model. is network replaces the atmospheric light value and transmission with one parameter. It is worth mentioning that AOD-Net has been embedded in other deeper models (e.g., Faster R-CNN) to improve the advanced tasks of hazy images. (7) GCA-Net: Gated Context Aggregation Network [30]. GCA-Net is an end-to-end Gated Context Aggregation Network. In particular, the latest smoothed dilation technology is designed to eliminate gridding artifacts caused by the extensive-used dilated convolution with negligible additional parameters.
In synthetic and realistic experiments, we will compare these methods with our proposed method. In addition, three full-reference indicators, that is, Peak-Signal-to-Noise Ratio (PSNR) [40], SSIM [41], and Feature Similarity (FSIM) [42], are introduced to evaluate the dehazing performance in the synthetic experiment. Meanwhile, one popular no-reference image quality assessment method, that is, Natural Image Quality Evaluator (NIQE) [43], is also exploited to perform dehazing quality evaluation in the real experiment. eoretically, higher values of PSNR, SSIM, FSIM, and lower values of NIQE indicate better visual performance.

Experimental Datasets and Settings.
To guarantee highquality dehazing results, we tend to select 7000 haze-free maritime images as the dataset and randomly cropped these images into 34000 patches of size 256 × 256. In this work, our network is trained for 80 epochs. e learning rate of the first 40 epochs is 10 − 3 and the learning rate of the last 40 epochs is 10 − 4 to increase the convergence rate. In each epoch, the hazy synthetic versions are obtained by equation (2), that is, atmospheric scattering model. In particular, the transmission t and atmospheric light value A ∞ are random constants ranging between (0.2, 0.6) and (0.8, 0.9). All numerical experiments and model training are conducted in Python 3.7 and Matlab2019a environment running on a PC with Intel(R) Core (TM) i7-9750H CPUa 2.60 GHz and a Nvidia GeForce GTX 2080Ti GPU. It takes about 10 hours to train our network with the Pytorch package [44]. e Python source code is available at https://github.com/LouisYuxuLu/ JAT_Dehazing.
For the sake of better visual comparisons, the dehazing versions of hazy images with different degrees obtained by various methods are shown in Figure 6. It can be clearly observed that DCP and HL often make the color unnatural. Meanwhile, due to the incomplete dehazing, the results obtained by DCP easily suffer from the interference of boundary artifacts around the object. Although GRM can get satisfactory visual effects, it requires complex calculations and has the risk of excessive smoothness. F-LDCP can excellently solve the blocking artifacts and halo problems in the sky regions, but the color fidelity in the water regions needs improvement. MSCNN and AOD-Net can handle the lowconcentration hazy image. However, the restored versions of the high-concentration hazy images (i.e., hazy images with t � 0.2) usually have a poor visual effect. GCA-Net fails in the synthetic experiment, resulting in a nonuniform distribution of fog remaining in the results. By comparison, our method can not only make the restored image visually more natural but also ensure the color reproduction of the sky and water regions.
To further confirm the superiority of our method, the quantitative results of PSNR, SSIM, and FSIM are shown in Figure 7 and Table 2. PSNR, SSIM, and FSIM values are illustrated using box-plot in Figure 7. It can be seen that our method has higher index values in most cases. Particularly for high-concentration hazy images, our method can stably obtain high-quality restored versions. Besides, Table 2 shows three metrics value comparisons of various image enhancement methods on 36 hazy images. In particular, we display the best result of each metric in bold. Due to the highest values of PSNR, SSIM, and FSIM, our method has the best dehazing performance. Meanwhile, the standard deviation calculated by the SSIM and FSIM is the smallest, which verifies that our method has excellent robustness.

Experiments on Realistic Maritime Datasets.
is subsection will verify the reliability of several methods in realistic hazy maritime images due to the distinctness between synthetic and realistic versions. Meanwhile, NIQE is introduced to describe the naturalness of visual effects quantitatively, and our proposed method will be compared with seven dehazing methods, that is, DCP [14], GRM [16], HL [38], F-LDCP [39], MSCNN [25], AOD-Net [28], and GCA-Net [30]. Figure 8 shows several dehazing results to reflect the imaging performance more intuitively.
From the visual comparisons, DCP and HL have serious color distortion problems and blocking effects in the sky regions. Recovery results obtained by GRM have the risk of low contrast, especially in the recovery task of Image 9. F-LDCP and AOD-Net fail to correct the color of the image. GCA-Net not only has the problem of overexposure in the sky region but also has nonuniform fog remaining in the image. Although MSCNN has better visual effects than other methods, our method has pleasing color and can remove fog more fully. Our superior performance can be further confirmed by the quantitative results NIQE shown in Table 3.

Experiments on Vessel Detection under Haze
Environment. In the maritime imaging system, the harsh imaging environment severely restricts the regular operation of the visible light imaging sensor, reduces vessel detection accuracy, and leads to incorrect identification. To prove this phenomenon, we, respectively, used YOLOv4 [45] and Faster-RCNN [46] to detect vessels in haze and haze-free images. As shown in Figure 9, it is easily found that the haze image has low contrast and massive useful information is obscured, which leads to problems, for example, identification errors or missing identification during the target detection process. After dehazing, the vessel target is effectively captured and recognized, and the recognition accuracy is significantly increased. erefore, dehazing the degraded hazy image by our method can improve vessel detection performance. e computer and the related workers can make correct decisions in time.    Figure 5. From top to bottom: the synthetic hazy experiment with A ∞ � 0.8/t � 0.6, A ∞ � 0.9/t � 0.6, A ∞ � 0.8/t � 0.4, A ∞ � 0.9/t � 0.4, A ∞ � 0.8/t � 0.2, and A ∞ � 0.9/t � 0.2, respectively. Note that IQR represents interquartile range.

Conclusion
In this paper, a novel CNN-enabled visibility dehazing framework was proposed, which could significantly improve the visual effect of images captured by the maritime camera under haze environment. In particular, this framework is composed of two subnetwork named Coarse Feature Extraction Module (C-FEM) and Fine Feature Fusion Module (F-FFM). C-FEM is an initial multiscale feature extraction network containing three simple six-layer convolutional networks, that is, Single C-FEM. C-FEM can obtain coarse feature maps from 1, 1/4, and 1/16 of the original image pixel size. F-FFM is a special encoder-decoder structure used to fuse and enhance the multiscale information obtained by C-FEM and original hazy image. To further improve the network performance, a corresponding loss function is proposed to simultaneously supervise the multiscale output of C-FEM and the final result of F-FFM. Furthermore, our dataset contains massive maritime images to complete the vessel detection task under haze environment successfully. Both qualitative and quantitative experiments have illustrated the effectiveness of our proposed framework.

Data Availability
e image data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Yuxu Lu and Yu Guo are co-first authors.