Intelligent Vision-Enabled Detection of Water-Surface Targets for Video Surveillance in Maritime Transportation

The timely, automatic, and accurate detection of water-surface targets has received signiﬁcant attention in intelligent vision-enabled maritime transportation systems. The reliable detection results are also beneﬁcial for water quality monitoring in practical applications. However, the visual image quality is often inevitably degraded due to the poor weather conditions, potentially leading to unsatisfactory target detection results. The degraded images could be restored using state-of-the-art visibility enhancement methods. It is still diﬃcult to generate high-quality detection performance due to the unavoidable loss of details in restored images. To alleviate these limitations, we ﬁrst investigate the inﬂuences of visibility enhancement methods on detection results and then propose a neural network-empowered water-surface target detection framework. A data augmentation strategy, which synthetically simulates the degraded images under diﬀerent weather conditions, is further presented to promote the generalization and feature representation abilities of our network. The proposed detection performance has the capacity of accurately detecting the water-surface targets under diﬀerent adverse imaging conditions, e.g., haze, low-lightness, and rain. Experimental results on both synthetic and realistic scenarios have illustrated the eﬀectiveness of the proposed framework in terms of detection accuracy and eﬃcacy.


Introduction
e visual information captured from video is able to provide more meaningful and important data in intelligent transportation systems. With the popularization of artificial intelligence (AI) and Internet of ings (IoT), intelligent vision techniques have witnessed the rapid developments of video surveillance in the IoT-empowered maritime transportation system [1][2][3]. To enhance traffic safety and maritime monitoring, the timely and accurate detection of surface-water targets (e.g., ship, garbage, and person in water) has received tremendous attention in the current literature. e traditional detection methods, such as mean shift [4], deformable part-based models (DPMs) [5], support vector machine (SVM) [6], and sparse representation [7], have been proposed to detect the targets of interest. However, the corresponding detection results easily suffer from the complicated environments, including water-surface reflect light and multimoving targets. To further improve detection results, deep learning has gained increasing attention during the past several years. e learning-based detection methods can be mainly divided into two types, i.e., two-stage and one-stage methods. In the literature, R-CNN [8], Fast R-CNN [9], and Faster R-CNN [10] belong to the representative two-stage strategy. e accurate detection results could be obtained but at the expense of a high computational cost. To guarantee real time detection, the one-stage methods have recently gained much attention in practical applications. e typical methods mainly include YOLOv1 [11], which is based on image global information, and its advanced extensions (e.g., YOLOv2 [12], YOLOv3 [13], and YOLOv4 [14]). ese methods could achieve a good balance between detection accuracy and efficiency and have been widely adopted in practical applications. ere is thus a great potential to exploit these methods to detect water-surface targets in intelligent vision-enabled maritime transportation systems [15,16].
However, it is often intractable to observe high-quality video images under different severe weather conditions (e.g., hazy, low-light imaging, and rainy), which easily occur in practice. e corresponding degraded visibility will be harmful for reliable detection of water-surface targets, even for deep learning-enhanced detection methods. e existing visibility enhancement methods can theoretically improve image visual quality. It should be noted that it is impossible to generate restored images without loss of any details [17]. Due to this essential limitation, the learning-based detection methods easily fail to accurately recognize the water-surface targets. To guarantee accurate and robust detection, the data augmentation strategies can be incorporated into existing learning-based detection frameworks. It has the capacity of significantly improving detection results under different imaging conditions. Taking detection of water-surface garbage as an example, it is able to guarantee reliable water quality monitoring for IoT-based maritime video surveillance. In current literature [18][19][20], most studies on water quality monitoring mainly focus on spectrum analysis and physical and chemical analysis in several IoT-empowered practical applications. To our knowledge, rare studies have been implemented on automatic detection of water-surface garbage through IoT-based maritime video surveillance. e main contributions related to intelligent watersurface target detection and water quality monitoring are threefold in this work: (1) An intelligent vision-enabled water-surface target detection framework with deep neural networks has been proposed for IoT-based maritime video surveillance (2) Optimal strategies for training deep neural networks have been represented to handle the influences of different severe weather conditions on water-surface target detection (3) Extensive detection experiments on both simulated and real-world scenarios have demonstrated that the proposed vision-enabled water-surface target detection framework could provide robust and accurate results under different imaging conditions e main benefit of our water-surface target detection framework is that two aspects have been taken into consideration, i.e., the powerful learning capacity of deep neural networks and the data augmentation-based network learning strategy. Comprehensive experiments under different severe weather conditions demonstrate that the proposed framework could accurately and robustly detect the water-surface targets for maritime video surveillance. e reminder of this work is composed of several sections. Section 2 briefly reviews the recent work related to IoT-based maritime video surveillance and detection of water-surface targets. In Section 3, an intelligent vision-enabled target detection framework is proposed to promote maritime surveillance under different weather conditions. Comprehensive experiments are performed to demonstrate the effectiveness of our detection framework in Section 4. is paper is finally concluded by summarizing the main contributions in Section 5.

Related Work
In the current literature, many efforts have been devoted to maritime surveillance and water quality monitoring. We will briefly review the intelligent maritime surveillance and water-surface target detection in this section.

Intelligent Video Surveillance in Maritime Transportation.
Intelligent maritime surveillance has recently attracted tremendous attention [21], which enables the understanding of various maritime activities leading to enhanced maritime safety and security. With the recent burgeoning application of computer vision technology, visual-based maritime surveillance systems could provide more reliable applications, such as traffic safety management and water pollution monitoring. In the current literature, Bloisi et al. [22] proposed to promote existing maritime surveillance systems through the popular closed-circuit television (CCTV) camera device which could provide useful visual information. To promote maritime video surveillance under different weather conditions, many visibility enhancement methods were developed to improve imaging quality. For example, illumination decomposition-based image dehazing [23] reconstructed the natural-looking dehazed images in visual maritime surveillance. e observed maritime images captured in low-light conditions have also been improved through the Retinex theory [24] and deep learning [25]. e high-quality images in maritime video surveillance are potentially conducive to promoting maritime monitoring in practice. To achieve more effective visual maritime surveillance, the maritime objects of interest should also be robustly and accurately detected, recognized, and tracked [26][27][28].
With the rapid developments of low-end IoT devices and emerging AI techniques [29,30], IoT-enabled intelligent maritime video surveillance has received increasing attention both from academics and practitioners. Liu et al. [2] proposed to improve big data quality to promote intelligent vessel traffic services in maritime IoT systems. To further improve the efficacy of maritime surveillance, an AIenpowered maritime IoT was developed by proposing a parallel-network-driven approach [31]. Palma [32] also proposed to enable maritime IoT across the seas and oceans through the performance analysis of CoAP and 6LoWPAN over VHF links. By combining the AI-based computer vision and IoT, it is tractable to robustly and accurately detect the undesirable water-surface targets under different weather conditions. ere is thus a great potential for monitoring water quality with the IoT-based maritime video surveillance. For more details on maritime IoT, please refer to [33] and references therein.

Detection of Water-Surface Targets in Maritime
Surveillance.
e detection of water-surface targets (i.e., garbage) is beneficial for water quality monitoring in maritime surveillance. e reliable evaluation of water quality is challenged for management activities aiming at protecting the limited water resources. Several different types of water quality monitoring systems have been developed to assist in assessing water quality and providing early warning [19]. Adu-Manu et al. [20] has reviewed the main water quality monitoring methods from traditional manual to newly developed methods. e paper-based monitor sensors and smart cell phones were combined for on-site water quality monitoring [34]. However, it becomes difficult to implement real time water quality monitoring in a large-scale region through this technology. To overcome this limitation, Chung and Yoo [35] proposed to implement remote water quality monitoring through the wireless sensor network (WSN). e network of smart sensors was proposed to implement in situ and in continuous spatio-temporal monitoring of surface-water quality [36]. For more details on the traditional water quality monitoring system, please refer to [19,37,38] and references therein. Video surveillance has become an emerging manner to indirectly assess water quality. For example, Serra-Toro et al. [39] tended to monitor water quality by recognizing the fish swimming behavior from video images. To promote learning-based waste detection in water bodies, a dataset (i.e., AquaTrash) [40] was developed based on existing TACO dataset [41] to assist in protecting water sources. Benefiting from the strong learning capacity of deep models, an extension of YOLOv3 [42] performs well in effective detection of vision-based water-surface garbage.
e YOLOv3 network has been embedded into an intelligent water-surface cleaner robot, which is capable of accurately and real timely detecting and collecting floating garbage [43]. However, if the observed images are degraded under severe weather conditions (e.g., haze, low-lightness, and rain), these intelligent vision-based detection methods easily fail to accurately and robustly recognize the water-surface garbage, leading to unreliable water quality monitoring in maritime surveillance.

Intelligent Vision-Enabled Water-Surface Target Detection Framework
In this work, we mainly focus on detection of water-surface targets in vision-empowered maritime surveillance. An intelligent vision-enabled water-surface target detection framework with deep neural networks will be proposed. To enhance the accuracy and robustness of target detection, degraded images under different severe weather conditions will be synthetically generated through existing physical imaging models. ese synthetically degraded images are naturally beneficial for improving the generalization abilities of our neural networks.

AI-Empowered Detection of Water-Surface Targets.
We propose to develop the intelligent vision-enabled watersurface target detection framework based on the existing CCTV system, which has been widely utilized in maritime video surveillance. With the great advancements of IoT technologies, where sensors and embedded equipment are connected to the Internet to efficiently gather and exchange maritime data [44], IoT-based maritime video surveillance has become increasingly more attractive to both academia and industry. Two typical deep neural networks, i.e., YOLOv4 [14] and Faster R-CNN [10], are incorporated into our learning-enabled detection framework to accurately monitor the water-surface garbage. To improve the generalization abilities of neural networks, a standard dataset, which contains several types of water-surface garbage, is designed to train our networks. However, the efficacy of water-surface garbage detection highly depends upon the imaging quality of CCTV. Under different weather conditions, e.g., haze, low-lightness, and rain, the observed images inevitably suffer from visibility degradation, leading to unsatisfactory detection results in practical applications. To guarantee reliable detection performance, we propose to develop two different strategies to enhance the proposed intelligent vision-enabled target detection framework, i.e., (1) It firstly selects the state-of-the-art visibility enhancement methods to improve the visual qualities of target images obtained under hazy, low-light, or rainy conditions. Both Faster R-CNN and YOLOv4, which are trained using the standard dataset only containing sharp images, are then adopted to automatically detect the water-surface garbage in visibility-enhanced images. (2) To eliminate the negative effects of suboptimal enhanced images on garbage detection, the second strategy proposes to enlarge the existing standard dataset with different types of degraded images synthetically generated under different severe weather conditions. e enlarged dataset, which contains both sharp and degraded images, is beneficial for promoting the generalization abilities of our neural networks. e effectiveness and robustness of garbage detection under degraded visual environments could be enhanced accordingly.
Extensive experiments will be implemented in this work to compare these two strategies and select the optimal scenario. Once the harmful water-surface garbage is accurately detected, an automatic alarm device in our IoT-based maritime video surveillance system will automatically emit an alarm signal. e operators in charge will then perform the corresponding activities to reduce the negative effects of unwanted garbage on water quality. erefore, the proposed intelligent vision-enabled water-surface target detection framework is capable of early detection of harmful pollution for real time monitoring water quality under different weather conditions.

Synthetically Degraded Images in Poor Weather
Conditions. To enhance the learning capacities of our neural networks, the observed degraded images under different severe weather conditions will be synthetically generated Journal of Advanced Transportation 3 accordingly. In this work, the hazy, low-light, and rainy images can be simulated according to the following physical imaging principles.

Haze Imaging.
In the fields of image processing and computer vision, the atmospheric scattering model has been directly adopted to model the process of generating hazedegraded images. In particular, the observed degraded image I(x) ∈ [0, 1] is obtained as follows: where represents the transmission map, A ∈ (0, 1] is the global atmospheric light, and x ∈ Ω denotes the pixel index with Ω being image domain. Given the predefined J and A, the hazy images under different haze levels can be synthetically generated by manually changing the mappings of t. In particular, t can be theoretically generated according to the depth map d, i.e., t � exp(−κd) with κ being a positive constant. It is able to directly measure the depth map d in practical applications.

Low-Light
Imaging. According to the Retinex theory, the observed low-light image I(x) ∈ [0, 1] can be defined as follows: where ∘ represents the element-wise multiplication operator and R(x) ∈ [0, 1] and L(x) ∈ [I(x), 1] denote the reflection and illumination maps, respectively. eoretically, the visual quality of I is highly related to the magnitude of L with the existing high-quality image R. To simulate the lowlight images, the latent sharp images are firstly transformed from RGB color space into HSV color space. By multiplying the V layer with different attenuation coefficients ϖ ∈ (0, 1) in sharp images, we can accordingly generate the synthetically degraded images from severe to slight levels.

Rain
Imaging. e rain-degraded image I(x) ∈ [0, 1] can be considered as the combination of rain-free background B(x) ∈ [0, 1] and rain streak layer S(x) ∈ [0, 1]. e rainy image I can thus be synthetically expressed as follows: where the rain-free scene B is commonly reconstructed by estimating the unwanted rain streaks S from the rainy version I. Analogous to the generation of rain streaks in [45], we introduce both salt-and-pepper noise and motion blur to synthetically simulate the rain-degraded images. e degraded images under different rain conditions are highly related to the levels of random noise [46] and types of motion blurs.

One-Stage-Based Detection Framework.
e object detector is an important part of our water-surface target detection framework. To improve detection accuracy and robustness, it is necessary to employ the reliable object detector. In the literature, deep network-based object detectors can be classified into two categorizes: (i) onestage detectors and (ii) two-stage detectors. As an emerging one-stage-based object detection method, YOLOv4 is now receiving an increasing attention from both academia and industry. We will provide a brief overview of YOLOv4 used in our target detection framework in this subsection.
To balance the trade-off between detection accuracy and efficiency, Redmon et al. [11] originally proposed a one-stage detector called YOLO in 2016. It divides the target image into several partially overlapping regions of different sizes and predicts the bounding boxes and probabilities for each region.
e object identification and bounding box extraction are jointly implemented by formulating object detection as a special regression problem. To further enhance detection performance, YOLOv2 [12] and YOLOv3 [13] were then proposed with improved precision and speed. More recently, a more powerful detection framework, named YOLOv4 [14], was presented, which combined several advanced strategies in different aspects. As observed in Figure 1, we choose CSPDarknet53 as the backbone of YOLOv4, which could promote the ability of learning invariable features. In particular, YOLOv4 utilizes Weighted-Residual-Connections (WRCs) and Cross-Stage-Partial-Connections (CSPC), takes into account Cross mini-Batch Normalization (CmBN), DropBlock regularization, and Mish-activation, and performs Self-Adversarial-Training (SAT) and Mosaic data augmentation to train detection network. In literature [47,48], YOLOv4 has achieved superior performance in different datasets. is state-of-theart detection results benefit from the powerful "Bag-of-Freebies" and "Bag-of-Specials" detection strategies during the real time implementation. ere is thus a great potential to adopt YOLOv4 to real time detection of water-surface garbage in existing maritime video surveillance systems. For more details on YOLOv4, please refer to [14] and references therein.

Two-Stage-Based Detection
Framework. Current studies have demonstrated that two-stage detectors could produce higher detection accuracy compared with traditional onestage detectors [49]. However, the superior accuracy was achieved at the expense of high computation, leading to slow inference speed. Among the two-stage-based detection methods, Faster R-CNN [10] has received tremendous attention from both academia and industry. It mainly consists of two modules. In particular, the first module is a fully convolutional region proposal network (RPN) which proposes candidate regions. e second module is the Fast R-CNN [9] detector which refines the proposed regions. Note that both RPN and Fast R-CNN share the same convolutional layers, resulting in joint training in practice. It is worth mentioning that RPN can simultaneously predict the object boundary and object score at each location. Furthermore, RPN accepts images (of any size) as input and obtains a set of rectangular object suggestions, each of which has an objectivity score. e Faster R-CNN adopts a fully convolutional network to learn this process and guarantees end-to-end detector training on the shared convolutional features [10]. Owing to the sharing of convolutional features, it is tractable to adopt the very deep networks to yield high-quality detection results. Figure 2 visually displays the entire architecture of Faster R-CNN for water-surface garbage detection. For more implementation details on Faster R-CNN, please refer to [10] and references therein.

Experimental Results and Analysis
is section is mainly devoted to demonstrate the efficacy of our proposed framework on water-surface target detection under different weather conditions. Several objective criteria will be simultaneously selected to quantitatively evaluate the detection accuracy and robustness.

Experimental Environment Setup.
e network training of Faster R-CNN was performed using TensorFlow 1.9.0 on a machine with Intel ® CoreTM i7-8700K a 3.70 GHz × 12. e detection experiments were conducted using CUDA 9.0 and cuDNN 7.0.5 in Ubuntu18.04-based TensorFlow environment. In contrast, YOLOv4 was trained using PyTorch 1.5.1, which is a library specially developed for the deep learning model.

Experimental Dataset.
e original dataset adopted in our experiments contains 2000 clean maritime images collected by ourselves and downloaded from the Internet. Several typical images in our garbage dataset are visually illustrated in Figure 3.
is developed dataset is mainly composed of glass and plastic materials. We manually operated the garbage-type labels and high-precision bounding boxes to annotate these experimental images. We further augmented each training image by flipping horizontally, translating randomly, and cropping randomly, which makes the training dataset 3 times larger than the original version. ese images will be adopted to evaluate the performance of our intelligent vision-enabled garbage detection method for IoT-based maritime video surveillance.

Evaluation Criteria.
To evaluate the performance of water-surface target detection under different weather conditions, we propose to simultaneously adopt several evaluation criteria in this work. e quantitative evaluation indexes mainly include the intersection over union (IoU), Precision and Recall, mean average precision (mAP), and frames per second (FPS).

Intersection over Union (IoU).
To objectively evaluate object detectors (i.e., YOLOv4 and Faster R-CNN in this work), IoU has become the most popular metric to compare the similarity between any two arbitrary shapes [50]. e IoU is mathematically defined as follows: where A and B, respectively, denote the prediction and ground-truth bounding boxes. In particular, IoU has the attractive property of scale invariance. It means that it takes into consideration the width, height, and location of the bounding boxes. Because of this appealing property, IoU has also been successfully utilized in other tasks, e.g., segmentation [51] and object tracking [52].

F1
Score. Sometimes, these two indicators (i.e., Precision and Recall) cannot be directly adopted to accurately evaluate the detection performance. By combining both Precision and Recall, the calculation of F1 score is defined as follows: where P and R denote the Precision and Recall scores, respectively. In particular, the F1 score is exploited as the evaluation index as follows:

Mean Average Precision (mAP).
e commonly used evaluation index in target detection is mean average precision (mAP), which indicates the average of average precision (AP) scores over all of the types. e AP can be calculated according to the area under the Precision-Recall curve. In particular, mAP is mathematically defined as follows: where R n and P n , respectively, represent the Recall ratio and Precision ratio for the n-th preselected threshold. In addition, N c indicates the number of target types, which is set to N c � 1 in this work. Figure 3: e water-surface garbage dataset, which contains 2000 color images in different scales, is developed for deep learning-based automatic detection of water-surface garbage. is current dataset is mainly composed of glass and plastic materials. More types of watersurface garbage will be incorporated into our dataset in the future work.  Figure 2: e garbage dataset, which contains 2000 color images in different scales, is developed for deep learning-based automatic detection of water-surface garbage. is current dataset is mainly composed of glass and plastic materials. More types of water-surface garbage will be incorporated into our dataset in the future work. 6 Journal of Advanced Transportation

Frames per Second (FPS).
FPS represents the number of image frames handled using a specific detection method in one second. It has been widely adopted to evaluate the detection speed in practice.

Experimental Results on Detection Accuracy.
is subsection is devoted to evaluating the water-surface target detection results in terms of accuracy and robustness. In particular, both YOLOv4 and Faster R-CNN were trained directly using the original dataset. Five different objective criteria (i.e., F1, Recall, Precision, mAP, and FPS) are jointly adopted to quantitatively evaluate the detection accuracy and robustness. To further investigate the influences of different weather conditions on detection performance, the degraded images, synthetically generated under hazy, lowlight, and rainy imaging conditions, are collected to measure the detection results. In particular, the degraded images are simulated according to the physical imaging models introduced in Section 3.2. e detailed evaluation results are summarized in Table 1. It can be observed that YOLOv4 is able to real timely implement target detection and generates slightly more accurate detection results compared with Faster R-CNN for clean images, i.e., under normal light conditions, shown in Figure 4. If the observed images are degraded due to poor imaging conditions, the detection accuracy will be obviously decreased for both YOLOv4 and Faster R-CNN.
To further evaluate the detection results, Figure 5 shows the original images and degraded versions synthetically generated under hazy, low-light, and rainy imaging conditions, respectively. e visibility-degraded images theoretically lead to reduced detection accuracy. e experimental results are visually displayed in Figure 6. If the image degradation becomes more severe, the detection accuracy and robustness will be obviously decreased, leading to unreliable water quality monitoring. is phenomenon has been confirmed by the quantitative evaluation results in Table 1.

Influences of Visibility Enhancement on Detection Results.
As discussed in Section 4.3, the detection accuracy and robustness are highly depended upon the visual image quality. (Note that both YOLOv4 and Faster R-CNN were trained directly using the sharp images from original dataset. erefore, to ensure satisfactory detection results in this case, it is necessary to guarantee high-quality observed images in our IoT-based maritime video surveillance.) If the camera images are generated under poor weather conditions, it is essential to improve image quality using existing visibility enhancement methods.

Haze Removal Methods.
To suppress the effects of haze degradation on detection results, three typical haze removal methods, i.e., dark channel prior (DCP) [53], multiscale convolutional neural networks (MSCNNs) [54], and all-inone dehazing network (AOD-Net) [55], are introduced in this work to enhance image visibility. e popular DCP is based upon the assumption that most local patches in hazefree images contain some pixels with greatly small intensity at least one color channel. MSCNN firstly adopts the coarsescale network to estimate the rough transmission and then utilizes the fine-scale network to refine the rough transmission and generate final dehazed images. In contrast, AOD-Net directly reconstructs the haze-free images using an end-to-end network. Table 2 detailedly depicts the quantitative detection results for both YOLOv4 and Faster R-CNN based on restored images yielded by different haze removal methods. As can be observed, the detection performance has been obviously improved by enhancing the visual image quality. It means that dehazed images enable more accurate detection of watersurface garbage, leading to more reliable monitoring of water quality. e detection results are visually illustrated in Figure 7. Due to the negative effects of haze degradation, it is intractable to accurately detect some small-scale objects shown in Figures 7(d) and 7(e). In contrast, Figures 7(f) and 7(g) illustrate that the popular DCP-based dehazing method [53] is able to suppress the effect of haze, leading to satisfactory detection results for both YOLOv4 and Faster R-CNN. It means that the high-quality water-surface target detection could be guaranteed by combining visibility enhancement methods and deep networks under haze conditions.

Low-Light Image Enhancement Methods.
e adaptive histogram equalization (AHE) [56], Retinex-Net [57], and probabilistic method for image enhancement (PMIE) [58] are, respectively, adopted to reconstruct the high-quality images from their low-light versions. In particular, AHE performs well in contrast enhancement. By introducing the Retinex theory, Retinex-Net develops Decom-Net and Enhance-Net networks for image decomposition and illumination adjustment, respectively. PMIE employs a linear domain representation to simultaneously estimate both illumination and reflectance components to reconstruct latent sharp images. e water-surface target detection results under lowlight conditions are visually illustrated in Figure 8. It is obvious that the low-lightness could also generate negative effects on detection results. If the low-light images are enhanced via PMIE [58], the visual image quality is significantly improved in Figures 8(f ) and 8(g), leading to more satisfactory detection performance. e importance of visibility enhancement can be further confirmed by quantitative evaluation results in Table 3. e preliminary visibility enhancement methods (e.g., AHE, Retinex-Net, and PMIE) could contribute to high-quality water-surface target detection. Compared with the detection results from original sharp images in Figures 8(b) and 8(c), the enhanced scenarios are able to achieve the comparable results, potentially leading to reliable water quality monitoring under low-light conditions.

Rain Removal Methods.
To effectively remove the unwanted rain streaks, three state-of-the-art methods, i.e., lightweight pyramid networks (LP-Net) [59], directional gradient-guided constraints-based model (DiG-CoM) [60], and multiscale progressive fusion network (MSPFN) [61], are simultaneously introduced to improve image quality. In particular, LP-Net adopts the mature Gaussian-Laplacian image pyramid strategy to simplify image deraining. By taking into consideration the directional gradient operator, DiG-CoM performs well in efficiently extracting rain streaks from rainy images, leading to visual quality improvement. MSPFN enhances image quality by fully exploiting the pyramid representation to collaboratively model the rain streaks from multiple scales.  To evaluate the influences of visibility enhancement on target detection, the quantitative and qualitative detection results are indicated in Table 4 and Figure 9. e quantitative evaluations illustrate that rain removal methods are capable of improving detection accuracy and robustness under rainy conditions. However, it is inevitable that derained images easily suffer from loss of some fine details, unfortunately leading to suboptimal detection performance, shown in Figures 9(f ) and 9(g). To guarantee reliable water-surface target detection, it is thus necessary to generate high-quality derained images. ere are almost no deraining methods adopted to adequately remove rain streaks while preserving all fine details. In this work, we will propose to adopt the widely used data augmentation (DA) strategy to retrain both YOLOv4 and Faster R-CNN to effectively improve detection results under different severe weather conditions.

Influences of Data Augmentation on Detection Results.
e water-surface target detection methods introduced in Section 4.4 are essentially two-phase detection strategies, i.e.,

YOLOv4
Faster R-CNN (a) Original (b) Hazy Image1 (c) Hazy Image2 (d) Low-Light Image1 (e) Low-Light Image2 (f) Rainy Image1 (g) Rainy Image2 Figure 6: e water-surface garbage detection results under different weather conditions, i.e., hazy, low-light, and rainy conditions. It can be found that the detection accuracy will be dramatically reduced if the image degradation becomes more severe.  first visibility enhancement and then learning-based target detection. If both YOLOv4 and Faster R-CNN are trained using sharp images from the original dataset, the final detection results will thus depend on the qualities of enhanced images. However, it is almost impossible to perfectly reconstruct the high-quality maritime images, resulting in unsatisfied detection performance. In addition, this twophase framework may suffer from long computational cost because the final total time is related to the computational costs of visibility enhancement and target detection. If we can directly and accurately detect the water-surface targets under severe imaging conditions, it will real timely implement online water quality monitoring in practice. To achieve this goal, we will first synthetically simulate the degraded images according to the physical imaging models introduced in Section 3.2. e synthetically degraded images are collected to enlarge the existing standard dataset, which only contains original sharp images. is DA strategy could promote the volume and diversity of our training dataset, which is beneficial for enhancing the generalization abilities of our deep neural networks. e accuracy and robustness of target detection under different severe imaging conditions     of the cases. However, YOLOv4 is exploited for more efficient detection. To balance the trade-off between efficiency and accuracy, we propose to combine the DA strategy and YOLOv4 to directly detect water-surface targets from degraded images without visibility enhancement.

Experiments on Realistic Imaging-Degraded Conditions.
To demonstrate the applicability of our method, we adopt the enlarged dataset, which contains original sharp images and synthetically degraded images, to train YOLOv4-based garbage detection framework. Figure 10 displays the detection results under different lighting conditions. It can be found that our IoT-based maritime video surveillance system can provide accurate and robust detection of water-surface garbage. Compared with traditional WSN or contact-type chemical sensors, our intelligent vision-enabled water quality monitoring framework is more flexible, convenient, robust, and lowcost. ere is thus a huge potential to extend our intelligent framework for indirectly evaluating water quality in different water areas under different severe weather conditions.

Limitations and Future Studies.
e proposed intelligent vision-enabled framework has the capacity of effectively and robustly detecting water-surface targets. However, it still suffers from some potential limitations, which constrain the further improvement of water pollution detection in maritime surveillance.
(1) e designed garbage dataset only contains two main types of pollution materials, which could constrain the detection of water-surface garbage in practice. To further improve the detection effectiveness and robustness, other types, e.g., paper, cardboard, metal, and trash, should also be considered in the future studies. e enlarged volume and diversity of training dataset are beneficial for improving the generalization abilities of neural networks, resulting in more accurate and robust water quality monitoring in maritime transportation.
(2) Both YOLOv4 and Faster R-CNN are not specifically developed for detection of water-surface garbage. eoretically, this task is significantly different from other detection tasks, e.g., pedestrian, vessel, car, and animal detection. To further enhance the detection results, it is necessary to redesign and optimize these two neural networks according to the unique characteristics existing in water-surface targets.
Although the proposed detection framework has several limitations, it is still worthy of further investigation since it is able to achieve satisfactory detection results under severe weather conditions. e main contributions of this work show that there is a great potential for intelligent vision technique which tremendously improves water quality monitoring in maritime surveillance.

Conclusions
To conclude, we have proposed an intelligent vision-enabled target detection framework to automatically recognize water-surface garbage and make early warning in maritime transportation. It accordingly contributes to flexible and robust detection of harmful pollution in AI-and IoT-based maritime video surveillance. e major contributions of this paper were threefold. First, an intelligent vision-enabled water-surface target detection framework was developed to perform water quality monitoring. Second, we have designed a water-surface garbage dataset, which contains 2000 images collected by ourselves and downloaded from the Internet. A large number of synthetically degraded images were generated to further enlarge this dataset to improve the generalization abilities of our neural networks. Last, the proposed detection framework was capable of yielding timely, robust, and accurate garbage detection results. Numerous experiments on both synthetic and realistic scenarios have demonstrated the effectiveness and robustness of our water-surface target detection framework under different degraded visibility conditions. In addition, the water quality could be accordingly monitored with our intelligent maritime video surveillance.

Data Availability
e image data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
Yongqi Guo and Yuxu Lu are co-first authors.

Conflicts of Interest
e authors declare that they have no conflicts of interest.