Automatic Garbage Scattered Area Detection with Data Augmentation and Transfer Learning in SUAV Low-Altitude Remote Sensing Images

,


Introduction
Many nature reserves and scenic spots have been shut down due to the destruction of the ecological environment. Take 2018 as an example: the Everest administration of the Tibet Autonomous Region issued an announcement prohibiting any individuals and units from entering the core area above the Rongbu Temple, and Boracay in the Philippines was shut down for half a year.
is year, Nianbaoyuze National Geopark, Zhaling-Eling Lake, and Xingxinghai reserves in China were also shut down. Among the many shutdown factors, secondary pollution caused by the inability to clean up garbage in time is considered to be the primary factor, which destroys water sources, vegetation, soil, glaciers, etc. erefore, it is necessary to propose an automatic garbage scattered area detection approach to improve the cleaning efficiency and reduce secondary pollution.
Object detection is a long-term hot issue in the field of computer vision. Currently, object detection algorithms can be divided into two classes: based on traditional computer vision approaches and based on deep learning. Traditional computer vision approaches have difficulty in discriminating colors, textures, edges, shapes, and sizes, such as Histogram of Gradient (HOG) [1], Scale-invariant Feature Transform (SIFT) [2], Gabor Wavelets [3], Gabor filters [4], Fisher Kernels [5]. Object detection approaches based on deep learning have become popular in the field of computer vision because of the massive labeled data in natural images and some state-of-art deep learning models. Currently, these deep learning models can be divided into two classes: based on region proposals and based on regression. Deep learning models based on region proposals have higher accuracy but the speed is slower, such as R-CNN [6], Faster R-CNN [7], and Mask R-CNN [8]. Deep learning models based on regression are faster but the accuracy is poor, such as YOLO [9] and SSD [10]. In addition, these deep learning models have requirements for the input image size. Most importantly, there is no publicly available dataset for garbage detection. e above reasons brought great challenges to the garbage scattered area detection, resulting in few related studies found in the published literature. In the existing literature, Mittal et al. [11] built a dataset containing the garbage scattered area. Finally, they achieved the detection accuracy of 87.7% in urban images by using AlexNet [12]. Wei and Cheng [13] collected 372 urban images containing the garbage scattered area and expanded the number of images by 9 times through traditional data augmentation operations. Finally, they achieved the detection accuracy of 89.7% by using Faster R-CNN. However, the above research is based on the garbage scattered area detection in the urban areas and relies heavily on cameras.
ese approaches cannot be implemented in nature reserves.
In order to solve the above issues, we propose an automatic garbage scattered area detection (GSAD) model based on the state-of-the-art deep learning EfficientDet [14] method, which can achieve the best trade-off between accuracy and speed. As shown in Figure 1, a garbage scattered area sample is detected by our model. e main contributions in this paper are (1) we build a garbage sample dataset based on SUAV low-altitude remote sensing; (2) we propose a novel data augmentation approach based on garbage scattered area detection; (3) this paper establishes an model (GSAD) for garbage scattered area detection based on EfficientDet, data augmentation, transfer learning, and image blocking and gives future research directions.

Materials and Methods
In this section, we describe the dataset in detail and introduce a series of approaches used to optimize the GSAD model. e workflow of the GSAD is shown in Figure 2. It is composed of seven modules: (1) data collection; (2) data preprocessing; (3) data augmentation; (4) transfer learning with EfficientDet; (5) image blocking; (6) test; and (7) model evaluation.

Dataset.
We collected a total of 630 remote sensing images that were taken by DJI Mavic 2 Pro with a flying height of 30 meters. e training set contains 480 images that were taken at Yunnan Normal University. And the test set contains 150 images that were taken in Dali Cangshan Erhai Nature Reserve. e image sizes range from 4,000 × 3,000 pixels to 5,472 × 3,648 pixels and the background of the garbage scattered area in the image is complex and diverse, including barren mountains, grassland, wetlands, and rivers. e size of the garbage scattered area in the image ranges from 200 × 200 pixels to 600 × 600 pixels. e garbage scattered area in the image is mainly presented in the form of stack and scatter. In addition, we fully consider the impact of brightness during the process of collecting images. e details of the dataset are shown in Table 1. Some training samples taken from the dataset are shown in Figure 3 and some test samples are shown in Figure 4.

Data Preprocessing.
We cannot directly use the training set to train deep learning models because the current deep learning models have requirements for the input image size. For example, YOLOv3 has three structures, including YOLOv3-320 structure, YOLOv3-416 structure and YOLOv3-608 structure. And the YOLOv3-416 structure is widely used. If the resolution of the image does not match the input size of the model, the image will be automatically scaled by the model. In this paper, we use SSD-512 structure, YOLOv3-416 structure, YOLOv4-512 [15] structure, and EfficientDet-512 structure as the base model.
For high-resolution remote sensing images, this direct scaling operation will seriously damage performance of the deep learning model for small objects detection because a lot of image information will be lost. Instead of converting lowresolution images into high-resolution images [16], we crop remote sensing images into subimages that contain the garbage scattered area. Considering that the size of the garbage scattered area in the image ranges from 200 × 200 pixels to 600 × 600 pixels, these subimages sizes range from 1,400 × 1,400 pixels to 1,800 × 1,800 pixels. Finally, we obtain 500 subimages cropped from the training set. We use these subimages as the new training set. However, these images will still be scaled. Due to the robustness of the deep learning model, a certain degree of scaling will not have a large impact on training the deep learning model with high prediction accuracy.

Data Augmentation.
Training an automatic garbage scattered area detection model with high prediction accuracy requires massive training samples for support. However, the number of original training samples is limited. Fewer training samples will lead to poor performance of the deep learning model. In order to overcome the problem caused by insufficient samples, we adopted two data augmentation approaches to increase training samples. ey are named DA1 and DA2, respectively.

DA1.
e first is to use traditional data augmentation operations. e training sample is increased by brightness change, translation, flip, rotation, and zoom. e brightness change refers to the brightening and darkening of the image. e brightness change in remote sensing images has a significant impact on the detection performance of deep learning models. e rotation operation refers to rotating the image at any angle. In addition, we apply translation, flip, zoom in, and zoom out to increase training samples. ese operations are shown in Figure 5.

DA2.
e second is to use our proposed data augmentation approach. e pseudo code is in Algorithm 1. We use a camera to shoot a single garbage. en we use simple image processing techniques to extract single garbage. In this way, we have extracted about 100 kinds of single garbage.   Figure 2: Workflow of GSAD model.  Mathematical Problems in Engineering 3   Mathematical Problems in Engineering ey have different shapes and appearances. Some samples are shown in Figure 6. We randomly select a certain number of single garbage and adjust the size of selected single garbage according to the proportion of single garbage in remote sensing images.
After we select a certain number of single garbage, we need to simulate the shape of selected garbage because the texture feature is one of the main features extracted by the deep learning model. e shape of the garbage scattered area has irregular characteristics. So we use irregular polygons to simulate the shape of the garbage scattered area. First, we randomly generate some key point sets. en, we generate irregular polygons according to the rule that two points are closest to each other and do not cross. Some simulated shapes of the garbage scattered area are shown in Figure 7. e distribution of the garbage scattered area is another major feature extracted by the deep learning model. erefore, in the simulated shape, it is of great significance to  simulate the distribution of the garbage scattered area as much as possible. We combine 8 distributions such as Normal distribution, Uniform distribution, and Exponential distribution. en, we randomly generate scale factors for each distribution. e final distribution is determined by the following formula: Y i is a scale factor for each distribution. B i is a type of distribution function. N is the number of distributions. In this paper, N is 8. Some simulated distributions of the garbage scattered area are shown in Figure 8.
Finally, we choose the original dataset as the background image. e selected garbage will be automatically placed in a random location of the background image according to the simulated shape and simulated distribution. We crop these generated images into subimages that contain the simulated   Mathematical Problems in Engineering garbage scattered area.
ese subimages sizes also range from 1,400 × 1,400 pixels to 1,800 × 1,800 pixels. ese subimages become a new training set. Some generated training samples are shown in Figure 9.
rough the above two data augmentation approaches, 4500 images and 1000 images are generated, respectively. e details of the final training set are shown in Table 2.

Transfer Learning.
rough two data augmentation approaches, we have increased a certain number of training samples. However, the number of images in the training set is still not enough to train the deep learning model. We use transfer learning to solve the problem of garbage scattered area detection with a small training set. e transfer learning process can be divided into two main steps. First, we use the deep learning model trained on the COCO dataset [17] as the pretrained model. en, we use the training set to fine-tune the pretrained model.

Image Blocking.
In the test phase, we use image blocking to solve small objects detection in low-altitude high-resolution images. We will divide the input image into a × b Mathematical Problems in Engineering subimages before the image is input to the deep learning model. After we set a to 3 and b to 2, the width and height of each subimage can be calculated through the following formula: where OriginalWidth and OriginalHeight are the width and height of the image in test set, respectively. Overlap is the overlap distance between two subimages, which is set to 200 pixels in this paper. BlockWidth and BlockHeight are the width and length of each subimage, respectively. e subimage sizes are approximately from 1,400 × 1,400 pixels to 1,800 × 1,800 pixels, which correspond to the size of the training set. en, each subimage is detected by the deep learning model and the detection results of six subimages are mapped to the original image. Finally, we use the nonmaximum suppression (NMS) algorithm in the original image to remove redundant bounding boxes. e detailed process of prediction is shown in Figure 10.

Results and Discussion
In this section, we introduce the experimental environment and evaluation criteria. In addition, a series of optimization experiments and comparative experiments are organized and implemented.

Experimental Environment.
All experiments are conducted on a desktop with an Intel single Core i7 CPU, NVIDIA GTX-2080Ti GPU (11 GB video memory). Other experimental environments are shown in Table 3. In the training phase, the deep learning model is trained for a total      of 40,000 iterations. e initial learning rate is set to 0.001. en, the learning rate is divided by 10 on the 30,000th iteration and the 35,000th iteration. e batch size is set to 12. e weight decay is 0.0005 and the momentum is 0.9.

Evaluation Criteria.
In this paper, we use precision, recall, and F1-score as the evaluation criteria to evaluate the deep learning model for garbage scattered area detection in SUAV low-altitude remote sensing images. ey are defined below: Precision represents the ratio of the true garbage scattered areas to the detected garbage scattered areas. Recall represents the ratio of the true garbage scattered areas to the total garbage scattered areas. F1-score is the harmonic mean between precision and recall.

Comparison among EfficientDet Models.
In the first series of experiments, we test the effectiveness of the DA2, transfer learning, and image blocking. Table 4 shows the quantitative comparison results among six different Effi-cientDet models. In Table 4, EfficientDet denotes the Effi-cientDet without data augmentation; EfficientDet-DA1 denotes the EfficientDet with traditional data augmentation operations; EfficientDet-DA2 denotes the EfficientDet with our proposed data augmentation approach; EfficientDet-DA1 & DA2 denotes the EfficientDet with two data augmentation approaches; EfficientDet-DA1 & DA2 & TL denotes the EfficientDet with two data augmentation approaches and transfer learning. e GSAD model is based on EfficientDet, two data augmentation approaches, transfer learning, and image blocking.
As shown in Table 4, the precision of EfficientDet-DA2 is higher than EfficientDet-DA1, which demonstrates that DA2 can more effectively improve the precision of the GSAD model because DA2 produces new image information. e detection performance of using two data augmentation approaches is better than using DA1 and DA2 separately, which demonstrates that the two data augmentation approaches complement each other. When we use the transfer learning approach, the performance of the model will be slightly improved, which reflects the positive influence of transfer learning. e image blocking technology will improve the F1score significantly because the deep learning model can extract more object information. However, this operation will destroy the real-time detection performance of the GSAD model. Its average detection time of 1.096 s is still acceptable.

Results on Other Deep Learning Models.
In the second series of experiments, we compare with other deep learning models. Four deep learning models based on transfer learning, data augmentation and image blocking are compared with the GSAD model. e results of the quantitative comparison are shown in Table 5. It is obvious that the Faster R-CNN model cannot recognize some garbage scattered areas, thus resulting in the effect that the recall is low. But the precision is high, which means the Faster-CNN is more capable of not falsely extracting nongarbage scattered areas. In addition, the speed of the Faster R-CNN is slow because the region proposals step and image blocking. When compared with the SSD, YOLOv3, and YOLOv4, those models will recognize people, trees, rocks, and other brightly colored things with similar characteristics as garbage, thus resulting in the effect that the precision is very low. And the recall of those deep learning models is high. is means that those models are more capable of extracting all the garbage scattered areas. Some false and missed samples detected by other models are shown in Figure 11. e precision and recall of the GSAD model are relatively high. is means that the GSAD model is more capable to extract all the garbage scattered areas and to not falsely extract nongarbage scattered areas. e average detection time of the GSAD model is still acceptable. However, when the garbage scattered area is partially obscured by vegetation, the model cannot effectively detect the garbage scattered area in remote sensing (j) (k) (l) Figure 12: Some test samples detected by the GSAD model.
images. In addition, the crowd gathered will also be detected as a garbage scattered area by the GSAD model with a certain probability. is is because the characteristics of the crowd gathered and the garbage scattered area are very similar. Some test samples detected by the GSAD model are shown in Figure 12.

Conclusions
In order to improve the cleaning efficiency and reduce secondary pollution of nature reserves, this paper proposes an automatic garbage scattered area detection (GSAD) model based on the state-of-the-art deep learning EfficientDet method, data augmentation, transfer learning, and image blocking. First, a garbage sample dataset based on remote sensing images is built. en, we use two data augmentation approaches to increase training samples and use transfer learning to solve the problem of insufficient training samples. Finally, we apply image blocking technology to improve the detection performance of small objects in remote sensing images. Experimental results demonstrate that the performance of our proposed GSAD model is far superior to other deep learning models. GSAD can achieve the F1-score of 95.11% and average detection time of 1.096 s. According to the detection results, we obtain the location information of all garbage scattered areas in remote sensing images, and then formulate a reasonable cleaning route in nature reserves. In future, we will study the SUAV aerial photography planning method. And we will propose an range positioning approach based on ground resolution estimation. Finally, we will develop a software platform that can automatically monitor garbage scattered areas and formulate a reasonable cleaning route in nature reserves.

Data Availability
e data used in the study are available from the corresponding author upon request.