Image Dehazing Method of Transmission Line for Unmanned Aerial Vehicle Inspection Based on Densely Connection Pyramid Network

The quality of the camera image directly determines the accuracy of the defect identification of the transmission line equipment. However, complex external factors such as haze can seriously affect the image quality of the aircraft. The traditional image dehazing methods are difficult to meet the needs of enhanced image inspection in complex environments. In this paper, the image enhancement technology in haze environment is studied, and an image dehazing method of transmission line based on densely connection pyramid network is proposed. The method uses an improved pyramid network for transmittance map calculation and uses an improved U-net network for atmospheric light value calculation. Then, the transmittance map, atmospheric light value, and dehazed image are jointly optimized to obtain image dehazing model. The method proposed in this paper can improve image brightness and contrast, increase image detail information, and can generate more realistic deblur images than traditional methods.


Introduction
In recent years, with the breakthrough of key transmission technologies such as intelligent autonomous operations and maintenance systems, UAV inspection [1][2][3][4][5][6][7] has been rapidly promoted and applied. For the patrol pictures, videos and other data generated during the patrol operation of the transmission line UAV, pattern recognition and computer vision technology [8][9][10][11][12] can be used to complete the discrimination work with the help of computers. This technology uses the deep learning network [13][14][15][16][17][18] to train and learn the fault samples of transmission line equipment, to obtain amature power vision target detection model, and automatically realize the target detection and fault location of the machine patrol image. The recognition accuracy and positioning accuracy of the target detection algorithm are positively correlated with the quality of the machine patrol image. The higher the image quality, the more effective the detection algorithm. However, drone inspections are mostly carried out in the wild, and their complex weather conditions play a crucial role in the quality of the photos taken. Due to environmental pollution, in recent years, a large range of haze weather has occurred in China, and the visibility of the field of vision has declined sharply.
Particulate matter in the atmosphere will seriously scatter the light coming into the camera, causing the brightness and contrast of transmission line pictures taken during drone patrols to decrease. The background image information is blurred, which ultimately leads to a serious degradation of image quality and the accuracy of image target detection. Therefore, it is urgent to study the image enhancement technology which is more suitable for the smog environment.
Image dehazing has always been a hot issue in the field of image enhancement. The research on image dehazing method at home and abroad is mainly divided into two categories: nonphysical methods based on image enhancement and restoration methods based on imaging models [19][20][21][22][23]. The method of image enhancement is mainly to use the traditional image enhancement technology to directly filter the low-quality foggy image to remove the influence of noise in the image and restore the image clarity. The typical image dehazing method based on image enhancement includes histogram equalization [24][25][26][27][28], wavelet transform method, and Retinex algorithm [29][30][31]. The method based on image physical repair is mainly to study the degradation model of foggy image and inversely solve the optical imaging model to obtain the dehazing image. This method can retain the detailed information of the image and improve the authenticity of the image. It is the mainstream direction of the current research on dehazing algorithms, and its representative method is based on the partial differential equation dehazing method [32], defog method based on depth of field [33,34], defog method based on a priori theory, and defog method based on deep learning [35][36][37]. In recent years, CNN networks have achieved great success in the fields of image segmentation, target detection, object classification, etc. Affected by this, more and more scholars have begun to introduce CNN networks into image dehazing algorithms. Literature [38][39][40][41] obtained the transmittance map of foggy images by establishing a shallow neural network and then used the image degradation model to achieve image dehazing. Literature [42,43] combines convolutional neural networks and guided filters to achieve the restoration of foggy images. Image dehazing method based on convolutional neural network achieves good dehazing results in some specific scenarios, but the network depth is insufficient, and the network architecture has defects that lead to unsatisfactory effects on scenes with high fog density.
Through the study of the existing image dehazing method, it is found that the traditional image dehazing method relies heavily on prior knowledge [44]. The model is seriously simplified, and the detail information of the dehazing image is insufficiently restored. The image enhancement method does not consider the physical imaging model, but only improves the visual effect of the image by changing the contrast and gray value of the image; image restoration is based on the imaging physical model, using dark channel prior theory [45] to repair the haze image. Compared with image enhancement, this method has better dehazing effect, but there are some simplifications to the image generation model during the implementation process, and there is still a certain difference between the restored image and the real image. In view of the shortcomings of the traditional image dehazing method, this paper proposes the image dehazing method of transmission line machine patrol image based on densely connected pyramid network. It directly embeds the atmospheric degradation model into the deep learning framework and uses physical principles to restore the image fog. The fog image obtained by this method is closer to the real image in visual effect.

Single Image Defog Model
2.1. Haze Image Optical Attenuation Model. In the field of image processing, the model [46] shown in Equation (1) is often used to describe the image formation process in fog and haze.
Among them, I represents the real image taken by the camera in the case of fog and haze; J represents the clear picture taken in the case of clear weather. A is the ambient light intensity, which is usually assumed to be constant in the local area of the image; t is the transmittance, which is used to describe the proportion of light that enters the camera lens through the haze; z represents the position of the pixel in the image. Transmittance is a factor related to distance, which represents the proportion of light transmitted by the target object through the atmosphere and reaching the camera lens. When the atmospheric light value A in the local area of the image is constant, the transmittance can usually be expressed as Equation (2).
It can be seen from Equation (1) that the image taken in fog and haze is the superposition of the light passing through the fog and the atmospheric light scattered by the fog and haze in a clear background image. The process of image dehazing is to find the global atmospheric light value A and transmittance t from the foggy image I, inversely calculate Equation (1), and finally obtain the clear image J.

Model Architecture of Densely Connected Pyramid
Dehazing Network. This paper proposes a new deep learning-based transmission line machine patrol image dehazing network, called dense connection pyramid dehazing network (DCPDN). The network uses the method of end-to-end learning to achieve the purpose of image dehazing. The essence of the DCPDN network is to use the physical method of image restoration to solve the problem of image degradation. The end-to-end image dehazing repair is achieved by embedding the atmospheric degradation model Equation (1) into the deep learning network. In the dehazing network, the deep fusion between the transmittance map estimation module, the atmospheric light value estimation module, and the dehazing image is realized, and the information exchange and restriction between the various modules are achieved to achieve the purpose of common optimization.
The architecture of the DCPDN network proposed in this paper is shown in Figure 1. The network is composed of four parts: (1) a transmission diagram estimation module with densely connected pyramids, (2) atmospheric light value estimation module, (3) dehazing module based on image degradation equation, and (4) joint discriminator module. The four modules are introduced in detail below:

Pyramid Densely Connected Transmission Graph
Estimation Network. Inspired by the previous method using multilevel features to estimate the transmittance map [47][48][49][50][51], this paper also attempts to use the multilevel features of the image to estimate the transmittance map, as shown 2 Wireless Communications and Mobile Computing in Figure 2. A densely connected encoding-decoding structure is proposed. The encoder-decoder uses dense blocks as the basic structural unit, and dense connections are made between layers within the dense blocks. Dense blocks can not only use CNN network to extract multilevel features of the image, but also ensure better convergence of the entire encoding-decoding network. In addition, the encoderdecoder also uses a multilayer pyramid pooling module, which uses the global transmission information of the image to estimate the transmittance map, avoiding the problem that the network pays too much attention to local details and ignores the global information.
In the encoder, it is a first traditional convolution block and then four dense blocks. The output of the encoder is 1/32 of the original input image. Corresponding to the encoder is the decoder, whose structure is completely symmetrical with the encoder and contains four dense blocks and one convolution block. The output of the decoder and the original image have the same size, and the corresponding modules of the two are directly connected. Although the proposed densely connected encoding-decoding structure combines different features within the network, the transmittance graph output only through the encoder-decoder still lacks global structural information with different scale features. This is because the features extracted from the images of different scales are not directly used to generate the feature rate map, so after the codec, the network has added a multilevel pyramid pooling module to use the feature information obtained by each layer of the feature pyramid. The final transmittance map is estimated. The pyramid pooling module designed in this paper contains four levels of pooling operations, and the output size is 1/4, 1/8, 1/16, and 1/32 of the original image size. Then, all the four sizes of transmittance maps are upsampled to the original image size, and the original images are connected, respectively, and finally, the fined transmittance map is obtained.

Atmospheric Light Value Estimation
Network. The calculation of atmospheric light values in the traditional image dehazing method is based on empirical formulas. The atmospheric light map used for dehazing is also rough and not precise, so it is difficult to obtain satisfactory dehazing images. This network proposes an improved U-net network when solving atmospheric light values, as shown in Figure 3. The entire network includes two parts: upsampling and downsampling. The upsampling process is to capture the context information of the image, and the downsampling process is to obtain the local precise information of the image. In the downsampling process, every two 3 × 3 convolutional layers will be followed by a 2 × 2 pooling layer. ReLU is used as the activation function after each convolutional layer. The upsampling process is symmetrical with the downsampling process. Each 2 × 2 pooling layer is followed by two 3 × 3 convolutional layers. The high-pixel atmospheric light feature maps extracted during the downsampling process are directly transmitted to the corresponding upsampling process to guide the generation of new feature maps and retain some of the important feature information obtained during the previous downsampling process to the greatest extent. The final output of the network is a refined atmospheric light map. The value of each pixel in the picture is as close as possible to the atmospheric light value in the case of real haze.

Dehazing Module Based on Image Degradation Equation.
In order to realize image dehazing using physical imaging principles, this network directly embeds the image degradation model into the dehaze network. As shown in the dehazing module in Figure 1, accurate transmittance map and atmospheric light map can be obtained from the first two modules, and the dehazing image can be easily obtained by using the image degradation model.

Joint Optimization Discriminant Network.
In order to establish the relationship between transmittance map, atmospheric light map, and dehazing image, this paper builds a joint optimization discriminant network based on GAN network [51]. The discrimination network uses the high 3 Wireless Communications and Mobile Computing correlation between the three to optimize the generated transmittance map, atmospheric light map, and dehazing image and finally obtain a clear and true dehazing image. LetG t and G d denote the generation networks of clear images and transmittance maps respectively. As shown in Equation (3), the first part and the second part of the formula are the games between the generator and the discriminator. Continuously optimize the generation and discrimination network and finally be able to generate the transmittance map and dehazing image as realistic as possible. The third part of the formula is to compare the actual clear image with the defogged image to further optimize the dehazing image. For the training of deep learning networks, the simplest loss function is the L2 loss function, but after many experiments, it is found that only the L2 loss function is used for network training, and the output image is often blurred. Through in-depth analysis of the image, it is found that the value of the edge pixel point of the target object has a discontinuity. It can be characterized by calculating the gradient of the pixel value. The edge and contour features of the target object can be captured in the first few layers of the CNN structure, so the first few layers of the convolutional neural network can be used as the edge detector for target feature extraction. So the first few layers of the convolutional neural network can be used as the edge detector for target feature extraction. Based on the above analysis, this paper proposes an edge protection loss function, which adds a two-way gradient loss and feature edge loss on the basis of the L2 loss function. The function expression is shown in Equation (4).   Wireless Communications and Mobile Computing L E is the retention losses for the entire edge of the target object, L E,l 2 is the loss for L2, L E,g is for the horizontal and vertical gradient loss, andL E,f is a characteristic loss. The specific calculation formula of loss is shown in Equation (5).
ð5Þ H x and H y calculate the image pixel gradient along the horizontal and vertical, respectively, and w × h represents the width and height of the output feature map. Feature loss definition is shown in Equation (6).
V i is on behalf of the CNN structure, c i , w i , h i , V i are the dimension of the corresponding low-level feature.λ E,l 2 , λ E,g , andλ E,f are the weight of the balance loss function.

Overall Loss Function.
For the entire network training, in addition to the edge protection loss function, the loss function of the atmospheric light map, the loss function of the dehazing module, and the loss function of the joint optimization discriminator are also required. The overall loss function can be expressed as Equation (7).
L t consists of edge retention loss L E x, and L a is the loss function of the atmospheric light calculation module that is composed of the traditional L2 loss L d , which denotes defog loss, which also consists of L2 loss only.L j is for the joint discriminator loss, andL j is defined as Equation (8).
where j is a constant.

Model Training
3.1. Composition of the Dataset. In order to train the dense connection pyramid dehazing network (DCPDN), this paper constructs a training set containing 8000 images through simulation. The training set contains a total of four data types, namely, foggy images, clear images, transmittance maps, and atmospheric light maps. In the process of obtaining the training set through simulation, we randomly sample as the atmospheric light value in the range of 0.5-1, And construct of the corresponding atmospheric light map, we randomly select the data as the scattering coefficient in the range of 0.4-1.6 and generate the corresponding transmittance map. We randomly selected 2000 transmission line images captured by drone patrol under clear weather and synthesized them according to the foggy image model Equation (1), obtaining a total of 8000 simulation images. Then, the dataset is divided into training set, validation set, and test set according to 7 : 2 : 1. In order to ensure that the trained dehazing network has good generalization performance, the training set in this paper does not contain any pictures in the verification test set.

Wireless Communications and Mobile Computing
uses Gaussian random variables to initialize the weight parameters, and the Adam optimization algorithm is used to optimize the network. The initial learning rate of the generator and the joint discriminator is set to 2 × 10 −3 . The learning rate is a key parameter that affects the model training.
The smaller the learning rate, the less likely to miss the local minimum, but the smaller the learning rate, the slower the model convergence. The number of samples in the training set of this network is not very large, so the full training set is selected for the batch size. Each iteration of the network can make full use of the feature information of the data in the entire training set and can accelerate the network's approach to the extreme point. In addition, the size of the image input from the network is uniformly adjusted to 512 × 512. In the end, the paper performed 40,000 iterations on the network and determined all the parameters of the network through cross-validation.
During the initial training of the model, we found that starting to train the entire network directly, the convergence speed of the network is very slow. The possible reason is that the gradient descent direction of different modules in the network in the initial training period is inconsistent, causing the convergence speed of the entire network to decrease. In order to solve this problem and accelerate training, this method introduces a staged learning strategy, which has been used in multimodel recognition [52] and feature learning [53] The algorithm is applied. We input the information in the training data to different modules in the network, and each module is trained separately without affecting each other, and we update the parameters independently. After each module completes the "initialization" of parameters, we associate different modules with each other to jointly optimize the entire network.

Defog Image Rendering Comparison.
We randomly select a foggy image from the UAV inspection image sample library and use the DCPDN dehazing algorithm for dehazing. The results are shown in Figure 4. It can be seen from Figure 4 that the method in this paper can effectively remove the haze in the image and restore the image detail information.
We randomly select a foggy image from the UAV inspection image sample library and use the He dehazing algorithm [37], Li dehazing algorithm [38], and DCPDN dehazing algorithm for dehazing. The results are shown in Figure 5.
Figures 5(a)-5(d) are the original foggy image, the image after He dehazing method, the image after Li dehazing method, and the image after DCPDN method, respectively. It can be seen from Figure 5(b) that the image processed by the He image dehazing method is seriously distorted in the sky area, and it is not good for dehazing images containing large white areas. It can be clearly seen from Figure 5(c) that the image dehazing is not thorough enough. The main reason is that the CN method used in the Li method has fewer layers, and the fog feature extraction is insufficient, resulting in the presence of fog in the processed image. Figure 5  proposed in this paper. From the visual effect, it is obviously superior to the first two methods, and the image detail information is restored while ensuring the image brightness and contrast.

Comparison of Dehazing Image Indicators.
Peak signalto-noise ratio (PSNR) and structural similarity (SSIM) are often used as the basis for image quality evaluation. The peak signal-to-noise ratio is defined by the ratio of the maximum signal power to the signal noise power, usually expressed in decibels. Structural similarity is to evaluate the image quality from the image brightness, contrast, and structural properties of the target object. We calculate the PSNR and SSIM values of Figure 5, and the results are shown in Table 1.
It can be seen from the comparison of the PSNR value and SSIM value in Table 1 that the DCPDN image dehazing method proposed in this paper is better than the image dehazing method of He and Li, and the PSNR value and SSIM value are higher. It shows that the image dehazing method proposed in this paper is good for image repair and can generate dehazing images with high similarity to clear images.

Target Detection Accuracy Comparison.
In the field of target detection, two indicators, average precision (AP) and mean average precision (mAP), are usually used to evaluate the pros and cons of the target detection algorithm. The average accuracy is used to measure the recognition accuracy of a target detection algorithm for an object. The mean average accuracy is used to measure the recognition accuracy of an algorithm on all targets. Generally speaking, mAP is a simple average of multitarget detection AP.
From the test set, 100 randomly selected images of foggy transmission lines with pole tower failure, smallsize fitting failure, ground conductor failure, and insulator failure were selected. The He image dehazing method, Li image dehazing method, and DCPDN image dehazing method were used for image dehazing, respectively. The Faster Rcnn target detection algorithm [54] was used to detect the equipment defect targets for foggy images, He dehazing images, Li dehazing images, and DCPDN dehazing images. We calculate the AP values of the four faults and the mAP values of each group of images separately. The results are shown in Table 2.
It can be seen from the results in Table 2 that the AP value and mAP value of the target detection algorithm after image dehazing have been improved, and the effect of the method proposed in this article is the most obvious, indicating that the preprocessing of image dehazing can improve the accuracy of target detection. The AP value of the target detection algorithm for tower failure, ground conductor failure, and insulator failure has been greatly improved. The AP value of small-size metal fittings has a small increase, which proves that the image dehazing process can improve the overall image quality, and the effect of recovering the edge information of large-size target objects is obvious.

Conclusion
This paper proposes an image dehazing method of transmission line machine patrol image based on densely connected pyramid network. It embeds the atmospheric degradation model directly into the deep learning framework and uses physical principles to restore the image. For the calculation of the transmittance graph, this paper proposes a new dense connection encoding-decoding structure with multilevel pooling modules and redesigns the edge retention loss function. This method introduces a joint discriminant optimizer based on GAN network in the network, which can jointly optimize the transmittance map and dehazing image with high correlation. Then use the sample set designed in this paper to train the network to obtain an image dehazing model suitable for the transmission line background. Experiments show that the dehazing image obtained by the method proposed in this paper is closer to the real image in visual effect. Using the dehazing algorithm proposed in this paper for image enhancement can improve the accuracy of the target detection algorithm.
Although the proposed method indeed promotes the quality of transmission line UAV inspection image, it is still not a real-time solution. Considering the actual problem that for the deep learning model proposed in this paper, the number of existing samples is still insufficient and future work can further increase the number of training set samples through data expansion method. In addition, the main work carried out in this paper is to enhance the image containing haze, but there are still some problems in the actual aircraft patrol image including raindrop image and motion blur. Thus, there is an important and meaningful need for work in the future to explore more comprehensive image enhancement method for unmanned aerial vehicle inspection in complex environment.