Automatic Detection of Surface Defects on Underwater Pile-Pier of Bridges Based on Image Fusion and Deep Learning

. As an important part of the bridge structure system, the underwater pile-pier structure usually occurs various defects on its surfaces due to its complex hydrological environment. Te existing conventional defect detection approaches exist two aspects of problems: (1) insufcient defnition and color distortion of the underwater images, and (2) low efciency and error-prone. To solve these problems, this paper proposed the target defect detection model by integrating the image-fusion enhancement algorithm and the deep learning algorithm. Firstly, by analyzing the reasons for the degradation of the underwater images, the ACE (automatic color equalization) and CLAHE (contrast limited adaptive histogram equalization) algorithms are selected to enhance the image, respectively. Secondly, the two enhanced images are fused based on the point sharpness weight, and then the fusion results are further sharpened by the USM (unsharp mask) algorithm, thus obtaining the fnal fused images. Tirdly, 3,200 fused images are taken as the training set, by adopting the YOLOv3 algorithm to train the detection model, and then the training model is validated and tested by the other each 400 fused images, thus building up the target automatic detection model of underwater pile-pier surface defects. Finally, a series of comparison and discussion were conducted to validate the efectiveness of image-fusion and the robustness and efectiveness of the target detection model. Te results found that the target detection model has excellent robustness against noise and efectiveness in the surface defect detection. Tis indicates that the image-fusion approach proposed in this paper can efectively enhance the image features, and the target detection model is feasible, robust, and efective in the automatic detection of surface defects on underwater pile-pier structures.


Introduction
Te number of existing bridges in service in China has exceeded 900,000, and the proportion of bridges over 30 years will soar from less than 20% in 2014 to 62.7% in 2044 [1].Te explosive growth of old bridges shows that China's bridges have generally entered into a rapid aging stage.During the service process of bridge underwater pilepier structure, it is constantly afected by several factors, such as current scouring, ship collision, and wave force, which often lead to the appearance of various surface defects, such as cracks, exposed reinforcements, holes, and swellings [2].With the accumulation of defects, bridge collapse accidents will frequently occur in case surface defects of the bridge underwater pile-pier structure are not detected and found in time.Terefore, it is particularly urgent and important to carry out the defect detection of underwater pile-pier structures, thus providing accurate and efective data for the damage analysis and evaluation of underwater pile-pier structures [3,4].
At present, the underwater structure detection of bridges is mainly completed by professional divers or underwater robots carrying equipment to take photos or videos of pilepier structures [5].Due to the infuence of light and water quality, the images obtained by equipment are blurry, distorted in color, and full of various noises.In addition, the number of defect images obtained is large.Tese facts bring about low efciency, high subjectivity, and poor recognition precision when adopting conventional approaches [6].To efciently obtain precise recognition results of surface defects, numerous scholars have investigated on them from two aspects in recent years.One aspect is to enhance the image quality of surface defects, and the other is to develop more efective and intelligent target detection models.
A great number of approaches have been presented for image enhancement.Generally, wavelet transform, bilateral fltering, and Retinex-based approaches are commonly used for underwater image enhancement.For example, Guraksin et al. [7] proposed an underwater image enhancement approach based on the wavelet transform and diferential evolution algorithm.Tis approach efectively improves the visibility and image quality; however, it causes the image to be darker overall.Hassan et al. [8] presented a Retinex-based enhancing approach to enhance underwater images.Tis algorithm improves the overall underexposure of the images while preserving edge detail, however, the problem of image color distortion is still inevitable.To improve the defnition of images, Dan et al. [9] proposed the bilateral flter function with a controllable kernel function to estimate the illumination intensity of images.Te experimental results show that this approach can remarkably improve the defnition of underwater images.From the abovementioned theory, we can fnd that the abovementioned approaches can only enhance a certain characteristic of the image and cannot improve the overall efect of the image.As a result, several scholars tried to solve this problem by image fusion with diferent enhanced images.For example, Zhou et al. [10] proposed a fusion enhancement approach for underwater images based on white balance, guided fltering, and multiexposure sequence techniques to improve image dark details and solve the overenhancement problem of a single algorithm, and yet, it ignored the relationship between degradation and scene depth.Gao et al. [11] proposed an underwater image enhancement approach based on multiscale fusion, which fuses local contrast-corrected images with sharpened images to solve the problem of low contrast and color distortion in underwater images.However, this approach has corresponding restrictions on the feld of application, and only local details can be enhanced.In conclusion, none of these approaches can be directly applied to enhance the images of underwater pile-pier structures surface defects.It is necessary to develop corresponding image fusion enhancement approaches for the purpose of solving the problems, such as enhancing contrast and optimizing detail features of defect images.
Te deep learning algorithm has been widely applied in target detection in recent years, due to its characteristics of high efciency, objectivity, and precise recognition accuracy.Zhang et al. [12] were the frst to apply deep learning techniques in bridge surface defect detection and proposed the application of convolutional neural network (CNN) algorithm in bridge crack image recognition.It is preliminarily proved that the approach based on deep learning can solve the problem of bridge defect detection.In addition to the classifcation of the bridge surface defect, it is more important to locate the defect.Yang et al. [13] proposed a vision-based automated method for surface condition identifcation of concrete structures, consisting of pretrained convolutional neural networks (CNNs), transfer learning, and decision-level image fusion, to improve the accuracy of crack detection.Afterwards, Yang et al. [14] presented a data-driven model based on 2D convolutional neural networks and the improved bird swarm algorithm to evaluate the torsional capacity of reinforced concrete beams, and the results found that the proposed model outperformed other machine learning models, building codes, and empirical formulas.Cha et al. [15] applied the two-stage target detection algorithm, namely, faster R-CNN, to accomplish the classifcation and localization of the defects.Due to the insufcient detection efciency of the two-stage target detection algorithm, it is unable to meet the needs of real-time detection in engineering.To solve this problem, Deng et al. [16] applied the YOLOv2 one-stage target detection algorithm in the detection of cracks in concrete.Te experimental results show that using the YOLOv2 algorithm to detect cracks can indeed signifcantly improve the detection efciency, but this approach has poor detection performance for targets with large scale.After Joseph and Ali [17] proposed the YOLOv3 target detection algorithm with both detection speed and accuracy, Zhang et al. [18] applied this algorithm to concrete bridge surface defect detection, thus realizing the efcient and accurate detection of common surface defects.Afterwards, Pan and Yang [19] combined the YOLOv3 and CNN algorithm to establish the real-time detection model.Te developed model was used to monitor the bolt rotation angle and the results showed that the detection accuracy could reach more than 90%.Liu et al. [20] proposed the modifed YOLOv3 model to automatically detect pavement crack and found that the detection efect of the model is higher than other state-of-the-art methods.Trough the research of YOLOv3 algorithm, the Darknet53 network is used as the backbone due to its excellent feature extraction capabilities and inference speed in YOLOv3 algorithm, and the multiscale feature map output by the neck module is conducive to detecting objects of diferent scales.Compared with R-CNN series algorithms, it is found that the YOLOv3 algorithm can maintain a high detection speed while ensuring accurate detection [21,22].It is clear that this algorithm is the ideal one for underwater pile-pier defect detection of bridges.However, for the current detection approaches based on deep learning, the existing models and algorithms cannot be directly transplanted and applied for bridge underwater pile-pier structures.For the harsh environment, complicated noise, and blurred defect image details, it is indispensable to train and build up the target detection models for specifc environments and defect categories.
To solve the abovementioned problems, this paper presents an automatic detection model by integrating images enhancement and deep learning, which is applicable to detecting and locating the surface defects of bridge underwater pile-pier structures.First of all, this paper proposed the image enhancement approach based on pixel-level fusion, which simultaneously reduces the blurriness of underwater images and strengthens the clarity of defect contours through increasing contrast and correcting color deviation, thus improving the image overall quality and enhancing the defect detail features.Next, the target

Conventional Approaches of Underwater Image
Enhancement.When light propagates underwater, the phenomenon of light absorption and scattering will occur due to the propagation characteristics of light [23].Tis further leads to several problems such as insufcient contrast, [24,25] color distortion, [26,27], and uneven brightness distribution [28] in underwater images.Te abovementioned problems restrict the practical application of underwater images in the defect detection of bridge underwater pile-pier structures.To solve these problems, a number of image enhancement approaches have been developed.Herein, two main common-used conventional approaches are reviewed briefy as follows.

Contrast Limited Adaptive Histogram Equalization.
Contrast limited adaptive histogram equalization (CLAHE) [29] is employed to realize contrast enhancement by expanding the gray range.Generally, the algorithm divides the image into blocks and realizes histogram transformation by calculating the transformation function of each pixel neighborhood, which can reduce the loss of image details.In addition, the CLAHE algorithm also restricts the height of the gray histogram by clipping and redistribution, which can efectively solve the problems of excessive detail enhancement and noise amplifcation.Te process of the CLAHE algorithm is as follows: (1) Divide the original image into several subregion images according to the image size (2) Establish the histogram H (x) of each subregion (3) Calculate the clipping amplitude T: where c is the acquisition coefcient; H and W are the numbers of pixels in the height and width direction of the subregion image, respectively; M is the gray level (4) Fill the part above the threshold T into the bottom of the histogram and then obtain a new histogram H′(x) (5) Reconstruct the gray value by the bilinear interpolation calculation for diferent subregion images All in all, the CLAHE algorithm has the capability to balance the brightness distribution and signifcantly improve the contrast, while being less efective in color correction.

Automatic Color Equalization Algorithm.
To address the problem that the CLAHE algorithm is not satisfactory in brightness enhancement and color restoration, automatic color equalization (ACE) algorithm [30] emerged as the times require.Te algorithm considers the spatial location relation between color and brightness in the image; the pixel values of the enhanced images are obtained by diferential calculating the relative light-dark relationship between the target pixels and the surrounding pixels, and fnally, the fnal pixel values are corrected so that the enhanced image has excellent color restoration.
Te ACE algorithm is mainly divided into two steps.Te frst step is to adjust the image domain: substitute the pixel brightness value of the original underwater image I z into formula (2), and the intermediate image where R z (k) is the brightness value of pixel point; I z (k) − I z (q) is the brightness value diference of two diferent pixel points; d (k, q) is the distance function; and r( * ) represents the brightness performance function.Te second step is dynamic expansion: adjust the dynamic range of the intermediate image R z and obtain the fnal target image where O z (k) is the brightness value of pixel point; round ( * ) is the rounding function; s z is the slope of [(m z , 0), (M z , 255)] of the line segment, where m z , M z is calculated as follows: Trough the abovementioned two steps, it can be achieved to correct the image color deviation and improve the overall brightness.

Drawbacks of Conventional Enhancement Approaches.
Due to the diversity of underwater image degradation reasons, images from diferent underwater environments need to be enhanced by diferent algorithms, so a single image enhancement approach can only solve a certain aspect of the problem.Trough the comparison of the above enhancement algorithms, it is found that the ACE algorithm can efectively achieve color restoration, correct color deviation, and enhance brightness signifcantly, but the efect on contrast enhancement is not ideal.On the contrary, the CLAHE algorithm can signifcantly improve the contrast of underwater images and balance the brightness distribution, but it does not perform well in color restoration and the Structural Control and Health Monitoring overall brightness enhancement.Obviously, both the ACE algorithm and the CLAHE algorithm have ideal complementarity.Terefore, this paper presents an image pixellevel fusion enhancement approach by integrating the ACE algorithm and the CLAHE algorithm.

Image Fusion Enhancement Algorithm Based on Point
Sharpness Weight.To obtain images with the better defnition, the point sharpness value of diferent enhanced images is calculated and selected as the fusion weight.Te calculation formula of point sharpness value is as follows: where m and n are the length and width of images, respectively; dG/dx is the rate of gray level; and E (G) is the calculated point sharpness value.Te steps of the image fusion approach proposed in this paper are as follows: (1) Enhance the original images of underwater pile-pier structures by the ACE and CLAHE algorithms, respectively and obtain two enhanced images (2) Adopt the improved point sharpness formula to calculate the point sharpness value of the two enhanced images and normalize it as their weight value, respectively.Te calculation formula is as follows: where To further reduce the noise interference, the USM algorithm is adopted to further sharpen the fused image.More specifcally, after the Gaussian blur processing is performed on the input image, the extracted high-frequency components are multiplied by the sharpening coefcients and then resuperimposed on the input image; fnally, the resuperimposed image is fltered and denoised, respectively.Te calculation formula is as follows: where f (a, b) is the input image; h (a,b) is the high-frequency component; ω is the sharpening coefcient, usually the value is 0.6; and g σ and ⊗ represent flter denoising and convolution operations, respectively.It can be seen from Figure 2 that the image enhanced by the fusion algorithm proposed in this paper is the best, ACE is the second best, and CLAHE is the worst.As shown in Figures 2(d-1) and 2(d-2), the fused images can not only highlight the crack and reinforcement details but also recover the concrete surface pores and hollows distinctly.Tis indicates that the fusion algorithm proposed in this paper can solve the problems of blurring, indistinguishable contours, and color distortion of the original image; furthermore, it is benefcial to feature extraction of the image content and defect discrimination.Correspondingly, the overall color is well recovered from the images enhanced by the ACE algorithm in Figures 2(b-1) and 2(b-2); however, there still exist problems of local darkness and low contrast at the periphery of the images.Note also that the images enhanced by the CLAHE algorithm in Figures 2(c-1) and 2(c-2) have a better efect of defogging and enhanced contrast, while the color correction on the concrete surface has no signifcant efect.

Verifcation of the
In summary, the image-fusion enhancement algorithm proposed in this paper combines the advantages of the ACE algorithm and the CLAHE algorithm, which is suitable for image enhancement of bridge underwater pile-pier structure surface defect.

Comparison and Discussion.
To observe and quantitatively assess the efciency of the enhancement approach proposed in this paper, SIFT [31] (scale invariant feature transform) approach was employed to quantitate images from diferent enhancement approaches.Te essence of the SIFT approach is frstly to fnd feature points on diferent scalespaces, then to calculate the gradient directions of the feature points, and to adopt the calculated gradient directions to build up match relationships between images on diferent scale spaces fnally.Te image quality is evaluated according to the number of feature points and matching relationships.Specifcally, the more feature points and matching relationships found, the higher the image quality is.

4
Structural Control and Health Monitoring Tis SIFT approach generally includes three steps.(1) Extract feature points; (2) Locate feature points and determine feature gradient directions; (3) Find several pairs of feature points that match each other, and establish the corresponding relationship.
On the basis of the abovementioned steps, enhanced image feature points and matching relationships are calculated and depicted in Figures 3 and 4 with the ACE algorithm, the CLAHE algorithm, and fusion algorithm proposed in this paper, respectively.Here, the yellow numbers represent the feature points and the green lines represent the matching relationships.
It is obviously seen in terms of the number of feature points and matching relationships that the images enhanced by the fusion algorithm are the best, the ones by the ACE algorithm are the second place, the ones by the CLAHE algorithm rank third, and the original ones are the worst.Especially for the crack image, as shown in Figures 3(a-1)-3(d-1), the number of feature matching points is increased from zero to thousands after the enhancement by the fusion algorithm.From Figures 3(a-2)-3(d-2), the image feature points of exposed reinforcements are increased to hundreds after enhancement by the fusion algorithm, while the original image has only 10 feature points.Te specifc numerical values of the feature points and matching relationships are demonstrated in Figure 4.
In conclusion, the images enhanced by the fusion algorithm proposed in this paper have more feature points and better matching performance than those of other enhancement algorithms.It can be proved that the fusion algorithm proposed in this paper is efective and feasible, which can signifcantly improve the detail feature information of the bridge underwater pile-pier surface defect images.It is conducive to the target detection model to extract the defect features and thus improving the detection efect of the target automatic detection model.

Target Automatic Detection Model
According to the analysis and summary of the abovementioned, the target automatic detection model is presented in this paper.Firstly, the underwater pile-pier surface defect images are obtained by the underwater visible camera.Te acquired images are then amplifed by rotating, fipping, and scaling transformations.Afterward, the actual damage locations on the images are marked with regions.Finally, the      Structural Control and Health Monitoring 3.1.2.Data Augmentation.Te target detection model has a deep structure and a large number of parameters, which requires numerous of data to participate in training so as to update the weights to improve the generalization ability of the model.It is very difcult to obtain enough data; however, the data augmentation approach can efectively solve this problem.Among them, afne transformation refers to the approach of cutting, fipping, scaling, and rotating images.It is one of the most commonly used data augmentation approaches.In this paper, the operation of rotation, fip, and scaling in afne transformation is used to increase the number of data samples.Te image obtained by the data augmentation approach is shown in Figure 5.

Data Region Labeling.
To achieve the defect automatic detection by the target detection model built up in this paper, the defect regions and defect categories in the images need to be marked manually.Te acquired images are labeled by the software labelImg, thus completing the data region labeling.Te operation steps are as follows.
Firstly, open the image annotation tool labelImg, click "open" to load the image and then select "Create RectBox" to select the objects in the image, afterwards enter the corresponding defect category; fnally, click "save" to save the data as a corresponding "xml" fle.Wherein "crack" corresponds to cracks, "exre" corresponds to exposed reinforcements, "hole" corresponds to holes, and "swelling" corresponds to swellings.
Te operation interface of labeling the sample image with LabelImg is shown in Figure 6.

Te Model Network Structure.
Te target automatic detection model proposed in this paper is built up by integrating the YOLOv3 algorithm and image-fusion enhancement approach.Te steps are frstly to extract features from the input enhanced images through the feature extraction network Darknet53 to generate the corresponding feature map, then to score the targetability of the content contained in the feature map through the anchor box, and fnally to predict the category, location, and confdence of the detected target.Because the model is capable of fusing features on each scale to achieve prediction on feature maps of three diferent scale sizes, it can signifcantly enrich the information of the feature maps, making the network model learn more features and improving the detection performance of the model.Simultaneously, the residual structure is added to the network to prevent from gradient disappearance and gradient explosion caused by too deep network structure and too many parameters.Te network structure of the model in this paper is shown in Figure 7.
Among them, the feature extraction module is the core part of the model in this paper, which determines the performance of the whole network model.Te model adopts the Darknet53 network with deeper network layers and more convolutional layers and adds the residual network to solve the problem of nonconvergence of network training.

Resblock. Resblock consists of CBL (Con-v2D_BN_Leaky
) and Res_unit, which are the basic components of this network structure.Among them, the function of the CBL component is feature extraction as well as downsampling, and Res_unit ensures that the training will not be nonconvergence due to the deep network structure.Te Darknet53 network of the feature extraction module contains fve diferent Resblock units, thus enabling efective extraction of features from the target defect even in the case of the deep network layer.

CBL Component.
Te CBL component contains the convolutional layer, the BN (batch normalization) layer, and the activation layer.Its main function is to extract features from the images and to recognize the category and location of the defect.
Te convolutional, BN, and activation layers of this model are specifed as follows: (1) Te convolutional layer is the most important structure in the target detection algorithm, which contains several diferent convolutional kernels.Each element of the constituent convolutional kernels corresponds to a weight coefcient and a deviation coefcient, and its main function is to perform dot product operations with image data by the convolutional kernels thus achieving feature extraction.Te calculation formula is as follows: where x (l) i is the i th output of the l th layer, x (l−1) i is the i th output of the upper layer, k (l)  ij is the convolutional kernel of the l th layer, and b (l)  i is the i th deviation coefcient of the l th layer; (2) Te function of the BN layer is mainly to normalize the image data before inputting it to the next layer, which enables to reduce the variability between data.Te calculation formula is where  x (k) is the results after normalization, μ is the data mean value, and σ is the data standard deviation; (3) Te activation layer provides the network with nonlinear modeling capability.Only when the network model contains the activation function, the deep network has the ability to learn nonlinear mapping in layers.Otherwise, it is difcult to effectively model the data with the nonlinear distribution.Te activation layer in this paper adopts the Leaky ReLU function.Te calculation formula is as follows: Structural Control and Health Monitoring where max() is the function of the maximum value and α takes the value of 0.01.
where L conf is the confdence loss, L class is the classifcation loss, L loc is the bounding box loss, and λ 1 , λ 2 , λ 3 are the balance coefcients.

Target Confdence Loss Function.
Te target confdence refers to the probability that the target to be predicted is in the rectangular recognition box.Tis paper adopts the binary cross entropy loss function, which is where x i is the predicted value of the target to be detected,  x i is the sigmoid probability of the predicted value, and y i is the presence or absence of the target to be predicted in the prediction box, taking the value of 0 or 1, herein 0 and 1 represents with absence and presence, respectively.

Target Classifcation Loss Function.
Although the targets to be detected in this paper are four types of defects (cracks, exposed reinforcements, holes, and swellings), it is worth noting that the classifcation loss function is still adopted the binary cross entropy loss function.Te reason for this is that only positive samples have target classifcation loss.Tat is to say, when one type of target defect is detected in the recognition box, the other three types of defects are considered as the same category target defect that is absent in the recognition box.Te calculation formula is as follows: where x ij is the predicted value of the target to be detected,  x ij is the sigmoid probability of the predicted value, and y ij is the presence or absence of the j th defect in the i th target detection box, taking the value of 0 or 1, herein 0 and 1 represents with absence and presence, respectively.

Target Localization Loss Function.
Te target localization loss function of the algorithm in this paper adopts the sum of squared error loss function, which is the sum of squares of the diference value between the true value and the predicted value.Te calculation formula is as follows: ) where x i , x i are the actual and predicted values of the horizontal coordinates of the center point of the target detection box, y i ,  y i are the actual and predicted values of the vertical coordinates of the center point of the target detection box, w i ,  w i are the actual and predicted values of the width of the target detection box, and h i ,  h i are the actual and predicted values of the height of the target detection box.

Experiment Verification
4.1.Data Acquisition.At present, there is no open-source database for the images of surface defect on bridge underwater pile-pier structures; therefore, it is necessary to collect images from experiments and practical engineering.Images in this paper are mainly obtained from two approaches.Te frst approach is to cast pile-pier components with common surface defects in the laboratory and to place them in the pool, and then the defect images are obtained by the underwater visible camera; the second approach is mainly through the detection of underwater pile-pier on site (on-site detection of Wulongjiang Bridge in Fuzhou, China) and collection on line (detection reports on the underwater structures of bridges in Fujian province).In total, this paper collected 800 original images, of which 669 were from experiments and 131 were from practice engineering or the networks.

Structural Control and Health Monitoring
Trough the investigation of numerous bridges across the hydrological environment in Fujian province (Jinshan Bridge, Minqing bridge, Jimei bridge, etc.), it is found that there are four types of the most common and infuential surface defects for bridge underwater pile-pier structures, namely, cracks, exposed reinforcements, holes, and swellings.As a consequence, the abovementioned four types of defects are mainly simulated on the cast pile-pier components.Pool and partial components with defects are shown in Figure 8. Herein, the underwater visible camera was used to collect the surface defect images of the underwater pilepier structures.

Software and Hardware Confguration.
Since the training phase of the target detection model needs consuming a high degree of computer resources and taking a long training time, the cloud server was employed to train the target detection model.Te operating environment confguration in this paper is as follows: the operating system is Linux Ubuntu-4ubuntu0.3, the programming language is Python, the framework is Pytorch, and the graphics card is GeForce RTX 3090 with 23G memory.

Training and Validation Phases of the Target Automatic Detection Model.
Tere are 800 images obtained by the experiment or practical engineering collected, which contains 243 hole images, 272 crack images, 138 exposed reinforcement images, and 147 swelling images.After data augmentation method, 3200 images were generated through various operations.Among them, 800 images augmented by horizontal fip operation, 800 images augmented by image enlargement operation, 800 images augmented by image scaling operation, and 800 images augmented by rotation operation.3,200 images, 80 percent of the 4,000 fused image samples, were randomly selected as the training set, and each 400 images (10% of the 4,000 fused images) were selected as the test set and validation set, respectively.Te input image size is set to 640 × 640 pixels, the initial learning rate is 0.01, the momentum is 0.937, the weight decay is 0.0005, the batch size is set to 128, the backbone network is Darknet53, and the number of the training epochs is 500.Te loss curve of each module is shown in Figure 9.
From the observation of Figure 9, the following characteristics can be found: Te binary cross-entropy loss function is adopted to train the classifcation ability of the model in this paper.Tis means that the loss value will decrease remarkably when the model correctly classifed the defect type.Tat is to say, the model can rapidly acquire the ability as to how to correctly classify defects, which is why the classifcation loss value can steadily converge around 0. Compared with the classifcation loss value, the bounding box loss value converges steadily around 0.02.Te reason for this is that certain errors will occur between the localization box from the predicted model and the rectangular box by manual labeling; furthermore, the localization box will have a larger range than the actual target defect.However, the loss value of bounding box converges to 0.02.Tis also indicates that the target detection model has remarkable localization ability.( 4) Although there appears certain deviation between the training set loss and the validation set loss in the confdence loss curve, the diference is not obvious and whose values converge below 0.02.Tis indicates that there exists the slight overftting of the model in the aspect of confdence, but the phenomenon of slight overftting does not afect the overall recognition performance of the model.Tis can be proven in the following test set results.
To sum up, the target automatic detection model proposed in this paper has excellent convergence performance in the training and validation phases.It is capable of adequately extracting the efective feature information in the fusion images and can intelligently and efciently recognize the target defect on both the training and validation sets.

Testing Phase of the Target Automatic Detection Model.
To verify the generalization ability of the trained target detection model, 400 images from the test set were input into the trained and validated model.After the detection results were obtained from the automatic detection model, three indices were employed to evaluate the performance of the model.

Fusion Images Detection Results
. Te images from the test set were recognized, classifed, and localized by the trained model, and the bounding boxes with defect categories and confdence were outputted.Te partial detection results are shown in Figure 10.
Te detection results demonstrate that the target detection model presented in this paper is capable to achieve automatic detection for the underwater pile-pier structure surface defect images after a comparison of Figures 10(a

Model Performance Evaluation Index.
To quantitatively evaluate the performance of target automatic detection model, four evaluation indices were used in this paper, namely, the recall (R), precision (P), average precision (AP), and mean average precision (mAP).Tey are where R is the ratio of the number of detected targets to the total number of targets; P is the ratio of the number of detected targets to the number of all detected targets; TP is the number of correct detections of the target defect; FN is the number of target defects that are incorrectly detected as other defects; FP is the number of other defects detected as target defects; AP is the area of the curve surrounded by the horizontal and vertical coordinates with recall (R) and precision (P); mAP is the mean value of the average precision (AP) of all defect categories.Te four evaluation indices were employed to assess the performance of the model, and the evaluation results are also given.
(1) Evaluation Indices of P and R. Te P and R of target detection model trained in this paper are as follows.
From Table 1, it is seen that the mean value of the P reaches 95.19%.Herein, the maximum value of P is 98.63% for the swelling defect, while the minimum value of P is 90.48% for exposed reinforcement defect.Tis implies that the false detection rate of the four types of defects is extremely low.However, the mean value of R reaches 88.04%.Tis indicates that the model rarely misses defect detection, and all types of defects presented in the image can be generally detected.
All in all, it is evident that the model built up in this paper has not only a high correct identifcation rate, but also a low probability of missing detection.
(2) Evaluation Indices of AP and mAP.Te AP of target detection model trained in this paper is as follows.
It is obvious in Figure 11 that the AP of cracks, holes, exposed reinforcements, and swellings reach 94.29%, 97.94%, 90.90%, and 99.91%, respectively.And, the mAP value reaches 95.76%.Tese values are all above 90% and even the AP of swelling defect reaches 99.91%.Tis reveals that the target detection model has the excellent ability of recognition and classifcation for all types of defects.
In conclusion, the target automatic detection model proposed in this paper is feasible and efective, which enables to meet the actual engineering requirements.

Comparison and Discussion
. A series of comparison and discussion were conducted to validate the efectiveness of image-fusion, the robustness and efectiveness of the target detection model.Herein, it is divided into three parts.In the frst part, recognition efect is compared between images without and with fusion.Te second part examines the robustness of model under diferent noise.In fnal, the third part discusses the efectiveness of the target detection model.

Images without and with Fusion.
A comparison and discussion were made between the model presented in this paper using the images without and with the fusion enhancement algorithm.Herein, the same number of images and the same algorithm were used to build up the target detection model for the original images.Te partial detection results between the original images and the fused images are shown in Figure 12.It is worth noting that in Figure 12, for the same defect image, the detection results of original images failed to recognize the defect.However, the detection results of the fused images can successfully detect all the defects with accurate classifcation and localization, and all confdence is between 0.99 and 1. Terefore, it is intuitively seen that the image-fusion enhancement algorithm proposed in this paper can enhance the overall quality of original images and strengthen the defect detail features.It is conducive to feature extraction and detection performance of the target automatic detection model.
Te overall detection performance of the models trained separately by the original and fused images is still evaluated by adopting the AP and mAP.Te comparison of the model detection performance is shown in Figure 13.
As can be seen from Figure 13, the indices of the target detection model trained by the fused images are both higher than those of the original images in terms of AP.Among them, the largest increment in AP value is 20.39% for exposed reinforcements, while the smallest one is 3.93% for holes.Tis indicates that the image-fusion enhancement algorithm has the most signifcant enhancement efect on exposed reinforcement.From the overall perspective, the mAP of the model trained by fused images is 95.76%, which is 11.79% higher than those of the original images.It demonstrates that the images enhanced by the image fusion algorithm proposed in this paper can efectively improve the detection performance of the target automatic detection model.

Noise Efect.
In order to test the robustness of the target model against noise efect, Gaussian noise was added to the fusion images.Te variance values of Gaussian noise were taken as 0.1, 0.2, 0.3, 0.4, and 0.5, respectively.Five groups of fusion images with Gaussian noise of diferent variances were input into the model proposed in this paper.Te fnal recognition accuracy results are shown in the Figure 14.
It is obvious from Figure 14 that as the noise variance increases, the mAP indices gradually decrease.When the noise variance is less than 0.4, the model proposed in this paper has excellent noise-tolerance capacity and robustness, and the mAP is over 80.48%.Moreover, the recognition accuracy of the target detection model is as high as 75.94% even if the noise variance reaches 0.5.It proved that when the defect image becomes more blurred with the infuence of noise, the model proposed can still reliably identify, classify, and locate defects.Tis indicates that the model has excellent robustness and recognition accuracy.As can be seen from Figure 15, all four target detection algorithms have excellent recognition capacity, and their mAP values are more than 88%.More specifcally, the YOLOv3 algorithm proposed in this paper ranks the frst with 95.19%, the fast R-CNN algorithm ranks the second with 90.47%, the YOLOv2 algorithm ranks the third with 89.42%, and the SSD algorithm ranks the fourth with 88.70%.Based on the results of this study, it can be concluded that the model built up using the YOLOv3 algorithm in this paper has demonstrated exceptional accuracy and efectiveness in detecting underwater structures of bridges.Structural Control and Health Monitoring

Conclusions
Tis paper frstly proposed an image pixel-level fusion algorithm based on point sharpness weights by analyzing the problems of underwater imaging.Tis algorithm fuses and enhances the collected images of underwater pile-pier structure surface defects.Te fused images were then adopted to build up the target automatic detection model, to realize automatic detection of underwater pile-pier structure surface defects.Te main conclusions are as follows: (1) Tis paper proposes the point sharpness weightbased image fusion algorithm that is combined the advantages of the ACE algorithm and the CLAHE algorithm.Te results based on the SIFT feature matching approach show that the fusion algorithm can signifcantly improve the contrast and defnition of underwater images and strengthen the images feature information.It is conducive to feature extraction for target detection model.
(2) Tis paper proposes the target detection model by integrating the image-fusion enhancement algorithm and the YOLOv3 algorithm.Experimental results show that the model can achieve automatic detection of the underwater pile-pier structure surface defect.Tis indicates that the model proposed in this paper provides a new intelligent detection technology and is applicable to the identifcation of bridge underwater pile-pier surface defect.
(3) Te target detection model built-up in this paper can efectively recognize and locate underwater pile-pier surface defects.Four indices, namely, the precision (P), recall (R), average precision (AP), and mean average precision (mAP) are employed to validate the efectiveness of image-fusion and the robustness and efectiveness of the target detection model.
It is evident that the image quality of underwater pilepier surface defects can be improved and the automatic detection can be efectively performed by the model proposed in this paper.Tis provides approach and technical support for the automatic detection of underwater pile-pier surface defect.However, the target detection model proposed in this paper can only be used to recognize and locate four common surface defects of bridges underwater pile-pier structures.Furthermore, this model does not support a quantitative evaluation of the defects inside the structure, such as crack size and depth and area of swelling and hole area, which are crucial for assessing the remaining load capacity and service life of bridge structures.Tese works will be further investigated for us in the future.
E (G) A and E (G) C are the point sharpness values of the image enhanced by the ACE and CLAHE algorithm, respectively; W A and W C are the corresponding image weight coefcients, respectively (3) Decompose the two enhanced images into three single RGB channel images and fuse the corresponding channel values by the weight coefcients from Formulas (6) and (7) (4) Recombine the three fused single-channel images to obtain the fnal fused image Te image fusion process based on the point sharpness value weight is shown in Figure 1.

Figure 2 :
Figure 2: Te image enhancement results: (a) the original image, (b) image enhanced by ACE algorithm, (c) image enhanced by CLAHE algorithm, and (d) the fused image: (1) the crack image and (2) the exposed reinforcement image.

3. 1 .
Data Processing 3.1.1.Underwater Defect Image Data Acquisition.In this paper, four common surface defects of bridge underwater pile-pier structures are selected from the underwater pilepier images collected in the laboratory as the database, namely, cracks, exposed reinforcements, holes, and swellings.Meanwhile, the database is randomly divided into the training set, the validation set, and the testing set.Te training set is used for feature learning and training the parameter weights of the model; the validation set is used for adjusting the hyperparameters of the model and for preliminary evaluation of the trained model; and the function of the testing set is to test the model on data that has not been trained and validated and to evaluate the overall performance of the model for target recognition and localization.

Figure 3 :
Figure 3: Te matching results of Figure 2 based on the SIFT feature matching approach: (a) the original image feature matching, (b) the ACE algorithm enhanced image feature matching, (c) the CLAHE algorithm enhanced image feature matching, and (d) the fused image feature matching: (1) the crack image and (2) the exposed reinforcement image.

( 1 )
Te more times the model is trained, the smaller the training and validation loss values are.Especially in the frst 100 epochs of training, the loss curve decreases rapidly; afterwards, the loss values hardly change in both the training and validation phases.Tis indicates that the model learns numerous feature information, and the weight parameters change signifcantly after the training phase.(2) Te convergence efect reaches the ideal situation when the epoch is 500.At this moment, the loss curve is already close to the horizontal level.Tis indicates the diference between the predicted value and the actual value is extremely small, and this phenomenon also appears in all other three diagrams in Figure 9. (3) For both the bounding box loss curve and the classifcation loss curve, the iterations on both the training set and validation sets converge well.Tis indicates that as the epoch increases, the model has a better ability to locate and classify the target defect.

4. 5 . 3 .
Diferent Detection Algorithms Efect.Te same 3,200 fusion images were employed to train other three models using SSD (single shot MultiBox detector), fast R-CNN (fast region-based convolutional neural network), YOLOv2 algorithms, respectively.A comparison was made among the model proposed in this paper and other three models.Figure15depicts the results of testing set with other 400 fusion images.

Figure 12 :Figure 13 :Figure 14 :Figure 15 :
Figure 12: Te comparison of detection results between the original image and the fused image: (a) the detection result of the original image, (b) the detection result of the fused image: (1) crack, (2) hole, (3) exposed reinforcement, and (4) swelling.

2
Structural Control and Health Monitoring automatic detection model was built up by integrating the YOLOv3 algorithm and image enhancement approach, which can mine and learn the defect features in the images than other methods.Finally, a series of comparison and discussion were conducted to validate the efectiveness of image-fusion and the robustness and efectiveness of the target detection model.Te paper is organized as follows.
Enhancement Approaches 2.3.1.Image Results from Enhancement Approaches.To verify the efectiveness of the fusion algorithm proposed in this paper, two common surface defects of underwater structures are given as examples: crack (a-1) and exposed reinforcement (a-2).After being scanned and photographed by the underwater visible camera, the original images of surface defects are obtained.Te ACE, CLAHE, and the fusion algorithm proposed in this paper are applied to enhance the acquired defect images of the underwater pilepier structures, respectively.Te results are shown in Figure2.

Table 1 :
Te results of precision and recall.Te signifcance of bold values is the average values of precision and recall of the above four defects, respectively.