Computer Vision-Based Detection for Delayed Fracture of Bolts in Steel Bridges

. The delayed fracture of high-strength bolts occurs frequently in the bolt connections of long-span steel bridges. This phenomenon can threaten the safety of structures and even lead to serious accidents in certain cases. However, the manual inspection commonly used in engineering to detect the fractured bolts is time-consuming and inconvenient. Therefore, a computer vision-based inspection approach is proposed in this paper to rapidly and automatically detect the fractured bolts. The proposed approach is realized by a convolutional neural network-(CNN-) based deep learning algorithm, the third version of You Only Look Once (YOLOv3). A challenge for the detector training using YOLOv3 is that only limited amounts of images of the fractured bolts are available in practice. To address this challenge, ﬁ ve data augmentation methods are introduced to produce more labeled images, including brightness transformation, Gaussian blur, ﬂ ipping, perspective transformation, and scaling. Six YOLOv3 neural networks are trained using six di ﬀ erent augmented training sets, and then, the performance of each detector is tested on the same testing set to compare the e ﬀ ectiveness of di ﬀ erent augmentation methods. The highest average precision (AP) of the trained detectors is 89.14% when the intersection over union (IOU) threshold is set to 0.5. The practicality and robustness of the proposed method are further demonstrated on images that were never used in the training and testing of the detector. The results demonstrate that the proposed method can quickly and automatically detect the delayed fracture of high-strength bolts.


Introduction
High-strength bolt connections are widely used to assemble the load-bearing members of the steel structure in long-span steel bridges.The popularity of the bolt connections is attributed to their low cost, high reliability, and rapid assembly [1].However, these bridges are often operated in adverse environments and subject to corrosion [2,3], vibration and fatigue [4,5], and thermal cycling, which can contribute to the damage of bolts.The damage types of bolts that occur the most include the looseness and delayed fracture.The delayed fracture of bolts refers to the sudden fracture of bolts under constant tension [6].Due to the huge energy released by the brittle fracture, the fractured bolts will be missing.The damage of bolts will threaten the safety of the bridges and may even lead to severe accidents.Hence, it is necessary to monitor the bolt condition in the daily operation and maintenance phase.
Over the decades, structural health monitoring methods have attracted lots of attention [7][8][9][10], and they have been applied to detect the bolt damage [11,12].They mainly rely on the sensor technology to identify the variations of the preload, including piezoelectric active sensing methods [13,14], the electromechanical impedance methods [15,16], and the vibroacoustic modulation methods [17,18].A "smart washer" with a piezoceramic patch sandwiched between two flat metal rings was developed to monitor the bolted joint through the active sensing method [19].Further, the fluctuation of the impedance signatures in frequency was utilized to evaluate the bolted joint with the developed "smart washer" [20].A novel vibroacoustic modulation method was proposed to monitor the early looseness of a bolt in real time [21].Notably, although the contact sensor-based methods are proposed to detect the decrease of preload induced by initial bolt looseness, they can also be used to detect the delayed fracture of bolts, which is a kind of brittle damage and can result in the disappearance of the preload [22,23].Nonetheless, the contact sensor-based methods face the challenge of the cost scaling up when monitoring multiple bolts, because one sensor can only perform the measurement at one bolt.As a result, most bridges are impossible to equip with enough sensors, and the current monitoring method in practice highly relies on manual inspection.However, the whole inspection process is very dangerous and inefficient.As shown in Figure 1, maintenance workers are inspecting the delayed fracture of high-strength bolts on a long-span steel bridge.
In recent years, computer vision technology has received substantial attention as an interdisciplinary subject, and it has been applied in civil infrastructure inspection and monitoring to improve the accuracy and efficiency of manual vision inspection [24,25].It has been applied to detect bolt damage because there are always a huge number of bolts in actual steel structures.Park et al. [26] proposed a vision-based method to evaluate the rotation angle of a bolt nut.Cha et al. [27] utilized the image-processing techniques and the support vector machine to detect the bolt looseness.However, traditional computer vision-based methods have poor robustness and low accuracy.On the other hand, the convolutional neural network (CNN) has gained great success in computer vision [28] with the development of deep learning technology, and CNN-based algorithms have achieved the most advanced performance in various tasks, including image classification [29], object detection [30], and semantic segmentation [31].This kind of technology has also been applied for bolt damage detection.Huynh et al. [32] proposed a quasiautonomous bolt looseness detection method, where the plausible bolts were detected using a CNN-based object detector and the rotation angle of each bolt was measured by the Hough line transform.Zhao et al. [33] proposed a method for the measurement of the bolt-loosening rotation angle using a CNN-based object detector.Wang et al. [34] designed a computer vision-based method by integrating the perspective transformation to detect the bolt looseness for flange connections.However, most studies have only focused on the detection of bolt looseness, and there is no research on the inspection of the delayed fractures in high-strength bolts, to the authors' best knowledge.The visual characteristics of the delayed fracture of high-strength bolts are totally different from looseness, because the fractured bolts will be missing due to the tremendous amount of energy released by the fracture [22,23].Notably, bolt delayed fracture can be more dangerous than bolt looseness in theory, because the former will cause the vanishing of the preload, whereas the latter will only reduce the force.Hence, this paper proposed a computer vision-based inspection method for the delayed fracture of bolts, where the damage was detected and located in an automated fashion using an object detection algorithm.
The task of the object detection is to classify and locate the targets in the image, and various algorithms have been developed with high recognition accuracy.The CNN-based object detection methods can be divided into region-based and region-free classifications according to the differences in the idea of the algorithm.The region-based approaches, such as the region-based convolutional neural network (R-CNN) [35], Fast R-CNN [36], and Faster R-CNN [37], combine region proposals and CNN to detect objects.The region proposals are produced from the input image, and they are treated as the set of candidate detections.The region-free methods, such as Single Shot MultiBox Detector (SSD) [38], You Only Look Once (YOLO) [39], YOLOv2 [40], YOLOv3 [41], and YOLOv4 [42], frame the object detection task as a regression problem, and these methods directly detect objects from the input image by using CNN.The speed of region-based methods is slower than region-free methods due to the necessity of region proposals.Hence, the region-free methods were selected in our research for the real-time detection.In addition, YOLOv3 boasts improved performance for detecting small objects in wide-scale images [43,44].The size of the delayed fracture of bolts is relatively small in an image of a bolt connection.Therefore, YOLOv3 is selected to detect the delayed fracture of bolts.
On the other hand, the performance of the CNN-based object detector heavily relies on extracting information from abundant labeled images, and the performance can be improved with the increase of training data in amount and diversity.However, it is quite difficult to acquire enough labeled images in practice, and then, the performance of the trained detector is always limited to some extent.For the bolt damage detection task in long-span steel bridges, images are difficult to be captured due to the environmental complexity and limitation (such as the positions of fractured bolts in a long-span bridge are inaccessible in most cases), and the manual labeling of the images is laborious due to the concentration of bolts.
Data augmentation is one of the most commonly used methods to alleviate this problem, and it can automatically enlarge the dataset by utilizing the existing images [45,46].In recent years, many data augmentation methods have been developed for object detection, and images are augmented by many kinds of image-processing technologies.The widely used technologies include brightness transformation, flipping, noise addition, and perspective transformation [47].For example, Fast R-CNN and Faster R-CNN use horizontal flipping to augment training data [36,37].The perspective transformation was introduced to enlarge the training dataset for transmissionline object detection [48].Although many augmentation techniques are available, the selection of the techniques is taskspecific and primarily depends on the experience.Thus, the augmentation effects are still unclear for each method in the In this paper, a computer vision-based inspection method is developed to automatically detect and locate the bolt delayed fracture, and a series of data augmentation methods are utilized to improve the performance of the detector without external laboriousness.In addition, the impact of different data augmentation methods on the performance of the detector is analyzed.

Workflow of the Detection Method for the Bolt Delayed
Fracture.As shown in Figure 2, the whole process involves three steps, including dataset preparation, detector training, and damage detection.During dataset preparation, many raw images of high-strength bolt connections are first collected through a camera device.Then, the labeled images can be obtained by artificial labeling and data augmentation, where the damage is labeled by enclosing rectangle bounding boxes.The pairwise images and labels are used to train the YOLOv3 neural network until it can pass the performance checking.Finally, the trained neural network can be used as a damage detector to perform damage detection in the real world.

Overview of YOLOv3
Detector.YOLOv3 is evolved from its predecessors: YOLO and YOLOv2, which mainly improves the detection accuracy, especially for the detection of small targets.Specifically, a new network, Darknet53, integrating residual networks and Darknet19 (the network used in YOLOv2) was introduced to improve the ability of feature extraction, and the multiscale prediction is used to help simultaneously obtain semantic information and fine-grained information from different feature maps.The architecture of YOLOv3 is shown in Figure 3.
At the beginning of the training process, the image-label pairs are fed into the neural network.Each input image is adjusted to a fixed size, and then, it is divided into S × S grids using upsampling and feature fusion operations.Each grid is tasked with detecting objects that have their center coordinates enclosed by the grid.Each grid outputs b bounding boxes and c conditional category probability.Each bounding box can be determined by the coordinate information (x, y, w, and h) and the confidence score (S c ).The coordinates (x, y) point towards the center of the bounding box.The parameters w and h are, respectively, the width and height of the bounding box.S c can be obtained according to Equation (1).The loss function value is calculated using the prediction value and label value.The adjustable parameters in the neural network are updated using a backpropagation algorithm.The process is repeated until the loss function value converges at a small value.During the inference process, only the image is fed into the trained neural network, and the prediction of the neural network is regarded as the detection result.
where PðObjectÞ is equal to 0 when no object exists in the grid; otherwise, its value is 1.IOU truth pred is the intersection over union (IOU) between the predicted bounding box and the ground truth of the object.The loss function in YOLOv3 consists of three parts: coordinate loss, IOU loss, and classification loss.All of them correspond to the output of the neural network prediction.However, the classification loss is removed in this paper, because the number of classifications is only one.The loss function used in this paper is shown in the following equation: where λ noobj and λ coord are the efficiencies of the IOU loss and coordinate loss, respectively; xi , ŷi , ŵi , ĥi , Ŝci are the ground truth values.The value of I obj ij is 1, when the target falls into the j th bounding box of the i th grid; otherwise, it is equal to 0.
In addition, the average precision (AP) is used as an indicator to estimate the performance of the damage detector.The AP sums up the precision-recall curve by computing the area under the curve [49].The precision (P) and recall (R) are defined as follows:  4, where the labels of the image are represented by green rectangular bounding boxes.
The data augmentation methods take an image and its label as input and automatically generate a new augmented image and corresponding label.As shown in Figure 4, the BT and GB do not change the coordinates of the bounding box, whereas the PT, FL, and SC can result in the coordinate change of the bounding box.Hence, the bounding boxes after BT and GB are the same as the original bounding boxes, and the bounding boxes after FL, PT, and SC should be rectified.
The details of the data augmentation methods are described in the following text.The linear BT was used in where T is the projective transformation matrix, v i = ðx i , y i , 1Þ is a point in the original image, and u i = ðx * i , y * i , 1Þ is a point in the PT augmented image.
x tl , x bl , x br , where W and H represent the width and height of the image; x tl , x bl , x br , x tr , y tl , y bl , y br , and y tr are the distances from points to the corresponding image boundary; and λ is the intensity parameter for perspective transformation; and the greater the λ, the more obvious the perspective phenomenon.

Experimental Verification
3.1.Dataset Preparation.Due to the practical challenges of obtaining large amounts of images depicting the bolt delayed fracture in real bridges, only two images of delayed fractured bolts from an actual suspension bridge were collected in this study.It is impossible to train the YOLOv3 neural network with such a limited number of images; however, these two images can be used to demonstrate the practicability of the proposed method.Thus, training images were generated using two steel plates held together with high-strength bolts.Many images of fractured bolts were collected from the fabricated steel plates to train the neural network.
In this paper, a total of 500 raw images were collected at 3016 × 3016-pixel and 3016 × 4032-pixel resolutions by a smartphone camera from Xiaomi Mi 6.The distance between the object and the camera is approximately from 0.2 m to 1.5 m.To obtain different lighting intensities of a bolt image in an actual bridge, the images were collected outside during different times of the day (e.g., 9 a.m., 1 p.m., and 5 p.m.).The relationship between the camera's viewing direction and the direction of the sunlight illumination will also influence the brightness of the images.Hence, during the image collection, the conditions of front-lighting, back-lighting, and side-   Journal of Sensors lighting are all considered.The direction of the camera viewing was set parallel, antiparallel, and perpendicular to the vector of the sunlight, which can, respectively, simulate front-lighting, back-lighting, and side-lighting.Since the shadow from clouds or the bridge structure will affect the detection accuracy, images were also gathered under scattered tree shade.The apparent shape of the bolt changes based on the viewing angle, and thus, images were taken from multiple viewing angles for the same fractured bolt during image acquisition.After the image acquisition, the fractured bolts in all 500 images were manually labeled with bounding boxes using the custom code written in Python.And considering the convenience of using the dataset in the future, the dataset is converted to PASCAL VOC format [49].A ".XML" file including the information of the labeled bounding boxes was generated for each image after successful labeling.The file was then converted into a ".txt" file suitable for the training.The labeled images were then randomly divided into three sets: training set, validation set, and testing set, with 320, 80, and 100 images for each set, respectively.During the labeling process, a total of 439 objects were annotated in the 320 training images.The training set was utilized to train the neural network, and the validation set was used to aid the training and avoid overfitting.After training, the performance of the trained detector was estimated with the testing set.Notably, five extra training sets were generated using five data augmentation methods based on the original training set, and finally, a total of six training sets (DA OR , DA BT , DA GB , DA FL , DA PT , and DA SC ) were established in this research and used to train six neural networks.DA OR is the original training set with 320 manually labeled images.DA BT , DA GB , DA FL , DA PT , and DA SC are the training sets consisting of augmented images generated by the corresponding augmentation method and manually labeled images.The augmented images were produced before training for convenience.
Two BT coefficients were randomly selected from 0.6 to 1.4 and utilized to adjust the brightness for images in DA OR , and as a result, 640 new images were generated.The range of the brightness transformation coefficient was determined based on whether the edge of the target can be identified using naked eyes.The original images in DA OR were also modified using GB to generate 640 additional images.The standard deviation for the Gaussian kernel was randomly selected between 0 and 3.0.The range of standard deviations was set in the same manner mentioned in BT.The images in DA OR were horizontally and vertically flipped, and 640 new flipping images were produced.The scaling coefficient was selected from 0.1 to 1.9, and 640 new augmented images were produced.The PT was applied twice, and 640 new augmented images were generated.The perspective intensity parameter λ was selected from 0.1 to 0.3.The number of images in different data sets is shown in Table 1.

Implementation Details during Training Process.
All experiments were performed on a personal computer: Lenovo R720 (a Core i7-7700HQ CPU @ 2.80 GHz, 8 GB DDR4 memory, and 2 GB memory NVIDIA GeForce GTX 1050 Ti GPU).All the training and testing processes were conducted on the GPU.The YOLOv3 neural network was developed using Python 3.6.5 under TensorFlow 1.8.0 frame.
Before the beginning of experiments, the k-means clustering algorithm was applied on the size of the bounding boxes of images in DA OR to obtain the bounding box priors and facilitate network learning and detection results.The clustering results are shown in Figure 6.The number of clusters is set at 9 as follows: ð15 × 11Þ, ð13 × 14Þ, ð20 × 17Þ, ð22 × 21Þ, ð25 × 26Þ, ð35 × 31Þ, ð45 × 45Þ, ð74 × 74Þ, and ð131 × 160Þ.
In order to improve the detection accuracy of the detector and adapt to the required input format of the Darknet53, the size of the input image is set to 416 × 416 pixels.Due to

Result and Discussion
After training, the images in the testing set were used to test the performance of the six detectors, and the IOU threshold is set to 0.5, 0.6, and 0.7.The AP values of six detectors under different IOU thresholds were calculated, as shown in Table 2.The highest AP value is 89.14%, which indicates that the trained detector has a strong generalization and excellent detection performance, and the AP value decreases with the increase of the IOU threshold.The AP of the detector trained using DA OR is used as a benchmark, and the AP increment of other detectors is used to estimate the usefulness of different methods.The PT and FL both improve the AP value on the testing set, and the highest increment of AP is induced by PT, achieving 4.52%, 13.39%, and 18.45% corresponding to three different IOU thresholds.In theory, five data augmentation methods all can improve the richness of the training set, and the performance of the detectors trained by augmented training sets should be better than the detector trained by the original training set.However, the BT, GB, and SC reduce the performance, as shown in Table 2.The reason can be that although the change of lighting intensity, distance, and resolution was considered during image collection, the number of the collected raw images in the testing set is too small to represent the entire image sample.Hence, the promotion of the ability of the detector to detect vague images, brightness changes, and resolution changes cannot be reflected on the existing testing set, whereas the improvement of the ability to detect objects captured from different viewpoints is the most obvious, because all images were captured from different viewpoints.
The detection results of some images in the testing set are shown in Figure 7.The fractured bolts in the image were  In [41], the detection speed for a 416 × 416 resolution image is 0.029 seconds.Hence, this method can accomplish realtime autonomous damage detection when a camera is used in conjunction with a processor.The proposed method can facilitate the transition from manual inspection to automated inspection or monitoring carried out by fixed cameras, UAVs, or remote-controlled robots in the future.
The generalization ability of the trained detector was further demonstrated using the new images of bolts with different colors (black, red, gray, and blue) and covered with raindrops and the images of fractured bolts from an actual bridge (two 3024 × 4032-pixel images taken from a real long-span steel bridge in China).The detection results are shown in Figures 8  and 9, and the trained detector can correctly detect the damage from the new images.The results show that the trained detector does not overfit the two sample steel plates.It also demonstrates the practicality of the proposed method.
On the other hand, although the detector can detect the damage correctly, the predicted bounding boxes do not perfectly fit the fractured bolts.The minor errors may be induced by the limitation of the training set, such as the lack of images taken from actual bridges.In addition, a comprehensive analysis of the effectiveness of different augmentation techniques for this detection task needs a comprehensive image dataset.The images in the dataset should be collected from actual engineering.Thus, more actual images need to be collected and a larger image dataset will be established in the future to further analyze the effectiveness of different augmentation methods and how to use them in combination.

Conclusion
This paper presents a new, automated method to inspect fracture failures for bolts.The method is developed based upon the CNN-based object detection algorithm YOLOv3, and the performance of the detector is improved by data augmentation.An image dataset was developed through image acquisition, image labeling, and data augmentation, and six YOLOv3 neural networks were trained using different augmented training sets to analyze the impact of different augmentation methods.The highest AP of the trained detectors is 89.14% when the IOU threshold equals to 0.5.The effectiveness of different data augmentation methods is evaluated by the increment of AP.The highest increment of AP on the testing set is achieved by perspective transformation augmentation.The detection speed of the trained detector achieved 0.06 seconds for each input image with 416 × 416 resolution.The generalization of the trained network and the practicality of the proposed method were validated using new images that were never used in the training and testing.
The proposed method has the potential to enable safe, realtime, and autonomous detection of delayed fracture of highstrength bolts with high accuracy.

Figure 1 :
Figure 1: Maintenance workers are inspecting the bolt delayed fracture in a long-span steel bridge.

Figure 2 :
Figure 2: Flowchart of the proposed bolt delayed fracture-detection method.

Figure 5 :
Figure 5: Flowchart of the perspective transformation data augmentation.

Figure 6 :
Figure 6: The result of k-means clustering on the original training set.

7
Journal of Sensors the constraints of the GPU memory, the batch size was set to 8.And the training step was set to 10000 to analyze the training process by the loss curve.To prevent training the neural network from scratch, the internal adjustable parameters was initialized by using a pretrained weight, which can be obtained from the website (https://pjreddie.com/yolo/).The initial learning rate was set to 0.003 through trial and error with the help of the validation set.The λ coord and λ noobj are set to 5 and 0.5, respectively.To analyze the impact of different data augmentation methods, six neural networks were trained using six training sets.The parameter settings during training process are the same except using the different training sets.

Table 1 :
The number of images in the datasets.

Table 2 :
The average precision (AP) of the detectors using different training sets (%).