Bridge Displacement Measurement Using the GAN-Network-Based Spot Removal Algorithm and the SR-Based Coarse-to-Fine Target Location Method

Image-based bridge displacement measurement still sufers from certain limitations in outdoor implementation. Each of these limitations was addressed in this study. (1) Te laser spot is difcult to identify visually during the object distance (OD: mm) measurement using a laser rangefnder, which makes the scale factor (SF: mm/pixel) calibration tricky. To overcome this issue, a stereovision-based full-feld OD measurement method using only one camera was suggested. (2) Sunlight refected by the water surface during the measurement causes light spot interference on the captured images, which is not conducive to target tracking. A network for light spot removal based on a generative adversarial network (GAN) is designed. To obtain a better image restoration efect, the edge prior was novelly designed as the input of a shadow mask-based semantic-aware network (S 2 Net). (3) A coarse-to-fne matching strategy combined with image sparse representation (SR) was developed to balance the subpixel location precision and efciency. Te efectiveness of the above innovations was verifed through algorithm evaluation. Finally, the integrated method was applied to the vibration response monitoring of a concrete bridge impacted by the trafc load. Te image-based measurement results show good agreement with those of the long-gauge fber Bragg grating sensors and lower noise than that of the method before improvement.


Introduction
Calibration of the scale factor and the target location are two critical steps in image-based bridge displacement measurement technology [1][2][3]. However, the convenience and reliability of traditional techniques used to perform these two steps need to be improved.
Te scale factor (SF) is used to convert image displacement (pixel) into real displacement (mm) [4]. As the object distance (OD)-based method does not need a reference object having a known size near the measured target, it is widely used. For this method, two parameters, OD and the camera angle, are measured. At present, techniques to correct the efects of camera angles have been well studied [4,5]. However, OD measurement relies on the laser rangefnder, and the related challenges have not been suffciently researched. On the one hand, target sections are difcult to identify directly in the feld of view (FOV), especially when the bridge bottom is curved. On the other hand, the indicator point of a laser rangefnder is difcult for human eyes to capture, especially during the daytime. In addition, laser ranging can only obtain the OD of sparse points. Consequently, displacement conversion cannot be carried out if the measured distance does not correspond to the real targets. To avoid the above issues, visual ranging has been widely used. However, the working distance of an integrated depth camera is too small to be applied to the bridge structures considered in this study. Given the above analysis, a stereovision-based full-feld OD measurement method was proposed. To improve the practicability in the feld measurement, only one camera and two monopods are used in fact, which is diferent from the traditional binocular stereovision method. Scale conversion uses the actual distance between two shooting positions because it is easier to achieve than placing a ruler in the feld of view.
Furthermore, the images of a bridge bottom for displacement calculation are often degraded by light spots refected from the water surface, reducing the matching reliability of the region of interest (ROI). Terefore, it is necessary to frst restore the image. Te light spot problem encountered in this task is similar to the shadow phenomenon [6][7][8] in text image processing. In addition to traditional image optimization methods [9][10][11][12], deep learning (DL) has received immense attention in the past few years. As one of the most popular models in deep learning, generative adversarial network (GAN)-based models [13,14] have been widely used in image restoration. Recently, a shadow mask-based semantic-aware network (S 2 Net) [15] was proposed and showed a better restoration efect [16,17]. For instance, nonshadow regions were kept unchanged when fltering shadows. More importantly, much attention was paid to artifacts around the shadow edge. Accordingly, S 2 Net was also used for spot removal in this study. However, to achieve equilibrium in the training process of GAN, although the quality of the restored image is signifcantly improved, the gradient is smooth, and detail clarity is lost. Inspired by the idea in DeepSemanticfaceNet [18], it is proposed to take the edge image of the original image as input and redesign the loss function.
Using the restored images, both the accuracy and speed of the target location need to be addressed. Widely used template-matching (TM) techniques [19,20] cannot meet real-time requirements and will fail when features on the structure surface are sparse. Feature-based methods [21,22] have fewer demands regarding the texture and take less time. By contrast, the sparse representation (SR)-based target tracking method [23] shows higher efciency and robustness when image quality is afected. However, when the scene is complex or the real-time map is very small, there are usually many similar areas. In such cases, the mapping position of nonzero items is very scattered, which can easily lead to inaccurate positioning. To mitigate this issue, a distance-weighted sparse representation algorithm [24] was developed. In addition, the larger the dictionary size, the faster the sparse representation and the lower the positioning accuracy. Terefore, a coarse-to-fne matching strategy was designed to guarantee both speed and accuracy.
Te remainder of this paper is structured as follows: Te research framework and the algorithmic innovations, which cover three blocks for bridge displacement measurements, are introduced in Section 2. Te efectiveness of the improved algorithm is evaluated in Section 3. Ten, the engineering application to a concrete bridge is introduced in Section 4. Finally, this research work is summarized in Section 5.

Proposed Method
A convenient and robust camera-based method for bridge displacement measurement was proposed. Te framework is shown in Figure 1. First, the distance from the target to the camera needs to be measured; then, SF is determined [4] and used to convert the pixel displacement into physical displacement. Ten, image sequences are collected before and after deformation, and the target region is continuously positioned in these sequences to get the pixel displacement. As the efect of the light spot is considered, image preprocessing is required. In addition, to balance positioning speed and accuracy, a coarse-to-fne matching method based on sparse image representation was proposed. As the camera can continuously capture the image of the whole bridge, the displacements at multiple control sections of the whole bridge can be extracted synchronously. Object distance measurement, image preprocessing, and target location are the three key steps that afect the accuracy of measurement results. Corresponding innovative works were carried out, and the principles were introduced as follows.

Principle of Stereovision-Based Ranging Using Only One
Camera. Conventional binocular stereovision requires two fxed cameras. However, to minimize hardware costs, only one camera was used in this study. As shown in Figure 2(a), the internal parameter matrix K of the camera is kept constant and can be calibrated in advance [25]. After focusing, the lens was locked, and then, two images of the same scene were shot at diferent stations. Te coordinates of the matching point pair are obtained based on the feature point-matching algorithm SURF-BRISK [26]. Here, l/r in the parameter's subscript indicates that the parameter is for the left/right camera. Te projection equation corresponding to two images can be written as s l p l � M l X w � M l1 m l X w , where X w � (x w y w z w ) T is the world coordinate of the target point, s l and s r are the scale parameters of the left and right cameras, p l and p r are the coordinates of the projection points in the left and right images, and M l and M r are the projection matrices of two cameras. Let the 3 × 3 part of the projection matrix coordinates be M l1 and M r1 , and the remaining 3 × 1 part be m l and m r . Setting X � (x w y w z w ), equation (1) can be rewritten as s l p l � M l1 X + m l , By eliminating X from the above equation, As both sides of the above equation are threedimensional vectors, s l and s r can be eliminated to obtain the relationship between p l and p r , which are just the polar constraint. Let the left side of equation ( Te antisymmetric matrix of m is denoted as [m]×. As m × [m]× � 0, equation (5) can be derived: Multiplying p T r by the left side, Ten, the linear relationship between p l and p r can be described as

Structural Control and Health Monitoring
where F is the fundamental matrix. Te eight-point method was adopted to solve F. Te intrinsic matrix E refects the relation of the space points in diferent coordinate systems (equation (8)) which can be obtained through K and F (equation (9)): where E is only related to the camera motion, and in the sense of a nonzero factor diference, it can be expressed as Te rotation matrix R and the translation vector t � (t x t y t z ) can be recovered by performing SVD decomposition of E, and then, the two camera matrices M l and M r are obtained. Finally, the spatial coordinates are reconstructed through the triangulation method using the projection matrix. However, due to the lack of real-scale information, the threedimensional result here is dimensionless. Te conventional method is to place a measuring scale of known length in the feld of view. But this is suitable for the test scenario focused in this study. Terefore, it is proposed to realize the scale conversion by the distance between two shooting stations, which is easier to measure. Te specifc instructions are as follows.
According to the principle of relative orientation, the distance T (mm) between the optical centers ( Figure 2 where T x , T y , and T z are the components of T in three directions. Te relationship between them and the elements of t is T x : T y : T z � t x : t y : t z . As T can be measured in advance, then the relative orientation relationship and 3D coordinates with a real scale will be obtained. z w is just OD used for calculating the displacement conversion factor [4].

Proposed GAN-Based Light Spot Removal Method considering Edge Priors.
Te architecture of the GAN-based light spot removal method is shown in Figure 3. It consists of two parts, the frst network is for light detection and the second is for light spot removal. Te purpose of multitask learning is achieved by taking the output of the frst network as the input of the second network. To impress the interference of the light spot on target positioning, an end-to-end spot mask-based semantic-aware network (S 2 Net) [15] for light spot removal was adopted in this study. Using the guidance of semantic prior from the spot masks, spot-mask-based semantic transformation (SST) transfers statistical information from nonspot features to spot features, and the nonspot features were kept intact.
As the importance of edge information to image restoration has been proved [27], the author proposed to use the edge image of the degraded image as additional network input for deep training. Te Canny() function of OpenCV in Python was called for edge extraction to construct a new dataset, called GOPRO Gauss edge. Te restored clear image can then be expressed as where B is the degraded image and E(B) is the edge image of B. Trough the edge constraint, more prior information is provided, and more attention is paid to the key features of the structure. Te image gradient will be clearer and more detailed. Te loss function of the constructed network consists of four parts.

Target
Lef station

Edge Loss
in the training stage, to ensure that E(B) can be adequately considered and to avoid the possible ringing phenomenon, the edge loss was designed as where ⊙ represents the dot product. S i , L i , and B are the input clear image, the restored clear image, and the blurred image corresponding to the input clear image, respectively. Te generator based on GAN is shown in Figure 3.

Scale Loss
where k represents multiple scales and k � 3 in this method and c i , w i , and h i are the channel number, width, and height of the input image, respectively.

Adversarial Loss L adv .
To solve the problem that the GAN network training does not easily converge, deep convolution is used to generate the optimized formula of the adversarial network DCGAN [28]. Te adversarial loss is only applied to the output of the last scale: where G stands for the generator and D represents the discriminator. Furthermore, because of the existence of the skip connection, L adv can be written as 2.2.4. Perceptual Loss L per . Perceptual loss [29] helps enhance the detail of the image and is also used only for the last output: where ϕ l (x) is the frst layer of a characteristic pattern. Te perceptual loss layer used in this study is the same as that in the study by Shen et al. [27]. Te perceptual loss was then calculated through the pooling layers, i.e., Pool2 and Pool5.
To sum up, the loss function of DeepEdgeGAN is where the weight values are λ adv � 1 × 10 − 4 , λ edg � 10, and λ per � 5 × 10 − 5 , respectively [27]. GAN-based methods are prone to generate artifacts, and optimization is underconstrained due to unstable data acquisition. Since large-scale high-quality datasets [15] are publicly available, we adopt the strategy of training on paired data. Large and diverse training datasets can give the trained model better generalizability. Te input image is resized to 256 × 256, and the minibatch size is set to 8 for training. Te initial learning rate is 0.0001. Te learning rate is reduced by  Structural Control and Health Monitoring the "poly" policy [30], with a power of 0.9. We trained 600 epochs for each network.

Coarse-to-Fine Location Algorithm Based on Sparse
Representation. At present, image restoration mainly focuses on target detection reliability, while positioning accuracy is not given enough consideration, especially the displacement measurement to be realized in this study. In addition, existing image matching still sufers from issues of low matching efciency, high time complexity, and computation intensiveness. Combining with SR, a distanceweighted image-matching method from coarse-to-fne was proposed.
For an image x, it can be reexpressed as where D � d 1 , d 2 , ..., d n ∈ R m * n is the dictionary and α � α 1 , α, ..., α n is the sparse vector matrix. Referring to the conclusion in the research by Donoho [31], the solution of equation (19) is where λ ≥ 0. For image matching, after obtaining the dictionary D of a real-time image x, the sparse vector α can be obtained using equation (20). arg min stands for the argument of the minimum. Te atom of D corresponding to max(α) represents the position pp (m, n) of the real-time reference x in S. max(α) is the largest element of α. Tis can be expressed using the following formula: where map () is the position identity mapping function. Te positioning error is where pt (m, n) is the real pixel location of x in S.
It is preferable to use as many identical atoms as possible to express the real-time graph. Considering the spatial location constraint between the real image and the reference image, a distance constraint operator ω was introduced to ensure that atoms near the real location were given more emphasis during the sparse expression of real-time images. Ten, the nonzero term of the sparse vector α is constrained to be near the location pp as much as possible, that is, to ensure that similar candidate positioning regions have similar coefcient values.
Based on the above analysis, a sparse representation algorithm based on distance weighting was developed. Ten, the solution of equation (19) becomes where ω is the distance constraint operator, representing the Euclidean distance between x and D, "∘" denotes the operation at the pixel level, and λ‖ω ∘ α‖ 1 is the distance constraint of SR. With the distance constraint ω, the sparse expression is further sparse, while ensuring that similar atoms have approximate sparse coefcient values. A small length t (or the size of the sliding window, unit: pixel) can get higher localization accuracy but lower matching efciency. Ten, a coarse-to-fne matching strategy was designed, as shown in Figure 4. First, an initial step length t 1 was set to construct a dictionary for coarse matching. Ten, a smaller step length, t 2 � 1 pixel, was set to construct a new dictionary for fne localization.

Algorithm Tests for Image Restoration and Localization
Te images of real bridges are used for algorithm tests through two criteria.

Restoration Efects.
In this study, the peak signal-tonoise ratio (PSNR) and structural similarity (SSIM) [32] were set to evaluate restoration efects. Ten, 500 images and corresponding edge images were input into the proposed restoration model considering the edge prior. Te values of PSNR and SSIM were then calculated.  For comparison, the above operations were also performed in the training model without considering the edge prior.
From Figure 5, it is obvious that the restored image considering the edge has a higher image quality and clearer details. As presented in Table 1, it can be seen that the restoration efect is further improved after the introduction of the edge prior. Tis shows that the introduction of the edge prior plays a major role in promoting the restoration efect. Tis is because edge information provides the network with a clearer optimization direction in the learning process and a better constraint on the image structure in the restoration process.

Matching
Accuracy. Te original images disturbed by light spots were used as reference images, whose size was 2048 × 2048. Te real-time images were from the restoration images, and the size was 50 × 50. A total of 100 small images were randomly selected from each restored large image as real-time images. Te average matching localization results of these 300 small images were counted as the fnal result. Te matching methods include template matching (TM), the original method, subsequently improved methods, and the proposed method. Teir implementation details are shown in Table 2. Te red word indicates that this step takes an improved approach. For the proposed method, the step sizes are t 1 � 5 and t 2 � 1. Te positioning error was calculated through equation (22).
In this study, the average of the positioning error was used as the matching accuracy, and its unit is a percentage (%), which represents the proportion of the real-time image meeting the positioning error in the total test images under the current setting of PD. For example, the positioning accuracy of TM is 43.27 when PD ≤ 1, indicating that 43.27% of 300 real-time images have pixel error less than or equal to 1 pixel.
Two conclusions can be drawn from Table 2. First, In contrast to TM, the matching accuracy of the original method has been noticeably improved. In the condition of PD ≤ 5, positioning accuracy is improved by 56.66%. Second,

Image with uneven illumination
Without considering the edge prior Considering the edge prior  Structural Control and Health Monitoring comparing a series of optimized methods with the original method, it is found that with the introduction of the edge information and distance weighting algorithm, positioning accuracy can be improved to some extent. However, it is still not enough for the subpixel measurement requirement. Te coarse-to-fne search strategy makes the most signifcant contribution to subpixel positioning accuracy. With the improved method, matching accuracy was substantially improved.

Real Bridge Application
As shown in Figure 6, the tested structure is a concrete bridge with three consecutive spans. Te middle span with a length of 85 meters is the monitoring object. Te contacttype deformation sensor, long-gauge fber Bragg (FBG), has been installed on the bridge, which can be used as a reference. FBG can sense the minor change in external physical quantities through the change in the wavelength of light. Defection is inversed directly through the conjugate beam method, and its principle is shown in Figure 7(b). Te positive strain on the beam surface at section x is ε(x) when defection is y(x). Te angular displacement is θ(x). Te height of the neutral axis is h m . Q is the shearing force, and M is the bending moment, respectively. When the boundary (x � 0) conditions of the virtual beam satisfy Q 0 � θ 0 and M 0 � y 0 , the angular displacement distribution θ(x) of the real beam is equal to the shear distribution Q(x) of the virtual beam and the defection distribution y(x) of the real beam is equal to the moment distribution M(x) of the virtual beam. Tis can be expressed by the following equation: Here, q(x) � −ε(x)/h m "−" indicates that the parameter belongs to the virtual beam. Based on the long-scale sensor, the defection distribution is not dependent on the load and stifness distribution of the beam; rather, it only has an explicit linear relationship with the measured long-scale strain distribution. In addition, q m in the equation accounts for the change in neutral axis height, so the method can also be applied to beams with variable cross-sections.  For easier comparison, the location of the focused target of the camera was consistent with that of FBG. Two images (Figure 8(a)) of the target from diferent perspectives were used to obtain the 3D point cloud (Figure 8(b)). Ten, the OD of each target was deduced for scale factor calculations, as shown in Figure 8(c). Te lens used has a focal length of 100 mm. Te acquisition frequency was 10 Hz. As shown in Figure 9(a), the captured images were afected by light spots. Ten, with the light spot removal model, they were restored. Compared to the original image without a light spot at other times, it can be seen that the restored image is qualifed. Te original method, " [15] + sparse expression + coarse matching," was used to extract the displacement from the original image, and the results are drawn using the blue line in Figure 9(b). To contrast against it, the improved method was applied to the restored images, and the displacement measurement result was described using the red line. Taking the results from FGB as a reference, it can be concluded that the abnormal data were efectively reduced by performing image restoration, and the noise level was controlled through the improved matching strategy. Te displacement curves for several sections are shown in Figure 10(a). By extracting the defections of diferent measuring points at the same time, the linearity of the defection of the bridge can be obtained, as shown in Figure 10.

Conclusions
Two key problems, object distance measurement and light spot interference, which are often encountered in the process of camera-based structure displacement measurement, were studied. Te main contributions of this study are as follows: (1) To overcome the limitations of the laser rangefnder, a fast and accurate object distance measurement method based on stereovision was suggested, which has the advantage of full-feld multipoint synchronous calibration.
(2) To protect the image quality from random light spots, the edge information of the degraded image was designed to be used for deep learning to achieve a better restoration efect. (3) To balance the matching speed and accuracy when images are degraded, a distance-weighted coarseto-fne matching strategy was developed combining sparse representation. (4) Trough the algorithm test, it was found that the introduction of edge priors causes the restored image to have a higher signal-to-noise ratio and a higher structural similarity with the original image. Compared with the template-matching method without considering image restoration, the sparse representation-based matching method has higher matching accuracy using the restored images, and using the coarse-fne matching strategy, the accuracy will be further improved. (5) Te integrated algorithms above were applied to a concrete bridge, and the vertical displacement impacted by the normal trafc load was monitored. Compared with the algorithms before improvement, the results of the proposed method are closer to those of the FGB sensor and the noise level is lower.
In conclusion, this study is benefcial in promoting image-based displacement measurement technology to adapt to the complex environment. Comprehensive attention will be given to other possible factors in the future.

Data Availability
Te image data used to support the fndings of this study are currently under embargo, while the research fndings are commercialized.

Conflicts of Interest
Te authors declare that they have no conficts of interest.