KCF-Based Identification Approach for Vibration Displacement of Double-Column Bents under Various Earthquakes

Vibration displacements are one of the most signifcant indicators in the health monitoring and condition assessment of bridges in the life cycle. Te traditional monitoring means, such as contact sensors, have relatively high-cost and limited points for displacement measurement of bridges. Tis paper proposes a low-cost and non-contact monocular vision system based on the KCF algorithm to accurately and timely identify the vibration displacement of bridges. A conversion method associated with a scale ratio was established to cope with the loss of depth information in images when a monocular camera is used to monitor multiple targets in diferent depths of the feld. A series of shaking table tests on a two-column pier with energy dissipation beams were conducted to verify the feasibility, accuracy, efectiveness, and robustness of the KCF-based identifcation approach. Te results showed that the vibration displacements of the column identifed by the monocular vision system based on the KCF algorithm are almost consistent with the measurement results obtained by the laser displacement sensors. Te peak displacement discrepancies between both measurement methods are within 6% for all cases with diferent shaking amplitudes and earthquake waves. Te RMSE of the displacement histories between both measurement methods is very low. Te corresponding frequency spectra contents identifed by the monocular vision system based on the KCF algorithm match well with the measurement counterparts recorded from the laser displacement sensors.


Introduction
Te displacement of bridges is one of the most signifcant indicators refecting its mechanical performance and operational status. Tus, it is crucial to use monitoring means or sensing technologies capable of timely obtaining accurate displacement of bridges. Displacement sensors mainly fall into two categories: contact displacement sensors (e.g., Linear variable displacement transducer (LVDT)) and noncontact displacement sensors (e.g., Global positioning system (GPS), Laser displacement sensor (LDS), microwave interferometric radar, and machine vision-based measurement method). Te LVDT is fxed on a platform to measure the structural displacement. However, the fxed platform is hard to build when the LVDT is used to measure the displacement of large-span bridges crossing rivers and valleys [1,2]. Te LVDT installation may cause a certain degree of damage to bridges. Te vibration of the fxed platform is prone to causing measurement errors [3]. GPS measurement has low accuracy with an error of about 5 mm. It is challenging to monitor the vibration amplitudes within a few millimeters [4]. Te successive vibration displacement measurements are also difcult to achieve due to the low sampling frequency of GPS [5]. Even though the LDS could overcome shortcomings of contact displacement sensors and improve measurement accuracy, its high price and installation platform limitations make it inconvenient to apply widely [6]. Te microwave interferometer radar employs microwave signals for signal transmission and reception. It obtains the displacement of the bridge by analyzing the phase diference between refected waves at a time interval. However, the temperature, humidity, and pressure in the air may make the microwave signal time lag and bend during measurement to decrease measurement accuracy [7][8][9][10]. In addition, the installation and maintenance of conventional displacement sensors require a lot of human and fnancial resources [11].
Machine vision-based vibration displacement measurements are one of the research hotspots due to numerous advantages, such as higher measurement precision and sampling frequency, multi-point, non-contact, and longdistance measurements [12]. Also, machine vision-based vibration displacement measurements are not reliant on additional fxed platforms. Displacement is directly extracted from structural vibration videos recorded by cameras. Terefore, the methods are widely applied in displacement or defection measurements [13][14][15][16] and modal recognition [17] of civil engineering structures. Machine vision-based displacement identifcation methods, including template matching methods [18][19][20], optical fow estimation methods [21][22][23][24], and correlation fltering methods [25][26][27], were developed in the past. Olaszek [28] established an imaging system based on the photogrammetric principle to determine the dynamic characteristic of bridges, indicating that the system reduced the infuence of environmental factors on the acquisition of images. Hanssen et al. [29] developed a digital image correlation method based on the normalized cross-correlation coefcient (NCC) to measure the defection of a three-point bending steel beam. Yu et al. [30] presented a fast and accurate machine vision-based measurement method to identify the deformation of a cantilever beam and the mid-span defection of the Su-Tong Bridge. Yoon et al. [31] applied Kanade-Lucas-Tomasi (KLT) algorithm in measuring the displacement of a six-story-high building model and verifed the algorithm by comparing the results from conventional sensors. Chen et al. [32] developed an optical fow method based on motion magnifcation, which was validated by the defection of structures, cantilever beams, and pipes. Zhao et al. [33] built an approach that combined support correlation flters (SCF) and KLT to measure the vibration displacement of a cable-stayed bridge model. Te results showed that the identifcation displacement is accurate by comparing the LDS displacement. Although the displacement identifcations based on the template matching and optical fow methods achieved fruitful results, there are still several shortcomings. For example, the template matching method needs high contrast or artifcial targets to improve the measurement accuracy. However, the high contrast targets are often not present in actual bridges, and artifcial targets may afect the structural appearance. Te optical fow estimation method should meet the assumptions of constant brightness and continuous time or small motion. However, the approach cannot satisfy the abovementioned assumptions when it is used to monitor the displacement of largespan bridges. Although the optical fow method based on motion magnifcation could improve the identifcation accuracy, the efciency of the methodology tracking and identifying the displacement of fast-moving structures reduces due to the high time complexity caused by a spatial band-pass flter.
Te correlation flter (CF) could compensate for the shortcomings of the template matching and optical fow estimation methods, improve identifcation speed, accuracy, and robustness, and is widely applied in video tracking and identifcation. Bolme et al. [25] proposed the minimum output sum of squared errors flter (MOSSE) to compute the correlation response of targets by multiplication elementby-element in the Fourier domain, indicating that MOSSE signifcantly reduced the time complexity and could track the target with speeds of 600-700 frames per second. Henriques et al. [26] developed a circulant structure of tracking-by-detection with kernels (CSK). Te CSK employs a circular matrix to obtain dense samples and the corresponding feature contents. Te CSK also uses kernel functions to improve computational efciency. Henriques et al. [27] further proposed a kernelized correlation flter (KCF) based on CSK. Te KCF replaces the single-channel grayscale features in CSK with a multi-channel histogram of oriented gradients (HOG), which could enhance the expression capability of samples and the accuracy and robustness of the KCF. Du et al. [34] presented a KCF tracker integrating a three-frame-diference algorithm, which identifed target displacements in the ultra-high-resolution video (3840 × 2160 pixels and 3600 × 2700 pixels). Shao et al. [35] established a velocity correlation flter (VCF) using velocity features and an inertia mechanism (IM). Yang et al. [36] embedded the KCF tracker into an unmanned aerial vehicle (UAV) tracking platform. Te KCF tracker maintained high operating speed due to its low time complexity. However, its computational power was limited. Chen et al. [37] built a KCF tracking framework based on the curve ftting algorithm and evaluated the root mean square error and mean absolute deviation between the tracking displacement and theoretical displacement. Zheng and Gupta [38] proposed a multi-camera multi-target tracking system based on the KCF algorithm and the improved KCF algorithm. However, there are few publications on the KCF algorithm tracking and identifying the vibration displacement of civil structures. Tere is a lack of research on the KCF algorithm applied in the displacement measurement of bridge structures.
Terefore, this paper proposes a monocular vision system based on the KCF algorithm to identify the vibration displacement of bridges. First, the system establishes the relationship between physical space and pixel coordinates by camera calibration. A region of interest (ROI) containing the target is determined. Te KCF algorithm is used to identify the vibration displacement of the target. Te target could be artifcial targets, such as 2-D codes, geometric patterns, and artifcial light sources, or natural targets, such as pits, bolts, and rivets of the structural surface. Finally, the pixel coordinate displacements of the target in each frame are transformed into the physical space displacements of the target by a scale ratio. Consequently, the vibration displacement of the structure is identifed. In this paper, the main contents are as follows: (1) Te KCF-based identifcation approach is introduced, and identifcation processes are presented in Section 2. (2) In Section 3, the KCF-based identifcation approach is verifed by shake table tests on   2 Structural Control and Health Monitoring a two-column pier with energy dissipation beams. Te displacement identifed from the proposed KCF-based identifcation approach is compared with the displacement recorded from the traditional LDS. Te root-meansquare errors (RMSE) and peak displacement errors between both approaches are obtained to evaluate the accuracy, feasibility, and robustness of the proposed KCF-based identifcation approach. (3) Te critical conclusions are extracted from the analyses and discussions in Section 4.

Vibration Displacement Measurement Method Based on KCF Algorithm
For a machine vision-based identifcation approach for vibration displacement of bridge structures, a commercial high-speed camera is used to record the target movement and videos of bridge structures. Ten, a proposed KCF-based identifcation approach is employed to identify the vibration displacement of the target from the recorded video. Te specifc process (see Figure 1) is as follows: (1) Calibrating camera to calculate the scale ratio: Te step aims to accurately obtain the scale ratio (φ), which could describe the relationship between the pixel coordinate displacement, d img , and the physical space displacement, d real , of the target. (2) Selecting the region of interest (ROI) and extracting the sample set features: Te ROI, including the target, is determined for the t th frame of the video recorded by the camera. Te ROI is used to obtain the sample set by a cyclic shift operator. Te sample set information is composed of the FHOG features of the sample. (3) Training and updating the flter tracker: For the t th frame of the video recorded by the camera, the sample set information, such as FHOG features, is applied to train and update the flter tracker. (4) Acquiring the physical space displacement of targets: For the (t + 1) th frame of the video recorded by the camera, the flter tracker is used to detect the target position. Ten the pixel coordinate displacements of the target are converted into the physical space displacements of the target.
Each step is discussed in depth in the following subsections.

Calibrating Camera to Calculate the Scale Ratio.
Camera calibration is an indispensable step in machine vision-based displacement measurement. Te purpose of camera calibration is to obtain the scale ratio between the pixel coordinate displacement, d img , and the physical space displacement, d real , of the target. Simultaneously, the camera calibration could eliminate the infuences of the geometric aberrations caused by optical imaging through the calibration method proposed by Zhang [39]. However, only scale ratio calculation is highlighted in this section. Te calculation principle of the scale ratio could be presented by the pinhole model (see Figure 2). Namely, the physical space displacement of the target is proportional to the pixel coordinate displacement of the target, which is only related to the distance between the camera and the target when the optical axis of the camera is perpendicular to the plane of the target. Te scale ratio, φ, can be determined according to the mapping relationship between pixel and physical space length.
where L AB is the physical space length of the target, L ab is the pixel length on the image plane projected by a physical space length.
Multi-targets in diferent depths of feld for the bridge structures inevitably need to be simultaneously monitored by the monocular camera, resulting in the depth information of the image loss. Even if the sizes of the structural targets in physical space with various depths of feld are diferent, the structural targets are mapped to the same pixel length on the image plane ( Figure 2(b)). Terefore, various scale ratios, φ ι , need to be calculated for structural targets in physical space at diferent depths of the feld.
where L A ι B ι denotes the physical space length of the structure surface with diferent depths of feld and ι represents the ι th depth of feld plane.

Selecting the ROI and Extracting the Sample Features
2.2.1. ROI Acquisition. An adaptive method was established to adjust the display window of the video with high resolution to satisfy the display of the computer monitor with lower resolution. Te KCF-based tracking algorithms identify the structural targets within the ROI. Terefore, ROI acquisition is one of the critical steps. Te ROI is a twodimensional image containing the target. An ideal ROI could make the target tracking and identifcation take less processing time and improve accuracy and robustness. Te key steps generating the ROI in the KCF-based tracking algorithm are as follows: (1) Selecting and extending the ROI (selected by mouse). Te selected ROI is expanded to N times (e.g., N � 2.5 [27]). Te ROI expansion could prevent the target from being decomposed and reconstructed during cyclic shift sampling. Te expansion operation could also increase the weight of the pixel in the target edge for the feature extraction because the feature extraction ignores the boundary element. (2) Resampling the ROI resolution to obtain the appropriate size. Te expanded ROI increases the number of internal pixels of the target, resulting in the operation speed of the tracking algorithm slowing down. However, the running speed of the KCF-based tracking algorithm could be improved by Structural Control and Health Monitoring the bilinear interpolation sampling adjusting the ROI resolution. Simultaneously, the primary feature contents of the target within the ROI could be preserved.

Cyclic Shift.
Te ROI is sampled by a cyclic shift operator established in the KCF-based tracking algorithm to collect more sample data for training the flter tracker. Te process of the cyclic shift is relatively complex for the ROI. Terefore, the computational procedures and principles of the cyclic shift operator are introduced in a one-dimensional case. Namely, a one-dimensional vector (called the base sample) is given in the equation (3). Te cyclic shift operator is presented by a specifc matrix to shift the one-dimensional vector. Te base sample vector, u, and cyclic shift operator matrix, R, are defned as follows.
Te base sample vector, u, multiplied by the cyclic shift operator, R, equals to a negative sample, u 1 , according to the equation (5). Namely, the frst element of the base sample   Optical axis vector is shifted to the second element of the negative sample, and the rightmost element is shifted to the leftmost. If the base sample vector separately multiplied by the cyclic shift operator of powers, R n , a data matrix, U, is obtained, where n � 0,1, . . . , m − 1. Consequently, the data matrix, U, contains one base sample and m − 1 negative samples.
Similarly, the derivative process of the one-dimensional vector could be extended to the ROI. Te ROI in each frame multiplied by the cyclic shift operator is regarded as movement along the horizontal or vertical directions. Te ROI is shifted using the cyclic shift operator to obtain the sample set. Figure 3 shows several typical samples. Te positive and negative signs represent the shift down and up of the image sample, respectively. Te number (such as 10 and 20) represents the times of shifts, and "0" represents the base sample for the image.

FHOG Feature Extraction.
Te KCF-based tracking algorithm extracts the features of the sample set to locate the target position in each frame. Te features of the sample set should have invariance properties even if the sizes and posture of the target in the sample set and even environmental lighting change. However, how efectively describing the features of the sample set is a challenging task. Herein, the HOG feature [40] is used to present the features of the sample set, such as gradient features of the target. In the HOG feature extraction process, the image of the sample set is converted to a grayscale image, and the image contrast is modifed by gamma correction to lessen the impact of uneven lighting. Te gradients of the image are obtained from the convolution between the [−1, 0, 1] and [−1, 0, 1] T operators and the image. Te gradient values G p (p, q) and G q (p, q) of an image pixel in the horizontal and vertical directions are obtained according to equations (8) and (9), respectively.
where H(p, q) denotes the gray value at the pixel coordinate (p, q). Te amplitude and direction of the gradient for a pixel is determined according to equations (10) and (11), respectively.
where G(p, q) represents the amplitude of the gradient, and θ(p, q) denotes the direction of the gradient. Te image is discrete into several cells by the HOG feature. Each cell consists of 4 × 4 pixels, and adjacent four cells establish a block. Te gradient direction of each cell is divided into unsigned and signed histograms using the HOG feature with a weighted method. Te unsigned gradient direction histogram uniformly divides the gradient direction of each pixel into 9 bins in 0-180 degrees. Te signed gradient direction histogram equably divides the gradient direction of each pixel into 18 bins in 0-360 degrees. Te histograms in the block are normalized by four types of normalization methods, resulting in a 108-dimensional data acquisition. A higher dimensional data is obtained for the images composed of several blocks, leading to the processing with high time complexity. Terefore, it is importance to reduce the number of dimensions of the feature while retaining the primary feature contents.
Te abovementioned high-dimensional data of the features could be a dimension reduction process by the FHOG method [41]. For example, the FHOG method could reduce the 108-dimensional HOG features to 31dimensional FHOG features. Te detailed dimension reduction process is shown in Figure 4.
As shown in Figure 4, the 27-dimensional features consist of the column accumulation of 4 normalization operators under 27 bins (including 9 unsigned gradient histograms and 18 signed gradient histograms). Te 4dimensional features consist of the row accumulation of 27 bins under 4 normalization operators.

Training and Updating the Filter Tracker.
Te core step of the KCF-based tracking algorithm is to train the flter tracker that is used to locate the target position in each frame. Te computation, training, and updating of the flter tracker are taken as an example to process each frame. First, a Gaussian regression label, y i (y i ⊂ [0, 1]), in the Fourier domain is established by utilizing the training sample x i . Te Gaussian regression label gradually decreases as the number of cyclic shifts increases. It is worth noting that the target is itself when the Gaussian regression label equals 1. Te training sample set (acquisition process as shown in Figure 5(a)) is used to train the flter tracker to obtain the equation (12) in the t th frame, which minimizes the squared error over the training sample x i and its Gaussian regression label y i , as presented in the equation (13).
min Te target in the frame t

Te ROI in each frame
Te target in the frame t+1 where λ represents the regularization parameter to prevent overftting, and ω denotes the regression coefcient. Te unique closed-form solution of ω is presented as follow.
where X is the training sample set composed of training samples x i , X H � (X * ) T is the Hermitian transpose, X * is the complex-conjugate matrix of X, I is the unit matrix, and y is the vector composed of Gaussian regression labels y i . To improve the classifcation performance of the flter tracker, a mapping ϕ is used to map the training samples x i into Hilbert space. Simultaneously, a kernel function, as presented in the equation (15), is introduced for optimization. According to the Representer Teorem [42], the regression coefcient ω can be expressed as a linear combination of the mapped samples, as presented in the equation (16). Terefore, the equation (12) is transformed into the equation (17).
where α i is the combination coefcient, and α is the vector composed of α i . Te solution of ω is transformed into the solution of α in the dual space. Te solution of the kernelized version of α in the dual space is presented as follows [43].
where K xx is the kernel matrix of the training sample, namely, K xx ij � k(x i , x j ). Te kernel matrix K xx has the structure of the cyclic matrix [26], which could be further optimized in the Fourier domain by the following equation.
where the symbol^denotes the Discrete Fourier Transform (DFT) of the variable, k xx is the kernel matrix K xx in the Fourier domain, and α is the flter tracker in the Fourier domain.
Due to the robustness requirements of the KCF-based tracking algorithm, the flter tracker α needs to be updated in each frame. Te flter tracker is updated by the following equation where α t+1 is the flter tracker for the (t + 1) th frame, α t is the flter tracker for the t th current frame, and α t−1 is the flter tracker for the (t − 1) th previous frame.

Acquiring the Target Physical Space Displacement.
In the (t + 1) th frame, the response of the testing sample set (acquisition process as shown in Figure 5(b)) is detected by the flter tracker α t+1 in the equation (20), as presented in the equation where the symbol^denotes the DFT for the variable, and F − 1 is the Fourier inverse transform matrix. k xz is the kernel matrix K xz in the Fourier domain, and f(z) is the response set of the testing sample set Z. Te position of the target in the (t + 1) th frame could be located by the largest response in the equation (21). Te pixel coordinate displacement d img of the target is determined by diference between the target coordinate of the (t + 1) th frame and the target coordinate of the frst frame. Te physical space displacement d real of the target in the (t + 1) th frame is calculated by the scale ratio φ, as presented in the equation During long-term health monitoring and even shaking table tests on bridges, the identifed vibration displacements may drift from the baseline due to environmental noise or other uncertain factors. Terefore, the KCF algorithm was improved to eliminate the baseline drift. Consequently, the         robustness and accuracy of the KCF algorithm identifying the vibration displacements improve.

Experimental Schemes.
Shaking table tests on a twocolumn pier with energy dissipation beams were conducted to verify the feasibility, accuracy, efectiveness, and robustness of the KCF-based identifcation approach. Te test model of the two-column pier with energy dissipation beams was designed and built according to the similarity ratios. Te similarity ratios of the geometric and elasticity modulus are 1 : 15 and 0.3 : 1, respectively. Te total height of the test model is 4500 mm. Each column is a box cross section. Te geometric dimensions of the box cross section are 567 × 347 mm, and the wall thickness is 100 mm (see Figure 6). Five I-type energy dissipation beams made of lowyield steel were equally installed between both columns. Te cross section is composed of the fange and web. Te fange and web thickness are 7 mm and 2 mm, respectively. Te fange wideness and web height are 52 mm and 66 mm, respectively. Te test model was built using the HRB400 steel bar with 8 mm diameter and M15 cement mortar. Te longitudinal reinforcement ratio is 1.526%. Te additional counterweight, 9993 kg, was installed along the height of the column to satisfy the dynamic similarity requirement. Seismic waves should be reasonably selected as vibration inputs for the shaking table tests. Terefore, a typical Chi-Chi wave was chosen as the vibration input because the wave with pulse efects may markedly infuence the seismic response of the test model. Te accuracy and feasibility of the KCF-based identifcation approach were investigated by gradually increasing the peak ground motion acceleration (PGA) of the Chi-Chi wave. Furthermore, the diferent frequency contents of other seismic waves, such as the Artifcial wave, the El-Centro wave, and the Mexico City wave, were selected as vibration inputs to evaluate the effectiveness and robustness of the KCF-based identifcation approach. It is worth noting that all seismic waves must be compressed by the time similarity ratio of 0.2582 to consider  Table 1.
Te artifcial circular targets were adhesive to the column to easily track by a high-speed camera, as shown in Figure 9(a). In addition, the natural targets of the test model, such as screws and structural corners, were selected as tracking objects because the artifcial target is not easily placed for an actual structure. Signs A1-A5 and C1-C5 represent artifcial and natural targets, respectively (see Figure 9(b)). In particular, signs C2, C3, and C4 present screws, and C1 and C5 represent structural corners as the natural target. Te high-speed camera distance from the test model is approximately 6 m. Te optical axis of the highspeed camera is perpendicular to the surface of the test model by reasonably adjusting the visual angle. Te sampling frequency of the high-speed camera is 120 Hz, and its resolution is 2448 × 2048. Te traditional LDS mounted on the fxed platform was used to measure the vibration displacement of the test model, indicated by D1-D5 (see Figure 9(b)). Te vibration displacement of the column measured by LDS was used to verify the feasibility, accuracy, efectiveness, and robustness of the KCF-based identifcation approach. It is worth noting that each traditional LDS mounted at the column height is almost identical to the targets. Te sampling frequencies of the LDS and high-speed cameras are 256 Hz and 120 Hz, respectively. Moreover, both methods did not simultaneously record the vibration displacement due to the limitations of the equipment. Terefore, the peak value alignment retrieval was used to synchronize the time series of vibration displacements obtained from the high-speed camera and the LDS.
Te geometric parameters of the targets were determined from the coordinates in the 2-D pixel space and the 3-D physical space of the targets, as shown in Figure 10. Te red coordinates represent the target position in the pixel coordinate system. Te blue lines and numbers represent the distance between both targets in the 3-D physical space. Te scale ratios were calculated from the geometric parameters for various targets, resulting in the pixel coordinate displacements of the target transformed into the physical space displacements of the target. Similarly, the scale ratios of the natural targets were obtained from the adjacent structural dimensions or the geometric spacings between adjacent screws. In particular, the diferent scale ratios were calculated for various targets in diferent depths of the feld.

Validation of the Identifcation Accuracy.
For simplicity, only representative experimental displacements from shaking table tests were taken as comparisons to assess the feasibility, accuracy, efectiveness, and robustness of the KCF-based identifcation approach when the two-column pier with energy dissipation beams subjected to various seismic waves with diferent amplitudes and frequency contents. For comparisons, the vibration displacements identifed by the KCF-based identifcation approach based on artifcial and natural targets were referred to as KCF-AT and KCF-NT, respectively. Te vibration displacements measured with laser displacement sensors were referred to as LDS. Furthermore, the RMSE and peak displacement errors were computed to evaluate the accuracy and robustness of the KCF-based identifcation approach. Te RMSE and peak displacement errors are calculated as follows.
where N represents the number of vibration displacement data, the vector, d LDS , is the vibration displacement recorded by the laser displacement sensor, the vector, d KCF , is the vibration displacement identifed by the KCF-based identifcation approach, max|d w LDS | and max|d w KCF | represents the

Displacement Response and the Corresponding PSD of the Column under Chi-Chi Seismic
Wave. Te Chi-Chi wave has a pulse efect, which signifcantly afects the displacement response of the test model. Terefore, the infuences of shaking intensities on the accuracy of the KCF-based identifcation approach for the vibration displacement were studied using gradually increasing the PGAs of the Chi-Chi wave (see Table 1). However, the vibration displacements identifed by the KCF-based identifcation approach were compared with the results recorded by the LDS for the test model under the Chi-Chi wave with typical PGAs (0.1 g, 0.3 g, 0.5 g, 0.68 g, and 0.9 g), as shown in Figures 11(a)-15(a). Te corresponding power spectrum density (PSD) of the vibration displacement is shown in Figures 11(b)-15(b), presenting the frequency spectra contents. Te PSD was calculated using the Welch average power diagram method [44]. Te vibration displacements identifed by the KCF-based identifcation approach are almost identical to the waveforms, change trends, and peak values of those recorded by the LDS for the test model under the Chi-Chi wave with various PGAs (see Figures 11(a)-15(a)). Te peak displacement errors and RMSE values between the vibration displacements measured by diferent methods are low (see Figure 16), indicating that the KCF-based identifcation approach identifying the vibration displacements of the test model has high accuracy. For example, the peak displacement errors between KCF-AT and LDS are within 4%, and the peak displacement errors between KCF-NT and LDS are less than 5%. Te peak displacement errors and the RMS between the KCF-AT and KCF-NT are within 4.8%. Te RMSE values between KCF-AT, KCF-NT, and LDS are within 6 mm, and the maximum value is 5.9 mm. Te RMSE values between the KCF-AT and KCF-NT are within 6.6 mm Structural Control and Health Monitoring ( Figure 16). However, the peak displacement errors between KCF-NT and LDS are slightly higher than those between KCF-AT and LDS. It is because the contrast of the natural targets is lower than that of the artifcial targets under natural illumination. Terefore, the KCF-based identifcation approach according to natural targets is a higher level of difculty. Te identifcation accuracy is slightly lower for the KCF-NT (see Figure 17). Te frequency spectra characteristics and the wave crests of the vibration displacements identifed by the KCF-AT and KCF-NT are in agreement well with those of the LDS for the test model under the Chi-Chi waves with diferent PGAs (see KCF-NT, and LDS are 1.27% and 4.02%, respectively. Consequently, the identifcation accuracy was verifed from the frequency spectra contents of the vibration displacement recorded by the KCF-based identifcation approach and the LDS.  Figure 21), indicating the high robustness and accuracy of the KCF-based identifcation approach identifying the structural vibration displacements. For example, the RMSE values of the vibration displacements between the KCF-AT, KCF-NT, and LDS are within 6 mm. Te RMSE values between the KCF-AT and KCF-NT are within 7.3 mm. Te peak displacement errors between the KCF-AT and LDS are lower than 4%, and the peak displacement errors between the KCF-NT and LDS are less than 6%. Te peak displacement errors between the KCF-AT and KCF-NT are fewer than 7.2% ( Figure 21). However, the peak displacement errors between the KCF-NT and LDS are slightly higher than those between the KCF-AT and LDS. It is since that the natural targets showed lower contrast compared to the artifcial targets under the natural illumination in the shaking table lab (see Figure 17). Te frequency spectra characteristics and corresponding wave crests of the vibration displacements measured by the KCF-AT and KCF-NT are the same as those of the vibration displacement recorded by the LDS for the test model under four seismic waves with 0.68 g (see Figures 14(b) and 18(b)-20(b)). For instance, the frequency spectra peaks at the dominant frequency of the vibration displacements at the column top from the KCF-AT, KCF-NT, and LDS are 897.6, 924.6, and 943.9, respectively, when the test model subjected to the Artifcial wave with 0.68 g. Te corresponding peak errors between the KCF-AT, KCF-NT, and LDS are 4.91% and 2.04%, respectively. Under the El-Centro wave with 0.68 g, the frequency spectra peaks at the dominant frequency of the vibration displacement at the column top identifed by the KCF-AT and KCF-NT are 2.30% and 14.14% less than those recorded by the LDS, respectively. For the case of the Mexico City wave with 0.68 g, the frequency spectra peaks of the dominant frequency of the vibration displacement at the column middle measured by the KCF-AT and KCF-NT are 97.30% and 94.58% compared to the results recorded by the LDS, respectively. Consequently, the KCF-based identifcation approach identifying the vibration displacement has high robustness and accuracy for the test model under seismic waves with diferent frequency contents.

Conclusions
Tis paper proposes a KCF-based identifcation approach considering various targets in diferent depths of the feld based on monocular vision, which is employed to identify vibration displacements of bridges. A two-column pier with energy dissipation beams was designed and tested using shaking table tests under the diferent seismic waves. Te vibration displacements of the two-column pier with energy dissipation beams were recorded by the proposed KCF-AT, KCF-NT, and conventional LDS approaches. Te measurement vibration displacements between the three methods were compared to verify the feasibility, accuracy, efectiveness, and robustness of the proposed KCF-based identifcation approach. Te crucial conclusions were summarized from the series of analyses: (1) A conversion method associated with the scale ratio was established and adapted to various targets in diferent depths of the feld, which is incorporated into the KCF-based tracking algorithm. Te scale ratio could be used to directly achieve the physical space displacement of artifcial and natural targets at diferent depths of the feld. Te vibration displacement identifed by the proposed KCF-based identifcation approach was consistent with the results recorded by the LDS. Te results show that the KCF-based identifcation approach has high accuracy and robustness in identifying vibration displacements. (2) Te vibration displacements and the corresponding frequency spectra contents identifed by the KCFbased identifcation approach are almost consistent with the measurement results obtained by the laser displacement sensors for the test model under the Chi-Chi wave with diferent PGAs and other seismic waves with various frequency contents. Te peak displacement errors and RMSE values between the vibration displacement recorded by diferent methods are small. Te peak displacement errors between the vibration displacement recorded by KCF-AT, KCF-NT, and LDS are less than 5% and 6%, respectively. Te RMSE values between the vibration displacement recorded by the KCF-based identifcation approach and LDS are within 6 mm. It is indicated that the proposed KCF-based identifcation approach has good accuracy and robustness. (3) Te vibration displacements and the corresponding frequency spectra contents from the KCF-based identifcation approach according to natural targets are almost identical to the results from the KCFbased identifcation approach according to artifcial targets. Te peak displacement errors and RMSE values between the vibration displacement recorded by KCF-NT and KCF-AT are within 7.2% and 7.3 mm, respectively. It is indicated that the KCFbased identifcation approach based on the natural targets has the same identifcation accuracy and robustness. Terefore, the KCF-based identifcation approach based on the natural targets is more convenient in applying practical bridge engineering. (4) Te infuences of complex environmental factors, such as climatic environments and low contrast natural targets, on the identifcation accuracy and robustness of the KCF-based identifcation approach in future work, especially for practical bridge engineering in more complex environments. Moreover, the KCF-based identifcation approach will be applied in more scenarios.

Data Availability
Te data that support the fndings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.