Geometrical-Feature-Preserving Adjoint Tomography of Near-Surface Structure with Seismic Early Arrival

Early arrival waveform inversion (EWI) is an essential approach to obtaining velocity structures in near-surface. Due to suffering from a cycle-skipping issue, it is difficult to reach the global minima for conventional EWI with the misfit function of least-squares norm (L2-norm). Following the optimal transportation theory, we developed an EWI solution with a new objective function based on quadratic-Wasserstein-metric (W2-norm) to maintain the geometric characteristics of the distribution and improve the stability and convexity of the inverse problem. First, we gave the continuous form of the adjoint source and the Frechet gradient of the Wasserstein metric for seismic early arrival, which leads to an easy and efficient way to implement in the adjoint-state method. Then, we conducted two synthetic experiments on the target model containing some velocity anomalies and hidden layers to test its effectiveness in mapping accurate and high-resolution near-surface velocity structure. The results show that the W2-normed EWI can mitigate cycle-skipping issues compared with the L2-normed EWI. In addition, it can deal with hidden layers and is robust in terms of noise. The application to a real dataset indicates that this new solution can recover more details in the shallow structure, especially in the aspect of dealing with hidden layers.


Introduction
Accurate near-surface solution is essential for mapping subsurface structures of the Earth in seismic industry. Generally, the irregular geology and undulating topography of the near-surface weathered layers lead to complex velocity variations [1]. If the structure in the near-surface region is not accurately recovered, the deeper re ections can be strongly degraded after migration [2]. First-arrival travel time tomography, which inverts the rst-arrival travel times, has been widely applied in near-surface imaging [3,4]. However, travel time tomography may be limited when applicating in geologically complex areas because it is limited by the assumption of high frequency approximation. en, several methods have been developed to overcome the limitation, including the fat ray tomography method [5], the Fresnel-volume tomography method [6], and the wave-equation tomography methods [7]. Full waveform inversion (FWI) [8][9][10][11][12][13] is another state-of-the-art imaging method used to overcome high-frequency limitation, which solves full wave equations to simulate accurately the propagation of seismic waves in complex media and considering full waveform information in inversion. FWI is able to yield better-resolved Earth models compared with conventional ray-based approaches.
For near-surface imaging, the early arrivals (always de ned as the signals arriving within a few periods of the rst arrival) rather than the full-wave information are the most commonly used to invert the near-surface structure at shallow depths (0-500 m) [14]. Generally, early arrivals could include seismic refractions, direct waves or shallow seismic re ections. Recently, several studies of early arrival waveform inversion (EWI) have been investigated. Sheng et al. [14] proposed an EWI method for imaging the near-surface velocity, by minimizing the data misfit between the synthetic and observed waveform in the time window of early arrivals, and they later applied this method to marine data [15] and land data [16]. Shen [17] developed a weighted EWI method by matching the phase instead of amplitude during inversion, which was later applied to shallow land data [18]. en, a joint inversion of early arrival waveform and reflection waveform was proposed to better reconstruct the shallow structures [19], by combining the low-wavenumber and high-wavenumber in the near surface velocity model. Also, this method was applied to a real ocean bottom cable (OBC) case study with gas cloud [20].
However, the misfit function of both FWI and EWI is highly nonlinear, which makes it easy to trap in the local minima before reaching the global minima and results in poor convergence during inversion. at is, cycle-skipping problems usually exist in waveform inversion ( Figure 1). To partly mitigate this problem, several methods have been developed in recent years. Ma and Hale [21] proposed the hierarchical strategy of waveform inversion, which firstly resolve the low-wavenumber content by wave-equation travel time tomography and then use FWI to constrain highwavenumber details in the model. Wu et al. [22] developed an envelope inversion to mitigate cycle-skipping issues by extracting low-frequency component in seismic data. As a result, it can provide a good initial model with low-wavenumber detail for the FWI. Bozdag et al. [23] developed more sophisticated objective functions to mitigate the local minima, e.g., using the instantaneous phase (IP) and amplitude envelope. en, adaptive FWI methods were also proposed [24,25], which reformulates the misfit functions by some matching filters and update velocity model by fixing time shifts between the synthetic and observed waveform. Zhang et al. [26] proposed waveform-based joint inversion of surface waves with traveltime and Z/H amplitude ratios to constrain shallow surface and provide an initial model for FWI.
However, most of the above studies used the misfit functions of least-squares norm (L2-norm). It is also well known that some obstacles exist in FWI based on L2-norm [27]. e main obstacles include the following: (i) L2-norm misfit function is always nonconvex in the respect of model parameters, making it easy to fall into local minima. (ii) e L2-norm cannot deal with noise in seismic data, which will be inverted into the final model and yield an incorrect result. If we examine the data misfit between synthetic waveform and observed waveform from another viewpoint, there is a mapping which may exist between these two datasets [27][28][29]. us, Engquist and Froese [30] proposed a novel misfit measurement which is associated with optimal transport. e squared Wasserstein metric (W2) used in this method is convex with respect to time shift leading to mitigate the cycle-skipping problems. e theory of optimal transport has been developed in the interdisciplinary and cutting-edge research fields of geometry, probability theory, and partial differential equations and has been applied in the fields of engineering and medicine in recent years [31,32]. Yang et al. [27] applied the W2-norm to FWI in the field of seismic exploration and achieved good results. It should be noted that no studies have used the quadratic-Wassersteinmetric in EWI solution. In addition, the geometric preserving property of the W2 metric has not been revealed in previous work.
In this paper, we used the quadratic Wasserstein metric as the misfit function in EWI. First, we gave the definition of the Wasserstein distance and derived the strict continuous form of the gradient direction of the target function under the W2 metric for early arrival. en, we show the advantages of the quadratic Wasserstein distance by comparing the W2-norm-based misfit function with the L2norm. We applied the newly developed seismic tomography method based on the W2-norm to EWI, and two numerical experiments were tested to show its effectiveness in mapping accurate and high-resolution near-surface velocity structure. Finally, the new method was tested using real data.

Theory and Methodology
e full waveform inversion can be considered as a PDE constrained optimization problem from the view of mathematics (1) Correspondingly, the misfit function χ ij of the sourcereceiver pair in index (i,j) is defined as where N indicates the total number of seismic events and M indicates seismic signals for each event. D is the distance function that measures the difference between the synthetic waveform s ij (t; c(x)) and observed waveform d ij (t).

Definition of the Wasserstein Distance.
Assuming that there are two probability distributions μ and v, we identify a homeomorphic mapping onto itself, which needs to satisfy two conditions simultaneously: (i) measure-preserving mapping and (ii) minimum transmission cost. e mapping satisfying the above conditions is called the optimal transport mapping between these two probability distributions. For any Borel set ψ: Ω ⟶ Ω, mapping S maps the probability distribution μ to v and satisfies

2
Shock and Vibration If the secondary transportation cost of moving a point from x to S(x) is |x − S(x)| 2 , the corresponding secondary transmission cost of the transportation mapping is defined as In all of the sets of the self-mapping preserving measure, the mapping with the minimum transmission cost is called the optimal transmission mapping.
Wasserstein distance W 2 (μ, v) is defined as the transmission cost of the optimal transportation map [31], which is expressed as follows: Brenier [33] proved that a convex function u: Ω ⟶ ∼ + exists, and its gradient mapping, namely, S: x↦∇u(x), is the unique mapping of the optimal transportation.
f and g are two probability density functions on Ω, satisfying μ � fdx and v � gdx. If μ retains absolute continuity in the sense of the Lebasque measure, w, then (i) a unique optimal transmission map S exists and (ii) a convex function u: Ω ⟶ Ω, exists; so, the optimal transmission map satisfies S � ∇u.
If the above conclusions are substituted into equation (3), we obtain the following equation for any ψ ∈ C ∞ c (Ω): us, Under the condition S � ∇u, μ satisfies the Monge-Ampère equation.
e Monge-Ampère equation was proposed by Monge [34] and Ampère [35] and has been widely used in the optimal transport theory and astrophysics [33]. e information presented above indicates that we must solve the Monge-Ampère equation to obtain the optimal transmission map. However, the high-dimensional Monge-Ampère equation has no analytical solution, and the numerical solution is also very complex. For the one-dimensional case, the traditional histogram equalization is the optimal transmission mapping [36,37]. If f and g are two continuous 1-D probability distributions, their corresponding cumulative distribution function (CDF) is denoted as e optimal map is as follows: Assume that there is a seismogram with the length of T, and the 1-D quadratic Wasserstein distance W 2 (f, g) can be expressed as follows [27]:

Adjoint Source Based on the W2-Norm.
EWI is a partial differential equation constrained optimization problem, which updates the model by a gradient-based optimization method. Following the adjoint-state method, the Fréchet kernels are always obtained using the time integration of the interaction between the forward and adjoint wavefield. As we know, the Fréchet kernels determine the direction of model updating, while the adjoint source is the premise of Fréchet kernels calculation. erefore, the derivation of adjoint source and gradient is critical to make it easy to implement in the adjoint-state method. Based on the early arrival information, we gave the strict continuous form of the gradient direction of the target functional under the W2 metric. e continuous form of the adjoint source can be seen as follows: where g(t) and f(t) are the synthetic and observed data, respectively, h(t) is the perturbation of the theoretical waveform, and ε is a small constant.

Advantages of the Quadratic Wasserstein Distance.
ere are a few of advantages for W2-norm compared with the L2-norm from a mathematical point view. e main advantages include (1) the convexity with respect to time shift and amplitude change benefit from the feature of geometrical preserving; (2) the insensitivity to noise. We set up two simple numerical examples to illustrate it.
First, we assumed that the observed dataset f is a Ricker wavelet and the synthetic dataset g has a time shift compared with f ( Figure 2(a)). e dominant frequency of both signals is set as 10 Hz, and the time delay between f and g was set as greater than the half-period signal. It can be seen that there are 2 separate signal events existing in the adjoint source of L2-norm (Figure 2(b)).
is will yield an incorrect Fréchet kernels calculated by correlating it with forward wavefield. e L2-norm misfit as a function of the shift between f and g is plotted in Figure 2(d), which shows that two local minima distribute near the global minima.
is behavior may make it difficult for waveform inversion to converge towards global minima when the initial model   is inaccurate. In contrast, the adjoint source of W2-norm has only a single event (Figure 2(c)), and their W2 distance function is a strictly convex function (Figure 2(e)). erefore, when the gradient-based method is used to obtain the optimal solution, it is easy to trap in the local minima for L2-norm, but it can accurately converge to the global minimum for W2-norm.
For 1-D signals f and g with N data points, Yang et al. [27] proved that the influence of a uniform noise distribution U [−c, c] on the L2 distance is O(1). L2 distance is largely affected by noise regardless of the number of points, whereas its effect on the W2 distance is O(1/N). us, if there are enough number of data points, the influence of noise on misfit function is negligible even if the noise is very strong. Here, we provide another simple numerical example to compare the difference in dealing with noise for the L2norm and W2-norm. Assuming that there are two onedimensional waveform signals, representing synthetic waveform ft and observed waveform gt, we added uniform random noise to the observed waveform gt to obtain gt + noise (Figure 3(a)). We calculated the distance between ft and gt and between ft and gt + noise using the L2-norm and W2-norm. e results shown in Figure 3(b) indicate that the noise has a large effect on the L2 distance, but it has little effect on the W2 distance.

Quadratic-Wasserstein-Metric-Based Early Arrival
Waveform Inversion. Early arrivals are defined as signals arriving after the first arrival within a few periods, which can be selected using a time window [T start., T end ]. For a single full waveform f(t), the individual misfit function χ i based on the W2 distance for early arrivals can be expressed as where T start and T end are the start and end of the time window, respectively. Following the adjoint-state method, the perturbation of individual misfit δχ is linearly related to perturbations of P-wave velocity α, perturbations of S-wave velocity β, and perturbations of density ρ: where K α , K β , and K ρ are the corresponding kernels for these model parameters (α, β, and ρ). e 2-D spectral element method (specfem2d) was used for simulating seismic wave propagation and to calculate the Fréchet derivatives numerically. e FLEXWIN open-source package [38] was used to select the time window for the early arrivals, in which the W2-based adjoint sources and misfits were calculated. e overall misfit function χ(m) for all of the selected windows is where m is the current Earth model, N e ω is the number of time windows of source e, E is the total number of sources, and N ω is all of the selected time windows for the early arrivals. We use equation (13) to calculate the adjoint source in the corresponding time window, and Fréchet kernels are calculated by the interaction between the forward and adjoint wavefield. en, all of Fréchet kernels were summed up to get the gradient, indicating  (e) waveform fitting of early arrival between the synthetic data (red) and observed data (black) for the L2-norm EWI; (f ) waveforms fitting of early arrival between the synthetic data (red) and observed data (black) for the W2-norm EWI. 6 Shock and Vibration Shock and Vibration the direction of EWI. e L-BFGS method [39] was used for the model updating, and the inversion would be terminated once the misfit decreases not quantifiably obvious.

Synthetic Experiments
To confirm the application of the W2-norm EWI, we tested the algorithm using two synthetic tests in which the model has typical features of near-surface structure.

Model with Velocity Anomalies.
We set up a target model (Figure 4 e results are shown in Figure 4(b). e results show that a smoothed model was attained after 10 iterations, which was used as the initial model for the L2-norm EWI and W2-norm EWI and was used to recover the low-wavenumber detail in shallow structures. Figure 4(c) shows the L2-norm EWI solution. It can be seen that the L2-norm EWI clearly constrains the contours of the low and high velocity anomalies, and their positions are also consistent with those in the target model. However, the inverted anomalies are smaller than the target anomalies. We also found that there was a high-velocity artifact below the low-velocity anomaly. For the W2-norm EWI, the solution (Figure 4(d)) not only recovered the clear structures of the low and high velocity anomalies but also constrained their positions close to the target model. In addition, the W2-norm EWI results contain fewer artifacts at the bottom of the low-velocity anomalies than L2-norm EWI. is suggests that the W2-norm EWI helps to tightly recover the near-surface velocity structures. Figures 4(e) and 4(f ) display the corresponding waveform fitting between the observed waveform (black) and the synthetic waveform (red) associated with the L2-norm EWI and the W2-norm EWI at a shot gather. Figure 4(e) shows that it is reasonable for the overall waveform fitting, while the synthetic waveform does not actually match the observed waveform at the far offsets (4000-5750 m). Figure 4(f ) shows the corresponding waveform match after the W2-norm EWI for the same shot   Figure 9: (a) Waveform fitting of early arrival between the synthetic data (red) and observed data (black) for the L2-norm EWI; (b) waveform fitting of early arrival between the synthetic data (red) and observed data (black) for the W2-norm EWI.
gather. e waveform fitting is obviously improved, especially for the seismogram at far offsets. Figure 5 shows the normalized misfit over iterations of the L2-norm EWI and W2-norm EWI. It shows that the W2norm EWI (dashed line) converges faster than the L2-norm EWI (solid line). e result indicates that W2-norm EWI has fast convergence speed, which can greatly improve the computational efficiency of waveform tomography.

Model with Hidden Layers.
In seismic exploration, hidden layers cannot be solved by refraction travel time tomography [40]. is is why EWI has become important for dealing with complex near-surface problems. Here, we set up a synthetic test of a model with a hidden layer to confirm the capability of the W2-norm EWI in terms of solving hiddenlayer problems. Figure 6(a) shows the target model, which contains a low velocity layer (2000 m/s) and a high velocity layer (3200 m/s). We chose the same geometry set up used in Section 3.1, and Ricker wavelet was set as the source wavelet. e background velocity model after excluding the hiddenlayer was used as the initial model for the travel time tomography. As we expected, the first-arrival tomography failed to solve the low velocity layer (known as the hiddenlayer) in the target model, producing a smooth result (Figure 6(b)), which was used as the initial model for the L2norm EWI and the W2-norm EWI. As shown in Figure 6(c), the L2-norm EWI constrained the layers with clear contours, but a high-velocity artifact was generated above the layer. In addition, the low-velocity layer was bent instead of being a straight horizontal layer in the inverted model. Several highvelocity artifacts were also produced below the low-velocity layer. In comparison, the W2-norm EWI solution ( Figure 6(d)) can successfully recover the clear outline both the high-velocity and low-velocity layers and constrain the layers to the accurate position. Also, there is almost no highvelocity artifacts observed above and below the layer. Figures 6(e) and 6(f ) show the corresponding waveform fitting between the observed waveform (black) and the synthetic waveform (red) from a same common shot gather associated with the L2-norm EWI and W2-norm EWI, respectively. Because of suffering from cycle-skipping problem, a bad waveform match is seen in Figure 6(e), which indicates that the result of L2-norm EWI is still far from the true model. Figure 6(f ) shows that the cycle-skipped data were mitigated when using the W2-norm EWI. is suggests that W2-norm EWI can avoid the cycle-skipping problem as well as providing a more reliable near-surface solution when hidden layers exist.
As was discussed in Section 3.1, one of the advantages of the Wasserstein distance is its capability to handle noise. To confirm the insensitivity of the W2-norm EWI to noise, we repeated the previous inversion using the data by adding uniform random noise. For the target model in Figures 4(a) and 6(a), we generated the synthetic waveform using spec-fem2d. en, random noise with the signal-to-noise ratio (−4 dB) was added to the data. e other settings were the same as in the previous numerical experiment. Figures S1(a) and S1(c) show the results of the L2-norm EWI and W2-norm EWI with uniform random noise for the synthetic test on the target model with anomalies, respectively. e W2norm EWI has a better solution for the velocity anomalies than the L2-norm EWI after the addition of noise. Similarly, the inverted model for the W2-norm EWI ( Figure S1(d)) also constrains the shapes and locations of both the high-and low-velocity layers better than the L2-norm EWI ( Figure S1(b)). is suggests that the W2-norm EWI can still converge reasonably well even when the noise is strong.

Applications in Field Data
We applied the W2-norm EWI to a real dataset from China. e dataset consists of 210 shots with a 40 m spacing and 450 receivers with a 20 m spacing. Figure 7(a) shows a typical shot gather (from shot 50). We manually picked the firstarrival travel times for the first-arrival travel time tomography. e results in Figure 7(b) show a rather smoothed velocity model, which can be used as a proper initial model for the EWI. en, we muted the data with an early arrival time window using the FLEXWIN package [38] and conducted trace normalization [14]. To reduce the nonlinearity of the waveform misfit function, we adopted a hierarchical strategy, in which we successively inverted through 4 stages from low to high frequency: 10 Hz (stage 1), 20 Hz (stage 2), 30 Hz (stage 3), and 40 Hz (stage 4).
We performed two workflows: (1) the L2-norm EWI using the travel time tomography result as the input model; and (2) the W2-norm EWI using the travel time tomography results as the input model. We ran the same iterations for both methods. Overall, the L2-norm EWI (Figure 8(a)) and W2-norm EWI (Figure 8(b)) produced more velocity details than the travel time tomography alone. It can be seen that  the L2-norm EWI produced several high-velocity layers (e.g., at depths of ∼200 m and 280 m) that do not exist in the travel time tomography solution. However, the interface of these layers also seems smoothed, and the low-velocity layers between the high-velocity layers are not obvious. In contrast, the W2-norm EWI shows more velocity details than the L2norm EWI. In particular, several low-velocity layers/hidden layers were recovered among the high-velocity layers (highlighted by the arrows in Figure 8(b)). is is in agreement with the conclusion discussed in Section 3.2, i.e., that the W2-norm EWI has the capability to solve hiddenlayer problems.
Figures 9(a) and 9(b) display the corresponding waveform fitting between the observed data (black) and the synthetic data (red) associated with the L2-norm EWI and the W2-norm EWI of the common shot gather (CGS) at shot 50, respectively. e waveform data are significantly better matched by the W2-norm EWI than the L2-norm EWI. Figure 10 shows the normalized waveform misfit associated with L2-norm EWI and W2-norm EWI results at each stage during inversion, and the value of relative misfit function is shown in Table 1. It indicates that the W2-norm EWI significantly speeds up the inversion and converges faster than the L2-norm EWI at any stage.

Discussion
e method proposed in this work was designed to partly solve the problems relevant to L2-norm used as the objective function in the framework of EWI. e quadratic Wasserstein distance was used instead. e change in the misfit measurement between the predicted and observed data seismograms appears to produce some advantages, e.g., it is capable of mitigating cycle-skipping problems, allowing accurate convergence to the global minimum in the EWI. In addition, it was demonstrated to be insensitive to noise.
ese features indicate that the W2-norm EWI may be a more robust strategy for near-surface imaging than L2-norm EWI.
As is well known, nonuniqueness has been a fundamental and common issue in geophysical inversion, especially in EWI. Our work does not completely solve this problem, but it provides a good breakthrough point to study EWI solutions using the W2-norm for near-surface structures. Based on this framework, more waveform databases (e.g., first-arrival travel time and early arrival envelope) can be used for EWI and can be jointly inverted to produce improved near-surface solutions.
Wavelet extraction is critical in real data applications of FWI. Previous studies have reported that there are several methods to extract wavelets. Yilmaz [41] proposed that wavelet extraction can be conducted through deconvolution if the zero phase (or minimum phase) of the source wavelet is true. However, in reality, the source wavelet consists of mixed phase. Pratt [10] suggested that the source wavelet can be inverted during the first iteration if assuming that current model is accurate. Another way is to simultaneously invert the source wavelet and the velocity parameters in the FWI [42]. In this study, we stacked the first-arrival along the first breaks to extract the source wavelet from the real data [42].
Using the W2-norm in EWI benefits from the relatively simple and stable waveforms of early arrivals (arriving after the first arrival only within a few periods). In contrast, FWI using full waveform information has more challenges in real data applications because of the large number of practical problems, such as the waveform data-signal processing problems in signal-to-noise ratio improvement, the effect of rugged topography beneath real irregular receiver acquisition, a very complicated near-surface effect. Ignoring any issue in real data waveform inversion applications could cause the inversion results to be unexplained or unstable in FWI, which should be investigated further in future work.

Conclusions
In this study, we developed an early arrival waveform inversion based on the quadratic-Wasserstein-metric for nearsurface imaging. First, we described the excellent properties of the Wasserstein metric that preserves the geometric characteristics of the probability distributions in detail. is feature results in the objective function based on the Wasserstein metric having a better convexity and helps mitigating the cycle-skipping problem in conventional FWI based on the L2-norm misfit function. en, we gave the continuous form of the adjoint source and the Fréchet gradient under the W2-norm misfit function for seismic early arrival. e synthetic tests revealed that the method can mitigate cycle-skipping issues and map accurate near-surface velocity structure with high-resolution. In addition, the W2-norm EWI is capable of handling noise. e application on a marine dataset revealed that the solution of our new method can recover more details in the near-surface region, especially when dealing with hidden layers.
Nevertheless, the application of the W2-norm EWI is not sufficient to solve the nonuniqueness in geophysical inverse problems. More seismic data (e.g., travel time and envelopes) should be used in EWI to jointly invert an improved near-surface solution, which could provide a good initial model for FWI. In addition, topography variation in nearsurface may produce artifacts leading to incorrect result during inversion; thus, the influence of rugged topography on EWI should be addressed and studied further in future work.
Data Availability e data and the code in this paper can be obtained by contacting the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest or personal relationships that could have influenced the work reported in this paper. Shock and Vibration 11