Preconditioned Conjugate Gradient Algorithm-Based 2D Waveform Inversion for Shallow-Surface Site Characterization

+e full-waveform inversion (FWI) of a Love wave has become a powerful tool for shallow-surface site characterization. In classic conjugate gradient algorithm(CG) based FWI, the energy distribution of the gradient calculated with the adjoint state method does not scale with increasing depth, resulting in diminished illumination capability and insufficient model updating. +e inverse Hessianmatrix (HM) can be used as a preprocessing operator to balance, filter, and regularize the gradient to strengthen themodel illumination capabilities at depth and improve the inversion accuracy. However, the explicit calculation of the HM is unacceptable due to its large dimension in FWI. In this paper, we present a new method for obtaining the inverse HM of the Love wave FWI by referring to HM determination in inverse scattering theory to achieve a preconditioned gradient, and the preconditioned CG (PCG) is developed. +is method uses the Love wave wavefield stress components to construct a pseudo-HM to avoid the huge calculation cost. It can effectively alleviate the influence of nonuniform coverage from source to receiver, including double scattering, transmission, and geometric diffusion, thus improving the inversion result. +e superiority of the proposed algorithm is verified with two synthetic tests.+e inversion results indicate that the PCG significantly improves the imaging accuracy of deep media, accelerates the convergence rate, and has strong antinoise ability, which can be attributed to the use of the pseudo-HM.


Introduction
e shear modulus is one of the vital engineering parameters directly related to the strength of a near-surface medium, with the shear wave (S-wave) velocity being a critical indicator. e multichannel analysis of surface waves based on dispersion curves has been one of the most accepted methods of shallow-surface S-wave velocity imaging [1,2]. However, this method applies to a pseudo-layered medium that has gentle horizontal variations, and the resultant image of a pseudo-two-dimensional (2D) profile is formed by interpolation. is type of procedure limits the vertical and horizontal resolution of the method. erefore, a method for extracting dispersion curve clusters by scanning multichannel seismic data with the application of the localized slant stack method to multichannel single-shot records has been presented to achieve 1.5D imaging [3]. is method, however, takes advantage of the multichannel dispersion information to the maximum extent and still cannot be used to implement 2D or 3D imaging. Furthermore, Li et al. [4,5] proposed a dispersion curve wave-equation theory that formulated wave-equation dispersion (WD) inversion. 2D S-wave velocity imaging could be directly implemented in this method, and the method was free from the cycleskipping phenomenon. However, this method required preprocessing data via mode decomposition and noise suppression, so the ground-state energy in the surface waves was dominant overall frequency bands and not perturbed by the excited-state energy. ese procedures are complicated, and the images acquired do not have high precision. For the above methods based on dispersion analysis, it is difficult to obtain quality results in shallow exploration, and the engineering objectives sometimes cannot be met.
By using all the amplitude and phase information for seismic datasets, full-waveform inversion (FWI) [6,7] matches the observed and predicted data to reveal a highresolution and high-precision underground velocity model, and it can meet the need for velocity parameters imposed by complex structural imaging. is technique has gradually become a research hotspot in the field of seismic data inversion [8][9][10]. Previous FWI works have primarily focused on the inversion of P and S waves [11][12][13][14], but the development of surface wave FWI has become feasible [8] due to the dominance of surface wave energy in the near-surface wavefield (the vertically induced Rayleigh wave accounts for 70% of the horizontally induced Love wave for 90% of the near-surface wavefield) [15]. Compared with a Rayleigh wave, a Love wave is formed by the interference of multireflected SH-waves near a free surface [16] and is independent of the P-wave velocity while being more sensitive to the S-wave velocity. ese properties have led to a relatively efficient and straightforward FWI framework for Love waves. Gradient-based local optimization algorithms such as the steepest descent algorithm [6] and conjugate gradient algorithm [17] have been widely used in solving FWI, a large-scale and strong nonlinear optimization problem, with low computational cost. However, the calculated gradient of the adjoint-state method [18] contains a singular value and uneven energy distribution, both of which are caused by the double scattering and limitations in the wavefield illumination and the wavelet frequency band. Hence, the model gradient amplitude at depth decreases rapidly, along with the diminished illumination capability, and the model parameters cannot be updated with apparent improvements.
To alleviate this problem, four different approaches have been proposed and developed: the distance-weighting, layerstripping, and full Hessian matrix (HM) schemes and the approximate-HM strategies. e distance-weighting [19,20] method adjusts the gradient amplitude across the model space by multiplying the gradient by a function of the distance from the source, and this method balances the energy distribution of the gradient by giving greater weight to the far offset and deep regions. Based on this idea, the layer-stripping method [21] divides a model into several parts according to different depths, and these parts are updated continuously according to the rules, from shallow to deep. erefore, the gradient in the deep region is not suppressed due to the large gradient amplitude in the shallow region, and the reconstruction effect in the deep region is improved. Both methods can improve the accuracy of model reconstruction to a certain extent. However, the function of distance or depth cannot completely or precisely express the scaling of a gradient.
ere is no objective standard to ensure the concrete implementation of these methods. For example, there are no standards for determining how to set the weight value of distance or depth in the distance-weighting method or how to determine where and when to move from one layer to the next in the layerstripping method. Different criteria will lead to different inversion results. e full-HM method relies on the advantages of second-order optimization algorithms such as the Newton [7] and Gauss-Newton [22] algorithms. A second-order partial derivative of the cost function, known as the HM, contains information about the curvature that can be used to clearly predict the illumination deficiency at depth in the model [14]. us, an inverse HM can adjust the ratio and balance the correction terms of the gradient to strengthen the illumination of the deep part of the model. Pratt et al. [22] implemented FWI in the frequency-spatial domain through a Full-Newton algorithm and provided a complete calculation method of the full-HM. is method greatly improved the inversion accuracy, but the calculation cost was too expensive to be acceptable under conventional conditions. In addition, Métivier et al. [14] calculated the full-HM based on the second-order adjoint state method and proposed the truncated-Newton method, which improved the calculation efficiency and ensured the robustness compared with the Full-Newton method. After that, some new Newtonclass algorithms emerged [23,24], but none of them could greatly reduce the computational cost of the full-HM. e approximate-HM employs the idea of estimation or construction to obtain the HM, solving the dilemma of determining the HM explicitly with current computing capabilities. e most common method is calculating the HM with the reciprocity theorem and the virtual source theory. Pratt et al. [22] pointed out that the approximate-HM was a diagonally dominant matrix with a high-frequency approximation. Only M diagonal elements in the HM are needed in the inversion involving M parameters. is fact greatly simplifies the calculation of the HM. Shin et al. [25] provided an estimating method for the diagonal terms in the approximate-HM in prestack depth migration based on inverse scattering theory. However, they failed to complete factors related to the geometric spreading. Later, Choi et al. [26] improved this estimation method by considering the amplitude of the seismic data in calculating the diagonal terms. However, this new approximate-HM only explained the source geometric spreading. Another approach is to construct the approximate-HM by combining information from a gradient, model, and cost function, including L-BFGS [27,28], the corrected pseudo-Newton algorithm [10], and the pseudo-Newton algorithm [29]. ese methods have further simplified the approximation method of the HM and achieved relatively effective results, but some accuracy has been lost.
In this paper, we present a new method to estimate the HM in Love wave FWI based on the calculation method of the HM in inverse scattering theory, and the preconditioned CG (PCG) is developed in combination with the CG. e proposed method uses the stress components of the Love wave wavefield to construct a pseudo-inverse HM to filter, balance, and regularize the gradient and alleviate the influence of nonuniform coverage from source to receiver, including double scattering, transmission, and geometric diffusion, thus improving the inversion accuracy. Because the required stress components have been obtained in the forward simulation, no extra calculation overhead is needed, thus ensuring the calculation efficiency and saving the calculation costs. Moreover, the construction of the gradient and the HM are derived in detail, and the implementation procedure of the PCG-based Love wave FWI is described in detail. e trial results from the checkerboard model and the complex structure model indicate that compared with the classical CG, the PCG can clearly rebuild the stratigraphic anomaly boundary and reconstruct the primary stratigraphic anomaly at depth, and it has strong antinoise ability. e advantages of PCG are due to the presence of the pseudo-HM.

Methodology
e core concepts of FWI are to fit the predicted wavefield from the forwarding modeling to that observed in the field and minimize the misfit between these two datasets. erefore, the optimal modeled data can be achieved. e optimization includes several key steps, such as the forward simulation of the seismic wavefield, the establishment of the optimization algorithm, the determination of the gradient, and the determination of the HM of a cost function.

Forward Problems.
Wave equations describe the physical process of seismic waves as they travel through a medium. e forward problem involves obtaining the wavefield at the receiving point by solving the wave equation when the underground medium parameters and the source function are known. In a 2D isotropic medium, assuming that a particle has only a nonzero displacement in the y-direction of the X-Z plane (X is the horizontal distance and Z is the depth), coupled with Hooke's Law, the pseudo-conservative Love wave elastodynamic equation [30] in the time domain is where ρ(x, z) is the density, V s (x, z) is the S-wave velocity, σ xy (x, z, t) and σ zy (x, z, t) are the shear stresses, v y (x, z, t) is the particle velocity, and F(x, z, t) is the seismic source excited by an external force. A general form of (1) is where D(m) is the partial-differential operator, u(m) is the seismic wavefield, φ is the external force loaded, m is the parameterized model in terms of the S-wave velocity and the density, m � (V s , ove it rem) � (m 1 , m 2 , . . . , m N ), and N is the number of model parameters. e high-order staggeredgrid finite-difference method [30] with second-order temporal accuracy and tenth-order spatial accuracy is used to solve (1) and (2) discretely. e free-surface boundary of the top part of the model is processed with the stress-image method, and the bottom, left, and right boundaries of the model are processed with the multiaxial perfectly matched layer technique. e number of absorption layers in three directions is 50. e source is a Ricker wavelet, and it is loaded into v y (x, z, t).

Inversion
Problems. FWI is a highly nonlinear problem, so nonlinear inversion theory should be adopted to solve it. In order to measure the fitting degree of the predicted data d pre and the observed data d obs , as well as the residual energy between these data, we define an L2-norm-based cost function: where ‖ · ‖ 2 denotes the L2-norm calculation, R is the mapping from the wavefield to the receiver arrays, and u(m) is the solution to equation (2). e inversion process involves finding a parameterized model that perfectly predicts the observed data by minimizing the cost function through a global or local optimization method. e global algorithm can basically avoid the local minimum problem, but a large amount of computation hinders its popularization and application. e local algorithm [31] is still widely used at present, and it is implemented by updating the model parameters iteratively, for which the iteration sequence is expressed as where m k , α k , and Δ m k are the model parameters of the kth iteration, the optimal step length estimated with the parabolic fitting method [32], and the descent direction of the model update, respectively. A Taylor series expansion with second-order precision is performed on the cost function (3) at the (k+1)th iteration: e derivative of the cost function E(m k+1 ) with respect to Δm k is assumed to be zero in order to find the minimum of the cost function near the model m k : e descent direction is obtained by rearranging (6): where ∇E(m k ) and H −1 k are the gradient and the inverse HM of the cost function, respectively. e inverse HM in the gradient-based local algorithms is substituted for by the identity matrix, and the CG is widely used in FWI with the descent direction: where β k+1 is the scalar parameter of the (k + 1)th iteration and the superscript T denotes the matrix transpose operator.
Shock and Vibration e core idea of the preconditioned gradient-based local algorithms is to solve or construct the inverse HM (7), and the descent direction is e value of β P is calculated with a formula from Métivier et al. [14]. To accelerate the convergence of the cost function, the scalar parameter is made nonnegative [33]:

Adjoint-State Gradient and Preconditioned
Operator. e acquisition of the cost function gradient is the core of the local optimization methods, which can be calculated with the explicit method [8] and the adjoint state method [18]. e explicit method obtains gradients by calculating the actual derivatives of the cost function concerning the various model parameters, which requires as much forward calculation as the model parameters. is type of computation is unacceptable for larger models. e adjoint state method overcomes this problem and only requires two forward calculations to obtain the gradient. In the adjoint-state method, it is pointed out that the gradient is the zero-lag cross-correlation of the forward-propagating wavefield and the backward-propagating wavefield obtained from the backward-propagating data residual at the geophone. is relationship is presented with detailed derivation in [34]. e backward-propagating wavefield can be determined with the adjoint function as follows: where (·) * H denotes the conjugate transpose, c is the backward-propagating wavefield to be calculated, and u(m) is the solution to (2) corresponding to the designated seismic source terms. Specifically, the source term on the right-hand side of (2) is a reverse time series consisting of the differences between the observed data and the predicted seismic data. Hence, the cost function gradient to the model parameters can be calculated according to the following equation: where real denotes real arithmetic and (·, ·) ω is the scalar product of the enclosed terms in the inner product space ω. Based on this equation, combined with the chain rule [17], the Love wave FWI gradient can be calculated with respect to the S-wave velocity and the density: where ∇E vel and ∇E den are the gradients of the S-wave velocity and the density, respectively, S represents the number of shots, T represents the sampling points in the time domain, σ f xy and σ f zy are the shear stresses of the forward-propagating wavefields, and σ b xy and σ b zy are the shear stresses of the backward-propagating wavefields.
Shin et al. [25] promoted the development of the preconditioned approach in FWI. Due to the narrow frequency band of the observed data, the double scattering, and other reasons, the gradient amplitude calculated by (13) and (14) does not scale with the depth increase. erefore, the illumination with the depth is lost, and the parameters related to the medium cannot be updated. e HM can clearly predict the above problems because it contains the curvature information. e inverse HM, also known as the preconditioned operator, can be used as a deconvolution operator to adjust the proportion and balance the correction terms of the gradient, eliminate the limited band effect, and enhance illumination for deep regions. However, the vast dimensions of the inverse HM make it almost impossible to calculate this matrix with the current computing power. Nevertheless, we can construct a pseudo-HM using the reciprocity theorem [14,25]. In this study, a new pseudo-HM in Love wave FWI is solved by referring to the HM calculation method of the inverse scattering theory [25], which can suppress the artifacts caused by multiple scattering in the gradient. Based on this, the form of the pseudo-HM can be expressed as where M is the model parameter space. is equation is transformed into a parameterized expression of the S-wave velocity and the density: 4 Shock and Vibration where H vel and H den are the pseudo-HMs of the S-wave velocity and the density, respectively. e inverse HM is diagonally dominant for the high-frequency approximation [22], and only the diagonal elements must be calculated. In addition, an adjustment factor is added to avoid the inversion instability caused by the near-zero value generated by the geometric diffusion of the wavefield on the diagonal. e preconditioned operator is adjusted at the kth iteration to where ε is an empirical parameter. e algorithm has high performance when ε � 10 − 5 -10 − 2 after various numerical tests. Eventually, the preconditioned operator is adjusted by L2-norm from the cost function gradient at the current iteration with Equation (19) is inserted into (9), resulting in the PCG.

e Workflow of the PCG-Based FWI.
Combining the principles of the core part of the Love wave FWI described in Sections 2.1-2.3, we give the detailed workflow of the PCGbased Love wave FWI as follows ( Figure 1): (1) An initial velocity model, the trial parameters (e.g., record time, maximum iteration number, and empirical parameter ε), and the iteration termination criteria are set.
(2) Information about the model velocity is input, equation (1) is solved to obtain a forward-propagating wavefield, and predicted seismic datasets are acquired. (3) e predicted seismic datasets are matched with the observed data, the cost function, and the wavefield residual calculated with equation (3). (4) A wavefield residual is set as the seismic source, a backward-propagating residual wavefield is initiated, and a backward-propagating wavefield is acquired. (5) Advantage is taken of the calculated forward and backward propagating wavefields to determine the gradient with equations (13) and (14), and then the pseudo-HM is obtained with equations (16) and (17). (6) Steps (2)-(5) are repeated until the gradient, and the pseudo-HM are calculated and stacked from all shots. (7) A preconditioned operator is calculated with the calculated stacked pseudo-HM using equations (18) and (19). (8) e calculated stacked gradient and the preconditioned operator are used to determine the scalar parameters with equations (9) and (10). (9) e optimal step size is then calculated with the parabolic fitting method. (10) e velocity model is updated by equation (9) with a calculated stacked gradient, preconditioned operator, and optimal step size. (11) It is judged whether the iteration convergence condition or the maximum iteration number is met. e velocity model is output if either criterion is met; otherwise, Steps (2)-(10) are repeated.

Model Tests
e PCG and the CG are simultaneously applied to the noise-free and noise-containing tests of the two models to verify the antinoise capability and the aforementioned advantages of the PCG over the CG. Because the Love wave is much more sensitive to the S-wave velocity than to the density [15], only the S-wave velocity is updated in the inversion, and the density is not updated. In all tests, the density of the model is 1800 kg/m 3 , and the density and the source wavelet are treated as known quantities. e coordinates of the first shot and the first receiver are (0 m, 0 m). In total, 2000 points are sampled in the time domain with a sampling interval of 0.0002 s, resulting in a total seismic record of 0.4 s. e maximum number of iterations is set to 40. e iteration is terminated automatically once the decline of the cost function value of the two adjacent iterations is less than 0.00001 times the initial value (k � 0). To quantitatively assess the inversion result precision and to compare the strengths and weaknesses among different methods, a root-mean-square error (RMSE) with the following form is introduced: where m inv and m true are the inversion results and the true values of the S-wave velocity model, respectively.

Noise-Free Data.
e first test example is a checkerboard model (Figure 2). e grid has a size of 41 × 101 in the vertical and horizontal directions with a step size of 0.5 in either direction. Hence, the actual model size is 30 × 40 m. e checkerboard is composed of 5 × 10 m blocks, and the S-wave velocity of each block is 300 m/s or 500 m/s, determined by the different positions (Figure 2(a)). e S-wave velocities of 300 m/s to 500 m/s are very common on a shallow subsurface. All 21 shots and 101 horizontal-component receivers (the white arrows and the solid yellow circles in Figure 2(a) represent the sources and receivers, respectively) are deployed on the surface in the x-direction with intervals of 2.5 m and 0.5 m. e initial model is set to a half-space with the S-wave velocity of 400 m/s (Figure 2(b)). e checkerboard model causes the transmitted seismic waves to scatter violently. When the seismic wave propagates to the boundary of each checkerboard block, the apparent scattered wave is generated due to the sudden change of the wave impedance, which can be clearly observed (Figure 3(a)). Furthermore, the first break Love wave with a shorter travel time than the scattered waves can also be observed (Figure 3(a)). ese two sets of waves occupy most of the wavefield's energy ("L" and "S" in Figure 3(a) indicate these two seismic events). Scattered waves are ubiquitous in shallow subsurface exploration, such as the existence of cavities or high-velocity anomalies in the ground.
Because of the insufficient illumination and uneven energy distribution of the gradient operators, the normalized descent direction in the first iteration of the CG presents artifacts near the free surface of the model, and its amplitude decreases sharply in the deeper medium (Figure 2(c)), which seriously affects the updating of the deep medium parameters and the stability of the inversion process. Because the PCG calculates the pseudo-HM of the cost function, the newly calculated descent direction contains the curvature information and balances the energy of the gradient operator so that it has the uniform amplitudes of the orders of magnitude in the shallow and deep parts of the model (Figure 2(d)). is improves the inversion accuracy of the deep medium, which suggests that the PCG can strengthen the wavefield's illumination capability and smooth the artifacts caused by double scattering and source coupling. Figures 2(e) and 2(f ) present the inversion results of the CG and the PCG. e CG can accurately reconstruct shallow checkerboard blocks and roughly display their boundaries, yet it cannot reconstruct deep checkerboard blocks (Figure 2(e)). Figure 3(g) also shows that the waveforms of inversion results fit poorly with the observed waveforms (the part with the worst fitting degree is enclosed by a green square and displayed in a magnified display). In contrast, the PCG reconstructs the model more precisely (Figure 2(f )), which is intuitively discernable in the S-wave velocity profiles in Figures 3(c) and 3(d). It is noted that due to the issue of the overestimation and underestimation of model parameters, the final inversion results have an upper bound of the velocity in the deep layer that is greater than 500 m/s. In comparison, the lower bound of the velocity at the surface is less than 300 m/s (Figures 3(c) and 3(d)).
is is the phenomefnon of illumination excess, which is caused by overadjusting the gradient operator by the preconditioned operator during the iteration. erefore, a constraint criterion for the model updates should be added in the iterative process to solve this issue. In the optimization process, the PCG has a faster convergence rate than the CG, and the cost function value and the RMSE are lower throughout (Figures 3(e) and 3(f )). e final value of the  normalized cost function value (misfit) of the two algorithms is close to 0 (Table 1), and the waveforms of the observed data and the inversion results are nearly identical (Figure 3(h)). However, there are still some differences between the reconstructed model and the true model, and the final RMSE values of the PCG and the CG are 0.1169 and 0.16, respectively (Table 1). It is suspected that the reason for this phenomenon is either that the initial model is too poor and it is not in the neighborhood of the optimal global solution, there is cycle-skipping, or there are numerical errors.
In order to further illustrate the advantages of the PCG in enhancing the illumination capability and improving the imaging accuracy of deep media, the second test example is a deeper model containing a complex high-velocity anomaly and a fault (Figure 4). e grid in this model has a size of 81 × 101 in the vertical and horizontal directions with a step length of 0.5 m in either direction, resulting in an actual model size of 40 × 50 m. e model has three layers: the first layer has a velocity of 300 m/s, the second layer has a velocity of 400 m/s and a "boot-like" high-velocity anomaly, and the third layer has a velocity of 500 m/s and a fault (Figure 4(a)).       respectively. e initial model without information about the anomalies is set as a gradient model with 40 layers, with a thickness of 1 m per layer and a velocity of 300 m/s for the first layer, and the velocity gradient is 5 m/s (Figure 4(b)). To eliminate the large amplitude of the gradient due to the seismic source coupling effect at a free surface in the CG, the gradient is preprocessed by setting the gradient to zero at the grids that are at the free surface and one line below [35], making our demonstration more convincing. Some of the reflected waves produced by changes in the wave impedance can be observed ("R" in Figure 5(a) indicates the seismic event). e CG inversion results precisely reconstruct the shallow morphology and display part of the high-velocity anomalies, yet they cannot be used to rebuild the fault structure (Figure 4(c)). e waveforms of the inversion results are basically fitted to the observed waveforms. However, the waveforms of the reflected wave are poorly matched ( Figure 5(g), the poorly matched parts have been circled and enlarged in green squares). In contrast, the PCG inversion results reconstruct the anomalies and structures more precisely and clearly delineate the boundaries (Figure 4(d)), and the final RMSE value reaches 0.0320 ( Figure 5(f ) and Table 2). Both parameter overestimation and underestimation still exist (Figures 5(c) and 5(d)). Because the updating of the shallow part of the model contributes greatly to the function value, the convergence rate of the CG is similar to that of the PCG in the first ten iterations (gradient zeroing near the free surface plays a role). Later, due to the insufficient updating of the deep medium parameters, the CG stops the optimization. Because the PCG makes it possible for the deep medium parameters to be updated due to better illumination capability at depth, the cost function value continues to decline. e final value is 0.0064, which is less than 1/9 of that in the CG (Figure 5(e) and Table 2). e waveforms of the inversion results match well with the observed waveforms ( Figure 5(h)). is comparison also confirms the importance of the HM for improving the inversion precision and proves that the PCG is exceptionally effective. In this research, we suggest using the preconditioned gradient-based algorithm such as the PCG in FWI.

Noise-Containing Data.
We apply the PCG to the second test example described above, but 20% and 50% Gaussian white noise is added to the observed data (Figures 6(a) and 6(b)) to verify the strong antinoise capability of the preconditioned gradient-based algorithm. e initial model is the same as that in the noise-free test (Figure 4(b)). e inversion results are shown in Figures 6(c) and 6(d). Even if the observed data contain noise, good inversion results can still be obtained, with which the highvelocity anomaly and the fault can be reconstructed (Figures 6(c) and 6(d)). Because the noise-containing waveforms cannot match the noise-free waveforms, in contrast to the value of the normalized cost function converging close to almost 0 in the noise-free test, the normalized cost function values of the noise test (Figure 7(a)) converge to 0.6725 and 0.8784 when the noise levels are 20% and 50%, respectively (Table 3). e inversion accuracy decreases with the increment of the noise level. However, when 50% noise is included in the observed data, the PCG final RMSE value is still lower and the inversion accuracy is even higher than that of the noise-free CG inversion (Figure 7(b) and Table 3). Additionally, the higher the noise level is, the more serious the parameter overestimation is (Figures 7(c) and 7(d)), which may be caused by the boundary effect. e noise-containing data test shows that the preconditioned gradient-based algorithm has high stability even if the observed data are contaminated by noise.

Discussion and Conclusion
With reference to the theory of inverse scattering, we develop a new pseudo-HM-based preconditioned approach for the 2D FWI of a Love wave to make full use of the wavefield information contained in the HM but not contained in the gradient. In this novel method, the stress components of the Love wave wavefield are used to construct a pseudo-inverse HM to filter, balance, and regularize the gradient. is reduces the influence of the wavefield propagation effects, including the double scattering, transmission, and geometric diffusion on the gradient, in order to improve the inversion results. In addition, because this method is independent of each component of the backward-propagating wavefield, it will not affect the focus of the data residual for the model perturbation, and the method will ensure that the gradient is not distorted. e stress components required by our new method have been obtained in the necessary forward simulation during each iteration, so the method does not have extra calculation costs and it ensures the calculation efficiency. e synthesis tests show that our new method can significantly accelerate the convergence rate, suppress the artifacts near the surface, and improve the model reconstruction accuracy, especially in the deep region of the model that is more affected by the defocus effect and geometric spreading. Furthermore, the PCG also has a strong antinoise ability that is very important for processing field datasets, and we believe that the pseudo-inverse HM plays a key role in this. However, the pseudo-inverse HM overcompensates for the gradient in some cases, resulting in overillumination of the wavefield and overestimation of the parameters, especially near the surface and in the deep region of the model. erefore, the confining criterion of the model update must be added in the iterative optimization, which requires further study. In addition, only the ability of the PCG to resist Gaussian white noise is tested in the synthetic tests, but some special noises such as traffic and electromagnetic noise may exist in the actual situation.
is still requires further testing and discussion. At the end, similar to the application of the PCG to the Love wave FWI described in this paper, a similar method can be applied to a Rayleigh wave and other body waves. It should be noted that the specific form of the proposed pseudo-HM is not invariable for different types of waves, and specific derivation and construction should be carried out according to different wave equations in combination with equation (15).

Data Availability
e data and the code in this paper can be obtained by contacting the corresponding author.