Inversion of Rayleigh Wave Dispersion Curves via Long Short-Term Memory Combined with Particle Swarm Optimization

An essential step in surface wave exploration is the inversion of dispersion curves. By inverting dispersion curves, we can effectively establish the shear-wave velocity model and obtain reliable subsurface stratigraphic information. The inversion of dispersion curves is an inversion problem with multiple parameters and multiple poles, and obtaining a high precision solution is difficult. Among the methods of inversion of dispersion curves, local search methods are prone to fall into local extremes, and global search methods such as particle swarm optimization (PSO) and genetic algorithm (GA) present the disadvantages of slow convergence speed and low precision. Deep learning models with strong nonlinear mapping capability can effectively solve nonlinear problems. Therefore, we propose a method called PSO-optimized long short-term memory (LSTM) network (PSO-LSTM) to invert the dispersion curves in order to improve the effect of inversion of dispersion curves. The method is based on the LSTM network, and PSO is used to optimize the LSTM network structure and other parameters that need to be given manually to improve the prediction of the network. Two theoretical geological models are used in the paper: Model A and Model B to test the PSO-LSTM. The tests include the noisy data test and noise-free data test. Model A was tested without noise, and Model B was tested with noise. In addition, PSO and LSTM were tested on model A to compare the performance of PSO-LSTM. In Model A, the maximum relative errors of PSO and LSTM are 20.76% and 5.85%, respectively, and the maximum standard deviations of PSO and LSTM are 57.37 and 1.97, respectively. For PSO-LSTM, the maximum relative errors of Model A and Model B in the inverse results are 2.05% and 2.09%, and the maximum standard deviations of Model A and Model B in the inverse results are 1.23 and 3.87, respectively. The test results of Model A show that the inversion performance of PSO-LSTM is better than those of LSTM and PSO, and the performance of the network can be improved after PSO is used to optimize the network parameters. The inverse results from Model B show that the PSO-LSTM is robust and can invert the dispersion curves well even after adding noise to the model. Finally, the PSO-LSTM is used to invert the actual data from Wyoming, USA, which demonstrates that the PSO-LSTM can be used for the quantitative interpretation of Rayleigh wave dispersion curves.


Introduction
Rayleigh waves are waves that transmit along the surface or medium partition interface formed by the interference and superposition of P-wave and S-waves. Since the discovery of Rayleigh waves by the British scholar Rayleigh [1], scholars have been investigating theories related to the propagation of Rayleigh waves in the strata. In the early stages of research, Rayleigh waves were treated as noise, and researchers focused on the characteristics of Rayleigh waves to reduce their hazard in earthquakes or to eliminate their impact on valid information in oil seismic exploration [2]. With the deepening of research, scholars found that Rayleigh waves propagate in layered media with dispersion phenomenon. Scholars then began to use Rayleigh waves in the study of stratigraphic structure. Compared with the conventional exploration methods, such as refection seismic exploration, surface wave exploration has the advantages of nondestructive testing, convenient construction, high resolution at shallow depth and fast detection speed [3] and is broadly used in the felds of soil delineation, engineering nondestructive testing, and geological hazard survey.
Surface wave exploration can be divided into three steps: surface wave data acquisition, dispersion curves picking, and inversion of dispersion curves. As the key step of surface wave exploration, the inversion of dispersion curves directly afect the reliability of the requested stratigraphic information. Currently, there are two main types of methods to invert the dispersion curves: local search methods and global search methods. Local search methods include the least squares method, Levenberg-Marquardt (L-M) algorithm, and Occam algorithm. Dorman and Ewing [4] used the damped least squares method to invert the dispersion curves for the frst time. Te S-wave velocities obtained by the inversion were consistent with those obtained by the refracted wave method, confrming the validity of the method. Xia et al. [5] combined the L-M algorithm and the singular value decomposition technique to invert the Rayleigh wave phase velocity. Te method improved the convergence speed, computational performance, and stability of the inversion. In addition to least squares method, the Occam algorithm has also been applied to the study of inversion of dispersion curves. For example, Ai and Cheng [6] employed the Occam algorithm for the inversion of Rayleigh wave dispersion curves, and the results of inversion showed that the Occam algorithm can balance the accuracy of model and the computational rate of the inversion well. Local search methods are extensively used because of their rapid computational speed. However, such methods rely excessively on the initial model, and reliable results of inversion can be obtained only if the initial model is similar to the real model. In addition, local search methods are prone to fall into local extremes, and the partial derivatives involved in the calculation and the results of inversion are afected by the accuracy of the Jacobi matrix. Te development of local search methods is limited by these factors. Terefore, researchers have applied another global research method that can avoid the initial model selection and partial derivative calculation to invert dispersion curves, such as GA and simulated annealing algorithm (SA). Shi and Jin [7] used GA for inversion of dispersion curves. In the inversion, Shi and Jin modifed the search range by analyzing the initial search results, and thus improved the search efciency of the method. Yamanaka and Ishida [8] added an elite screening strategy to GA to promote the convergence of the solution. Dal Moro et al. [9] used GA and a posteriori probability density estimation for inversion of dispersion curves, and the method obtained solutions with higher accuracy compared to those obtained by GA. In addition, some scholars have also used SA for inversion of dispersion curves studies. Lu et al. [10] proposed a heat-bath simulated annealing algorithm based on the SA. Te inverse results showed that the heat-bath simulated annealing algorithm is more suitable for inversion of dispersion curves than the L-M algorithm. Compared with local search methods, the above global search methods are superior, but these global search methods with long operation time and low solution accuracy are still needed to improve for high precision surface wave exploration.
Deep learning models are capable of building a good mapping between signal and semantics by building a hierarchical structure similar to the human brain, extracting features layer by layer from the bottom to the top of the input data [11]. Deep learning models are excellent at solving nonlinear problems and making fast predictions, and their applications in the geophysical feld are gradually increasing in recent years, such as earthquake detection and localization [12], seismic lithology prediction [13], denoising [14], detecting faults [15], and other directions. Deep learning techniques have also yielded impressive results in the feld of surface wave exploration. Dai et al. [16] proposed a network specifcally for dispersion curve extraction called dispersion curves network (DCNet). Te network can extract the dispersion curves quickly and accurately. Teoretical and practical data test results show that the accuracy of the extracted dispersion curves using DCNet has reached the level of manual pickup and can meet the needs of practical work. Song et al. [17] proposed a neural network Res-Unet++, which can accurately and efciently extract the dispersion curves. Actual data have verifed that using this network to select the dispersion curves is better than that of manual selection. Yablokov et al. [18] developed an artifcial neural network for Rayleigh surface wave fundamental mode dispersion curve inversion. Te accuracy of the inverse results of this method is better than the Monte Carlo algorithm inverse results and similar to the Gray Wolf optimization inverse results by theoretical model testing. For noisy data, the artifcial neural network still works well. Wu et al. [19] proposed a LSTM network to invert surface wave based on the frst height last velocity loss function. Te test results of synthetic and real data show that the network can be efectively used not only for theoretical data inversion but also cope well with real data. Te results of dispersion curves inversion mentioned above were summarized as shown in Table 1.
On this basis, the PSO-LSTM is used in this paper for the study of inversion of dispersion curves. PSO is used in the selection of parameters of the LSTM for the number of neurons in the hidden layer, the learning rate, and the number of training rounds in the LSTM to avoid the low prediction accuracy of the network model due to improper manual tuning of the parameters. In order to evaluate the ability of the PSO-LSTM to invert dispersion curves in detail, the feasibility of the model for inversion of dispersion curves was frst verifed by using a model without noise; then the stability of the PSO-LSTM was tested by using a model with 10% Gaussian noise; and fnally, the ability of the PSO-LSTM to invert the actual data was tested by using seismic data from the Wyoming area. Computational Intelligence and Neuroscience Least-squares algorithm [4] (1) High calculation speed; (2) High precision of solution (1) Te appropriate initial model needs to be given; (2) Te partial derivative needs to be calculated; (3) Easy to fall into local minima None Levenberg-Marquardt algorithm combined with the singular value decomposition technique [5] (1) High calculation speed; (2) Excellent stability (1) Te appropriate initial model needs to be given; (2) Te partial derivative needs to be calculated; (3) Easy to fall into local minima None Occam algorithm [6] (1) High calculation speed; (2) High precision of solution; (3) Excellent stability (1) Te appropriate initial model needs to be given; (2) Te partial derivative needs to be calculated; (3) Easy to fall into local minima None Genetic algorithm [7] (1) Excellent ability to escape from local minima; (2) Independent of selecting the initial model; (3) Calculation of partial derivatives is avoided (1) Huge computational time cost; (2) Low accuracy of calculation None Genetic algorithm combining elite selection and dynamic mutation strategy [8] (1) Excellent stability; (2) Excellent ability to escape from local minima; (3) Independent of selecting the initial model; (4) Calculation of partial derivatives is avoided (1) Huge computational time cost; (2) Low accuracy of calculation

Marquardt algorithm
Genetic algorithms combining marginal posterior probability density estimation [9] (1) Excellent ability to escape from local minima; (2) Independent of selecting the initial model;

Long and Short-Term Memory Network.
Te LSTM introduces a gating mechanism, which is better at handling timing problems than the traditional recurrent neural network. Te cell structure of the LSTM is shown in Figure 1. In Figure 1, f t , i t , and o t denote the forget gate, input gate, and output gate, respectively, c t denotes the neural unit state, h t denotes the hidden layer state, x t denotes the input vector of the LSTM unit, σ and tanh denote the sigmoid and tanh activation functions, respectively. Te core components of the LSTM are the cell state and the gate structure; the cell state can be seen as a channel for information transfer, allowing information to be passed continuously; the gate structure continuously learns during the training process whether to keep or forget information. Te input gate determines the important information in the current input, which in turn updates the cell state; the forget gate determines the information that should be discarded or retained; the output gate is used to determine the new hidden h t , and to pass the new cell state c t and the new hidden state h t to the next LSTM cell. Te information transfer within the LSTM neural unit follows equation (1)-(6): where w denotes the weight and b denotes the bias amount of each gating unit.

Particle Swarm
Optimization. PSO can be used to fnd the optimal solution quickly through the information interaction between particles. Te particles in the algorithm are moving simultaneously, and all particles will generate memory and experience in the process of motion. Any individual particle will compare its experience with the Heat-bath simulated annealing algorithm [10] (1) Excellent ability to escape from local minima; (2) Independent of selecting the initial model; (3) Calculation of partial derivatives is avoided; (4) Suitable for parallel programming (1) Low accuracy of calculation Levenberg-Marquardt algorithm and fast simulated annealing algorithm Artifcial neural network [18] (1) Excellent stability; (2) High inversion efciency (1) Requires large amounts of training data; (2) Training the network costs a lot of time Monte Carlo approach and gray wolf optimizer LSTM based on the frst height last velocity [19] (1) Excellent stability; (2) High inversion efciency (1) Requires large amounts of training data; (2) Training the network costs a lot of time Figure 1: LSTM basic cell structure [20,21]. experience provided by other particles in the process of fnding the optimal solution, so that it will constantly be in the optimal solution. Te PSO's velocity position update equation is shown in equation (7): where w is the inertia weight; c 1 and c 2 is the learning factor; r 1 and r 2 is the random number between [0,1]; v, x, pbest, gbest are the velocity component, position component, individual optimum, and population global optimum of the ith particle in the jth dimension at the tth iteration, respectively.

Flow of PSO Optimize Network Parameters.
Te selection of network parameters for LSTM is usually based on researchers' experience, and the low prediction accuracy of the model caused by artifcial selection can be avoided if PSO is used to determine the parameters. Te process of PSO to fnd the optimal network parameters is as follows: the number of hidden layer neurons, learning rate, and number of training rounds of key model parameters in LSTM are used as optimization-seeking variables for particles in different dimensions, and the optimal model parameters are obtained by continuously updating the velocity and position of particles and calculating and comparing the objective function ftness values so as to achieve the global optimum. Te PSO-LSTM fow chart is shown in Figure 2. Te PSO-LSTM fow is described as follows: Step 1 Te data are divided into training and test sets in a 4 : 1 ratio. Te input data is only the Rayleigh wave phase velocity; therefore, no normalization of the data is performed.
Step 2 Te number of hidden neurons, learning rate, and number of training rounds of the LSTM are set as the search parameters to initialize the population, and the search ranges of hidden neurons, learning rate, and number of training rounds are 50-300, 0.05-0.3, and 200-1000, respectively (the search ranges for these parameters are given based on the researcher's experience).
Step 3 Te prediction error of the PSO optimized LSTM is the ftness value of the particle, and the ftness value changes with the number of iterations, and the individual particle updates the individual optimal position and the global optimal position according to the ftness value and then updates its own speed and position according to equation (7).
Step 4 Stop the iterative update when the ftness value of the particle stabilizes and determine the values of the number of hidden layer units, learning rate, and the number of training rounds.
Step 5 Input the optimal parameters into the LSTM for training and prediction.

Evaluation Metrics for PSO-Optimized LSTM Network
Parameters. In the process of parameter optimization by PSO, PSO continuously updates the number of hidden layer units, learning rate, and training rounds to build LSTM models with optimized parameters for training and prediction. Te particle ftness value is represented by the mean absolute percentage error (MAPE) of the prediction of shear-wave profles on the test set. Te lower the value of MAPE, the better the parameters found in this iteration, and the optimal parameters are determined. Te MAPE is calculated as where N is the sample size, y i denotes the true value, and y ∧ i denotes the predicted value. Te model is built with optimal parameters and trained after the optimization search is completed. Te accuracy of the LSTM solution is tested by the mean squared error (MSE) between the predicted and true values. Te lower value of MSE indicates the higher accuracy of the model solution, and the formula for calculating MSE is where N is the sample size, y i denotes the true value, and y ∧ i denotes the predicted value.

Synthetic Data Tests
Te signifcant infuence on the variation of Rayleigh wave dispersion curves characteristics is shear-wave velocity and thickness of the stratum [5], and the remaining parameters have a minor efect on it. In order to reduce calculation cost, only the shear-wave velocity and thickness are inverted. For the network training data, the scalar transfer algorithm is used to generate the data. Te frequency band range of the dispersion curves is 5-100 Hz with a frequency interval of 3. Te sample data were randomly generated within the upper and lower 50% of the theoretical model parameters [22]. Te frequencies in the data are all the same, and to reduce the time cost, only the Rayleigh wave phase velocity in the dispersion data is used as the input data of the network model, and the stratum layer thickness and shear-wave velocity in the model parameters are used as the label data. In the article, the inertia weights, learning factors c1, and c2 of PSO are 0.8, 2.0, and 2.0, respectively. In the inversion of dispersion curves, the number of particles and iterations of PSO are 20 and 50, respectively. In PSO-LSTM, the number of particles and iterations of PSO are 10 and 20, respectively. All tests in the article were performed in the same environment. Te software platform is PyCharm, and the programming environment is the Python language using the PyTorch framework. PyTorch and Python version are 1.10.2 and 3.6.13, respectively. Te CPU and GPU of the computer used in this article are Intel Core i5-10400F and NVIDIA GeForce RTX 3060, respectively.

Computational Intelligence and Neuroscience
Te objective function of the PSO to invert dispersion curve is where V obs R is the measured phase velocity of the Rayleigh wave; V cal R is the theoretical phase velocity of the Rayleigh wave; and M is the number of points of frequency.
To verify the performance of PSO-LSTM, two typical geological models were designed. Model A is a four-layer geological model with increasing velocity with depth; Model B is a four-layer model with a low-velocity layer in the middle of the model; and the model parameters are shown in Table 2. Model A is tested without noise, and the data contain the dispersion data of the fundamental mode; Model B is tested with noise, and the data contain the dispersion data of the fundamental mode and second mode. Te sample data of model A and B are shown in Figure 3. Te sample data of model A contain no noise, and the sample data of model B have 10% Gaussian noise added to it.

Noiseless Synthetic Data Test.
To compare the inversion performance of PSO, LSTM, and PSO-LSTM, we tested these three inversion methods using Model A. Te number of neurons in the hidden layer 1 and hidden layer 2 in the parameters of the LSTM network without PSO optimization given by experience are 160 and 118, respectively; the learning rate and the number of training rounds of the LSTM network are 0.17 and 719, respectively. Te optimal LSTM network parameters by PSO search are as follows: the number of neurons in hidden layers 1 and 2 is 254 and 276, respectively; the learning rate is 0.0890; and the number of training rounds is 719. From Figures 4(a) and 4(b), we can see that using the model parameters searched in 20 iterations to train the network can already get reasonable training results. Te same trend of training error and validation error in Figure 4(b) indicates that the model is well trained. From Figure 4(c), it can be seen that PSO, LSTM, and PSO-LSTM inverted dispersion curves ft well with the observed dispersion curves, indicating that they have found the optimal solution. In Figure 4(     Computational Intelligence and Neuroscience 7 inversion more difcult and reducing the accuracy of inverse results [10]. Terefore, the capability of PSO-LSTM to invert noisy data is necessary to be examined. In order to simulate the real data, 10% Gaussian noise is added to both Model B and the sample data of Model B. Ten, the trained network is used to invert model B. Te optimal LSTM network parameters are searched by PSO: the number of neurons in hidden layers 1 and 2, the learning rate, and the number of training rounds are 61, 239, 0.1140, and 747, respectively. From Figure 5(a), we can see that after 20 iterations, the function values have converged. Figure 5(b) indicates that the model is well trained. In Figure 5(c), the dispersion curves ft well, and there is no signifcant deviation between the inverted dispersion curves and the observed dispersion curves.

Stability Analysis.
To further evaluate the performance of the PSO-LSTM to invert the dispersion curves, the theoretical model test was repeated 10 times, respectively, and the mean and standard deviation of the inverse results were calculated. Te mean and standard deviation are generally used to refect the stability of the inverse results, the smaller the mean and standard deviation, the more stable the inverse results [23,24]. Keeping the network parameters constant, the network is trained 10 times and predicted separately. Te inverted results are shown in Figure 6 and Table 3. In addition, 10 inversion tests were also performed separately for PSO and LSTM on Model A, and the inverted results are added in Table 3. In model A, the maximum relative errors in  In Table 3, the maximum relative errors and maximum standard deviations of model B are 2.09% and 3.87. In    Computational Intelligence and Neuroscience PSO-LSTM is stable, and the inverted models from PSO-LSTM can accurately predict the real ones.

Real Data Test
Te next step will be to test the ability of PSO-LSTM to invert the actual data acquired from the Wyoming area of the United States [25]. Te original seismic record is shown in Figure 7(a). Forty-eight 8 Hz vertical component geophones were used to collect data, the interval was 0.9 m, the minimum ofset distance was 0.9 m, and the shock source was a hammer shock source. Figure 7(b) shows the dispersion image extracted from the seismic record. Te inverse test was performed using the dispersion curves picked up on the fundamental mode (solid dots in Figure 7(b)). Te exploration depth was divided into fve layers based on the logging data, and the set stratigraphic physical parameters are shown in Table 4. 250 sample data were created according to Table 4 using the fast scalar method, and the data are shown in Figure 8. Te parameters of the PSO-LSTM obtained by PSO search are as follows: the number of neurons in hidden layers 1 and 2 are 217 and 192, respectively, the learning rate is 0.1760, and the number of training rounds is 719. Figures 9(a) and 9(c) show the inverse results, and Figure 9(b) shows the network training error. From Figure 9(a), it can be seen that the dispersion curve obtained from the inversion matches well with the measured curve. In Figure 9(c), the shear-wave velocity model obtained from inversion matches well with the logging data. Te inverted shear wave velocity model is not only close to the logging data in terms of speed but also roughly consistent with the depth of the real formation. Tis

12
Computational Intelligence and Neuroscience Computational Intelligence and Neuroscience 13 shows that the results of PSO-LSTM inverted the actual data are reliable.

Conclusion
In this paper, we propose a dispersion curve inversion method based on a deep learning model. Te method can avoid the manual parameter selection and improve the prediction accuracy of the network. In the specifc test work, the optimal network parameters are frst selected using PSO, and the prediction of the dispersion data is performed after training the model using the selected network parameters.

Data Availability
Te experimental data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.