We explore the use of stochastic optimization methods for seismic waveform inversion. The basic principle of such methods is to randomly draw a batch of realizations of a given misfit function and goes back to the 1950s. The ultimate goal of such an approach is to dramatically reduce the computational cost involved in evaluating the misfit. Following earlier work, we introduce the stochasticity in waveform inversion problem in a rigorous way via a technique called

The use of simultaneous source data in seismic imaging has a long history. So far, simultaneous sources have been used to increase the efficiency of data acquisition [

The basic idea of replacing single-shot data by randomly combined “super shots” is intuitively pleasing and has lead to several algorithms [

Another approach, called the

Most theoretical results in SA and SAA assume that the objective function is convex, which is not the case for seismic waveform inversion. However, in practice one starts from a “reasonable” initial model, and we may be able to converge to the closest local minimum. One would expect SA and SAA to be applicable in the same framework. Understanding the theory behind SA and SAA is then very useful in algorithm design, even though the theoretical guarantees derived under the convexity assumption need not apply.

As mentioned before, the gain in computational efficiency comes at the cost of introducing random crosstalk between the shots into the problem. Also, the influence of noise in the data may be amplified by randomly combining shots. We can reduce the influence of these two types of noise by increasing the batch size, recombining the shots at every iteration, and averaging over past iterations. We present a detailed numerical study to investigate how these different techniques affect the recovery.

The paper is organized as follows. First, we introduce randomized trace estimation in order to cast the canonical waveform inversion problem as a stochastic optimization problem. We describe briefly how SA and SAA can be applied to solve the waveform inversion problem. In Section

The canonical waveform inversion problem is to find the medium parameters for which the modeled data matches the recorded data in a least-squares sense [

We denote the corresponding optimization problem as

In practice,

We follow Haber et al. [

This technique is based on the identity

Using the definition of

A natural approach to take is to replace the expectation over

For a fixed

We investigate the misfit along the direction of the negative gradient

True (a) and initial (b) squared-slowness models (s^{2}/km^{2}) and the true reflectivity.

The full gradient is depicted in (a). The approximate gradients for various

Error in the gradient as a function of the batch size

Behavior of misfit for various

A second alternative is to apply specialized stochastic optimization methods to problem (

We discuss theoretical performance results and describe SAA and SA in more detail in the next section.

Efficient calculation of the trace of a positive semidefinite matrix lies at the heart of our approach. Factors that determine the performance of this estimation include the random process for the

Summary of bounds, adapted from Avron and Toledo [

Estimator | Distribution of | Variance of one sample | Bound on |
---|---|---|---|

Hutchinson | |||

| |||

Gaussian | |||

| |||

Phase encoded | |||

| n/a |

Of course, these bounds depend on the choice of the probability distribution of the

(1) the Rademacher distribution, that is,

(2) the standard normal distribution, that is,

(3) the fast phase-encoded method where

The lower bounds summarized in Table

We conduct the following stylized experiment to illustrate the quality of the different trace estimators. We solve the discretized Helmholtz equation at 5 Hz for a realistic acoustic model with 301 colocated sources and receivers located at

Residual matrix

We evaluated the different trace estimators 1000 times for batch sizes ranging from

This table shows the theoretical lower bounds (see Table

Gauss | |||

Hutchinson | |||

Phase |

Reconstruction as a function of

The sample average approximation (SAA) is used to solve the following class of stochastic optimization problems:

Stochastic approximation (SA) methods go back to Robbins and Monro [

Note that the error rate in the objective values is

To test the performance of the SAA approach, we chose to use a steepest descent method with an Armijo line search (cf., [

find

The SA methods are closely related to the steepest descent method. The main difference is that for each iteration a new random realization is drawn from a prescribed distribution and that the result is averaged over past iterations. We chose to implement a few modifications to the standard SA algorithms. First, we use an Armijo line search to determine the step size instead of using a prescribed sequence such as that discussed in the previous section. This assures some descent at each iteration with respect to the current realization of

draw

find

For the numerical experiments, we use the true and initial squared-slowness models depicted in Figure

The Helmholtz operator is discretized on a grid with 10 m spacing, using a 9-point finite difference stencil and absorbing boundary conditions. The point sources are represented as narrow Gaussians. As a source signature, we use a Ricker wavelet with a peak frequency of 10 Hz. The noise is Gaussian with a prescribed SNR.

We run each of the optimization methods for 500 iterations and compare the performance for various batch sizes and noise levels to the result of steepest descent on the full problem. Remember that by using small batch sizes, the iterations are very cheap, so we can afford to do more. The random vectors are drawn from a Gaussian distribution with zero mean and unit variance. We chose to use the Gaussian because the theoretical bounds on

In a realistic application, one might want to add a regularization term. In particular, this would prevent the overfitting that we observe in the noisy case. Note that limiting the amount of iterations also serves as a form of regularization [

We choose a set of

Inversion result for the SAA approach with various batch sizes and noise levels. The rows represent different batch sizes

Error between the inverted and true model for the SAA approach with various batch sizes and the full problem, (a) without noise, (b) with noise (SNR = 20 dB), and (c) with noise (SNR = 10 dB). On noiseless data, we achieve a qualitatively comparable result with

We run the stochastic descent algorithm for varying batch sizes (

The results obtained without averaging are shown in Figure

Inversion result for the SA approach without averaging for various batch sizes and noise levels. The rows represent different batch sizes

Error between the inverted and true model for the SA approach without averaging for various batch sizes and the full problem, (a) without noise, (b) with noise (SNR = 20 dB), and (c) with noise (SNR = 10 dB). We get qualitatively similar results, compared to the full inversion, with

Results obtained with averaging over the past 10 iterations are shown in Figure

Inversion result for the SA approach with limited averaging (

Error between the inverted and true model for the SA approach with limited averaging for various batch sizes and the full problem, (a) without noise, (b) with noise (SNR = 20 dB), and (c) with noise (SNR = 10 dB). The convergence is smoother than that of SA without averaging, especially when the data is very noisy (10 dB). The averaging seems to slow down the convergence slightly, however, and we need a batch size

Finally, we show the result obtained by averaging over the full history in Figure

Inversion result for the SA approach with full averaging (

Error between the inverted and true model for the SA approach with full averaging for various batch sizes and the full problem, (a) without noise, (b) with noise (SNR = 20 dB), and (c) with noise (SNR = 10 dB). Averaging over the full past slows down the convergence dramatically.

Following Haber et al. [

Theory from the field of stochastic optimization suggests several approaches to tackle the optimization problem and reduce the influence of the crosstalk introduced by the randomization. The first approach, the

We note that, as opposed to

In our experiments, we were able to obtain results that are comparable to the full optimization with a small fraction of the number of sources. In the noiseless case, we needed only

Averaging over a limited number of past iterations improved the results for a fixed batch size and allows for the use of fewer simultaneous sources. However, too much averaging slows down the convergence.

The results of the SA approach, where a new realization of the random vectors is drawn at every iteration, are superior to the SAA results, where the random vectors are fixed. However, one could use a more sophisticated (possibly black box) optimization method for the SAA approach to get a similar result with fewer iterations. The tradeoff between using a smaller batch size and first-order methods (i.e., more iterations) versus using a larger batch size and second-order methods (i.e., less iterations) needs to be investigated further. Random superposition of shots only makes sense if those shots are sampled by the same receivers. In particular, this hampers straightforward application to marine seismic data. One way to get around this is to partition the data into blocks that are fully sampled. However, this would not give the same amount of reduction in the number of shots because only shots that are relatively close to each other can be combined without losing too much data.

The type of encoding used will most likely affect the behavior of both SA and SAA methods. It remains to be investigated which encoding is most suitable for waveform inversion.

The authors thank Eldad Haber and Mark Schmidt for insightful discussions on trace estimation and stochastic optimization. This work was in part financially supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (no. 22R81254) and the Collaborative Research and Development Grant DNOISE II (no. 375142-08). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BP, Chevron, ConocoPhillips, Petrobras, Total SA, and WesternGeco.