Effective Parameter Dimension via Bayesian Model Selection in the Inverse Acoustic Scattering Problem

We address a prototype inverse scattering problem in the interface of applied mathematics, statistics, and scientific computing. We pose the acoustic inverse scattering problem in a Bayesian inference perspective and simulate from the posterior distribution using MCMC. The PDE forward map is implemented using high performance computing methods. We implement a standard Bayesian model selection method to estimate an effective number of Fourier coefficients that may be retrieved from noisy data within a standard formulation.


Introduction
Inverse problems are typically ill-posed and analytical solutions are seldom available.Approaches to inverse problems in the interface of applied mathematics, statistics, and scientific computing represent a setting with a myriad of tools for robust solutions, for a number of reasons including its sound theoretical setting for uncertainty quantification [1,2], prediction, and decision making.See [3] for a recent review.Although these interdisciplinary approaches are becoming prevalent [4][5][6][7][8], topics such as knowledge representation in complex systems [9] and effective dimension are not as mature yet.Sparsity promoting regularization and Bayesian methods have received attention recently [8,10].
In this paper we consider a nonlinear acoustic scattering inverse problem as a case study.The rationale is that the well posedness of the direct problem and robust numerical methods has been comprehensively studied [11,12].This fact allows us to address the inverse problem with a reliable direct problem solver.
The propagation of acoustic waves in a homogeneous isotropic medium with constant speed of sound is governed by the linear wave equation: for a velocity potential  and  the speed of sound in the medium.For monochromatic time-harmonic waves with frequency  we have  (, ) = Re (   ()) , where the space dependent term () satisfies the Helmholtz equation.Consider the following: where  = / is called the wave number.Given an obstacle of compact support  ⊂ R  ( = 2, 3), its forward scattering problem is governed by the Helmholtz equation in R  − .
Given an incident wave exp(⋅), the obstacle boundary Γ uniquely determines the scattered wave   (), and every scattered wave   () which is a radiating solution of the Helmholtz equation has the asymptotic behavior of an outgoing spherical wave.Consider the following: where x = /|| and  ∞ ( x) are the scattering amplitude or far field pattern.We denote by  Γ the boundary to far field mapping: The mapping from the obstacle boundary Γ to the far field scattering amplitude  ∞ is injective [11,12].
In this work we address the problem of estimating the boundary Γ given noisy measurements of the far field data  ∞ .This paper is organized as follows.For the sake of making this paper self-contained, in Section 2 we briefly review useful results used throughout the paper.The acoustic layer potential and integral equation method are presented in Section 2.1.We discuss the numerical solution of the forward mapping and its implementation in Section 2.2.In Section 2.3 we present in detail the Bayesian formulation of the inverse problem along with the probability distributions involved.The inverse problem results and the effective dimension analysis are presented in Section 3. We consider the effective dimension exploration as the main contribution of this work.

Materials and Methods
In this section we describe the forward mapping evaluation using the integral equations method and the layer acoustic potentials approach as in [11,12].

Single Layer and Double Layer Potentials. The fundamental solution of the Helmholtz equation (3) is
for  ̸ = , where | ⋅ | denotes the  2 norm and  (1)  0 is the Hankel function of first kind and order zero.Given an integrable function , the integral operators with ] being the unit normal vector to Γ directed to the exterior of , are called acoustic single layer and acoustic double layer potentials for density , respectively.Both, the single and double layer potentials are radiant solutions of the Helmholtz equation in R  \ .The single layer potential is continuous in R  \  (including the boundary Γ) and consequently it is defined as in (8) also for  ∈ Γ.The double layer potential is only continuous in R  \  (not including Γ); nonetheless ()() is defined in the boundary as its corresponding limit when  approaches Γ, namely, see [12] for details.
The direct problem is formulated in the form of a combined potential using the Dirichlet boundary condition: where  inc = exp( ⋅ ) is the incident wave,  is a coupling factor (in our case we set  = ), and  and  are the acoustic potentials defined above.The three potentials (single layer, double layer, and combined potential) reproduce the same far field pattern given a boundary Γ.However, the combined potential operator in (11) is the sum of the identity and a compact operator, and therefore its singular values are bounded away from zero.Consequently, the combined potential is more stable regarding numerical implementation and then we use the combined potential in the remainder.
The far field pattern for the combined potential is given by The corresponding far field pattern for the single-layer potential is given by The integral (12) can be evaluated numerically using the trapezoidal rule after solving the integral equation (11) for .It is known [13] that the trapezoidal rule has high accuracy when integrating over periodic functions and therefore is a sensible method to use in this case.

Numerical Solution of the Forward Map.
As a test case we consider the two-dimensional kite-shaped domain shown in Figure 1.The parametric representation for the boundary is given by Γ() = (cos  + 0.65 cos 2 − 0.65, 1.5 sin ), 0 ≤  ≤ 2.
The combined potential equation (11) gives rise to a linear system upon discretization: where  is the density,  = −2 inc , and the matrix  has the form where  is the  ×  identity matrix and  is the number of knots used to discretize the boundary Γ.  and  are the discrete kernels corresponding to single layer potential and double layer potential, respectively (for details see [12]).Matrix  in ( 15) is square, nonsymmetric, dense, complex, and well conditioned.Mathematical software such as Matlab or Python can be used to evaluate the direct problem.However, it must be taken into account that a big number of evaluations of the forward mapping must be done in a Bayesian statistics approach.Consequently, in order to have an efficient numerical machinery to evaluate the direct problem we have implemented a parallel version of the general conjugate residuals method (GCR) to solve the linear system (14).The GCR method was programmed in C on a graphics processing unit (GPU).Of note, a serial version of the GCR method programmed in C is slow, even compared with matlab and python solvers of the linear systems; see Figure 2. The implementation details for parallel computing scheme are presented on Appendix B. Although the evaluation of the far field pattern (12) is not computationally expensive, it can be done in parallel.Details are shown also in Appendix B.
The far field pattern value  ∞ () obtained for the kite in the directions  = (1, 0) closely resembles results previously reported in [12].The convergence of Nÿstrom method, used to numerically evaluate the integral operators, is illustrated on Figure 3.We use as a reference solution the far field pattern value  ∞ (1, 0) with wave number  = 1, obtained with the combined potential for  = 1024 points, and we plot in logarithmic scale for -axis the relative errors for far field patterns obtained with combined potential (solid line in Figure 3) and single layer potential (dashed line in Figure 3) for number of points  = 8, 16, 32, 64, 128, 256.

Bayesian Formulation. First let us remember the Bayes rule:
The posterior density ( |  ∞ ) quantifies the uncertainty of the parameter  to be recovered from data  ∞ .The prior density () expresses the information regarding the unknown parameter .( ∞ | ) is the likelihood function, that is, the measurement error distribution assumed, conditional on .The density ( ∞ ) is a normalizing constant.Below we describe these density functions for our inverse problem.We consider the inverse problem defined by the nonlinear forward mapping (6) from the point of view of Bayesian statistics.In Bayesian statistical inversion theory, the solution of the inverse problem is the conditional probability of parameters, given the data (posterior probability).We assume that the data are noisy measurements of far field pattern  ∞ and that the boundary Γ may be described by a vector of parameters  as follows.
We construct a prior model based on a parameterization for Γ considering that it is a simple 2-periodic curve and Γ ∈  2 (R 2 ).We express Γ as follows: where  ∈ [0, 2) and Thus the radius of Γ is expressed in terms of a Fourier series.The vector of parameters contains the Fourier series coefficients in (18).For a boundary Γ that is defined by a vector of coefficients  we denote the forward mapping by   .We refer to as the finite vector of  = 2 + 1 Fourier coefficients to represent a boundary Γ  .

Posterior Distribution.
The posterior distribution does not have to our knowledge a closed form.Consequently, we resort to Markov chain Monte Carlo (MCMC) simulation methods to analyze the posterior distribution.We use the t-walk [14], a general purpose sampling algorithm for continuous distributions, in order to sample satisfactorily from the posterior distribution.We describe briefly the t-walk algorithm and implementation details in Appendix A. The t-walk algorithm evaluates the energy function (minus log of the nonnormalized posterior).Consider the following: for an arbitrary constant  (i.e., minus log of the nonnormalized posterior) on each iteration.In practice the constant  is not used and the energy is evaluated as minus log of the product likelihood times the prior one.
The output simulations of the t-walk are vectors of Fourier coefficients   for  fixed, each one of which defines a boundary Γ  through (18).We refer subsequently to the simulations   , obtained with the t-walk, as MCMC simulations or posterior samples and we refer to a simulated boundary as the curve Γ  which is defined by a simulated vector   .

Prior Distribution.
We establish the prior model as follows.Suppose that Γ ∈  2 (R 2 ) corresponds to a star shaped domain.Then Γ admits a representation ( 17)-(18).Furthermore, the mean square error for approximating Γ by Γ is given by Further details of this consequence of Parseval inequality are described in [15].Due to smoothness of Γ, the amplitude of the Fourier coefficients decays quickly along the series.In fact Note that  is the position of the coefficient in the Fourier series.Therefore the norm |(  ,   )| of the coefficients   ,   is by definition of ( −2 ) bounded by for some positive constant  ∈ R.
We pose a normal distribution for each pair   ,   of coefficients as follows: with   ,   mutually independent.We note that the variance was set to /4 2 .The rationale is that with the variance in this way, we guarantee that the pair   ,   satisfies the inequality (24) with probability higher than 0.99.Moreover the variance decreases when increasing position , that is, for high order coefficients along the Fourier series.Consequently, this modeling incorporates the smoothness of Γ on the prior distribution.
The coefficient  0 in (18) is related to the size of the scattering object.We assume that  0 ∼  (2.5, 0.14) . ( It is not clear how to set the constant  in distribution (25).This constant can be seen as a scaling factor and the parameter  controls the spread of the distribution by 1\4 2 .A single constant  can be chosen for all prior distributions (25) independently of the value of the index  = 1, . . ., .For this aim we sample coefficients θ from the distributions (25) varying the value of .We add the condition on the radius (18) to guarantee that the sample curves do not intersect themselves.The sampled curves are presented on Figure 4: Prior distribution samples for coefficients on our boundary representation.The sample is drawn from a Gaussian distribution with mean  = 0 and variance  2 = /4 2 , where  is the position of the coefficient on a Fourier series and  is constant.We vary the variance and set the value of  depending on the expected smoothness of the boundary to recover.

Likelihood. Assume that the measurements of the far field pattern have additive Gaussian noise
with  x ∼ (0,  2 ) for a boundary Γ.The likelihood is given by the noise distribution assuming independent measurements.We assume that the measurements  ∞ are taken on evenly spaced directions x.

Results and Discussion
In order to avoid committing an "inverse crime" [5] in our numerical experiments we proceed as follows.First we generate synthetic noisy far field measurements from the kite-shaped object (see Figure 1).Second, this kite is not obtained from any finite Fourier series expansion and it is therefore not included in any of our models.This forces our  methodology to cope with diverse shapes, being the finite Fourier series only an approximation.Third, synthetic data generation is done using the single-layer potential approach, whereas for MCMC steps we evaluate the forward mapping using the combined potential.Of note, both the single and the combined potential approaches approximate the same far field pattern when  is large enough, however they are numerically different.We present results of the MCMC simulations, varying the number of Fourier coefficients in Section 3.1.Results representing the main contribution of this paper are presented on Section 3.2.

Inverse Problem Results.
For our experiments we discretize the boundary of the kite-shaped object with 256 points.The far field pattern was computed on 16 evenly spaced points.We use 16 incident waves and we set the coupling parameter  to be equal to the wave number  = 1.The synthetic data are generated with additive Gaussian noise (0,  2 ) whit  = 0.1 (equivalently SNR = 0.97).We generate 100,000 MCMC simulations of the posterior distribution for each number of coefficients ( = 3, 5, . . ., 31, 33).The transient stage of the MCMC is taken into account and we remove the first 5,000 iterations in each case.We highlight that it takes about 7 hours to produce 100,000 MCMC simulations on a Python implementation.With our parallel implementation the execution time is reduced to about 1 hour.
Due to space constrains, we present only marginal posterior probability distributions of cosine terms for  = 11 coefficients in Figure 5.As we observe, for each coefficient, the corresponding distribution is unimodal with low variation.The same feature is presented for coefficients for cases  ∈ {3, 5, . . ., 31, 33},  ̸ = 11 (not shown).
We present in Figure 6 the probability region (in gray) for the number of Fourier coefficients for cases  ∈ {3, 5, . . ., 31, 33}.For this aim we draw 1000 simulated boundaries subsampled with their corresponding IAT factor in each case.The true kite-shaped boundary is shown as a red dashed line.From Figure 6 it is apparent that as we increase the number of coefficients the samples have more oscillations and the probability region becomes imprecise.This fact exhibits the numerical instability of Fourier-based representation for the solutions of the inverse problem of interest.
Also, in Figure 6 we show what we refer to as MAP boundary (dashed line) and mean boundary (continuous line) for each value of .The MAP boundary is the curve defined by the simulated vector   that has the lowest energy value (maximum a posteriori).The mean boundary is the curve defined by the mean of each coefficient of simulated vectors   .MAP boundaries and mean boundaries seem to approximate the true curve with approximately the same quality.In fact, we cannot qualitatively distinguish the best approximation from  = 9 coefficients to  = 33.This issue gives rise to the analysis presented on the next section.

Effective Dimension.
We refer to effective dimension as the number of parameters that can be properly retrieved from a noisy data set.In the context of this paper, effective dimension Mathematical Problems in Engineering is related to the uncertainty quantification of parameters, that is, identifying the number of Fourier coefficients on the representation given by ( 17)-(20), required to approximate properly the boundary of the kite (see Figure 1) from synthetic far field measurements.As an aside comment we want to mention that effective dimension can be seen as an indicator for a parsimonious approximation regarding computational cost.However this feature is not used here due to the fact that effective dimension analysis is performed as a posterior procedure.In this section we have changed the notation for the distributions to be consistent with [14].
We define a collection of  models given by with  ∈ {1, 2, . . ., }.Θ  is a vector of random variables.A realization of Θ  is a vector   defined in (20), and    ∞ |Θ  is the corresponding posterior distribution for model .For simplicity we use as index  to indicate the model corresponding to  = 2 + 1 Fourier coefficients.
We use the super-model approach; that is, a new model is defined where  =  is the indicator of the model and Θ = (Θ 1 , Θ 2 , . . ., Θ  ).
The posterior probability for model  =  of explaining the observed far field pattern is given by where   () is the prior probability of model  =  and    ∞ ( ∞ ) is the corresponding normalizing constant (also known as the marginal likelihood) for model .A straightforward way to perform model selection is by computing (32) for all the models and choosing the model with highest probability.Indeed, the difficulty regarding the computation of (32) is evaluating the normalizing constant    ∞ ( ∞ ).To perform our MCMC only the nonnormalized posterior is needed and therefore    ∞ ( ∞ ) was not previously calculated (this is the usual case when doing MCMC in a fixed number of parameters in a Bayesian application).Nevertheless, the MCMC simulations may be used to estimate    ∞ ( ∞ ) and provide information to explore the effective dimension.This is explained below.
The ratio of two normalizing constants is defined as follows: where ,  ∈ {1, 2, . . ., },  ̸ = .Considering a uniform prior distribution for the models, that is,   () = 1 \ , model  is better than model  when  > 1, and they are indistinguishable if  = 1 and  is better than  when  < 1.Then, the best model has the highest normalizing constant value.
We observe from (34) that for a model , the product for each  , ∈ { ,1 ,  ,2 , . . .,  ,  } , is the corresponding likelihood times prior distribution.This product is in fact the energy (21) using a suitable constant .Then the resulting sample from MCMC posterior simulations is used to compute (34), as a byproduct of the former with marginal computational cost added.Some details about approximating normalizing constant from MCMC simulations are discussed in [16].
The importance sampling density    must be comparable to the product (35).We propose to numerically estimate the density    for every model using a Kernel density estimation (KDE) [17] of the posterior distribution using the MCMC sample.A rough estimate may be used, for the sole purpose of estimating .
The kernel density estimator has the form where the kernel  is a bounded density on   and ℎ   is the bandwidth.For this estimator we define a Gaussian kernel and a bandwidth where ℎ    = 1.06   −1/5  and   is the standard deviation for each coefficient on the model .We compute for each model  ∈ {1, 2, . . ., } then all the models can be compared together.Then effective dimension corresponds to the model with the highest normalizing constant, equivalently the lowest value of minus logarithm of the normalizing constant.
We present in Figure 7 the minus logarithm of the estimated normalizing constants (39) for all models (red line).As we see in this figure, the lowest value is obtained with  = 11 coefficients.Also we present in the same Figure the distance (norm  2 of the pointwise difference) between the radius of the kite and the radius of the simulated boundaries.The continuous line shows the difference with respect to the mean boundaries.The dashed line corresponds to the distances with respect to the map boundaries.The boxplot in the same figure corresponds to distances of the last 1000 simulations of the posterior.These last simulations are taken and subsampled with their corresponding IAT factor.These boundaries are in fact the probability region shown in Figure 6.
Based on the Bayes factors, the best model corresponds to  = 11 (we recall that  = 2 + 1).Of note, there are MAP boundaries (e.g. ∈ {13, 19, 25, 29}) with lower mean squared error compared with the MAP for  = 11.Also the mean boundaries from  = 15 to  = 33 have lower distances to the MAP.However as we observe on the boxplot, the mean of the error decreases from 3 to 11 Fourier coefficients and then oscillations appear, increasing spread and increasing error from 13 to 33 coefficients.Despite the low error of MAP and mean boundaries for models with 13 to 33 coefficients, due to the spread shown in the boxplot, the best model is certainly given by Bayes' factor (i.e., for  = 11).A qualitative analysis that agrees with these results is also discussed in Section 3.
Fourier series is a classical representation for smooth periodic closed curves.Furthermore, we have used a well understood and high order numerical method for forward mapping evaluation.However, these two elements by themselves are not enough to have confidence regarding the solution of the inverse problem.The quality of the solution depends also on the number of coefficients of the Fourier series used for the representation.
Our effective dimension discussion relies on MCMC methods to provide a quantitative strategy to determine the number of parameters that can be retrieved given a data set.

Conclusions
In this work we address the acoustic inverse scattering problem with a classical Fourier-based representation of the solution.We pose the inverse problem as a Bayesian inference problem and use the output of a MCMC method (namely, the t-walk) for our effective dimension results.For the corresponding direct problem we have used the classical layer potential approach, which was solved in a fast and reliable manner with a robust numerical method and parallel computing.Using Fourier series to represent solutions allows for a straightforward formulation that incorporates the smoothness of the solutions into the prior distribution.On the other hand, the finite Fourier representation is numerically unstable.Although other approaches to represent the scattering obstacle are applicable (e.g., wavelet basis which correspond to Besov priors), a fundamental question remains: How much information can be retrieved, within the representation, from a noisy data set?
The main contribution of this paper is the effective dimension method, which is a quantitative method to estimate and quantify the uncertainty of the estimable parameters given a noisy data set.Given a parametric representation of the solution of the inverse problem, the effective dimension method is implemented via Bayesian model selection where the normalizing constant for each model is approximated using the MCMC output.Of note, the effective dimension method is applicable regardless of the parametric representation of the solution.

Figure 1 :Figure 2 :
Figure 1: Synthetic example: a kite-shaped object.Numerical results for forward mapping evaluations are obtained by generating synthetic far field pattern measurements for this smooth, nonconvex boundary.

Figure 3 :
Figure3: Relative errors of the forward mapping evaluation on direction  = (1,0), for the smooth and periodic kite-shaped boundary, varying the number of points  and wave number  = 1 fixed.This plot with logarithmic scale for -axis illustrates the exponential convergence of Nÿstrom method for combined and single layer potentials.

Figure 4 .
As we observe, using the values of  = 0.5 and  = 1 (see Figures4(a) and 4(b)) we obtain very smooth curves.On the other hand, Figures4(c) and 4(d) show curves with more oscillations.For our purposes, a value of  = 2 is chosen.

Figure 5 :
Figure 5: Marginal posterior distributions for   's coefficients for  = 11.These unimodal distributions are used to estimate coefficients of the inverse problem.

Figure 6 :
Figure 6: Probability Regions from the MCMC simulations (gray).The boundaries defined from mean of marginal posterior distribution are shown in blue.The corresponding MAP boundaries are shown in green.The results are compared to the kite object in dashed red line.

Figure 7 :
Figure 7: Minus logarithm of estimated normalizing constants for Bayesian model comparisons.Best model refers to the highest normalizing constant ( = 11 for this case).Continuous line shows the distance between the kite and the mean boundaries.Dashed line corresponds to the distance between the kite and the MAP.The boxplot corresponds to distances of the last 1000 simulations of the posterior to the kite.Red scale corresponds to minus logarithm of normalizing constants and blue scale is for the distances.