Bayesian Estimation Applied to Stochastic Localization with Constraints due to Interfaces and Boundaries

Purpose. We present a systematic Bayesian formulation of the stochastic localization/triangulation problem close to constraining interfaces. Methods. For this purpose, the terminology of Bayesian estimation is summarized suitably for applied researchers including the presentation of Maximum Likelihood (ML), Maximum A Posteriori (MAP), and Minimum Mean Square Error (MMSE) estimation. Explicit estimators for triangulation are presented for the linear 2D parallel beam and the nonlinear 3D cone beammodel.The priors inMAP andMMSE optionally incorporate (A) the hard constraints about the interface and (B) knowledge about the probability of the object with respect to the interface. All presented estimators are compared in several simulation studies for live acquisition scenarios with 10,000 samples each.Results. First, the presented application shows thatMAP andMMSEperform considerably better, leading to lower RootMean Square Errors (RMSEs) in the simulation studies compared to theML approach by typically introducing a bias. Second, utilizing priors including (A) and (B) is very beneficial compared to just including (A). Third, typicallyMMSE leads to better results thanMAP, by the cost of significantly higher computational effort.Conclusion. Depending on the specific application and prior knowledge,MAP andMMSE estimators strongly increase the estimation accuracy for localization close to interfaces.


Introduction
Due to their inherent ability to use prior knowledge, Bayesian approaches have gained interest in many fields in the recent years, such as in computer vision [1,2] and medical applications [3][4][5][6], as well as robotics (Kalman-Filter, Particle-Filter) [7,8] and metrology [9].In this work, we present a Bayesian framework for the localization of objects with constraints derived from interfaces (surfaces) or boundaries specific to a given localization problem.
Methods to find the locations of objects based on several projections are known from photogrammetry for a long time.These methods are also called triangulation in a multiple view geometry (e.g., see Hartley and Zisserman [1] for an overview of established methods).Introducing probabilities in order to deal with the Gaussian measurement noise is a well-known approach for estimating a 3-dimensional (3D) point based on two 2-dimensional (2D) projections, for example, see Hartley and Sturm [10].In this context, Bedekar and Haralick [11] provided a Maximum A Posteriori (MAP) estimator by introducing prior knowledge independent of the observation for the estimation based on a set of 2D projections, which represents a starting point for this Paper but does not provide a more general perspective on estimation which is required to bring it to the level described in this paper.We are following this probabilistic interpretation of the inverse problem of triangulation and present explicit solutions in order to characterize the triangulation problem as well as Mathematical Problems in Engineering embed it into the general framework of Bayesian estimation.In particular, we demonstrate the benefit of including soft and hard constraints about the solution space into the prior of the Bayesian triangulation problem.We compare three stochastic estimation methods: Maximum Likelihood (ML), Maximum A Posteriori (MAP), and Minimum Mean Square Error (MMSE), and show that, depending on the prior knowledge, MAP or MMSE is a suitable approach for this kind of problem.We show how the estimation methods can be utilized for a 3D nonlinear/non-Gaussian geometrical model in physical space without modifications or approximations of the observation geometry.
In the localization problem the input data consists of 2D projections (or images) of the 3D objects and the goal is the optimal localization of features contained in these objects, by employing prior knowledge.In general, projections of 3D objects onto 2D planes contain information about their coordinates, but due to the experimental conditions these coordinates may not be accurate or may not be easily extractable from these projections.Thus the localization problem is typically an ill-posed inverse problem.Fortunately, in many applications there exists prior knowledge about the probability densities of these uncertainties.Furthermore, if there is a general probabilistic knowledge about the position of the observed object or additional constraints about the domain of its coordinates, the total prior knowledge becomes difficult to be incorporated into the localization model.Such localizations close to a constraining interface or boundary can be modeled and solved in the Bayesian approach.For instance, in the field of medicine, during image-guided procedures (interventional radiology, radiotherapy, and surgical procedures with C-arm CBXT scanners) there is often a need to obtain 3D coordinates of anatomical features or implanted markers instantly from multiple X-ray projections, which are constrained by external or internal anatomical interfaces (e.g., patient surface) and form a hard or semi-flexible barrier.Both, the general Bayesian framework and its application to a specific localization problem presented herein can also be utilized for other applications in image or data processing, such as computer vision or metrology.
The general aim of this paper is to provide a pragmatic foundation for the Bayesian estimation in a compact way suitable for applied sciences utilizing a concise and uniform mathematical notation (general parts about the Bayesian methodology of this paper are partly taken from the doctoral thesis of the first author, which is submitted (and not yet published) at the Regensburg University, Germany, at the time of this paper submission), which is difficult to find in more theoretical formulations and whose full understanding is required for the localization problem at stake.This brief introduction to Bayesian estimation may benefit several research fields in which there is a gap between the general mathematical concepts and direct applications of Bayesian methods.Thus the major motivation of presenting the Bayesian estimation theory is the linking of these two areas by providing the Bayesian formalism for researches in applied sciences.The second aim of this work is to demonstrate that a Bayesian framework can be effectively utilized for more accurate localization of features close to interfaces, which may have many applications.
The structure of the paper is as follows: first, we present the basic terminology of Bayesian estimation required to derive the solutions presented in the subsequent sections based on a concise summary of the books of Lehmann and Casella [12] and Robert [13].Second, we specifically present the application of stochastic localization applied to 1D and 2D projections of 2D and 3D objects.Besides such a structured presentation of Bayesian triangulation, the hope is that this course of explanation will help the readers to develop their own Bayesian estimators for their specific problems at hand.

Introduction to the Terminology of Estimation.
The goal of estimation theory is to determine a concrete set of unknown real-valued parameters x ∈ Ω X ⊆ R  (e.g., the 2D/3D position coordinates of the observed object in triangulation) from real-valued observations u ∈ Ω U ⊆ R  (e.g., the detected or measured 1D/2D projections of the object), which are imperfectly or ambiguously related to each other (bold fonts of symbols are utilized to emphasize multidimensional vectors).Their ambiguous relation is due to the experimental uncertainties of the given problem, such as an imperfect Radon transform, or similar transforms.The estimation of parameters x from observables u demands a set of definitions which will be briefly introduced in this section.The concrete parameters x and observations u are associated with continuous random variables X and U, respectively (denoted by corresponding capital letters), which are further associated with the underlying probability distributions describing the sources of uncertainties.With this concept a connection between the deterministic and probabilistic quantities is established.For instance, the probability that x takes a value lower than s is defined by with the corresponding Probability Density Function (PDF) In general, the occurrence of one event can influence the probability of another.It is possible to define such a residual probability by conditional probabilities, and for the corresponding PDF describing the random variable U under the condition of X = x this is formulated by In this formula, the expression  [U,X] (u, x) describes the joint PDF of U and X.Based on this concept, the most important quantities for estimation are defined in the next subsections.
The likelihood function (depending on the condition x) the prior, that is, PDF of the prior knowledge about x the posterior, that is, PDF of the posterior distribution of parameter  conditioned on u PDF of the probability of the occurrence of u (=: marginal distribution) random variables X and U are distributed according to their PDFs true values x are known.In a typical observation scenario this corresponds to the accuracy of the observation device with respect to a specific true parameter x.It is common to define the likelihood function L u (x) :=  [U] (u | x) describing the same probabilities of the observation but this time with the focus on the conditioning parameter x.Therefore the likelihood is regarded as a similarity measure between the current observation u and possibly "true" parameters x.
The PDF of the prior (x) :=  [X] (x) defines how probable the occurrence of x is; that is, X ∼ .The use of the prior is an intensely discussed topic, since one could derive arbitrary conclusions by utilizing a specific and strong prior knowledge which supports a certain point of view which may be just one of many possible options.Although this is generally valid, the utilization of appropriate prior knowledge (noninformative priors, priors based on actual observations of the same or similar situations, or priors derived from knowledge about the range of x) can reduce the uncertainties inherent to x in addition to making observations u.
In Bayesian statistics the most important PDF is the posterior (x | u) :=  [X] (x | u), which defines the probabilities of x under the condition of the observation u.Similar to the likelihood this quantity is also often regarded as a similarity measure between the observation and x.
The aforementioned PDFs are related to each other by Bayes' Theorem, which is expressed as The benefit of this theorem is that it allows to reverse the condition by introducing prior knowledge about the parameter of interest.In practical use, with this theorem the posterior (x | u) can be determined by having only knowledge about the likelihood L u (x) and the prior (x), since the denominator on the right side is typically determined indirectly by normalizing the posterior to the integral value of 1.

Loss and
Risk.An estimator is any function  : Ω U → Ω X , which utilizes the current observations u in order to find an approximation to the true parameter x.In order to decide on the use of an estimator compared to another, one needs a quality criterion, which depends on the true value for the parameters x and an estimate (u) based on the observation u.In estimation theory this is realized with the concept of the loss function (x, (u)).An important subset is strictly convex loss functions, such as the popular quadratic loss, since they lead to unique Bayesian estimators.An example for a not strictly but still convex loss is the constant loss for a constant  > 0, which is of technical interest for the derivation of the MAP estimator.In this context, a very attractive thought is to find an optimal estimator (⋅) by minimizing the loss function (x, (u)).Unfortunately, the naive minimization of the loss would lead to (u) = x, which is a practically useless estimator since it demands knowledge about the true x.Essentially, the loss itself does not contain any information about the relation between the parameter x and the measurements u and, therefore, represents an incomplete description of the estimation problem.Due to this observation, in classical estimation theory the concept of the risk function is introduced, which is defined here by the conditional expectation value.
Definition 1.The risk function (or alternatively average loss) is defined as (It is inherent to the classical frequentist's approach that the parameters are not interpreted as random variables X.Since in this work a general Bayesian perspective is focused on, in the definition of the frequentist risk, the parameters are written as random variables but simultaneously conditioned on X = x, which essentially leads to the same risk function.) The risk function (x, ) incorporates all possible outcomes u of the observation and weights them with their likelihood  [U] (u | x).For this reason, the long-term average loss of a given estimator (⋅) is obtained by the risk function of (6).In consequence, based on the risk function a reasonable comparison of estimators can be performed using the concept of admissibility (see Lehmann and Casella [12, or Robert [13, pp. 74-77, 391-415] for details).On the other side, the risk function still depends on the unknown parameter x, and, therefore, direct minimization of the risk function can only be performed for a fixed x.In order to incorporate all possible values of x into a single quality criterion an extension of the concept of the risk function is introduced by incorporating (possibly noninformative) prior knowledge about x by (x).

Mathematical Problems in Engineering
Definition 2. The integrated risk function is defined as the expectation value taken over X and U

𝑟 (𝜋, 𝛿)
With this criterion one gets a single real value for a given estimator (⋅) which incorporates all parameters x and possible observations u and allows a straight comparison between different estimators (and especially a meaningful optimization).Unfortunately, to find an optimal estimator directly from the minimization of the integrated risk of ( 7) is a nontrivial problem but can be solved by Bayesian estimation as presented in Section 2.2.Further, a delicate point is the introduction of a prior distribution (x) before any measurement is performed since this could bias the estimation result in a disadvantageous way.On the other side, if Ω X is bounded and (x) = 1/vol(Ω X ) = const.for all  ∈ Ω X , all possible values x are treated equally likely in order to erase the subjective notion when introducing the prior.In consequence, the application of the integrated risk function in order to derive estimators could also be of interest in cases in which one wants to avoid such a subjective notion and still get an estimator working best for all possible values x and u in average.

Maximum Likelihood, Maximum A Posteriori, and
Bayesian Estimation.In this section, three central estimation methods are introduced in a general way (for a brief characterization of these approaches, see Section 2.3).

Maximum Likelihood (ML).
Maximum Likelihood estimation is typically applied in situations in which no prior knowledge about x ∈ Ω X is available.(8) where "arg max" means that we want to find the position x which maximizes the likelihood L u (x).This is also valid for the logarithm of the likelihood since the logarithm is a strictly increasing function, and the likelihood is positive.Since the ML estimation depends on the shape of the likelihood function which is problem-specific, under fairly general assumptions, the existence but not uniqueness of the estimate can be guaranteed.The theorem by Lehmann and Casella [12, p. 463] assures the existence of an asymptotically unbiased, consistent, and asymptotically efficient estimator which is the solution of ∇ x L u (x) = 0, the so-called likelihood equations.If there is a unique solution to these equations, then in consequence, the corresponding ML fulfills these properties.From a practical perspective the determination of the estimate results in an optimization problem, and numerically this may lead to problems with respect to running into local maxima instead of global maxima.

Bayesian Estimator (BE).
Bayesian estimation utilizes prior knowledge about x ∈ Ω X , and focuses on the given observation.For this reason, a modified risk function needs to be introduced at first.Definition 4. The Bayesian risk function is defined for a given loss , observation u ∈ Ω U and the posterior (x | u) as The main difference between the Bayesian risk ( 9) and the integrated risk (7) is that the integrated risk does average over all possible observations u ∈ Ω U , and the Bayesian risk utilizes only the concretely acquired observation u.Definition 5. A Bayesian estimator (BE) for a given observation u ∈ Ω U is the estimator (u) which minimizes the Bayesian risk function At the first glance, the Bayesian estimator therefore looks unrelated to the optimization of the previously introduced risk functions in Section 2.1.A powerful statement is achieved by the following theorem.Theorem 6.The estimation function which minimizes the integrated risk (, ) can be obtained by selecting for every specific observation u ∈ Ω U the estimate (u) which minimizes the Bayesian risk  u (, ).
For the proof see Robert [13, p. 63].A major property of Bayesian estimators is that if the loss is strictly convex, then the corresponding BE is unique.This will be shown explicitly in the next paragraph for the quadratic loss.For this loss the properties are similar as for the ML estimator, leading to consistent and asymptotically efficient estimators under some weak assumptions on the prior; see Lehmann and Casella [12, p. 490].From a practical point of view, the numerical computation of the integral might be limiting the applications of BE if analytical solutions cannot be derived.Compared to the optimization problem in ML, the general calculation of these multiple integrals is typically computationally more challenging.

The Minimum Mean Square Error (MMSE). The Bayesian estimator utilizing the quadratic loss is called the Minimum Mean Square Error (MMSE). The derivation leads to
Since the loss is strictly convex and (11) represents a parameter integral with respect to , the unique solution can be derived from This means for every coordinate  ∈ {1, . . ., } This leads to the corresponding MMSE for each coordinate  ∈ {1, . . ., } with which is the expectation value of the posterior for every coordinate  ∈ {1, . . ., }, with the normalization constant .This example already points out the importance of the shape of the posterior (x | u) for the Bayesian approach.

Maximum A Posteriori (MAP).
An alternative approach, conceptually in between ML and BE, is the Maximum A Posteriori (MAP) estimation which already utilizes prior knowledge about x ∈ Ω X , but has many characteristics of the ML approach.
Definition 7. The Maximum A Posteriori estimator (MAP) for a given observation u ∈ Ω U is since the marginal distribution is positive and independent of x.In other words, MAP is the mode of the posterior.
When applying the logarithm to L u (x) ⋅ (x), this approach can be identified with a specific Tikhonov regularization of the ML approach (also known as penalized Maximum Likelihood), as discussed in Eldar [14].Since the MAP estimation depends on the shape of the likelihood and prior function which is problem specifically defined, under fairly general assumptions, the existence but not uniqueness of the estimate can be guaranteed.The asymptotic properties are the same as for the ML case, with some further regularity conditions on L u (x) and (x); for example, see Robert [13, p. 166].Although MAP utilizes prior knowledge about x ∈ Ω X , the numerical problem is the same as for ML, that is, resulting in a numerical optimization problem with the problem of identifying local and global maxima.
Figure 1: The MAP estimator as a limit of Bayesian estimators with x ∈ R 2 .For every fixed  the Bayesian estimator is the midpoint (u) which maximizes the gray shaded volume.
In the context of Bayesian estimation, the MAP estimation can be derived as a limit of Bayesian estimators with respect to the constant loss function (5).It is shown that with   as the -dimensional cube with side length 2 and midpoint .In other words, the aim is to find the midpoint of the -dimensional cube   , so that the volume under the graph of the posterior is maximized.By assuming an at least continuous posterior and letting  → 0, the best midpoint (u) is the mode of the posterior, that is, the MAP estimator.This is illustrated in Figure 1.Since the constant loss function is not strictly convex, the corresponding BEs are not necessarily unique, and this implicates the nonuniqueness of the MAP as a limit of these BEs.

Characterization of the Estimators.
The presented estimators are closely related which is graphically illustrated in Figure 2. A major distinction is drawn in between the use of prior knowledge about x ∈ Ω X or not, which is shown as a thick boundary in Figure 2. In detail, the main connections are summarized in the following list.
(i) BE → MMSE: the general BE reduces to the MMSE when applying the quadratic loss of (4).symmetry the expectation and the maximum of the posterior coincide.(iii) BE → MAP: the BE converges to the MAP estimator when applying the constant loss of ( 5) with  → 0. (iv) MAP → ML: if the prior is noninformative, that is, constant for a finite Ω X or asymptotically noninformative for an infinite Ω X as the deviation parameters of the prior go to infinity, then MAP coincides with the ML estimate.
Of course, the combination of the stated properties is possible, for example, for a symmetric posterior distribution around a global maximum and a noninformative prior the ML and MMSE coincide.There are several other concepts utilized in the literature for deriving non-Bayesian estimators, that is, without (possibly noninformative) prior knowledge about x ∈ Ω X , such as the following.
(i) Unbiasedness, that is, the search of an unbiased estimator which minimizes the frequentist risk locally or uniformly.This property has been regarded as very important from the beginning of estimation theory (e.g., see Cramér [15]), although it could be shown that an unbiased estimator does not always exist (see, e.g., Lehmann

Application to Stochastic Localization in Euclidean Space
The general Bayesian framework described in the previous section will be applied to a localization problem in which 2D (3D) objects are projected onto 1D (2D) planes, and the projection signals (images) are detected for multiple views around these objects.This localization problem will be referred to as multiple view triangulation close to interfaces with a 2D or 3D observation model.In principle, multiple projections can be measured for views adjacent to each other and forming a contiguous arc or could be separated by larger angles (e.g., 90 deg).The formalism works for all sets of projections and all input data containing information about the projected coordinates of the objects.Two cases are presented in the following: linear parallel projections of 2D objects and nonlinear cone beam projection of 3D objects in a convex solution space Ω X .All transformations are purposefully presented in Euclidean space (instead of the projective space with homogeneous coordinates [1]) in order to allow intuitive interpretations of the model and the results, as well as demonstrate direct solutions for the nonlinear 3D case, and utilize straightforwardly interpretable priors.Further, we exactly know the geometry of image acquisition (i.e., use calibrated cameras) in order to specifically focus on the noisy measurements and the incorporation of prior knowledge, instead of the challenging problem of camera calibration [1, pp.434-457].The 2D geometry is presented in Figure 3 and shows a region which contains a feature within a larger object with interfaces, whose coordinates x ∈ Ω X we want to estimate.It is known that x might be at a different position, but it cannot be outside the region (e.g., is constrained by the surface of the object or regional interfaces).Furthermore, the PDFs are assumed to be Gaussians or truncated Gaussians (or uniform distributions in the Results section), since these are valid assumptions in many applications [1], and the resulting estimators can partly be presented in terms of explicit relationships.In general, the estimation we present works for arbitrary PDFs and can be directly applied to nonlinear/non-Gaussian problems.
Constraints due to interfaces or surfaces, for example, might occur in radiotherapy (or interventional radiology, image-guided surgical procedure) where the location of an implanted fiducial marker or another feature has to be determined in real-time based on X-ray projections (radiographs or tomosynthesis projections) [3,4].Let us focus on a specific application of breast cancer patient setup in radiotherapy.In this case, surgical clips implanted during tumor resection are used for localization of the tumor bed.Due to deformations of the breast in between the original CT images and the treatment, the markers might be at arbitrary locations, but they will never be outside of the patient's body, and they will not be inside the chest defined by the ribs.Also it is likely that they will not be displaced by more than a given range known from clinical practice (e.g., 2 cm from the original location in the planning CT scan).In radiotherapy, the goal is to localize the tumor (or the tumor surrogate such as the marker) with 1 mm accuracy.This is achievable by performing a CT scan.But such a CT scan is not possible in many clinical situations either due to undesired dose from the imaging or due to time constraints or other practical concerns.Additionally, in the case of a given marker within the flexible breast tissue, the surface can be measured with optical or infrared cameras in real-time or based on a tomosynthesis reconstruction, which is an intensely discussed topic [17][18][19].We can thus assume that the knowledge about the breast surface is available for the given posture of the patient.In a similar context, in interventional radiology or image based surgeries there is a need to work with a few projections or even single images in order to find the location of certain objects in 3D.Many other applications are thinkable, such as robotic arm position estimation in a closed room or computer vision approaches with spatial restrictions [1].

Parallel Beam in 2D.
In this case x ∈ Ω X ⊂ R 2 .The observation  ∈ R is performed by projecting the feature in a detector plane at different angles Θ.The aim of this example is to estimate the coordinate x based on the observed coordinates  at  different angles between 0 ∘ and 90 ∘ , which are merged to a single vector u = ( 1 , . . .,   )  ∈ Ω U ⊂ R  .
Please note that although the presented situation might have several applications, the numerical example in the Results section uses arbitrary dimensionless parameters in order to demonstrate the principal differences of the estimation methods.Furthermore, the linear relation between observation and the coordinate in the parallel beam model allows to derive explicit estimators, which is not possible for the nonlinear cone beam geometry in Section 3.2.

Observation without Uncertainties and Prior Knowledge.
If it is assumed that the observation is without any errors, then no prior knowledge is necessary because there is no ambiguity in the detection, and the linear relation between x and a single observation  can be derived from This is a linear equation in the two coordinates of x and; therefore, two equations need to be derived for a unique solution, that is, two observations  1 and  2 at angles Θ 1 and Θ 2 with Θ 1 ̸ = Θ 2 modulo 360 ∘ .The explicit solution for this straightforward estimation is given by the estimator (u) In the numerical examples we will assume the two angles to be Θ 1 = 0 ∘ and Θ 2 = 90 ∘ which represents the best separation between two projections.In the further text the estimator of (18) will be denoted by the 2 × 2 approach.

Observation with Uncertainties and without Prior
Knowledge.If we know that there are observational uncertainties, that is, errors in determining u, then one can assume an error distribution, which is represented in this example by the normally distributed random variable at angle Θ In the geometrical model, this means we are adding a random error to the observed projected coordinate at every angle which makes now the observation itself to a random variable with the PDF By acquiring  ≥ 2 stochastically independent observations u at angles Θ 1 , . . ., Θ  , one can summarize these observations in a single PDF, which is the 2D likelihood function in x In consequence, by applying the logarithm the Maximum Likelihood estimator is determined by which is exactly the least squares solution.This is a wellknown result in estimation theory and indicates the success of least squares, since the sum of many independent errors can often be approximated well by a normal distribution [1, pp. 102, 121, 134, 434, and 568].An important observation is that in this approach the solution is totally independent of the knowledge about the measurement uncertainty  Δ .The explicit solution is derived from solving (23) analytically, and it follows (cp.( 18) ) ) . (24)

Observation with Uncertainties and Prior Knowledge.
If it is further known that x is stochastically distributed, that is, a random variable X, one can additionally introduce prior knowledge about this distribution into the estimation model.In the 2D example the prior is defined as follows with a normalization constant , the characteristic function  which has the value 1 in the circle (m, ) with midpoint m and radius , and 0 elsewhere (representing an example of a convex region), and  X ∈ R 2 the mean of the normal distribution, and This is essentially a Gaussian density function which is truncated by the circle (m, ), which is illustrated in Figure 4.
In the geometrical model now also the parameters become random variables; that is, Utilizing Bayes' theorem, the posterior PDF can be calculated by ( is the normalization constant of the posterior) Based on the posterior the Maximum A Posteriori estimator is therefore defined as This formula essentially shows explicitly how the compromise between measurement and prior knowledge, as two different sources of information, is gained: the measurement is weighted with 1/ 2 Δ against the prior knowledge with 1/ 2   ( = 1, 2), which is the case of the Tikhonov regularization for the constrained least squares problem.Since this is a convex function on a convex domain (m, ), there is a global minimum.If this maximum is in the interior of (m, ), the explicit solution is determined by solving (31), and it follows (cp.( 18) and (24)) From a practical perspective, if the (u) calculated by ( 32) is outside of (m, ), then the maximum of (31) must be on the border (circle in Figure 4), which can then be solved by Lagrangian multipliers applied to (31) or directly by solving (31) with a general optimization algorithm.The explicit solution for the interior of (m, ) depends now on the ratios of  Δ /   ( ∈ {1,2}), which again shows a weighting of the measurement uncertainties and the prior knowledge.Particularly, if   1/2 → ∞, the prior gets noninformative, and the interior estimator converges to the ML estimator.
Based on the posterior also the Minimum Mean Square Error estimator can be determined by for  ∈ {1,2} with the posterior in (29), which cannot be calculated elementarily anymore.For the specific case of  → ∞, the MMSE estimator is coinciding with the MAP estimator of (32) due to the symmetry of the posterior around its global maximum when using Gaussians.The difference between MAP and MMSE will therefore be enhanced by the estimation close to the boundary.Please note that both MAP and MMSE guarantee an estimate which is not outside of the (convex) circle (m, ), effectively bounding the feasibility region with the prior.In contrast, for the 2 × 2 and ML approach no bounded feasibility region is possible.

Cone Beam in 3D.
In this case x ∈ Ω X ⊂ R 3 .The observation u ∈ R 2 is again performed by projecting the feature in a detector plane at different angles Θ.There are  observed coordinates u at different angles between 0 ∘ and 90 ∘ , which are merged to a single matrix u = (u 1 , . . ., u  )  ∈ Ω U ⊂ R 2× (which can be identified with a vector in R 2 ).
If it is assumed that the observation is without error and no prior knowledge is available to the estimation, the nonlinear relation between x and a single observation u  at angle Θ  is with the imaging source to isocenter of rotation distance  and isocenter to detection plane distance ℎ; see Figure 5 for an illustration of this geometry.This projection excludes points x which are on the plane  1 cos Θ  +  2 sin Θ  +  = 0 orthogonal to the beam through the imaging source.It is directly recognizable that   is nonlinear (in Euclidean space), since   (x + y) ̸ =   (x) +   (y).We recognize that this projection can be represented as a linear transformation in homogeneous coordinates (e.g., see Hartley and Zisserman [1]), but we purposefully ignore this in order to show how one can apply the proposed estimation approaches directly to a nonlinear situation as well as define intuitive priors in Euclidean space.
A direct solution, such as the 2 × 2 case in 2D, is not uniquely possible since there are equation numbers as a multiple of 2, but we have 3 unknowns.We avoid this ambiguity by not defining a corresponding 3 × 3 case.
Source iso in the detector plane.

Observation with Uncertainties and without Prior
Knowledge.The observation uncertainty at angle Θ  is in this case described by with the 2 × 2 identity matrix  2×2 .The geometrical model is therefore modified to In consequence, the observation PDF is

Δ𝑈
] . (37) By acquiring  ≥ 2 stochastically independent observations u  at angles Θ 1 , . . ., Θ  , one can summarize these observation in a single PDF, which is the 3D likelihood function in x This leads then to the Maximum Likelihood estimator which is the nonlinear least squares solution.Due to the nonlinearity, no elementary solution can be derived, and this expression is directly minimized.

Observation with Uncertainties and Prior Knowledge.
The prior knowledge is in the 3D case modeled with with a normalization constant , the characteristic function  which has the value 1 in the sphere (m, ) with midpoint m and radius , and 0 elsewhere (again as an example of a convex region), and  X ∈ R 3 the mean of the normal distribution, and This is essentially a Gaussian density function which is truncated by the sphere (m, ).A sample from this prior is illustrated in Figure 6.The geometrical model is then The posterior distribution then gets ( is the normalization constant of the posterior) In consequence, the Maximum A Posteriori estimator is representing a constrained and regularized nonlinear least squares optimization problem.The corresponding Minimum Mean Square Error estimator is analogously for  ∈ {1, 2, 3} with the posterior of (43), which again cannot be calculated elementarily anymore.

Results
In order to show the practical differences, we present 5 different cases for the comparison of the estimators with 10,000 sample points x each (recall that X is a random variable), in 2D parallel beam and 3D cone beam geometry.The standard parameter set (A) is as follows: with  = 5 projections equidistantly distributed in between 0 ∘ and 90 ∘ and Besides the standard parameter set presented above, four other parameter sets (cases) are investigated (by changing only one parameter at a time while setting all other parameters to the standard): (B) applying only  = 2 projections, (C) applying  = 10 projections, (D) narrowing the prior by    = 1.5, and (E) reducing the measurement uncertainties  Δ = 1.5.In addition, for the 2D parallel beam case, we present the results of the MAP and MMSE estimations if there is a noninformative prior (np) utilized, that is, uniform distribution inside the circle (m, ) and zero outside, which are denoted with MAP np := lim    → ∞ MAP and MMSE np accordingly.This means, only the hard constraints about the feasibility region are utilized in these estimations, without prior knowledge about the concentration of x at the border (circle in Figure 4).In a broader context, such a noninformative MAP np estimator can be identified with a ML estimator that has ad hoc constraints about the solution space in optimization.

Parallel Beam in 2D.
For the standard parameters of case (A), in Figure 7 an example of the posterior PDF in 2D for a single sample point x is presented.In this plot on the right side a profile of the likelihood and the posterior function is presented, which demonstrates the general behavior: the width of the posterior is reduced compared to the likelihood function by the cost of introducing a shift (bias) of the maximum position by prior knowledge.Further, in the posterior profile the circular boundary is presented by the truncation of the profile.By incorporating the truncation into the prior distribution, the MMSE (which for simplicity one can think of as the expectation value of the posterior profile) is compensating for the bias of the maximum position in MAP.
In Figure 8 the estimation performance, for case (A) of the 2 × 2 approach of (18), the ML of (24), the MAP of ( 32) and (31), and the MMSE of (33) (with noninformative prior and truncated Gaussian prior) are presented.The complete estimation results for all five cases (A)-(E) are presented in Table 2.In addition to Table 2 the RMSEs of a broader variation of the essential parameters are illustrated in Figure 9 by starting with case (A) and varying only the specified parameter of each subfigure.Examples for the posterior distributions cases (A)-(E) (for the same specific sample x) are presented in Figure 10, in order to see the qualitative differences due to the according changes in the parameter set.
Based on Table 2 and Figure 9 several characteristic observations can be made: at first, in all cases the estimation accuracy, which can be summarized with the RMSE, is improving mainly along the chain Second, in order to compare the empirical bias of each estimator one has to focus on the mean values of the coordinate directions  1 and  2 which are presented in brackets in the third column of Table 2.In theory, the 2 × 2 and the ML approaches are (asymptotically) unbiased, which was also observed in this empirical evaluation (i.e., a bias of 0.0 in each coordinate direction).MAP and MMSE estimators are typically biased estimators, which is certainly true in this application for the MAP estimation and partly true for the MMSE estimator.In detail, for the estimators with uniform priors, MAP np and MMSE np , a bias towards the center of the constraining circle can be recognized (e.g., negative bias values in each coordinate direction).This is plausible, since estimates outside the circle are prohibited with this prior and, on the other side, no information about the concentration of x at the border of the circle is introduced.When focusing on the full prior (constraining circle and the concentration at the border) with MAP, a bias in the opposite direction, towards the border, is introduced (e.g., see Figure 7 for an illustration of this effect).Please note that the introduction of a significant bias in these three estimations, MAP np , MMSE np , and MAP, still leads to better results compared to the unbiased 2 × 2 and ML estimations, which is a characteristic behavior in Bayesian estimation.Remarkably, the use of the Bayesian MMSE with the full prior leads to an empirically unbiased estimator in this application.This can be understood by realizing that the bias is in balance between the concentration of x at the border (leading to a bias towards the border) and the constraints of the border (leading to a bias towards the center of the circle) when averaging over the whole posterior distribution (e.g., see Figure 7 again for an illustration).This allows to conclude that for the spatial estimation at constraining interfaces a highly accurate and practically unbiased estimator can be constructed with MMSE estimation.
Third, compared to the standard parameter set, utilizing  = 2 projections reduces the observation information and increases the RMSE.Specifically, ML is converging exactly to the 2 × 2 approach, increasing its RMSE from 3.05 to 4.28.Compared to the uniform prior in MAP np and MMSE np , the full prior in MAP and MMSE leads to robuster estimates with much less increasing RMSEs.
Fourth, by utilizing  = 10 projections, the performances of all estimators are improved.Essentially, the relative use of the prior knowledge is decreased since it is weighted less by the MAP and MMSE estimators, which, for example, can be seen in (32).
Fifth, when the sample distribution is getting narrower and simultaneously the corresponding prior PDF is utilized, the RMSEs of MAP and MMSE decrease significantly.The relative benefit from MAP to MMSE is decreased in this case, since, for the narrower prior, the influence of the truncation of the posterior by the boundary is reduced.On the other side, MAP np and MMSE np lead to RMSEs much closer to ML, since they introduce a strong bias (not knowing the concentration of x at the border).In this context, the increase in the bias is much stronger for MMSE np compared to MAP np , leading overall to even worse RMSE values.Sixth, if the measurement accuracy is increased, all estimators benefit from it significantly.Particularly, the benefit of using (partial) prior knowledge (in MAP np , MMSE np , MAP, and MMSE) compared to not using it (in 2 × 2 and ML) is decreased but still not negligible.

Cone Beam in 3D.
In Figure 11 the estimation performances for this standard parameter set of the ML of (39), the MAP of (45), and the MMSE of (46) are presented.Further, in Table 3, the results of the three estimators are presented for all five cases (A)-(E), similar to the results of 2D parallel beam geometry.
Several observations can be made: at first, in general the same behavior as for the 2D parallel beam geometry can be made regarding the presented five cases, and, therefore, the mentioned statements remain valid; especially, the estimation improves in the order ML → MAP → MMSE.
Second, when focusing on the RMSE column, for all three methods the estimation of the third coordinate  3 is in general more accurate than the estimation of  1 and  2 .This can be understood by the fact that  3 is almost directly observed in the cone beam geometry of (34) and, therefore, is estimated with higher confidence.
Third, the estimation of  1 and  2 is slightly better in the 3D cone beam geometry than in the 2D parallel beam geometry.This is due to the fact that also in the third coordinate there is a slight information about the first two coordinates due to the scaling in cone beam geometry.Therefore, the information content about the first two coordinates is higher in 3D cone beam geometry, having two measurements per projection u for three unknowns x.
Fourth, the calculation times on the test computer (Intel(R) Core2 Quad CPU, 4 Cores, 3.00 GHz, 4.00 GB RAM) for a single data point were the same for ML and MAP with in average 1.9 ⋅ 10 −3 seconds utilizing the MAT-LAB implemented Nelder-Mead simplex method and for the MMSE implementation in average 5.5 seconds utilizing 1/3 Simpson's rule for integration.The optimization in ML Mathematical Problems in Engineering   and MAP was terminated if the difference of the estimated position between iterations was less than 10 −6 , and for the MMSE integration a 100 × 100 × 100 grid with a constant spatial step size of 0.2 was utilized.It could be observed that a further refinement of these criteria did not lead to practically better results.Several ways for a further speedup of the MMSE estimator can be implemented, for example, taking symmetries into account when Table 3: Statistical values about the estimation results of 3D cone beam geometry for cases (A)-(E) with  = 10,000 sample points x each are presented for the three 3D estimators.The radial estimation error is presented at first and the coordinate estimation errors of ( 1 ,  2 ,  3 ) are shown in brackets.The same formulas described as in the caption of Table 2  utilizing Gaussians, more adaptive grid generation in the numerical integration, more effectively utilizing parallel programming and a full implementation in C compared to MATLAB.

Discussion
We presented briefly the general Bayesian framework for point estimation and its application to the stochastic localization/triangulation problem.This compact presentation of the estimation methods is suitable for a broad group of applications.In the general introduction, we presented the foundations of Maximum Likelihood, Maximum A Posteriori, and Bayesian estimation applied to the Minimum Mean Square Error and how they are related to each other and other popular estimation methods.
Several points can be made considering the specific application to stochastic localization/triangulation, which hold true in general for Bayesian approaches as well: as seen in the proposed application, there is a tradeoff between benefit and effort.The more the information is contained in observations (= quality of observation, e.g., more projections or higher quality measurement/detection), the smaller the benefit is due to prior knowledge.The other way round is true as well: the less the information in observations, the more important is the prior knowledge.With the Bayesian estimation (MAP np , MMSE np , MAP, or MMSE), these two probabilistic sources of information can be naturally included in the model.
The introduction of a prior can be utilized to benefit the estimation in two ways: it can place more weight on possible values of locations x and effectively restrict Ω X by setting the prior probabilities to zeros outside of the feasibility region Mathematical Problems in Engineering (boundaries).If the constrained region of interest for x is a convex region (as in the example), then the MMSE and the MAP will derive estimates in the constrained region, and, therefore, this imposes an effective constraint.However, if the region is nonconvex, only the MAP will still lead to an estimate within the restricted region, while MMSE will lead to an estimate in the convex hull of the constrained region (possibly outside the constrained region).In a broader context, the prior can also be interpreted as a regularizing influence to the presented inverse problem, such as the Tikhonov regularization as in MAP or in a more interweaved form in MMSE.
It is pointed out that in this simulation study the standard deviations of the probabilistic knowledge about the prior knowledge (   for  = {1,2,3}) and the observation uncertainties ( Δ ) are known before estimation.In some practical applications these parameters are not known and have to be estimated or assumed, which would introduce additional uncertainties to the presented estimation methods.The investigation of these additional uncertainties is not part of this study, and it is recommended to extend the presented methods for this purpose utilizing hierarchical Bayesian models [13].
Focusing on the presented 2D study, the following statements can be formulated: if no prior knowledge is available, utilizing ML with 5 or 10 projections compared to a single 2×2 with 2 projections decreases effectively the measurement uncertainties and increases the estimation accuracy by about 40%.If only information about the constraints of x is available but no information about its possible distribution (e.g., truncated Gaussian), MAP np should be preferred to MMSE np since it achieves a similar accuracy (case (A): 11% increase for MAP np relative to ML compared to 13% for MMSE np ) with less bias and much faster calculation times.In contrast to this, if both information are available, information about the constraints and concentration of x, then utilizing MMSE should be preferred to MAP since it allows finding a better compromise close to the surfaces between constraints and concentration.This is due to the fact that MAP neglects the constraints totally as long as the estimates are inside the circle, leading to worse results in average (case (A): 33% increase for MMSE relative to ML compared to 24% for MAP).In general, it is shown in this application that with the utilization of the MMSE for spatial coordinate estimation at constraining interfaces it is possible to construct an empirically unbiased estimator with high accuracy.
Further testing of these methods in medical imaging or other practical applications of stochastic localization at boundaries is highly encouraged.This paper summarizes the basic theory for this purpose and provides clearly interpretable results in the simulation studies.However, in each practical application the true benefit of the estimation methods has to be evaluated separately or individually.For instance, in radiotherapy due to differences in patient anatomy, the problem of breast tumor localization is dramatically different from prostate localization.For breast the boundary is the breast surface, and, for prostate, hard constraints can arise from the bony anatomy or an endorectal balloon which fixes the anatomy differently.
The presented estimation with constraints incorporated into the prior is very general and not limited to linear acquisition geometries, as demonstrated for the nonlinear 3D case.Further, the results of the 3D study show that the scaling of the 3D cone beam geometry increases the estimation accuracy for the first two coordinates of x compared to the parallel beam geometry in 2D.Eventually, ML and MAP estimations lead to fast optimization problems and the MMSE estimator to a more challenging high-dimensional integration.In practical situations it depends on the specific application, such as real-time imaging, to find a suitable compromise between computational effort and accuracy.

Conclusion
With the 2D parallel beam and 3D cone beam geometry examples it is illustrated that the general framework of Bayesian estimation can be applied to a variety of estimation problems even when no explicit formulas can be derived, such as for nonlinear/non-Gaussian distributions of the uncertainties or prior knowledge.In this context, it is demonstrated that constraints about the solution space can be efficiently incorporated into the prior for stochastic localization by utilizing Maximum A Posteriori and Minimum Mean Square Error estimation.

(Figure 2 :
Figure 2: Illustration of the relation between estimation based on ML, MAP, MMSE, and Bayesian estimation.The intersections state under which conditions the estimators are equivalent.Every condition appears only once; that is, since the conditions are not excluding each other, multiple conditions lead to further equalities.

Figure 3 :
Figure 3: Geometrical projection  ∈ R of the coordinate x ∈ R 2 in the detector plane.

Figure 4 :
Figure 4: (a) Example for the prior density function with the parameters specified in the Results section.(b) 10,000 samples drawn from the prior distribution which are utilized in the Results section.

1 Figure 6 :
Figure 6: 10,000 samples drawn from the prior distribution which are utilized in the Results section.

Figure 9 :
Figure 9: The RMSEs of ML, MAP np , MMSE np , MAP, and MMSE for broadly varying parameters, such as the number of projections  (a), the standard deviation of the prior knowledge   1 =   2 (b), and the standard deviation of the observation uncertainty  Δ (c).The constant parameters of each subfigure are set to the standard values of case (A).In the central subfigure MAP np and MMSE np are not presented since they are independent of   1/2 .

Table 1 :
Probability Density Functions (PDF) which are important for estimation theory.
results are utilized.