Log-PF: Particle Filtering in Logarithm Domain

,


Introduction
Many scientific problems involve dynamic systems, for example, in navigation applications.Dynamic systems can be described by state-space models where the state is only observable by noisy measurements.Recursive Bayesian filters are algorithms to estimate an unknown probability density function (PDF) of the state recursively by measurements over time.Such a filter consists of two steps: prediction and update.In the prediction step, the PDF of the state is calculated based on the system model.During the update step, the current measurement is used to correct the prediction based on the measurement model.In this way, the posterior PDF of the state is estimated recursively over time.Particle filters (PFs) are implementations of recursive Bayesian filters which approximate the posterior PDF by a set of random samples, called particles, with associated weights.Several types of PFs have been developed over the last few years [1][2][3][4][5][6][7][8].They differ in their choice of the importance sampling density and the resampling step.
A common way is to choose the importance sampling density to be equal to the prior, for example, the bootstrap filtering algorithm [1].However, if the width of the likelihood distribution is too small in comparison to the width of the prior distribution or if measurements are located in the tail of the prior distribution, this choice may fail; see [4].These situations may arise when sensors are very accurate or measurements rapidly change over time such that the particle states after the prediction step might be located in the tail of the likelihood.Additionally, numerical representation of numbers may limit the computational accuracy by floating point errors.In these situations, a common way is to use the likelihood particle filter (LPF) [3,5].The LPF uses the likelihood distribution for the importance sampling density and the prior for the weight update.The LPF is recommended when the width of the likelihood distribution is much smaller compared to the one of the prior and accordingly, the posterior density function is more similar to the likelihood than to the prior.However, in many situations, it is impossible to draw samples from the likelihood distribution.Furthermore, the LPF is not suitable for an underdetermined system where the number of measurements is lower than the number of states per time instant.Additionally, using the likelihood as proposal distribution might increase the variance of the simulated samples according to [5].
In this paper, we derive a PF that operates in logarithmic domain (log-domain), called Log-PF.The Log-PF represents the weights in log-domain which enables a more accurate representation of low weights with a limited number of bits.Particularly, when the involved distributions contain exponentials or products of functions, the log-domain representation is computationally more efficient [9].The derived Log-PF uses the Jacobian logarithm [10][11][12] to describe all steps of the PF, including weight update, weight normalization, resampling, and point estimations in log-domain.In this paper, we derive the minimum mean square error (MMSE) and the maximum a posteriori (MAP) point estimators.
The paper is structured as follows: First, we describe in Section 2 standard PFs; thereafter we derive the proposed Log-PF in Section 2. Afterwards, we derive in Section 4 two point estimators in log-domain: Section 4.1 describes the MMSE estimator and Section 4.2 the MAP estimator.We evaluate the Log-PF by simulations and compare the results to standard PF implementations and Kalman filters (KFs) in Section 5. Particularly, we show by an example based on Rao-Blackwellization the benefits by computing all steps in log-domain.For distributed particle filters like [13] similar results are expected.Finally, Section 6 concludes the paper.Throughout the paper, we will use the following notations: (i) All vectors are interpreted as column vectors.
(ii) I denotes an identity matrix.
(iii) Matrices are denoted by bold capital letters and vectors by bold small letters.
(iv) [x]  denotes the th element of vector x.
(viii) x denotes the estimate of .

Particle Filtering
PFs represent the probability density of the state vector x  at time step  by   particles.According to [1][2][3]5] and assuming a first-order hidden Markov model, the posterior filtered density p(x  | z 0: ) is approximated as where z 0: defines the measurement vector for the time steps 0, . . ., , (⋅) stands for the Dirac distribution, x The unnormalized weight is denoted by  * ()  , while the weight update is calculated as with the importance density q(x  | x ), and the transition prior distribution p(x −1 ); see [1][2][3]5].For   → ∞, the approximation used in (33) approaches p(x  | z 0: ).By (33), (2), and (3), the sequential importance sampling (SIS) PF can be described which is the basis of most PFs [2].
For numerical stability reasons, weights are often computed and stored in the log-domain, which is also computationally efficient when the distributions involved contain exponentials or products.Thus, we obtain from (3) the update equation in log-domain, with ŵ * () where we define with the log-domain weight (log-weight) ŵ()  for particle .After calculating the weights ŵ * () in log-domain, the weights are transferred for further processing to the linear domain (lindomain) with  * ()

𝑘
for  = 1, . . .,   , where the numerical accuracy is lost due to floating point representation.In order to obtain a more stable PF implementation, the weights ŵ * () can be transferred to the lin-domain by such that =1  +()  ; see, for example, [9].In the following we investigate a different approach, where the transformation from the log-domain to the lindomain is not necessary.Hence, we show that all steps of the PF can be computed in log-domain.

Algorithm Derivation
To compute all steps of the PF in log-domain, we obtain for the approximation of the posterior filtered density from (33) using (5).The normalization of the log-weight can be calculated directly in log-domain as a simple subtraction, with where Ŵ denotes the normalization factor with To compute the normalization factor Ŵ of ( 9) without transferring the log-weights to the lin-domain, the Jacobian logarithm [10,11] can be used.The Jacobian logarithm computes the logarithm of a sum of two exponentials ln(  1 +   2 ) using the max (⋅) operator and adding a correction term; that is, With (10) and as derived in [12], the expression ln(∑  =1    ) can be calculated iteratively as where  = ln(  1 + ⋅ ⋅ ⋅ +   −1 ) and Δ =   1 + ⋅ ⋅ ⋅ +   −1 .Hence, using the Jacobian logarithm allows computing operations such as summations like in (9) efficiently in the log-domain.For later conveniences, we express (11) by an iterative algorithm shown in Algorithm 1 by a pseudocode; that is, ln where {  }  =1 defines the set for   with  = 1, . . ., .Thus, the normalization factor Ŵ of ( 9) can be calculated iteratively by Hence, we obtain for the log-weight normalization of (8), Please note that a complexity reduction can be obtained if the term ln(1 By using ( 14), the SIS PF can be described in log-domain as shown in Algorithm 2 by a pseudocode.Algorithm 2 is (5) for  = 1 :   do (6) Normalize: Algorithm 2: SIS-PF in log-domain (SIS Log-PF).
evaluated at each time step , where {x =1 denotes the set for the particle states x ()  and log-weights ŵ()  with  = 1, . . .,   for time step .
One of the crucial problems of the SIS PF is degeneracy (another problem which is not discussed in this paper is the selection of the importance density q(x  | x () −1 , z  ) of ( 4); see e.g., [2]).
After a few time steps all particles except for one have low weights and do not contribute anymore to the computation of the posterior PDF; that is, the distribution estimation degenerates.A suitable measure of degeneracy is the effective sample size  eff [1][2][3]5].A widely used approximation for the effective sample size is with w  = ( (1)   , . . ., )  and 1 ≤  (2)    (w  ) ≤   .A small value of  (2)    (w  ) indicates a severe degeneracy.By using the Jacobian logarithm of (12), we obtain from (15) the effective sample size in log-domain, with ln ( eff ) ≈ ln ( (2)    (w  )) = − ln ( Alternative effective sample size approximations as introduced in [14] can also be represented in log-domain.Table 1 summarizes four generalized alternative effective sample size approximations in lin-domain and log-domain which depend on the parameter .Please note  (2)    (w  ) as in ( 15) is obtained from  ()    (w  ) with  = 2.The Generic PF extends the SIS PF by a resampling step to prevent degeneration as shown in Algorithm 3 by a pseudocode.The basic idea of resampling is to eliminate   (w  ),  ()   (w  ),  ()   (w  ),  ()   (w  ) to approximate  eff in lin-domain as defined in [14] and log-domain with their coefficients  ()  {⋅} and  () {⋅} .Please note  (2)    (w  ) as in ( 15) is obtained from  ()   (w  ) with  = 2. particles with low weights and reproduce particles with high weights.Whenever a significant degeneracy is observed in the Generic PF, that is, ln( eff ) is less than a threshold ln( thr ), the particles are resampled.Algorithm 4 shows a pseudocode of the systematic resampling algorithm [15] transferred into log-domain.In Algorithm 4, U[0,  −1  ] denotes the uniform distribution on the interval [0,  −1  ] (cf.Algorithm 4 Line 5).Similarly to the descriptions before, the Jacobian logarithm is used to construct the estimated sampled cumulative distribution function in log-domain (log-CDF); see Algorithm 4 Line 3. The estimated sampled log-CDF is presented by a vector c Construct log-CDF using the Jacobian logarithm: [c]  = max(ŵ Algorithm 4: Resampling in log-domain.
with length   and element [c]  with  = 1, . . .,   .By [x]  , we denote the th element of the vector x.According to the estimated sampled log-CDF, particles with high weights are reproduced and particles with low weights are eliminated.
In Section 5, we use the sequential importance resampling (SIR) PF; see [1], as an example for comparing the performance of the linear domain PF (Lin-PF) and Log-PF.Therefore, Algorithm 5 shows a pseudocode of the SIR PF in log-domain, which is derived from the Generic PF by setting the importance density to be equal to the transitional prior {x distribution, with q(x  | x −1 ) and using  eff = 1, that is, performing resampling at each time step [2].
Additionally, we compare in Section 5 the proposed Log-PF to the PF implementation which computes the weights in log-domain and uses (6) to obtain the weights in lindomain, called Lin-Log-PF in the following.A pseudocode of the Generic Lin-Log-PF is shown in Algorithm 6: the weights are calculated in log-domain according to (4) and normalized and transferred to the lin-domain according to ) as mentioned in Section 3 and, for example, [9].Please note further improvements can be obtained if the weights ŵ * () are directly propagated in logdomain if resampling is not necessary.

Log-PF Point Estimators
In many applications, we are interested in a point estimate of the state instead of its a posteriori PDF.In this section we derive the MMSE and MAP point estimators based on the a posteriori density estimated by the Log-PF.

Minimum Mean Square Error Estimate.
The MMSE point estimate using the approximated a posteriori density, see, for example, [16], is defined by where the th element of the vector xMMSE  can expressed as In order to use the Jacobian logarithm to compute (18) in logdomain, we separate the positive and negative values of [x and the corresponding log-weight ŵ()  accordingly.Please note [x ]  ) = 0. Thus, we obtain from (18) for the MMSE estimate, where we introduced which can be approximated, see [17], by for  = 1, . . .,   .The corresponding MAP state estimator using weights in log-domain can be calculated using the Jacobian logarithm of (12) with (24)

Simulations
In this section, we demonstrate the performance of the Log-PF using floating point 64-bit number accuracy according to IEEE Standard 754 for double precision with three simulations.
5.1.Linear Processes.First, we simulate a linear Gaussian model.The KF introduced in [18] is an optimal recursive Bayesian filter which can be used if the considered system is linear and the probabilistic model is Gaussian.Hence, we compare the Log-PF and the Lin-PF to the KF as benchmark.
The simulation considers the linear transition model with the transition matrix F, the state vector x  = (  ,   )  , and the zero-mean multivariate Gaussian distributed process noise k  ∼ N(0,  V I) with standard deviation  V and the identity matrix I.The measurement model is defined by with the measurement matrix H and the zero-mean Gaussian distributed measurement noise   ∼ N(0,   ) with standard deviation   .Based on the measurements   , the state sequence x  for  = 1, . . ., 60 is estimated using a KF, the Lin-PF and the Log-PF with   = 200 particles and known initial state x 1 = (2, 2)  .For the Lin-PF, we use the standard PF implementation as well as the Lin-Log-PF.First, we compare the KF to the SIR Lin-PF, SIR Lin-Log-PF, and the SIR Log-PF.In order to see the robustness of the SIR Log-PF, we variate the measurement noise standard deviation   from 10 8 down to 10 −150 .We simulate 1000 different realizations with known initial state for each run.Figure 1 shows the root mean square error (RMSE) averaged over all time steps and simulations versus the decreasing measurement noise standard deviation   .The abbreviation SIR Log-PF MAP stands for the MAP point estimate and SIR Log-PF MMSE for the MMSE point estimate of the SIR Log-PF.Respectively, the abbreviations SIR Lin-PF MAP, SIR Lin-Log-PF MAP, SIR Lin-PF MMSE, and SIR Lin-Log-PF MMSE stand for the SIR Lin-PF and SIR Lin-Log-PF point estimates.We see that the KF obtains the best estimation results followed by the SIR Log-PF and SIR Lin-Log-PF. Figure 1 shows additionally an enlarged subfigure of the region for 10 8 <   < 10 −8 .For   > 10 −2 , all SIR PFs obtain equivalent performance.As soon as   decreases, the RMSE decreases for the SIR PFs until   = 10 −2 .For lower measurement noise standard deviations, the RMSE of the SIR Lin-PF and increases up to a limit of 0.7 whereas the accuracy of the SIR Log-PF and SIR Lin-Log-PF are limited by the number of particles.This effect is caused by the number representation of the particle weights of the SIR Lin-PF.For   < 10 0 , the particle weights of the SIR Lin-PF are small and at similar order as numerical errors due to number representation.Therefore, numerical errors dominate the resampling step such that the resampling algorithm draws  particles based on numerical inaccuracies.Thus, the update step of the SIR Lin-PF loses its effect and we obtain 0.7 as the expected error of the process model in (25).For lower standard deviations, the accuracy of the SIR Log-PF and Lin-Log-PF is limited by the number of particles.The RMSE of the SIR Log-PF and Lin-Log-PF stay constant around 0.05 for the standard deviations between 10 −1 >   > 10 −150 .The KF is the optimal filter for this simulation; hence, the RMSE of the KF is limited by the process noise; hence, the RMSE is constant around 0.01.
To summarize, we obtain for all considered standard deviations equal or better estimation results using the SIR Log-PF compared to the SIR Lin-PF.However, the Lin-Log-PF which computes the weights in log-domain using (6) obtains similar simulation results compared to the SIR Log-PF.Hence, in the following we show the benefits of the Log-PF by using SIS PFs, where no resampling is preformed.The resampling step is a key point for the success of a PF which is applied to avoid the degeneracy.However, resampling yields to a loss of diversity in the propagation of particles and entails an additional computational cost [14].Similar to the SIR PF, we set the importance density to be equal to the transitional prior distribution, with q(x  | x Figure 2 shows the RMSE averaged over all time steps and simulations versus the decreasing measurement noise standard deviation   .The curve looks similar to Figure 1.For   > 10 −2 , the SIS PFs obtain equivalent performance.As soon as   decreases, the RMSE decreases for the SIS PFs until   = 10 −2 .For lower measurement noise standard deviations, the RMSE of the SIS Lin-PF and Lin-Log-PF increase up to the limit of 0.7.The RMSE of the SIS Log-PF stays constant  around 0.03 for the standard deviations between 10 −2 >   > 10 −150 .However, since the SIS PF uses no resampling step, the transformation of the weights from the log-domain to the lin-domain is not essential.Hence, if the weights of the SIS Lin-Log-PF are not transferred to the lin-domain and the weights are normalized with ŵ() ) and propagated to the next time instant, the Lin-Log-PF obtains equivalent estimation results than the SIS Log-PF.Therefore, the transformation to the lin-domain in Line 6 in Algorithm 6 may introduce numerical inaccuracies.As long as no normalization is needed, the Lin-Log-PF can be computed completely in the log-domain.

Nonlinear Processes: Generic Particle Filter.
In this section, we use the Generic PF which decides adaptively when the resampling step is performed.Similar to the SIR PF, we set the importance density to be equal to the transitional prior distribution, with q(x  | x We compare the Generic Log-PF to the Generic Lin-PF and Generic Lin-Log-PF.As mentioned before, a pseudocode of the Generic Lin-Log-PF is shown in Algorithm 6.In all algorithms, resampling is performed whenever the effective sample size  eff falls below a threshold  thr .We use the approximation  (2)    (w  ) for the effective sample size in the lin-domain and respectively ln( (2)    (w  )) in the log-domain.Similar to [14], we consider a stochastic volatility model with  = 0.99 and the zero-mean Gaussian distributed process and measurement noise V  ∼ N(0,  V ) and   ∼ N(1,   ), where   is a multiplicative noise.Based on the measurements   , the state sequence   for  = 1, . . ., 3000 is estimated using the Generic PFs with   = 100 particles.We set the measurement noise standard deviation to   = 0.01 and the process noise standard deviation to  V = 0.01.
For performance evaluation, we try to recreate situations with rapidly changing measurements, that is, a certain model mismatch between the true likelihood of the process and the likelihood representation inside the PF.Inside the Generic PFs, we use the measurement noise standard deviation  , = 10 −4 .Hence, taking the model mismatch into account it is more likely that particle states are located in the tail of the PF's likelihood after the prediction step.For the simulations, we variate the normalized resampling bound  =  (2)    (w  )/  from 0 to 1 using a grid of 0.01 resolution (equivalently for the log-domain).We simulate 1000 different realizations with known initial state and count the number of performed resampling for each run.The resampling rate is afterwards calculated as () = E[Number Resampling using /3000], where E[] stands for the sample mean of .
Figure 3 shows the resampling rate () averaged over all time steps and simulations versus the normalized resampling bound .For 0.2 <  < 0.8, the Generic Log-PF has a lower resampling rate than the Generic Lin-PF and the Generic Lin-Log-PF. Figure 4 shows the RMSE versus the resampling rate (), where the RMSE decreases with increasing resampling rate for all Generic PFs.However, we can observe that, for a resampling rate of 0.15 < () < 0.5, we obtain a slightly lower RMSE with the Generic Log-PF than using the Generic Lin-PF or Generic Lin-Log-PF.

Rao-Blackwellization.
In this section we consider a simultaneous localization and mapping (SLAM) example with radio signals indicated in Figure 5 according to the system model in [19] (similar results are expected in, e.g., belief propagation [20] or distributed PFs [13]).A receiver which is moving along an arbitrary trajectory is measuring the distances x u,k+1 for  = 1, . . .,   transmitters.We consider a static environment with a fixed number of transmitters and a receiver moving along an arbitrary trajectory.However, for notational convenience, a time dependence on  is introduced here for the transmitter positions r ,, .Additionally, we assume based on [19] that the distance offset  , is constant.The state space is estimated by an algorithm according to [19,21] based on Rao-Blackwellization [22], where the states space of x  is partitioned into subspaces.Hence, we use PFs to estimate the subspaces representing the transmitters inside a PF.The reason to use a PF instead of a low complexity extended Kalman filter (EKF) is the nonlinearity of the measurements in (29).The algorithm of [19,21] is based on a superordinate particle filter (superPF) and subordinate particle filters (subPFs).Each particle  = 1, . . .,   of the superPF with the state vector x with  = 1, . . .,  ,, where  ,, stands for the number of particles in the th subPF with  = 1, . . .,   , estimating x ()  ,, .Similar to [19,21], the posterior distribution p(x , , x , | d 1: , u 1: , x ,0 ) can be approximated by importance samples, as The simulations are performed using   = 3000 and  ,, = 1000 particles for all transmitters  = 1, . . .,   .For the initialization, we use prior information x ,0 which is the knowledge on the starting position and velocity.Please note that an unknown starting position and direction or larger initial uncertainties may result in a biased and rotated coordinate system for the estimation.For simplicity, we use also prior information on the transmitter states x ,,0 (please see [19,21], e.g., on unknown transmitter states).The prior information includes a Gaussian distribution with standard deviation of 0.5 m centered around the true transmitter states, that is, position and distance offset.For computing the position estimate r, , we use the MMSE estimate as introduced in [19,21].
Figure 7 shows the RMSE versus the receiver traveled time for the different implementations with RMSE u, = √E{‖r , − r, ‖ 2 } and 200 independent evaluations.At the starting time, the RMSE for all algorithms are similar because of the identical initialization.Afterwards, the RMSE increases caused by the dilution of precision (DOP).However, we can clearly see that we obtain a higher accuracy using the Log-PF compared to the Lin-PF and the Lin-Log-PF.

Conclusion
In this paper we derived a particle filter representation in logarithmic domain, called Log-PF.The derivations show that the weight calculation, weight normalization, resampling, and point estimations can be expressed in logarithmic domain using the Jacobian logarithm.Representing the weight of each particle in logarithmic domain allows reducing the effect of numerical issues.Furthermore, the algorithm derived in this paper can be generalized to multidimensional nonparametric marginalization.
PF MAP SIR Log-PF MMSE SIR Lin-PF MAP SIR Lin-PF MMSE SIR Lin-Log-PF MMSE SIR Lin-Log-PF MAP Measurement standard deviation 

Figure 1 :
Figure 1: Performance evaluation of the SIR Log-PF, SIR Lin-PF, SIR Lin-Log-PF, and KF by the RMSE versus a decreasing measurement standard deviation   .
PF MAP SIS Log-PF MMSE SIS Lin-PF MAP SIS Lin-PF MMSE SIS Lin-Log-PF MMSE SIS Lin-Log-PF MAP Measurement standard deviation 

Figure 2 :
Figure 2: Performance evaluation of the SIS Log-PF, SIS Lin-PF, SIS Lin-Log-PF, and KF by the RMSE versus a decreasing measurement standard deviation   .

Figure 4 :
Figure 4: Performance evaluation of the Generic Log-PF, Generic Lin-PF, and Generic Lin-Log-PF by the RMSE versus the resampling rate ().

Figure 5 :
Figure 5: Overview of the SLAM example: the moving receiver simultaneously estimates its position and the location of the transmitters.

pFigure 7 :
Figure 7: RMSEs of the estimated receiver positions versus receiver traveled time for different algorithms.

Table 1 :
Generalized effective sample size functions () ,, with the distance offset  , and zero-mean Gaussian distributed measurement noise  , ∼ N(0,   ) with standard deviation   .The receiver uses the control input u  to move from state x ,−1 to state x , .In order to use the distance measurements d  for positioning, the positioning algorithm estimates the receiver and transmitter states simultaneously.The state vector x  describing the complete system at time instant  is , and the transmitter states x , which are also unknown.The receiver state x , = (r  , , k  , )  includes the receiver position r , and the receiver velocity k , , while the transmitter states are defined by 1 () , . . .,    ()]  , (28) with   () =     r , − r ,,     +  , +  , , (29) between the receiver at location r , and  = 1, . . .,   transmitters at location r x , = (x  ,1, , . . ., x  ,  , )  (31) with x ,, = (r  ,, ,  , )  , (32)