A Simple Asymptotically Optimal Filter Over An Infinite Horizon

A filtering problem over an infinite horizon for a continuous time signal and discrete time observation in the presence of non-Gaussian white noise is considered. Conditions are presented, under which a nonlinear Kalman type filter with limiter is asymptotically optimal in the mean square sense for long time intervals given provided the sampling frequency is sufficient


Introduction
Consider the infinite horizon filtering problem for a continuous time signal (Xt) > o observed only at time points tk, k 0, 1,... where t k t k 1 z.Suppose the sigffal is distorted by a linear transformation AXtk_ 1 1 that is A I_A_ tional to -, Yk AXt k 1 k" tinuous time process 0 t-0, Yff ya tk-1 t < tk, k > 1 tk_ 1 and corrupted by white noise propor- For convenience, we introduce a con-

Yk )k > 1"
where Y Y + Yk A with similar signal information as A 1 Both (Xt) and (Yff) are assumed to be vector processes of sizes n and l respective- ly.The signal is a homogeneous diffusion process with respect to the vector Wiener process (Wt) > 0 with independent components.The initial condition X 0 is the ran- dom vector wit--h E I I X0 I I 2 < oc.Then the filtering model where matrices A, a and b are known may be analyzed as follows.The noise (k)k > 0 forms an i.i.d, sequence of random vectors independent of (Wt) > 0 and X 0. CoYn- ponents of the vector 1 are independent random variables and-their distributions obey densities so that the Fisher information of each component of 1 is well defined and positive.The assumptions made about the distribution of imply that the Fisher information matrix of 1 is scalar and nonsingular.
If E ]] 2 < , the Kalmn filter with continuous time signal and discrete time observation, that is being adapted to the setting, might be used as an optimal linear filter in the mean square sense.If E 1 1 2 is infinite or even if E l 1 1 2 is finite but too large, the Kalman filter becomes useless though the optimal filtering estimate t(YA) E(Xt Y, t k t) would probably be reasonable from an applied view of filtering quality.Such situa- tions, where E ]] 1 ]] 2 is large because of "heavy tails" of the distributions of 1 com- ponents are typical in engineering practice.
An essential role in the verification of a nonlinear filter quality is to be the lower bound for the mean square filtering errors matrix where T is the transpose operator.
The lower bound for V is found by a method borrowed from Bobrovsky and Zakai [2,3], Bobrovsky, Zakai, and Zeitouni [4], Bobrovsky, Mayer-Wolf, and Zeitouni [5].Under the assumption that the Fisher information matrix if0 of the random vector X 0 exists, we show that t 0 in the sense that V-P is a nonnegative definite matrix.Here, P is the filtering mean square error matrix for Gaussian filtering model d atdt -1-bdW subject to 0 A-0, " is the Gaussian vector with E 0 = EXo, the covariance matrix is equal to ff (fro + is the Moore-Penrose pseudoinverse matrix [1])and (k)k > 1 is an i.i.d, sequence of zero mean Gaussian vectors with unit covariance matri-es.
In general, this lower bound might be unattainable.However, in this paper we show that it can be closely approached if the parameter A is small enough.More- over, we use this fact to construct a nearly optimal filtering estimate by applying a Kalman type filter to a nonlinearly preprocessed observation.The use of a prelimin- ary nonlinear transformation (limiter), to improve filtering accuracy is well known from Kushner [8], Kushner and Runggaldier [9], Liptser and Lototsky [10], Liptser  and Runggaldier [12], Liptser and Zeitouni [14], and Liptser and Muzhikanov [11].In [11] and [14], the choice of an appropriate limiter depends on the diffusion approximation of the observation process so that the drift and the diffusion parameters in the associated diffusion limit determine the signal to noise ratio, and thus, the filtering quality.Moreover, the filter with the limiter which gives the largest signal to noise ration (for the limit model) turns out to be optimal in the mean square sense, as A0 for any finite interval.This is readily verified by applying Goggin's approximation [7] for the conditional expectation.
In this paper, we consider the filtering problem on the infinite time interval (infinite horizon) in the situations when Goggin's approach can not be effectively extended for the analysis of limA_0 limt__,o asymptotic.Our approach exploits the lower bound from (1.3) and conditions under which lim tli_,rnPt:-P, (1.4) where P is a positive definite matrix.We show that the estimate t(YA), generated by the Kalman filter with limiter, satisfies the asymptotic optimal property lim tmE(Xt-t(YA))(Xt-t(YA)) T P. (1.5) A--,0 This paper is organized as follows.In Section 2, we make a revision of the finite horizon case and show that the filter, with the relevant limiter, guarantees the asymptotic optimal property in the sense that limVtA-limP Vt>O.A---*0 A0 In Section 3, the main infinite horizon result is presented.Several practical aspects are discussed and demonstrated with computer simulations in Section 4.

Preliminaries
2.1 A Diffusion Approximation and the Filter nent of 1" into As in [11], to keep the heredity of property of asymptotic optimality for a wider range of sampling intervals A, the limiter is applied to the innovation difference.
The limiter is chosen to be a column vector-function G with components Pi(X), i-1,..., , where pi(x) is the distribution density of the ith compo- To incorporate the limiter into the filtering algorithm, Yt A is transformed k=l yA _Atk_ tk-1 1 (2.1) Here, Ix] stands for the integer part of x, t(YA) is a random process defined by the linear equation dt(YA) at(YA) dt + PtA3:dzxt "o EXo, ( and Pt is a solution of the Riccati equation Pt aPt + Pt aT + bbT-PtATzj'APt (2.3) subject to P0 cv(Xo, Xo) in which the Fisher information is a scalar matrix with diagonal elements The process t(YA) is suggested as the nonlinear filtering estimate for the signal X given {Y 0<s<t}.It should be noted that yA_yA --Atk_ (YA)A is tk tk 1 1 regarded as "innovation difference" and the proposed filter can be seen as a Kalman filter with a limiter which is applied to the innovation process (see the block diagram on Figure 1).The choice of such a nonlinear filter is warranted by the diffusion approximation arguments given below.We fix the following assumptions (2.4) and extend the proof of Theorem 2.1 from [11] to the vector case.As in [11], we have (X,, At t(yA)) l__a__ (Xt, t, t(P )), (2.5) where Y0-0, 7r 0 EX o and dX aXtdt + bdW (2.6) d,7 A(X / with Wiener process (Vt) > 0 independent of (St) > 0 and d (")t a ( )tdt + PtATdt (2.7)Despite the trajectories of the prelimit triple (xt, ,rt(Y )) which belong to 1C0, o)XDfo, oc) xD0, o) ' the trajectories of the limit (Xt,t,t()) belong to C0 o)x Cf0 )x C0 ), so that ,,l__a_,, holds in the local supremum topology with the metric where u' (xl, Yl, zl), u" (x t, zt and Consider now the filtering problem for the limit pair (Xt, Yt) (see (2.6))in which X is treated as the signal and Yt as the observation.Clearly, the optimal, in the mean square sense, linear filtering estimate is generated by the Kalman filter given by equations (2.7) and (2.3).Note also that the nonlinear nt proposed in (2.2)is nothing but this Kalman filter with t replaced by 't zx.x E(Xk Y j, 1 <_ j <_ k) where the block-matrix has the full rank.
Then, for every k > 0 V > pA k" Proof: Under (FR), for every t > 0 the distribution of X obeys a density (e.g., see the proof of Lemma 16.3 in [13]).Hence, the joint distribution of k- (xk,'",Xl)   has a density as well.For convenience, introduce the vector O k -(Yk,'",Yl)" The signal and noise independence guarantees the existence of a smooth and strictly positive distribution density p(u,v) for (k, Ok).Set f(u)-i p(u, v)dv (density of g(v)-/ p(u, v)du (density of Ok) p(u, v) (conditional density of k given 0k) and denote by V uq(UlV) the row gradient vector with respect to u.Let us define a nonnegative definite matrix / i V T (u V) V uq(U ik uq ke q(u Iv) V)g(v)dudv" (2.10)We show that the mean square error for the estimate k E(klOk) of k given 0k is bounded below (a version of the Rao-Crammer inequality): E( )( ) I, (.) where I is the Moore-Penrose pseudoinverse matrix for Ik.The inequality (2.11)   becomes an equality if (k,0k) is Gaussian pair.
Integrating by parts we find f NkuV q(u Iv)du--I, where I is the unit matrix of relevant size, so that (, )p(, v) ( ;) q(= I.)' we hve p q u q(u Iv) v uq(U v)p(u, v)dudv.

13) [kn
Since r(vlu is the conditional distribution density of 0 k given k( u), the matrix (v I) dv d can be explicitly expressed in terms of matrices A and , or more exactly, as a block diagonal matrix with blocks ATffA.The detailed computation is omitted here whereas the rest of the proof does not relay on a specific structure of this matrix.
-1 "j A'j_ 1/kl"-1/2jr, (2.14)   where (j)j _> 1 is a sequence i.i.d, zero mean Gaussian vectors with i.i.d, elements of unit variance and 0 is a Gaussian random vector with E-EX0and covariance matrix r0+.Denoted by (v u) the conditional density of O k given k( u) and by f (u) the distribution density of k" Using the fact that under a Gaussian distribution, the covariance matrix coincides with the pseudoinverse of the Fisher information, we get and hence, with k-E(klYk), we have E( k -)(-k) T equality and (2.11) imply Thus, the above

15)
It is clear that the matrix V is the sub-block of E( k -k)(k-'k) T located on the left upper corner.Denote thesame sub-block of E( k -k)((k-k) T by Pta.. Recall that the Gaussian pair (k, Ok) is generated by the recurslon (2.14), so that the matrix pA gives the mean square filtering error and is defined by the recursion given k in (2.8) as a part of the Kalman niter corresponding to mode (2.14).
The property given in (2.15) is nothing but the statement of the lemma for k _> 1 (for k 0 the proof uses the same type arguments and is omitted). 2.3 An Asymptotic Optimal Property Over a Finite Horizon X 0 is Gaussian vector, so P0-0 +" By virtue of (2.5) we have Tl__a.w{y (Xt t(Ya))(Xt rt(YA)) A"" t-t( v ))(Xt t( Y ))T.
This recursion and I I I I 0 imply Thus (2.24) holds.
3. The Infinite Horizon Case A--0 t--cx with positive definite matrix P which is the unique solution (in the class of nonnega- tive definite matrices) of the algebraic Riccati equation aP + Pa r + bb r-PArAP 0. (3.2) To clarify a role of the matrix P, we mention here that the optimal filtering estimate rt(YIx) obeys a lower bound (Lemma 3.2)   lim lim E(X t-rct(YZX))(Xtrct(YA)) r >_ P (3.3) A--,0 tc A while the filtering estimate rt(Y given in (2.2) obeys the asymptotically optimal property lim tli_,rnE(Xt t(YZX))(Xt t(YZX)) r P. (3.4) It is known (see [13], Theorem 16.2) that at least the second part of (3.1) is provid- ed by (FR) (see Lemma  (FR') Aa -1 (t.)xIn this setting, for the verification of the first part of (3.1), and especially (3.4), another (FR') condition is required.Even if (FR') holds, we assume that eigenvalues of the matrix a have negative real parts (see (INF.3) in Theorem 3.1).It can be proved that under (FR) and (INF.3) the second equality in (3.1) holds with the posi- tive definite matrix P.
It is natural when filtering over a long time interval to replace Pt in the equation for 7rt(Y (see (2.2)), by its limit P, that is, to use the modified version of (2.2)   dt(YA) at(YA)dt + pATZf dt A.
(3.5)Moreover, the exact knowledge of the initial condition 0(YA) is not essential, since it is forgotten by the filter dynamics.It is well known from Makowski [15], Makowski and Sowers [16], Ocone and Pardoux [17] (see also Budhiraja and Kushner  [6]) that, under (FR) and (FR'), the Kalman filter is asymptotically optimal (among nonlinear filters) over the infinite time interval even if the first and second moments of an inappropriate distribution for X 0 are used as the initial conditions for the filter.
We establish the same property for t(YA) defined in (3.5).Eigenvalues of the matrix a have negative real parts.
Then (3.3) is valid and (3.4) holds for t(YA) defined in (3.5) under any fixed ini- tial condition.

Auxiliary Lemmaz
It can be shown that not only under (FR) and (FR') but also under (FR) and (INF.3),lrnP P exists.Let us denote" Pk PtA Then (2.8) may be rewritten as (3.8)Under assumptions made previously, the norm of the matrix pZX is bounded from above by a positive constant independent of A. He I[ (/X I I o(zx).Therefore, lim (aP/x + P/XaT + bb T-PAAZJ'ATpA) O.
A ---O Since matrices pA, A > 0 are bounded, any infinite sequence (At) > 1 decreasing to such that the limit lim., P At'-P' exists.
(3.9) Let (9T) be the right eigenvector of K T (the left eigenvector of K) with eigenvalue A. (Re(A)is the real part of ,.) Multiplying (3.9) from the right by and from the left by T, we obtain 2Re(,\)TP + T(bbT -+-pATAp) O.
Because P is positive definite and bbT+ pATAp is nonnegative definite, then Re(A) <_ O. Assume Re(A)-O.Then TpATApT O, so TpATI/2 0 and, in turn, TpAT 0 and TpATA 0 (or AAP 0).Then A KT9 (a T-ATAp) aT99, i.e., T is the right eigenvector of the matrix a T so the eigenvalue of a T has a zero real part.The latter contradicts the assumption Re(A) 0. Consequently, Re(A) < O. D 3.3 Proof of Theorem 3.1 The proof is divided into two parts.3.3.1:X o is a Gaussian vector with known expectation and covariance.
The solution of the differential equation "t (a-pATA)Tt + Tt(a-pAT6f A) T -ba T -3 t-pATaf AP subject to T O P is P.In particular, for every k, Ttk-P.On the other hand, for every k Ttk e( a pATaA)A T e(a-pATaf A)TA tk_ 1
A-O t--,o

Concluding Remarks
The proof of Theorem 3.1 uses extensively the bounded property of the limiter G and the stability assumption that eigenvalues of the matrix a have negative real parts.
Neither can be weakened: with an unbounded limiter and an unstable model, the computer simulation with a reasonable small A exhibits catastrophic failure of the signal tracking (see Figure 2).scalar observation noise n-(cOn + P(1-On) where n a and n are independent i.i.d, zero mean Gaussian sequences with var(n ) -0.1 and var(2n)-10, and O n a binary i.i.d, sequence with P{O n 1} 0.95.The signal is an unstable scalar diffusion with drift coefficient a 0.05.The estimate was generated for the signal sampled at A 10 msec.The signal loss occurs at 15 sec.Similar phenomena occur on long time intervals even for stable signals when the filter uses an unbounded limiter.Tracking failures occasionally occur even if our assump- tions are satisfied, i.e., the limiter is bounded and the signal is stable.However, these failures are localized (see Figure 3) and the averaged performance is close to the opti- mal.   3. A stable signal model (with drift coefficient a -0.01) and ob- servation noise with Cauchy distribution.In this case the limiter is bounded.With the sampling step A as in Figure 2, local tracking failures still occur (as can be seen at t ,, 350 sec), but they do not affect drastically the overall performance: empirical mean square error is still close to the theoretical lower bound.
The requirement of bounded limiter may turn out to be too rstrictive for certain noise models (e.g., if Pi is a denmty of Gaussmn mxture, then N s unbounded).In such cases, it makes sense to approximate G by some bounded functmn G, whmh gives an asymptotically &optimal filtering estimate t(YZX').

Figure 1 .
Figure 1.Block diagram of the proposed filter.

2. 2 A
Lower Bound Over a Finite Horizon Let us introduce the essential notation.Set xk--X and yk--Y-Y tk 1

Lemma 2 . 1 :
Assume --( b aba n-lb') r(ba)/ n x (n.rn) 1Ci0, oo),Di0, o) are the spaces of continuous and right continuous (having limits to the left) vector-functions of size r.

3. 1
Formulation of the Main ResultAn essential role in establishing the asymptotic optimal property for the filtering A estimate rt(Y over the infinite time interval is played by the relation of full rank.

Figure 2 .
Figure 2.An unstable signal model and the observation signal noise with the following simulation settings:

Figure
Figure 3.A stable signal model (with drift coefficient a -0.01) and ob-