^{1}

^{1, 2}

^{1, 3}

^{1, 2}

^{1}

^{2}

^{3}

We describe a Bayesian filtering scheme for nonlinear state-space models in continuous time. This scheme is called Generalised Filtering and furnishes posterior (conditional) densities on hidden states and unknown parameters generating observed data. Crucially, the scheme operates online, assimilating data to optimize the conditional density on time-varying states and time-invariant parameters. In contrast to Kalman and Particle smoothing, Generalised Filtering does not require a backwards pass. In contrast to variational schemes, it does not assume conditional independence between the states and parameters. Generalised Filtering optimises the conditional density with respect to a free-energy bound on the model's log-evidence. This optimisation uses the generalised motion of hidden states and parameters, under the prior assumption that the motion of the parameters is small. We describe the scheme, present comparative evaluations with a fixed-form variational version, and conclude with an illustrative application to a nonlinear state-space model of brain imaging time-series.

This paper is about the inversion of dynamic causal models based on nonlinear state-space models in continuous time. These models are formulated in terms of ordinary or stochastic differential equations and are ubiquitous in the biological and physical sciences. The problem we address is how to make inferences about the hidden states and unknown parameters generating data, given only observed responses and prior beliefs about the form of the underlying generative model, and its parameters. The parameters here include quantities that parameterise the model’s equations of motion and control the amplitude (variance or inverse precision) of random fluctuations. If we consider the parameters and precisions as separable quantities, model inversion represents a triple estimation problem. There are relatively few schemes in the literature that can deal with problems of this sort. Classical filtering and smoothing schemes such as those based on Kalman and Particle filtering (e.g., [

In this paper, we dispense with the mean-field approximation and treat all unknown quantities as conditionally dependent variables, under the prior constraint that the changes in parameters and precisions are very small. This constraint is implemented by representing all unknown variables in generalised coordinates of motion, which allows one to optimise the moments of the joint posterior as data arrive. The resulting scheme enables an efficient assimilation of data and the possibility of online and real-time deconvolution. We refer to this Bayesian filtering in generalised coordinates as Generalised Filtering (GF). Furthermore, by assuming a fixed form for the conditional density (the Laplace assumption) one can reduce the triple estimation problem to integrating or solving a set of relatively simple ordinary differential equations. In this paper, we focus on GF under the Laplace assumption.

We have previously described Variational filtering in [

Variational filtering of this sort is fundamentally different in its mathematical construction from conventional schemes like Kalman filtering because of its dynamical formulation. It can be implemented without any assumptions on the form of the conditional density by using an ensemble of “particles” that are subject to unit (standard) Wiener perturbations. The ensuing ensemble density tracks the conditional mean of the hidden states and its dispersion encodes conditional uncertainty. Variational filtering can be further simplified by assuming the ensemble density (conditional density) is Gaussian, using the Laplace assumption. Crucially, under this approximation, the conditional covariance (second moment of the conditional density) becomes an analytic function of the conditional mean. In other words, only the mean

In this work, we retain the Laplace approximation to the conditional density but dispense with the mean-field approximation; in other words, we do not assume conditional independence between the states, parameters, and precisions. We implement this by absorbing parameters and precisions into the hidden states. This means that we can formulate a set of ordinary differential equations that describe the motion of time-dependent conditional means and implicitly the conditional precisions (inverse covariances) of all unknown variables. This furnishes (marginal) conditional densities on the parameters and precisions that are functionals of time. The associated conditional density of the average parameters and precisions over time can then be accessed using Bayesian parameter averaging. Treating time-invariant parameters (and precisions) as states rests on modelling their motion. Crucially, we impose prior knowledge that this motion is zero, leading to a gradient descent on free-energy, which is very smooth (cf. the use of filtering as a “second-order” technique for learning parameters [

This paper comprises four sections. In the first, we describe the technical details of Generalised Filtering from the first principles. This section starts with the objective (to maximize the path-integral of a free-energy bound on a model’s log-evidence). It ends with set of ordinary differential equations, whose solution provides the conditional moments of a Gaussian approximation to the conditional density we seek. The second section reviews a generic model that embodies both dynamic and structural (hierarchical) constraints. We then look at Generalised Filtering from the first section, under this model. The third section presents comparative evaluations of GF using a simple linear convolution model, which is a special case of the model in Section

In this section, we present the conceptual background and technical details behind Generalised Filtering, which (in principle) can be applied to any nonlinear state-space or dynamic causal model formulated with stochastic differential equations. Given the simplicity of the ensuing scheme, we also take the opportunity to consider state-dependant changes in the precision of random fluctuations. This represents a generalisation of our previous work on dynamic causal models and will be exploited in a neurobiological context, as a metaphor for attention (Feldman et al.; in preparation). However, we retain a focus on cascades of state-space models, which we have referred to previously as hierarchical dynamic models [

Given a model

Crucially, the free-energy can be evaluated easily because it is a function of

This graph shows the kernels implied by the recognition dynamics in (

The solutions of (

In Generalised Filtering, changes in the conditional uncertainty about the parameters are modelled explicitly as part of a time-varying conditional density on states and parameters. In contrast, variational schemes optimise a conditional density on static parameters, that is,

In summary, we have derived recognition or filtering dynamics for expected states and parameters (in generalised coordinates of motion), which cause data. The solutions to these equations minimise free-action (at least locally) and therefore minimise a bound on the accumulated evidence for a model of how we think the data were caused. This minimisation furnishes the conditional density on the unknown variables in terms of conditional means and precisions. The precise form of the filtering depends on the energy

In this section, we review the form of models that will be used in subsequent sections to evaluate Generalised Filtering. Consider the state-space model

Under local linearity assumptions, the generalised motion of the data and hidden states can be expressed compactly as

Given this generative model, we can now write down the energy as a function of the conditional expectations, in terms of a log-likelihood

We next consider hierarchical forms of this model. These are just special cases of Equation (

This is exactly the same as (

In summary, hierarchical dynamic models are nearly as complicated as one could imagine; they comprise causal and hidden states, whose dynamics can be coupled with arbitrary (analytic) nonlinear functions. Furthermore, the states can be subject to random fluctuations with state-dependent changes in amplitude and arbitrary (analytic) autocorrelation functions. A key aspect is their hierarchical form that induces empirical priors on the causes that link successive levels and complement the dynamic priors afforded by the model’s equations of motion (see [

In this section, we generate synthetic data using a simple linear convolution model used previously to cross-validate Kalman filtering, Particle filtering, Variational filtering and DEM [

Here, the parameters

When generating data, we used a deterministic Gaussian bump function

The linear state-space model and an example of the data it generates: the upper left panel shows simulated data in terms of the output due to hidden states (coloured lines) and observation noise (red lines). The (noisy) dynamics of the hidden states are shown in the upper right panels (blue lines), which are the response to the cause or input on the lower left. The generative model is shown as a Bayesian dependency graph on the lower right.

These data were then subject to GF and DEM to recover the conditional densities on the hidden states, unknown cause, parameters and log-precisions. For both schemes, we used uninformative priors on four parameters;

Figure

Conditional estimates during the first iteration of Generalised Filtering. This format will be used in subsequent figures and summarizes the predictions and conditional densities on the states of a hierarchical dynamic model. The first (upper left) panel shows the predicted response (coloured lines) and the error (red lines) on this response (their sum corresponds to observed data). For the hidden states (upper right) and causes (lower left) the conditional mode is depicted by a blue line and the 90% conditional confidence intervals (regions) by the grey area. The lower right panels show the optimisation of the conditional means of the free parameters (above) and log-precisions (below) as a function of time.

The dynamics of the conditional states are prescribed by (

Conditional estimates after convergence of the GF scheme. This figure uses the same format as the previous figure but includes the true values of states used to generate the response (broken grey lines). Furthermore, time-dependent estimates of the parameters and precisions have been replaced with the conditional moments of the Bayesian average of the parameters over time. The black bars are the true values used to generate the data and the white bars are the conditional means. 90% confidence intervals are shown in red.

The conditional estimates of the parameters show good agreement with the true values, with the exception of the first parameter of the equation of motion

Figure

Conditional estimates after convergence of the DEM scheme. This is exactly the same as the previous figure but shows the conditional estimates following Dynamic Expectation Maximisation of the data in Figure

The free-action bound on accumulated log-evidence is shown in Figure

Comparison of Generalised Filtering and DEM. Upper left: negative free-action for GF (solid line) and DEM (dotted line) as a function of iteration number. Upper right: conditional moments of the log-precisions, shown for GF (grey bars) and DEM (white bars), in relation to the true values used to generate the data (black bars). 90% confidence intervals are shown in red. The lower panels show the conditional moments of the parameters (left) and log-precisions (right) as a function of time, after convergence of the GF scheme. The conductional means (minus the prior expectation of the parameters) are shown as blue lines within their 90% confidence tubes (grey regions).

It can be seen that DEM overestimates the precision of both observation and state noise while GF overestimates observation noise but underestimates state noise. Both schemes are overconfident about their estimate, in that the true values lie outside the 90% confidence intervals (red bars). These confidence intervals are based on accumulating the conditional precisions at each time step. For DEM, this accumulation is an integral part of optimisation whereas for GF it rests on the Bayesian parameter averaging of time-dependent precisions. These are shown on the lower right in terms of the corresponding confidence regions (grey areas). This panel shows a mild contraction of the confidence tube for the precision of state noise, when the hidden states are changing the most (shortly after the cause arrives). This is sensible because state noise is on the motion of hidden sates. A similar but more pronounced effect is seen in the equivalent confidence tubes for the parameters (lower left). Here, all the parameters estimates enjoy a transient decrease in conditional uncertainty during the perturbation because there is more information in the data about their putative role at these times.

In this section, we have tried to illustrate some of the basic features of Generalised Filtering and provide some comparative evaluations using an established and formally similar variational scheme (DEM). In this example, the estimates of the conditional means were very similar. The main difference emerged in the estimation of posterior confidence intervals and the behaviour of the free-action bound on accumulated log-evidence. These differences are largely attributable to the mean-field approximation inherent in DEM and related variational schemes. In the next section, we turn to a more complicated (and nonlinear) model to show that GF can recover causal structure from data, which DEM fails to disclose.

In this section, we turn to a model that is more representative of real-world applications and involves a larger number of states, whose motion is coupled in a nonlinear fashion. This model and the data used for its inversion have been presented previously in a comparative evaluation of variational filtering and DEM. Here, we use it to illustrate that the GF scheme operates with nonlinear models and to provide a face validation in this context. This validation rests upon analysing data from a part of the brain known to be functionally selective for visual motion processing [

We used a hemodynamic model of brain responses to explain evoked neuronal activity that has been described extensively in previous communications (e.g., [

In this model, changes in vasodilatory signal

(a) Biophysical parameters (state-equation). (b) Biophysical parameters (observer).

Description | Value (and prior mean | |
---|---|---|

rate of signal decay | ||

rate of flow-dependent elimination | ||

transit time | ||

Grubb's exponent | ||

resting oxygen extraction fraction |

Description | Value | |
---|---|---|

Blood volume fraction | ||

Intra/extra-vascular ratio |

This allows us to formulate the model in terms of hidden states

This model represents a multiple-input, single-output model with four hidden states. The parameters

Data were acquired from a normal subject at 2-Tesla using a Magnetom VISION (Siemens, Erlangen) whole body MRI system, during a visual attention study. Contiguous multislice images were obtained with a gradient echo-planar sequence (TE = 40 ms; TR = 3.22 seconds; matrix size =

The three potential causes of neuronal activity were encoded as box-car functions corresponding to the presence of a visual stimulus, motion in the visual field, and attention. These stimulus functions constitute the priors

The ensuing conditional means and 90% confidence regions for the causal and hidden states are shown in Figure

Conditional estimates following Generalised Filtering of the empirical brain imaging time-series. This figure adopts the same format as Figure

Parameter and precision estimates from the analysis of the empirical data presented in the previous figure. Upper left: conditional means (grey bars) and 90% confidence intervals (red bars) of all (eight) free parameters in this nonlinear model. Upper right: the corresponding conditional covariances are shown in image format (with arbitrary scaling). The lower panels show the time-dependent changes in conditional moments as a function of scan number for the parameters (minus their prior expectation; left) and the log-precisions (right). We have focused on the precision of the first (vision) coupling parameter (grey area) in this figure.

Figure

A detailed summary of the hemodynamics is shown in Figure

These are the same results shown in Figure

Finally, we analysed the same data using DEM. The results are shown in Figure

The equivalent results for the hemodynamic deconvolution using

Comparison of Generalised Filtering and DEM for hemodynamic deconvolution. Upper left: negative free-action for GF (solid line) and DEM (dotted line) as a function of iteration number.

As noted in [

In this paper, we have introduced Generalised Filtering, an online Bayesian scheme for inverting generative models cast as stochastic differential equations in generalised coordinates of motion. This scheme is based upon a path-integral optimisation of free-energy, where free-energy bounds the log-evidence for a model. Under a Laplace approximation to the true posterior density on the model’s unknown variables, one can formulate deconvolution or model inversion as a set of ordinary differential equations, whose solution provides their conditional mean (which implicitly prescribes their conditional precision). Crucially, this density covers not only time-varying hidden states but also parameters, and precisions that change slowly. We have seen that its performance is consistent with equivalent fixed-form variational schemes (Dynamic Expectation Maximisation) that entail the extra assumption that the states, parameters and precisions are conditionally independent.

Although not emphasised in this paper, the basic approach on which Generalised Filtering is based was developed with neurobiological implementation in mind. In other words, we have tried to construct a scheme that could be implemented by the brain in a neurobiologically plausible fashion. This was one of the primary motivations for a dynamical optimisation of the parameter and precision estimates. In future communications, we will focus on the neurobiological interpretation of Generalised Filtering and how it might relate to the optimisation of synaptic activity, efficacy, and gain during perceptual inference in the brain. Our particular focus here will be on state-dependent changes in precision as a model of visual attention (Feldman et al; in preparation). In this context, the recognition dynamics entailed by optimisation can be regarded as simulations of neuronal responses to sensory inputs.

In a more practical setting, this sort of filtering may find a useful role, not only in data analysis but also in online applications, such as speech recognition or active noise cancellation. Indeed, we have already used DEM to infer the hidden states of chaotic systems (hierarchically coupled Lorentz attractors) that were used to simulate bird songs [

There is a close connection between the updates implied by (

An intuition about the need for a high prior precision on the fluctuations of the model parameters can be motivated by a linear stability analysis of the associated recognition dynamics (see (

In this appendix, we compare the free-energy

Filtering involves integrating the ordinary differential equations (

This system can be solved (integrated) using a local linearisation [

The corresponding curvatures are (neglecting second-order terms involving states and parameters and second-order derivatives of the conditional entropy)

Finally, the conditional precision and its derivatives are given by the curvature of Gibb’s energy

Note that we have simplified the numerics here by neglecting conditional dependencies between the precisions and the states or parameters. This is easy to motivate because one is not interested in the conditional precision of the precisions but in the (conditional expectation of the) precisions

These equations may look complicated but can be evaluated automatically using numerical derivatives. All the simulations in this paper used just one routine—

When including a hyperparameterisation of the smoothness of the random fluctuations, encoded by the precision matrix on generalised motion

The schemes described in this paper are implemented in Matlab code and are available freely

This work was funded by the Wellcome Trust and supported by a grant from the China Scholarship Council (CSC). The author would like to thank Marcia Bennett for help preparing this paper.