Forward Sensitivity Approach to Dynamic Data Assimilation

The least squares fit of observations with known error covariance to a strong-constraint dynamical model has been developed through use of the time evolution of sensitivity functions—the derivatives of model output with respect to the elements of control (initial conditions, boundary conditions, and physical/empirical parameters). Model error is assumed to stem from incorrect specification of the control elements. The optimal corrections to control are found through solution to an inverse problem. Duality between this method and the standard 4D-Var assimilation using adjoint equations has been proved. The paper ends with an illustrative example based on a simplified version of turbulent heat transfer at the sea/air interface.


Introduction
Sensitivity function analysis has proved valuable as a mean to both build models and to interpret their output in chemical kinetics (Rabitz et al. [1], Seefeld and Stockwell [2]) and air quality modeling (Russell et al. [3]).Yet, the ubiquitous systematic errors that haunt dynamical prediction cannot be fully understood with sensitivity functions alone.We now include an optimization component that leads to an improved fit of model to observations.The methodology is termed forward sensitivity method (FSM)-a method based on least squares fit of model to data, but where algorithmic structure and correction procedure are linked to the sensitivity functions.In essence, corrections to control (the initial conditions, the boundary conditions, and the physical and empirical parameters) are found through solution to an inverse problem.
In this paper we derive the governing equations for corrections to control and show their equivalence to equations governing the so-called 4D-Var assimilation method (four-dimensional variational method)-least squares fit of model to observations under constraint (LeDimet and Talagrand [4]).Beyond this equivalence, we demonstrate the value of the FSM as a diagnostic tool that can be used to understand the relationship between sensitivity and correction to control.We begin our investigation by laying down the dynamical framework for the FSM: general form of the governing dynamical model, the type and representation of model error that can identified through the FSM, and the evolution of the sensitivity functions that are central to execution of the FSM.The dual relationship between 4D-Var/adjoint equations is proved.The step-by-step process of assimilating data by FSM is outlined, and we demonstrate its usefulness by application to a simplified air-sea interaction model.

Foundation Dynamics for the FSM
We have included a list of mathematical symbols used in this paper.These symbols and associated nomenclature are found in Table 1.

Prediction Equations. Let x(t)
∈ R n denote the state and let α ∈ R p denote the parameters of a deterministic dynamical system, where x(t) = (x 1 (t), x 2 (t), . . ., x n (t)) T and α = (α 1 , α 2 , . . ., α p ) T are column vectors, n and p are positive integers, t ≥ 0 denotes the time, and superscript T denotes the transpose of the vector or matrix.Let f : R n × R p × R → R n be a mapping, where f (x, α, t) = ( f 1 , f 2 , . . ., f n ) T with f i = f i (x, α, t) for 1 ≤ i ≤ n.The vector spaces R n and R p are called the model space and parameter space, respectively.
Observation error vector Objective or cost functions δJ, δJ Consider a dynamical system described by a system of ordinary nonlinear differential equations of the form or in component form where dx/dt denotes the time derivative of the state x(t), with x(0) ∈ R n the given initial condition.The control vector for the model is given by c = (x(0), α) ∈ R n × R p , the combination of initial condition and parameters referred to as the control space.It is tacitly assumed that the map of f in (1a) and (1b) is such that the solution x(t) = x(t, x(0), α) = x(t, c) exists and is unique.It is further assumed that x(t) has a smooth dependence on the control vector c such that the first k (≥ 1) partial derivatives of x(t) with respect to the components of c also exist.The solution x(t) of (1a) and (1b) is known as the deterministic forecast of the state of the system at time t > 0. If the map f (•) in (1a) and (1b) depends explicitly on t, then this system is called a time varying or nonautonomous system; if f (•) does not depend on t, then the system is known as a time invariant or autonomous system.Let z(t) ∈ R m be the observation vector obtained from the field measurements at time t ≥ 0. Let h : R n → R m be the mapping from the model space R n to the observation space R m .
If x(t) denotes the (unknown) true state, then we assume that z(t) is given by where v(t) ∈ R m is the additive (unobservable and unavoidable) noise.The mapping h(•) is known as the forward operator or the observation operator.It is further assumed that v(t) is a white Gaussian noise with mean zero possessing a known covariance matrix R(t) ∈ R m×m .That is, v(t) ∼ N(0, R(t)).

A Classification of Forecast Errors.
The forecast error e F (t) ∈ R m is defined as follows: the sum of a deterministic part and the random part v(t) induced by the observation noise.
Our immediate goal is to analyze and isolate sources and types of forecast errors.
First, if the model map f (•) and the forward operator h(•) are without error, that is, exact, and if the control vector c is also error free, then the deterministic forecast x(t) must be correct in the sense that x(t) = x(t), the true state.Then from (3), the forecast error is purely random or white Gaussian noise.That is, e F (t) = v(t). ( Second, if f (•) and h(•) are known perfectly but c has an error, then the forecast x(t) will have a deterministic error induced by the incorrect control vector.In such a case, we can, in principle, decompose the forecast error as a sum e F (t) = b(c, t) + v(t), ( 6 ) where the deterministic part, b(c, t) = h( x(t)) − h(x(t)), is purely a function of the control vector error.Third, if f (•), h(•), and c are in error, then the forecast error can be represented as where the deterministic part b(c, f , h, t) may have a complex (additive and/or multiplicative) dependence on errors in c, f (•), and h(•).
The following assumption is key to our analysis that follows.The model of choice is faithful to the phenomenon under study.The system is predicted with fidelity-the forecasted state is creditable and useful in understanding the dynamical processes that underpin the phenomenon.Certainly, the forecast will generally exhibit some error, but the primary physical processes are included; that is, the vector field f includes the pertinent physical processes.In this situation the forecast error stems from erroneously specified elements of control.Thus, in our study the forecast error assumes the form shown in (6).Dee's work [5] contains a very good discussion of the estimation of the bias b in (7) arising from errors in the model and/or observations.

Dynamics of First-Order Sensitivity Function Evolution.
Since our approach is centered on sensitivity functions, we develop the dynamics of evolution of the forward sensitivities in this section.
Differentiating both sides of (1b) with respect to α j , interchanging the order of differentiation on the left-hand side, we obtain for 1 ≤ i ≤ n and 1 ≤ j ≤ p with ∂x i (0)/∂α j = 0 as the initial condition.
These np equations can be succinctly written in matrix form as with D α (x(0)) = 0 as initial condition.This system of linear time-varying ordinary nonhomogeneous differential equations describe the evolution of the elements of D α (x) = [∂x i /∂α j ] ∈ R n×p , where the Jacobian matrices D x ( f ) ∈ R n×n and D α ( f ) ∈ R n×p are given by Similarly, by differentiating both sides of (1b) with respect to x j (0), we obtain for 1 ≤ i, j ≤ n, with ∂x i (0)/∂x j (0) = δ i j , where δ i j is the standard Kronecker delta.These n 2 equations can be succinctly represented as with D x(0) (x(0)) = I, the identity matrix.This system of linear, time-varying homogeneous equations governs the evolution of the elements of the matrix Notice that ( 9) and ( 13) are independent of the observations and have the same system matrix D x ( f ) on the right-hand sides; thus, the homogeneous solutions to ( 9) and ( 13) have the same structure.
The evolution of the sensitivities (solution to ( 9) and ( 13)) is dependent on the solution to the governing dynamical equations ((1a) and (1b)).Generally, these equations are solved numerically using the standard fourth-order Runge-Kutta method.Rabitz et al. work [1] contains more details relating to solutions of ( 9) and (13).In special cases such as in air quality modeling, the sensitivity equations ( 9) and (13) exhibit extreme stiffness.Special methods are needed to handle the inherent stiffness of these equations.Seefeld and Stockwell work [2] includes a discussion of these issues.Gear's work [6] is a good reference for a general discussion of stiff equations.

Duality between the FSM and 4D-Var Based on Adjoint Method
Let z(t 1 ), z(t 2 ), . . ., z(t N ) be a sequence of N observation vectors at times The goal is to use these observations to improve the estimate of the control vector c.This estimation problem is recast as a constrained minimization of an objective function J : R n × R p → R given by where the model state x(t) evolves according to (1a), (1b), and Fundamental to minimizing ( 14) is the computation of the gradient of J(c), denoted by ∇ c J(c).In the following we describe two ways of characterizing where Advances in Meteorology 3.1.The Adjoint Method.This method is based on the basic principle that if δJ is the first variation of J(c) induced by the perturbation δc in c, then where a, b denotes the standard inner product of two vectors a and b of the same dimension.Once the first variation δJ is determined, the gradient can be found.In the following we exploit two basic properties of inner product: (i) linearity (ii) adjoint property where G T is the transpose or the adjoint of the matrix G.
From first principles (Chapter 24 in Lewis et al. [7]), it can be verified that the first variation δJ of J in ( 14) is given by where the forecast error given by is the Jacobian of forward operator h(x) with respect to x.By invoking the adjoint property in (17), (19) becomes where The first variation δx(t k ) in x(t) at t = t k resulting from the perturbation δc in c is given by (A.4) in Appendix A. Using (A.4) and the linearity of the inner product it follows that where we will refer to the first term on the right-hand side of (23) as "Term I" and the second as "Term II."Using the adjoint and linearity property in (17), we get from which we obtain Similarly, rewriting Term II as we get Hence, we obtain the components of ∇ c J which are used in conjunction with the minimization algorithm to find the optimal c * that minimizes J(c) in ( 14).We conclude this discussion with an efficient recursive method for evaluating the expressions in ( 25) and ( 27).Define and for It can be verified that ∇ x(0) J = M T (t 1 , 0)λ 1 in (25).Details on the recursive computation of (27) are given in Appendix C.

Sensitivity-Based Approach.
Let us first consider the special case when N = 1.Then (14) becomes where we recall that c ∈ R n ×R p and x = x(t, c) is the solution of the model equations (1a) and (1b) at time t.Setting q = (n + p) and A = R −1 (t), the expression J 1 (c) in (30) becomes identical to Q(c) in (B.16) (Appendix B).Hence, by using (B.20) it follows that where η(t) is given by ( 22) and Now comparing (32) with ( 25)-( 27), we obtain the duality relation: The sensitivity matrices on the left-hand side of (33) are obtained by solving the forward sensitivity equations, whereas the matrix products M(t, 0) and L(t, 0) are obtained by integrating along the path as given by (A.3) and (A.5) (Appendix A).The primary advantage of the sensitivity-based approach is that it provides a natural interpretation of the expression for the gradient in (31).Recall from (22) that η(t) ∈ R n is the weighted version of the forecast error R −1 (t)e F (t) ∈ R m mapped onto the model space by the linear operator D T x (h).Also the ith column of D T c (x) is the sensitivity of the ith component x i (t) of the state vector x(t) ∈ R n with respect to the control vector c ∈ R q given by (∂x i /∂c 1 , ∂x i /∂c 2 , . . ., ∂x i /∂c q ).Thus, it follows from (31) that ∇ c J 1 is a linear combination of the columns of D T c (x) which are the sensitivity vectors, where the coefficients of the linear terms are the components of η(t).Thus, columns of D T c (x) with large norms that are associated with large forecast errors will be dominant in determination of the components of ∇J 1 (c).In other words, we gain some insight into the interplay between corrections to control and forecast errors-something that can be seen through a careful examination of the sensitivity vector at various times from initial state to forecast horizon.(The illustrative example in Section 5 further explores this diagnostic function.)Expression (31) also enables us to isolate the effect of different components x i of x(t) on the performance index J 1 (c).
For the general case of observations at multiple times, (31) assumes the following form: This gradient is the sum of linear combinations of the columns of D T c (x(t i )) at various time instances.With so many directions (the directions associated with the columns of D T c (x(t i )) contributing to the components of ∇ c J, the connection between sensitivity, and forecast error is obscured.

Data Assimilation Using Sensitivity
We seek to find the solution to the following problem using the FSM.Given f (•) and h(•), the control vector c, the observation z(t), and the error covariance of observations R(t), find a correction δc to the control vector such that the new model forecast starting from (c + δc) will render the forecast error e F (t) purely random; that is, the systematic forecast error is removed and accordingly E(e F (t)) = 0.
We start by quantifying the change Δx in the solution x(t) = x(t, c) resulting from a change δc in c.Invoking the standard Taylor series expansion, we obtain where δ k x is the kth variation of x(t), the fraction of the total change that can be accounted by the kth partial derivatives of x(t) with respect to c and the perturbation δc.Since practical considerations dictate that the total number of correction terms on the right-hand side of (35) be finite, we often settle for an approximation of only k terms (k generally ≤ 2).This is a tradeoff between improved accuracy resulting from a large value of k and the complexity of the resulting inverse problem.Although we have developed the methodology for second-order analysis (k = 2, where Δx is approximated by the sum δx + δ 2 x) (Lakshmivarahan and Lewis [8]), our development will follow the first-order analysis (k = 1, where Δx is given by the first variation δx).Second-order analysis is justified when δ 2 x is a significant fraction of δx-this occurs when f (x) and/or h(x) exhibit strong nonlinearity.It is further shown in Section 5 that iterative application of the first-order method often leads to improved results.

First-Order Analysis.
From first principles and (35) we obtain where D x(0) (x) ∈ R n×n is the Jacobian of x(t) with respect to x(0), and D α (x) ∈ R n×p is the Jacobian of x(t) with respect to α.The matrices D x(0) (x) and D α (x), found as solutions of ( 13) and ( 9), respectively, are known as the first-order sensitivity of the solution x(t) with respect to x(0) and α, respectively, and the elements of these matrices are called sensitivity functions.

Observations at One Time Only.
We first consider the case where observation z(t) ∈ R m is available at one time t.The first variation δx in x(t) induces a variation Δh in the forward operator h(x(t)).Again, by approximating Δh by the first variation, we get where D x (h) ∈ R m×n is the Jacobian of h(•) with respect to x and is given by substituting (36) into (37), we obtain where (n+p) and σ = (σ 1 , σ 2 ) ∈ R n+p , where σ 1 = δx(0) and σ 2 = δα, (39) becomes Given the operating point c, our goal is to find the perturbation δc such that the observation is equal to the model counterpart, that is, From (43), it follows that the required perturbation σ ∈ R n+p is obtained by solving the linear inverse problem where H(t) ∈ R m× (n+p) and e F (t) ∈ R m .

Observations at Multiple Times.
The above analysis can be readily extended to the case where observations are available at N times.We denote these sets of observation vectors by z(t 1 ), z(t 2 ), . . ., z(t N ), where 0 Define Then at time t i we have where Now define a matrix H ∈ R Nm×(n+p) and a vector e F ∈ R Nm as
Then, the N relations in (46) can be succinctly denoted by A number of special cases arise depending on (a) the value of Nm relative to (n + p), namely, over (under) determined cases when Nm > (<)(n + p) and (b) the rank of the matrix H(t), namely, H(t) is of full rank or rank deficient.In all these cases, the linear inverse problem (43) is recast as a minimization problem using the standard least squares framework (Lawson and Hanson [9]).The resulting minimization problem can then be solved using one of many standard methods, for example, the conjugate gradient method (Lewis et al. [7]; Nash and Sofer [10]).
As an illustration, consider the case when Nm > (n + p) and that the rank of H is (n + p), that is, full rank.The solution σ is then obtained by minimizing the weighted least squares criterion where is an Nm × Nm diagonal matrix with R(t i ) as its ith diagonal block.
Although it is computationally efficient to minimize (50) by using a method like conjugate gradient, there is an advantage to analyze the properties of the optimal solution via the classical approach, that is, by setting the gradient of J N (σ) to zero.It can be verified that the minimizing J N is found by solving the symmetric linear system or succinctly as where H, e F , and R are defined in ( 48) and ( 51), and subscript "LS" refers to the least squares solution.
From the discussion relating to the classification of forecast errors, recall that the forecast error inherits its randomness from the (unobservable) observation noise.The vector e F on the right hand side of ( 53) is random and hence the solution σ of ( 53) is also random.
Since we are interested in forecast errors in response to incorrect control, we have Hence, the vector e F in (48) can be expressed as 56) into (53) and taking the expectation give Thus, the expected value of the correction to control is indeed a linear function of the forecast error b itself.It can be verified (Lewis et al. [7]) that the covariance of the least squares estimate [(53)] is given by where Advances in Meteorology 7

A Practical Example: Air/Sea Interaction
We choose a simple but nontrivial differential equation to demonstrate the applicability of the forward-sensitivity method to identification of error in a dynamical model.We break this discussion into three parts as follows: (1) the model, (2) discussion of the diagnostic value of FSM, and ( 3) numerical experiments with data assimilation using FSM.
5.1.The Model.Consider the case where cold continental air moves out over an ocean of constant surface temperature.We follow a column of air in a Lagrangian frame; that is, the column of air moves with the prevailing low-level wind speed.Turbulent transfer of heat from the ocean to air warms the column.The governing equation is taken to be where θ: temperature of the air column ( • C), θ s : temperature of the sea surface ( • C), C T : turbulent heat exchange coefficient (nondimensional), V : speed of air column (ms −1 ), H: height of the column (mixed layer)(m), τ: time (h).Equation ( 59) is nondimensionalized by the following scaling: The governing equation then takes the form Assuming Thus, we take our governing equation to be where k = 0.25.The solution to (63) is The solution depends linearly on x(0) and x s but nonlinearly on k.
There are three elements of control: initial condition, x(0), boundary condition, x s , and parameter, k.

Diagnostic Aspects of FSM.
The Jacobians of f with respect to x and α are given by and the Jacobians of the solution x(t) with respect to α and x(0) are given by From ( 9) and ( 13) the evolution of the forward sensitivities is given by where Either by solving (67)-(70) or by computing directly from (64), it can be verified that the required sensitivities evolve according to The plots of the solution and the three sensitivities are given in Figure 1.
Let z(t) be the direct observation of the state x(t), namely, In this case, h(t) is the forecast variable and therefore is the forecast error.Following the developments in Section 4 for the case of a single observation described by (39)-(46), we obtain the analog of (46) as where H = H(t) = [∂x(t)/∂x(0), ∂x(t)/∂x s , ∂x(t)/∂k] is the forward vector and σ = [δx(0), δx s , δk] T .Clearly, (74) corresponds to an under-determined linear least squares problem, whose optimal solution is given by [7, chapter 5] where H T / H is the unit forward sensitivity vector at time t and e F (t)/ H is a scalar that is the forecast error normalized by the norm of the forward sensitivity vector H.In other words, the direction of the optimal corrections to the control that annihilate the forecast error is a constant multiple of the unit forward sensitivity vector.
If we assume the following control vector: c = (1.0,11.0, 0.25), that is, [x(0) = 1 • C, x s = 11 • C, and k = 0.25 (nondimensional)], we get The time variations of elements of H are given in Table 2 (also refer to Figure 1).From this table, it is clear that the direction of corrections to control varies as t increases.At t = 0, the corrections lie in the direction (1, 0, 0) T , where x(t) is only sensitive to the initial condition x(0).For large t, the corrections lie in the direction (0, 1, 0) T , where x(t) is only sensitive to the boundary condition x s (the sea-surface temperature).For intermediate times, all the components of control have nonzero sensitivities.∂x(t)/∂k reaches its maximum at t = 4.0.

Experiments.
We assume that the incorrect control vector is c = (x (0), x s , k ) = (2.0,10.0, 0.30); in dimensional form, x (0) = 2 • C, x s = 10 • C, and k = 0.30 (non-dimensional).Thus, for an ideal correction to control, To mimic reality, the correction process uses sensitivity functions that stem from the erroneous solution, that is, where the incorrect control is used.
We have explored both the goodness and failure of recovery of control under two different scenarios, where either 3 or 6 observations are used to recover the control vector.Since there are 3 unknowns, the case for 6 observations is an overdetermined system.
We execute numerical experiments where the observations are spread over different segments of the temporal range-generally divided into an "early stage" and a "saturated stage."By saturated stage we refer to that segment where the solution becomes close to the asymptotic state, that is, x → x s .The dividing time between these segments is arbitrary; but generally, based on experimental results to follow, we divide the segments at t = 24 where x(24) = 10.975,0.025 less than x s = 11.0 (see Figure 1).
The following general statement can be made.If more than one of the observations falls in the saturated zone, the matrix becomes illconditioned.As can be seen from ( 39) and the plots of sensitivity functions in Figure 1, ∂x/∂x s → 1 and ∂x/∂x(0) and ∂x/∂k → 0 as t → ∞.Accordingly, if two of the observations are made in the saturated zone, this induces linear dependency between the associated rows of the H matrix and in turn leads to ill-conditioning.This illconditioning is exhibited by a large value of the condition number, the ratio of the largest to the smallest eigenvalue of the matrix H T H.The inversion of this matrix is central to the optimal adjustments of control (see (55)).
Illconditioning can also occur as a function of the observation spacing in certain temporal segments.This is linked to lack of variability or lack of change of sensitivity from one observation time to another.And, as can be seen in Figure 1, the absolute value of the slope in sensitivity function curves is generally large at the early stages of evolution and small at later stages.As an example, we find satisfactory recovery, δc = (-0.882,-0.067, +0.922), when the observations are located at 5.0, 5.1, and 5.2 (a uniform spacing of Δt = 0.1).Yet, near the saturated state, at t = 20.0,20.1, and 20.2, again a spacing of 0.1, the recovery is poor with the result δc = (+5.317,-0.142, +0.998).The associated condition numbers for these two experiments are 1.0 × 10 3 and 1.0×10 6 , respectively.Similar results follow from the case where 6 observations are taken.In all of these cases, the key factor is the condition number of H T H.For our dynamical constraint, a condition number less than ∼10 4 portends a good result.
For the case where we have 6 observations at t = 2, 7, 12, 17, 22, and 27, with a random error of 0.01 (standard deviation), we have executed an ensemble experiment with 100 members to recover control.In this case, the condition number is 2.4 × 10 3 .Results are plotted three dimensionally and in two-dimensional planes in the space of control, that is, plots of correction in the x s /x(0), x s /k, and x(0)/k planes.Results are shown in Figure 2.
Finally, we explore the iterative process of finding corrections to control.Here, the results from the 1st iteration are used to find the new control vector.This vector is then used to make another forecast and find a new set of sensitivity functions.The error of the forecast is obtained, and along with the new sensitivity functions, a second correction to control is found, and so forth.For the experiment with 6 observations that has been discussed in the previous paragraph, we apply this iterative methodology.As can be seen in Figure 3, the correct value of control is found in 3 iterations.

Concluding Remarks
The basic contributions of this paper may be stated as follows.
(1) While the 4D-Var has been the primary methodology for operational data assimilation in meteorology/oceanography (Lewis and Lakshmivarahan [11]), and while forward sensitivity has been a primary tool for reaction kinetics and chemistry (Rabitz et al. [1]) and air quality modeling (Russell et al. [3], to our knowledge these two methodologies have not been linked.We have now shown that the method of computing the gradient of J(c) by these two approaches exhibits a duality hitherto unknown.
(2) By treating the forward sensitivity problem as an inverse problem in data assimilation, we are able to understand the fine structure of the forecast error.This is not possible with the standard 4D-Var formulation using adjoint equations.
(3) While it is true that computation of the evolution of the forward sensitivity involves computational demands beyond those required for solving the adjoint equations in the standard 4D-Var methodology, there is a richness or augmentation to the information that comes with this added computational demand.In essence, it allows us to make judicious decisions on placement of observations through understanding of the time dependence of correction to control.From definitions (A.3)-(A.5), it can be verified that, for u < s < t, M(t, u) = M(t, s)M(s, u), L(t, u) = L(t, s) + L(s, u). (A.6) We now consider two special cases.

Figure 2 :
Figure2: 3D cluster and its projections where corrections are computed using observations at the following times: t = 2, 7, 12, 17, 22, and 27.An ensemble of 100 members is used where the standard deviation of the observation noise is 0.01.

Table 2 :
Sensitivities of model state to initial conditions and parameters for the sea/air turbulent transfer model.