Applied Koopman Theory for Partial Differential Equations and Data-Driven Modeling of Spatio-Temporal Systems

. We consider the application of Koopman theory to nonlinear partial di ﬀ erential equations and data-driven spatio-temporal systems. We demonstrate that the observables chosen for constructing the Koopman operator are critical for enabling an accurate approximation to the nonlinear dynamics. If such observables can be found, then the dynamic mode decomposition (DMD) algorithm can be enacted to compute a ﬁ nite-dimensional approximation of the Koopman operator, including its eigenfunctions, eigenvalues, and Koopman modes. We demonstrate simple rules of thumb for selecting a parsimonious set of observables that can greatly improve the approximation of the Koopman operator. Further, we show that the clear goal in selecting observables is to place the DMD eigenvalues on the imaginary axis, thus giving an objective function for observable selection. Judiciously chosen observables lead to physically interpretable spatio-temporal features of the complex system under consideration and provide a connection to manifold learning methods. Our method provides a valuable intermediate, yet interpretable, approximation to the Koopman operator that lies between the DMD method and the computationally intensive extended DMD (EDMD). We demonstrate the impact of observable selection, including kernel methods, and construction of the Koopman operator on several canonical nonlinear PDEs: Burgers ’ equation, the nonlinear Schrödinger equation, the cubic-quintic Ginzburg-Landau equation, and a reaction-di ﬀ usion system. These examples serve to highlight the most pressing and critical challenge of Koopman theory: a principled way to select appropriate observables.


Introduction
Data-driven mathematical methods are increasingly important for characterizing complex systems across the physical, engineering, social, and biological sciences.These methods are aimed at discovering and exploiting a relatively small subset of the full space where low-dimensional models can be used to describe the evolution of the system.Thus, solutions can often be approximated through dimensionality reduction methods where if n is the dimension of the original high-dimensional system and r is the dimension of the subspace (or slow-manifold) where the dynamics is embedded, then r ≪ n.The reduced order modeling (ROM) community has used this to great effect in applications such as largescale patterns of atmospheric variability [1], turbulent flow control architectures [2], and/or spatio-temporal encodings in neurosensory systems [3].Traditionally, the large-scale dynamics may be embedded in the low-dimensional space using, for instance, the proper orthogonal decomposition (POD) in conjunction with Galerkin projection.More recently, the Dynamic Mode Decomposition (DMD) and its Koopman generalization have garnered attention due to the fact that they can (i) discover low-rank spatiotemporal patterns of activity and (ii) they can embed the dynamics in the subspace in an equation-free manner, unlike the Galerkin-POD method of ROMs.In this manuscript, we demonstrate that the judicious, and parsimonious, selection of observables for the Koopman architecture can yield accurate low-dimensional embedding for nonlinear partial differential equations (PDEs) while keeping computational costs down and avoiding costly cross-validation.Critical to its success is an appropriate choice of observables, which is demonstrated to act as a nonlinear manifold learning method.We demonstrate the success of the method, and compare it to traditional DMD, on several canonical PDE models: Burgers' equation, the nonlinear Schrödinger equation, the cubic-quintic Ginzburg-Landau equation, and a λ − ω reaction-diffusion system.
Historically, the DMD method originated in the fluid dynamics community as a principled technique to decompose complex flows into a simple representation based on low-rank spatio-temporal coherent structures.Schmid [4] first defined the DMD algorithm and demonstrated its ability to provide physically interpretable insights from highdimensional fluid data.The growing success of DMD stems from the fact that it is an equation-free data-driven method capable of providing an accurate decomposition of a complex system into spatio-temporal coherent structures that may be used for diagnostic analysis, short-time future state prediction, and control.Importantly, Rowley et al. [5] showed that DMD is connected to the underlying nonlinear dynamics through Koopman operator theory [6] and is readily interpretable using standard dynamical system techniques [7][8][9][10].Specifically, the DMD algorithm is a manifestation of Koopman theory when the observable functions are the identity or a linear transformation of the underlying state space.Thus, DMD is a principled algorithmic architecture allowing for an explicit approximation of the Koopman operator.For more details, there are numerous detailed references [5,11,12].
The approximation of the Koopman operator via DMD is critically important for enabling evaluation of the operator from data.Indeed, it transforms Koopman theory from an abstract mathematical conception to a readily tractable computation.It also highlights the important role played by observables and their associated evolution manifolds.In particular, nonlinear PDEs can be thought to evolve on manifolds which are often difficult to characterize and are rarely known analytically.A correct choice of observables can, in some cases, linearize the nonlinear manifold.For instance, the nonlinear evolution governed by Burgers' PDE equation can be linearized by the Cole-Hopf transformation, thus providing a linear manifold which can trivially describe the evolution dynamics.Such exact solutions to nonlinear PDEs are extremely rare and do not often exist in practice, with the inverse scattering transform (IST) for Korteweg-deVries, nonlinear Schrödinger, and other integrable PDEs being the notable exceptions [13].Regardless, judiciously chosen observables can help transform a PDE evolving on a strongly nonlinear manifold to a weakly nonlinear manifold, enabling a more accurate and broader range of applicability of the Koopman approximation.
The selection of appropriate observables remains one of the most important and open challenges for Koopman theory.The widely used DMD algorithm simply takes the state space variable as the observable.This provides the simplest approximation to the Koopman operator.An alternative approach has been advocated by Williams et al. and Kevrekidis et al. [14,15] whereby machine learning concepts (e.g., support vector machines (SVM)) are used to project the data to a large number of variables using the so-called extended DMD and kernel DMD (EDMD) methods.Thus, the DMD approximation is computed in a large set of nonlinear observables of the original data.Recently, it has been shown that the EDMD method is equivalent to the variational approach of conformation dynamics (VAC) [16,17], which was first derived by Noé and Nüske in 2013 for simulating slow processes in stochastic dynamical systems applied to molecular kinetics [18,19].The authors further show that time-lagged independent component analysis (TICA) [20,21], which was originally developed in 1994, is closely related to the DMD algorithm [17].Regardless of the EDMD/VAC strategy, it is well known in the machine learning literature, such projections into higher dimensional space, through SVM or deep neural nets, can lead to improved predictions at the cost of loss of interpretability.It also projects to variables that may have no natural association with the underlying physics or nonlinear manifold of the dynamics being considered.Importantly, Klus et al. [17] show that the EDMD/VAC method requires a principled cross-validation strategy in order to make the technique useful.
Our approach is aimed at improving the straightforward DMD approximation by adding a parsimonious set of judiciously chosen variables which are motivated by the governing equations, i.e., it is a version of EDMD with only a few extra variables.Thus, simple choices of nonlinear observables can greatly improve the Koopman approximation.Moreover, we show that a clear goal in selecting observables is to move DMD eigenvalues onto the imaginary axis.We show that selecting a parsimonious set of observables allows us to capitalize on the EDMD architecture while only incurring a marginal increase in computational costs and avoiding costly cross-validation for computing an improved approximation to the Koopman operator.Further, the judicious choice of observables can also be used to help understand the nonlinear manifold on which the dynamics evolve.Ultimately, this provides a valuable third option for variable selection that sits between the standard application of the DMD and EDMD methods.

Koopman Theory, Observables, and Dynamic Mode Decomposition
The original work of Koopman in 1931 [6] considered Hamiltonian systems and formulated the Koopman operator as a discrete-time mapping.In the following year, Koopman and von Neumann extended these results to dynamical systems with continuous spectra [22].Critical to implementing this definition numerically is understanding how to choose a finite set of observables g x .This remains an open challenge today and will be addressed in our PDE examples.By construction, the Koopman operator is a linear infinite-dimensional operator that acts on the Hilbert space H of all scalar measurement functions g.The Koopman operator acts on functions of the state space of the dynamical system, trading nonlinear finite-dimensional dynamics for 2 Complexity linear infinite-dimensional dynamics.It can be further generalized to map infinite-dimensional nonlinear dynamics to infinite-dimensional linear dynamics by appropriate choice of observables.In practice, the computation of the Koopman operator will require a finite-dimensional representation.
The advantage of the Koopman representation is compelling: linear problems can be solved using standard linear operator theory and spectral decompositions.With such methods, the infinite dimensional representation is handled by considering a sufficiently large, but finite, sum of modes to approximate the Koopman spectral solution.The Koopman operator is defined for discrete-time dynamical systems.A continuous dynamical system will induce a discrete-time dynamical system given by the flow map F t M ⟶ M, which maps the state x t 0 to a future time x t 0 + t : This induces the discrete-time dynamical system where x k = x kt .The analogous discrete-time Koopman operator is given by K t such that K t g = g ∘ F t .Thus, the Koopman operator sets up a discrete-time dynamical system on the observable function g: If an appropriate Koopman operator can be constructed, then linear operator theory provides the spectral decomposition required to represent the dynamical solutions of interest.Specifically, the eigenfunctions and eigenvalues of the Koopman operator K give a complete characterization of the dynamics.We consider the eigenvalue problem The functions φ k x are Koopman eigenfunctions, and they define a set of intrinsic measurement coordinates, on which it is possible to advance these measurements with a linear dynamical system.The low-dimensional embedding of the dynamics is ultimately extracted from the Koopman eigenfunctions.More precisely, a reduced-order linear model can be constructed by a rank-r truncation of the dominant eigenfunctions φ k .
A vector of observables g, which is in our new measurement space, may be expressed in terms of Koopman eigenfunctions φ as where v k is the kth Koopman mode associated with the kth Koopman eigenfunction φ k , i.e., it is the weighting of each observable on the eigenfunction.In the original theory [6], Koopman considered Hamiltonian flows that are measure preserving, so that the Koopman operator is unitary.In this case, the eigenfunctions are all orthonormal, and (5) may be written explicitly as The dynamic mode decomposition algorithm is used to compute an approximation to the Koopman eigenvalues λ k and modes v k .
The nonlinear dynamical system defined by f and the infinite-dimensional linear dynamics defined by K are equivalent representations of a dynamical system.One can either evolve the system in the original state space, requiring computational effort since it is nonlinear, or one can instead evolve using (5) so that the time dynamics are trivially computed Thus, future solutions can be computed by simple multiplication with the Koopman eigenvalue.Such a mathematical strategy for evolving nonlinear dynamical systems would always seem to be advantageous.However, it remains an open challenge how to systematically link the observables g and the associated Koopman mode expansion to the original evolution defined by f .For a limited class of nonlinear dynamics, this can be done explicitly [23].
In theory, the modification of the Koopman operator to PDEs would generate eigenfunctionals of the Koopman operator.In practice, the discretization of space and time, either in experiment or simulation, yields a high-dimensional system of ODEs.The PDE itself imposes clear relations between the high-dimensional data which correspond to spatial locations.Specifically, the data generated from the PDE most certainly inherit the underlying dynamics enforced by spatial relations and their spatial derivatives, leading to dimensionality-reduction and low-rank truncation possibilities.In our examples, the spatio-temporal patterns can be represented in POD/DMD modes with a truncation using the dominant r modes.This truncation to a lowdimensional space is a direct consequence of the PDE nature of the solutions.For a thorough discussion of the difference simply between high-dimensional ODEs and PDEs, please see [24].
3 Complexity Figure 1 illustrates the underlying concept in the Koopman approach.A dynamical system consisting of snapshots x k evolves according to the nonlinear dynamical system defined by F t in (2).In the state space, the nonlinearity generates a nonlinear manifold in which the data are embedded.The DMD approximation produces a least-square fit linear dynamical system approximating the flow map and the low-dimensional embedding (left panel of Figure 1).Koopman theory ideally defines an operator that attempts to linearize the space in which the data are embedded.The Koopman operator then produces a linear flow map and low-dimensional embedding that approximates the full nonlinear dynamics (right panel of Figure 1).
The nonlinear manifold on which the dynamics evolve can change due to parameter changes in the PDE.For instance, the dynamics can undergo bifurcation and generate a new nonlinear manifold.This requires building a new Koopman operator to characterize the dynamics.The Koopman embedding can be used to build libraries of low-rank representations for the dynamics.The concept of library building of low-rank "features" from data is well established in the computer science community.In the reduced-order modeling community, it has recently become an issue of intense investigation.Indeed, a variety of recent works have produced libraries of ROM models that can be selected and/or interpolated through measurement and classification [25][26][27][28][29][30][31].Alternatively, cluster-based reduced order models use a k-means clustering to build a Markov transition model between dynamical states [32].More recently, such techniques have been applied using the DMD approximation for the Koopman operator [33].The modeling of parametric systems remains an open challenge for model reduction frameworks.

The DMD and Koopman Algorithms
The DMD algorithm underlies the computation of the Koopman eigenvalues and modes directly from data.Its effectiveness depends sensitively on the choice of observables.Rowley et al. [5] showed that DMD approximates the Koopman operator for the set of observables g x = x.We will use this fact in constructing a DMD algorithm for observables of x instead of the state variable itself.To start, we use the following definition of the DMD decomposition [11]: Definition 1. Dynamic mode decomposition [11]: suppose we have a dynamical system (2) and two sets of data with x k an initial condition to (2) and x ′ k its corresponding output after some prescribed evolution time Δt with there being m initial conditions considered.The DMD modes are eigenvectors of where † denotes the Moore-Penrose pseudoinverse.
The above definition provides a computational method for evaluating the Koopman operator for a linear observable.In practice, three practical constraints must be considered: (i) We have data X and X′, but we do not necessarily know F t • , (ii) We will have to make a finite-dimensional approximation to the infinite-dimensional Koopman operator K, and (iii) We will have to judiciously select the observables g x in order to have confidence that that Koopman operator will approximate the nonlinear dynamics of F t • .Points (i) and (ii) go naturally together.Specifically, the number of measurements in each column of X and X ′ is n, while the number of total columns (time measurements) is m.Thus, finite-dimensionality is imposed simply from the data collection limitations.The dimension can be increased with a large set of observables, or it can be decreased via a lowrank truncation during the DMD process.The observables are more difficult to deal with in a principled way.Indeed, a good choice of observables can make the method extremely effective, but it would also require expert knowledge of the system at hand [23].This will be discussed further in the examples.
The left panel illustrates the nonlinear manifold on which the dynamical system defined by F t in (2) generates a solution.DMD approximates the evolution on this manifold by a least-square fit linear dynamical system.In contrast, the selection of appropriate observables g x define a Koopman operator that helps linearize the manifold so that a least-square fit linear dynamical system provides a much better approximation to the system (right panel).4 Complexity The following gives a practical demonstration of how to use the data, the DMD algorithm, and the observables to produce a Koopman operator and a future state prediction of the nonlinear evolution.The Koopman algorithm simply applies DMD on the space of observables.
(1) From the data matrices X and X′, create the data matrices of observables Y and Y ′ : where each column is given by where W comes from the eigenvalue problem A Y W = WΛ and Y = UΣV * .Note that an r-rank truncation of the SVD is performed at this stage (4) The future state in the space of observables is given by the linear evolution where b = Φ † Y y 1 is determined by projecting back to the initial data observable.The continuous-time eigenvalues ω are obtained from the discrete-time eigenvalues λ k (i.e., diagonal elements of the matrix Λ) where ω k = ln λ k /Δt (5) Transform from observables to state space This last step is trivial if one of the observables selected to comprise g x k is the state variable x k itself.If only nonlinear observables of x k are chosen, then the inversion process can be difficult.
This process shows that the DMD algorithm is closely related to the Koopman operator.Indeed, it is the foundational piece for practical evaluation of the finitedimensional Koopman operator.It is stressed once again here: selection of appropriate observables is critical for the algorithm to generate good reconstructions and approximations to the future state.We can also now introduce the following theorem [5,11,14,34].
Theorem 1. Koopman and dynamic mode decomposition: let φ k be an eigenfunction of K with eigenvalue λ k , and suppose φ k ∈ span g j , so that Note here that the observables g j x as introduced in the theorem [5,11,14,34] are not denoted as vectors since the theorem applies to functions.However, in practice, when a system is discretized, then g j x ⟶ g j x as is explicitly constructed in the algorithm above.Thus, the Koopman eigenvalues are the DMD eigenvalues provided (i) the set of observables is sufficiently large so that φ k x ∈ span g j and (ii) the data is sufficiently rich so that w ∈ R X .This directly shows that the choice of observables is critical in allowing one to connect DMD theory to Koopman spectral analysis.If this can be done, then one can simply take data snapshots of a finite-dimensional nonlinear dynamical system in time and reparameterize it as a linear system in the observable coordinates, which is amenable to a simple eigenfunction (spectral) decomposition.This representation diagonalizes the dynamics and shows that the time evolution of each eigenfunction corresponds to multiplication by its corresponding eigenvalue.

Koopman Observables and Kernel Methods
The effectiveness of Koopman theory hinges on one thing: selecting appropriate observables.Once observables are selected, the previous section defines a DMD-based algorithm for computing the Koopman operator whose spectral decomposition completely characterizes the approximation.In the machine learning literature, observables are often the basis of generating features, and we will build upon this concept to generate appropriate observables.An important practical consideration becomes the computational cost in generating the DMD approximation as the number of rows in the matrices Y and Y ′ gets progressively larger with each additional observable.

Complexity
To be more precise about the distinction between the EDMD/VAC and the present work, we emphasize that the state-of-the-art machine learning approach to EDMD is given by Williams et al. [14] and Klus et al. [17].These works provide a framework for projection to observables through kernel SVM-like methods, as well as showing how to crossvalidate the results.As is common in such schemes, interpretability and generalizability is typically lost.In the current work, our goal is to select a parsimonious set of observables based upon knowledge of the physics or its constraints.It provides a viable strategy whereby expert knowledge can be leveraged to guide the selection of observables.In the previous works [14,17], parsimony was not used in the regression framework.Here, parsimony to produce interpretable models, and potentially generalizable models, is the critical innovation advocated.It should be noted that we illustrate some of the kernel methods here in order to simply show the lack of robustness of EDMD in the absence of parameter tuning.
In the absence of expert-in-the-loop knowledge of the dynamical system, one might consider, for instance, the support vector machine (SVM) literature and associated kernel methods [35][36][37][38] for feature selection (observables).The SVM architecture suggests a number of techniques for constructing the feature space g x , with a common choice being the set of polynomials such that Using a large number of polynomials can generate an extremely large vector of observables for each snapshot in time.This is closely related to the Carleman linearization technique in dynamical systems [39][40][41].Alternatively, kernel methods have found a high degree of success using (i) radial basis functions, typically for problems defined on irregular domains, (ii) Hermite polynomials for problems defined on ℝ n , and (iii) discontinuous spectral elements for large problems with block diagonal structures.Regardless of the specific choice of feature space, the goal is to choose a sufficiently rich and diverse set of observables that allow an accurate approximation of the Koopman operator K. Instead of choosing the correct observables, one then simply chooses a large set of candidate observables with the expectation that a sufficiently diverse set will include enough features for an accurate reconstruction of the Koopman modes, eigenfunctions, and eigenvalues, which intrinsically characterize the nonlinear dynamical system.
Williams et al. and Kevrekidis et al. [14,15] have recently capitalized on the ideas of machine learning by implementing the so-called extended DMD and kernel DMD method on extended observables (16) within the DMD architecture.Moreover, they have developed an efficient way to compute A Y even for a large observable space.The kernel DMD method is the most relevant in practice as the number of observables (features) can rapidly grow so as to make n extremely high-dimensional.In the context of the Koopman operator, the kernel trick [35][36][37][38] will define a function h x, x ′ that can be related to the observables g j x used for constructing Y and Y ′ .Consider the simple example of a quadratic polynomial kernel where x and x ′ are data points in ℝ 2 .When expanded, the kernel function takes the form where Y x = 2x 1 2x 2 2x 1 x 2 x 1 2 x 2 2 T .Note that for this case, both the Koopman observables and the kernel function (17) are equivalent representations that are paired together through the expansion (18).The socalled kernel trick posits that ( 17) is a significantly more efficient representation of the polynomial variables that emerge from the expansion (18).Instead of defining the Koopman observables g i x , we instead define the kernel function (17) as it provides a compact representation of the infinitedimensional feature space and an implicit computation of the inner products required for the Koopman operator.
The computational advantages of the kernel trick are considerable.For example, a polynomial kernel of degree p acting on data vectors x and x′ in ℝ n is given by which requires a single computation of the inner product α = x T x ′ .This requires O n and produces f x, x ′ = 1 + α p where α is a constant.The resulting computational cost for this pth degree polynomial kernel remains O n .In contrast, the alternative observable space using Y requires construction of a vector of length taking the form Computing the inner product Y T x′ Y x is a significantly larger computation than the kernel form (19). Thus, the kernel trick enables an advantageous representation of the various polynomial terms and circumvents the formation and computation associated with (21).It should be noted that these two representations are not equivalent.Rather, they give two different representations of nonlinear observables from which a feature space can be extracted.In the former 6 Complexity (19), the computations are tractable, while in the latter (21), the computation quickly becomes intractable.
The choice of kernel is important and in practice is not robust for Koopman methods.Some standard choices are often used, including the three most common kernels of SVM-based data methods: The advantage of the kernel trick is quite clear, providing a compact representation of a very large feature space.For the polynomial kernel, for instance, a 20th-degree polynomial p = 20 using (22a) is trivial and does not compute all the inner products directly.In contrast, using our standard Koopman observables g x j would require one to explicitly write out all the terms generated from a 20thdegree polynomial on an n-dimensional data set, which is computationally intractable for even moderately large n.The tuning parameter a must be carefully chosen in practice for reasonable results.
In practice, the observables for Y are implicitly embedded in the kernel h x, x ′ .Specifically, we consider the observable matrix elements defined by where j, k denotes the jth row and kth column of the correlation matrix, and x j and x ′ k are the jth and kth columns of data.The kernel DMD formulation still requires the computation of the matrices V and Σ which can be produced from Y * YV = Σ 2 V.As before, the matrix elements of Y * Y are computed from Y * Y j, k = h x j , x k .Thus, all the required inner products are computed by projecting directly to the new feature space defined by the specific kernel used.Note that if the linear kernel function h x, y = x T y is chosen, the kernel DMD reduces to the standard DMD algorithm.

Application to PDEs
To demonstrate the Koopman operator concepts, we apply the methodology to various illustrative and canonical PDEs: Burgers' equation, nonlinear Schrödinger equation, the cubic-quintic Ginzburg-Landau equation, and a reaction-diffusion model.With these examples, we can (i) illustrate a scenario where the Koopman operator can exactly (analytically) linearize a dynamical system, (ii) demonstrate how to judiciously select observables, and (iii) show that kernel methods are highly sensitive as an observable selection technique.The simulations are based upon a pseudospectral technique whereby the spatial domain and its derivatives are computed in the Fourier domain using the Fast Fourier Transform (FFT).The time-stepping algorithm is based upon an adaptive 4th-order Runge-Kutta scheme, i.e., ode45 in MATLAB.By default, the FFT-based strategy imposes periodic boundary conditions.

Burgers' Equation.
To demonstrate the construction of a specific and exact Koopman operator, we consider the canonical nonlinear PDE: Burgers' equation with diffusive regularization.The evolution, as illustrated in Figure 2(a), is governed by diffusion with a nonlinear advection [42]: When ϵ = 0, the evolution can lead to shock formation in finite time.The presence of the diffusion term regularizes the PDE, ensuring continuous solutions for all time.
Burgers' equation is one of the few nonlinear PDEs whose analytic solution form can be derived.In independent seminal contributions, Hopf [43] and Cole [44] derived a transformation that linearizes the PDE.The Cole-Hopf transformation is defined as follows

Complexity
The transformation to the new variable v x, t replaces the nonlinear PDE (24) with the linear diffusion equation where it is noted that ϵ > 0 in (24) in order to produce a well-posed PDE.
The diffusion equation can be easily solved using Fourier transforms.Fourier transforming in x gives the ODE system where v = v k, t denotes the Fourier transform of v x, t and k is the wavenumber.The solution in the Fourier domain is easily found to be v = v0 exp −ϵk 2 t , 28 where v0 = v k, 0 is the Fourier transform of the initial condition v x, 0 .
To construct the Koopman operator, we can then combine the transform to the variable v x, t from ( 25) with the Fourier transform to define the observables The Koopman operator is then constructed from (28) so that This is one of the rare instances where an explicit expression for the Koopman operator and the observables can be constructed analytically.The inverse scattering transform [13] for other canonical PDEs, Korteweg-deVries (KdV) and nonlinear Schrödinger (NLS) equations, also can lead to an explicit expression for the Koopman operator, but the scattering transform and its inversion are much more difficult to construct in practice.
To make comparison between Koopman theory and DMD, we consider the DMD method applied to governing (24).Applying the algorithm of Section 3 to the observables g x = x gives the DMD approximation to the Burgers' dynamics as shown in Figure 2(b).For this simulation, data snapshots were collected at intervals of Δt = 1 for the time range t ∈ 0, 30 .The singular value decay for the dynamics is shown in Figure 3(a), suggesting that a rank r = 15 truncation is appropriate.The DMD spectra and DMD modes are illustrated in Figures 3(b) and 3(c), respectively.Thus, using 8 Complexity u x, t directly as an observable produces a low-rank model with fifteen modes.In contrast, by working with the observable (30), the Koopman operator can be trivially computed (31) and the dynamics analytically produced without need of approximation.In this case, the Koopman operator exactly linearizes the dynamics.This is the ideal which is hoped for but rarely achieved with nonlinear PDEs (or nonlinear dynamical systems in general).

Nonlinear Schrödinger Equation.
The example of Burgers' equation was easy to quantify and understand since the Cole-Hopf transformation was discovered nearly seven decades ago.Thus, the observables chosen were easily motivated from knowledge of the analytic solution.Unfortunately, it is rarely the case that such linearizing transformations are known.In our second example, we consider the Koopman operator applied to a second canonical nonlinear PDE: the nonlinear Schrödinger equation where u x, t is a function of space and time modeling slowly varying optical fields or deep water waves, for instance.Discretizing in the spatial variable x, we can Fourier transform the solution in space and use a standard time-stepping algorithm, such as a fourth-order Runge-Kutta, to integrate the solution forward in time.
As with Burgers' equation, we can compute the DMD by collecting snapshots of the dynamics over a specified time window.Specifically, we consider simulations of the equation with initial data u x, 0 = 2 sech x 33 over the time interval t ∈ 0, π .Twenty-one snapshots of the dynamics are collected during the evolution, allowing us to create the snapshot matrix X and X′.The DMD reconstruction of the dynamics is demonstrated in Figure 4(a).The low-rank DMD reconstruction provides a good approximation to the dynamics of the PDE.
To be more precise, it is explicitly assumed in the DMD reduction that the observables are simply the state variables x where x = u x, t at discrete space and time points.The DMD observables are then given by Thus, as previously noted, the DMD approximation is a special case of Koopman.The DMD spectrum for a rank r = 10 approximation is shown in Figure 4(d).An ideal (g) Figure 4: Reconstruction of the NLS dynamics using (a) a standard DMD approximation g DMD x , (b) the NLS motivated g 1 x , and (c) a quadratic observable g 2 x .The Koopman spectra for each observable are demonstrated in (d), (e), and (f) which accompany the observables of (a), (b), and (c), respectively.Note that the observable g 1 x produces spectra which are approximately purely imaginary which is expected of the 2-soliton evolution.The error between the three observables and the full simulation is shown (g).Note that the observable g 1 x gives an error reduction of four-orders of magnitude over DMD, while g 2 x is an order of magnitude worse.This highlights the importance of selecting good observables.The simulation was performed over t ∈ 0, π with 41 equally spaced snapshots taken.The domain was discretized with n = 512 points on a domain x ∈ −15, 15 .
9 Complexity approximation would have the eigenvalues aligned along the imaginary axis since the evolution with the initial condition given by ( 33) is known as the 2-soliton solution which is purely oscillatory.
Koopman theory allows us a much broader set of observables.In what follows, we consider two additional observables The first observable g 1 x is motivated by the form of the nonlinearity in the NLS equation.The second, g 2 x , is chosen to have a simple quadratic nonlinearity.It has no special relationship to the governing equations.Note that the choice of the observable x 2 in g 2 x is relatively arbitrary.For instance, one could consider instead x 5 x, x 2 , x 3 , or x 5 .These all produce similar results to g 2 x selected in (35b).Specifically, the observable g 2 x is inferior to either the DMD or judiciously selected g 1 x for the Koopman reconstruction.
As has been repeatedly stated, the success of the Koopman decomposition relies almost exclusively on the choice of observables.To demonstrate this in practice, we compute the Koopman decomposition of the NLS (32) using the two observables (35a) and (35b).The required data matrices have 2n rows of data, and only the state variables need to be recovered at the end of the procedure.Note that the algorithm produces both a state approximation since the first n components are actually the state vector x and approximations to the nonlinearity.The Koopman eigenfunctions and eigenvalues provide information about the evolution on the observable space.
Figures 4(b) and 4(d) show the Koopman reconstruction of the simulated data for the observables (35a) and (35b).The observable g 1 x provides an exceptional approximation to the evolution while g 2 x is quite poor.Indeed, the error of the DMD approximation and two nonlinear observables (35a) and (35b) are shown in Figure 4(g) where the following error metric is used: where x is the full simulation and x is the DMD or Koopman approximation.With the choice of observable g 1 x , which was judiciously chosen to match the nonlinearity of the NLS, the Koopman approximation of the dynamics is fourorders of magnitude better than a DMD approximation.A poor choice of observables, given by g 2 x , gives the worse performance of all, an order of magnitude worse than DMD.Note also the difference in the Koopman spectra as shown in Figures 4(d)-4(f).In particular, note that the judicious observable g 1 x aligns the eigenvalues along the imaginary axis as is expected from the dynamics.It further suggests that much better long-time predictions can be achieved with the Koopman decomposition using g 1 x .
Observable selection in this case was facilitated by knowledge of the governing equations.However, in many cases, no such expert knowledge is available, and we must rely on data.The kernel DMD method allows one to use the kernel trick to consider a vast range of potential observables.As already highlighted, the kernel method allows for an efficient method to consider a large class of potential observables without making the observation vector g x computationally intractable.For instance, one can consider a radial basis function kernel The absolute value is conjectured to be important for the case of the NLS equation considered due to the nonlinear evolution of the phase.This radial basis-type function is one of the more commonly considered kernels.Other kernels that we might consider include the three following observables The first function is the standard polynomial kernel of 20th degree.The second instead takes the absolute value of the variable in a polynomial in order to remove the phase, and the third is a Gaussian-type kernel that uses the same inner product as the polynomial kernel.Note that it uses the same inner product in state space but a completely different inner product in feature space.The selection of these kernels is motivated by well-known and often used kernels for real valued data.Other kernels can easily be considered.Our objective is not so much to evaluate a specific kernel but to demonstrate that kernel selection produces highly variable results so that kernel tuning via cross validation is of critical importance.It is well known that SVM and deep neural nets require significant cross-validation in order to work well.Moreover, note that complex data are rarely considered for kernel selection, so there is ambiguity about what impact this may have.Indeed, this remains an open research question in its own right.However, the NLS has a specific form of nonlinearity which is phase-independent, thus motivating some of our choices of potential kernels.
These three new kernels are compared to each other and the radial basis function.Figure 5 shows the spectra generated by these four kernels along with a comparison to the Koopman spectra generated by g 1 x .Note the tremendous variability of the results based upon the choice of kernel.More precisely, it simply highlights the critical importance of calibrating the kernel through cross-validation.10 Complexity Specifically, the choice of kernel must be carefully selected for either the extended or kernel DMD to give anything reasonable.Cross-validation techniques could potentially be used to select a suitable kernel for applications of interest.It could also ensure that overfitting of the data does not occur.In either case, this simple example should serve as a strong cautionary tale for using kernel techniques in Koopman theory unless results are carefully cross validated.In contrast, our judiciously selected variables produce reasonable and parsimonious results which can be easily cross validated, saving a great deal of computational effort in producing improved performance over DMD.

Cubic Quintic Ginzburg-Landau Equation.
A second and more difficult example for the DMD and Koopman theory to characterize is the cubic-quintic Ginzburg-Landau equation (CQGLE).The CQGLE is a canonical model from mathematical physics that exhibits a wide range of nonlinear spatio-temporal dynamics, including spatio-temporal periodicity and chaos.The evolution equation for CQGLE is given by [45] where the state variable u x, t is a function of space and time.Unlike the NLS equation, the CQGLE is not Hamiltonian and integrable, rather there are significant effects from dissipation and gain effects, both linear and nonlinear.An example solution generated from the CQGLE is illustrated in Figure 6.This breather-type solution, although simple looking, does not have a simple low-rank representation like the NLS two-soliton breather.Indeed, the singular value decay suggests that a large number of modes are required for an accurate reconstruction.Importantly, the temporal evolution of the POD modes, which can be extracted from the columns of the V matrix of the SVD, shows that the temporal evolution of the dynamics is quite complicated.This makes it difficult for the DMD approximation since it relies on approximating the temporal evolution by simple Fourier modes in time.
We again consider two additional observables The first observable g 3 x is motivated by the form of the nonlinearity in the CQGLE equation.The second, g 4 x , is chosen to have a quartic nonlinearity.The latter of the observables has no special relationship to the governing equations.And as before, a wide range of other The DMD and Koopman reconstructions of the dynamics of the CQGLE are illustrated in Figure 7.As with the NLS example, the CQGLE motivated g 3 x gives the best reconstruction.Importantly, the spectra generated are closest to the imaginary axis, which is expected for the periodic spatio-temporal dynamics observed.Indeed, the DMD algorithm, or its Koopman variant applied to observables, ideally generates a purely imaginary spectrum. where , and periodic boundaries are applied.This model generates spiral wave solutions which are sustained in the reaction-diffusion process.For processing the data, a spatial filter of the form f x, y = exp −0 01 x 2 + y 2 is applied to the snapshots.This removes the boundary effects from computing on a square domain.
Figure 8 shows key characteristics of the evolution dynamics and the decomposition architecture.In Figure 8(a), the first four POD modes are shown.These are the dominant modes of the dynamics associated with the spiral wave, containing approximately 99% of the total variance.
In the low-rank approximation applied, only the first two modes r = 2 are used.Importantly, the temporal evolution of the POD modes, which can be extracted from the columns of the V matrix of the SVD, shows that the temporal evolution of the dynamics is almost purely sinusoidal.This makes it exceptionally easy for the DMD approximation since it relies on approximating the temporal evolution by simple Fourier modes in time.Indeed, given the simple sinusoidal evolution in time, the direct application of DMD is not improved upon by using additional observables.Specifically, we can consider the reaction-diffusion motivated observable The first four POD modes from sampling snapshots of the spiral wave dynamics.For this data, the snapshots were collected once the spiral wave was formed.Additionally, a spatial filter of the form f x, y = exp −0 01 x 2 + y2 was applied in order to remove the effects of the periodic boundary conditions on the dynamics.(b) There is a two-mode dominance in the dynamics.(c) The temporal dynamics of the first two POD modes, which are extracted from the columns of the V matrix of the SVD.For the spiral waves, these temporal modes are nearly perfect sinusoids.(d) A comparison of the error between the DMD reconstruction (solid line) and reaction-diffusion-inspired observable g 5 u, v (dash-dot line).Note that there is very little difference in the reconstruction error in contrast to the NLS and CQGLE examples.The simulation was performed over t ∈ 0,100 with 1001 equally spaced snapshots taken.The domain was discretized with n = 128 points on the domains x ∈ −20, 20 and y ∈ −20, 20 .
13 Complexity where u and v are the discretized and reshaped vectors formed from numerically solving the reaction diffusion system.Figure 8(d) shows the reconstruction error for the DMD algorithm compared with the Koopman reconstruction using the observable (42).The error for both is comparable, which is in contrast to the previous examples of NLS and CQGLE where a significant increase in accuracy was achieved using well-selected variables.The fact is that given the almost perfectly sinusoidal low-rank nature of the temporal dynamics, the DMD algorithm simply does not require additional observables to produce an exceptional approximation.

Outlook on Koopman Theory for PDEs
Koopman analysis is a remarkable theoretical architecture with applicability to a wide range of nonlinear dynamical systems and PDEs.It combines a number of innovations across disciplines, including dimensionality-reduction techniques, manifold learning, linear operator theory, and dynamical systems.Although the abstract architecture provides a tremendously compelling viewpoint on how to transform nonlinear dynamical systems to infinite-dimensional linear dynamics, significant challenges remain in positing an appropriate set of observables for construction of the Koopman operator.If good candidate observables can be found, then the DMD algorithm can be enacted to compute a finite-dimensional approximation of the Koopman operator, including its eigenfunctions, eigenvalues, and Koopman modes.With a judicious choice of observables, these computed quantities can often lead to physically interpretable spatio-temporal features of the complex system under consideration.
We have demonstrated the application of Koopman theory on several canonical nonlinear PDEs: Burgers' equation, the nonlinear Schrödinger equation, the cubic-quintic Ginzburg-Landau equation, and a reaction-diffusion system.For Burgers' equation, the well-known Cole-Hopf transformation provides a critical link to an explicit calculation of the Koopman operator for a nonlinear PDE.Indeed, we show that the Koopman operator and associated observables can be trivially constructed from knowledge of the Cole-Hopf transformation.In contrast, choosing linear state observables for Burgers' yields a DMD approximation which is accurate but lacks the clear physical interpretation of the exact Koopman reduction.Although the NLS equation can similarly be linearized via the inverse scattering transform, the transform and its inverse are technically difficult to compute for arbitrary initial conditions.Instead, we demonstrate that the selection of an observable that is motivated by the nonlinearity of the governing PDE gives a remarkably accurate Koopman reduction.Indeed, the Koopman eigenfunctions and eigenvalues provide an approximation that is nearly equivalent to the accuracy of the numerical simulation itself.Importantly, for the NLS example, we also demonstrate that poor choices of observables are significantly worse than the DMD approximation.And for the case of observables chosen with a kernel method, the resulting spectra and eigenfunctions are highly inaccurate and nonrobust, suggesting that such generic techniques as kernel methods may face challenges for use in observable selection.Importantly, the cross validation of the EDMD methods is critically important as the large number of variables used to describe the dynamics, most of which do not have any physical interpretability, must be carefully tuned.Like NLS, the CQGLE model can be similarly improved by variables motivated by the nonlinearity of the PDE.In contrast, the reaction-diffusion system shows that the standard DMD approximation is difficult to improve upon given that the temporal dynamics are almost purely sinusoidal.Such sinusoidal temporal evolution is ideal for the DMD approximation.Even if the governing PDE is not known, symmetries and/or conservation laws can help inform appropriate choices of a parsimonious set of observables.If nothing is known of the physics, then the standard EDMD remains a viable strategy for producing a model, even if interpretability and generalizability is typically lost.This remains an open direction of research which is beyond the current manuscript.
The results presented here provide a prescriptive algorithm for variable selection.Specifically, we recommend the following heuristic measures for a variable selection algorithm: (i) Upon performing the SVD of the data matrix X = UΣV * of snapshots of the state space x j , evaluate the temporal nature of the dominant modes via the columns of the matrix V.If the dominant columns of V are approximately sinusoidal, then the standard DMD algorithm should be used (see the reaction-diffusion example).(ii) If the dominant temporal behavior is not sinusoidal, then select observables motivated by the nonlinearity of the governing PDE (see NLS and CQGLE examples).(iii) If the governing PDE is unknown or the method in (ii) is performing poorly, then enact the cross-validated EDMD architecture.All three methods, DMD, judiciously chosen observables with DMD, and EDMD, are all important components of a robust strategy for evaluating nonlinear PDE dynamics.
Ultimately, the selected observables do not need to exactly linearize the system, but they should provide a method for transforming a strongly nonlinear dynamical system to a weakly nonlinear dynamical system.In practice, this is all that is necessary to make the method viable and informative.The results presented here are simultaneously compelling and concerning, highlighting the broader outlook of the Koopman method in general.Specifically, the success of the method will hinge on one issue: selection of observables.If principled techniques, from expert-in-the-loop knowledge, the form of the governing equation, or information about the manifold on which the data exists, can be leveraged to construct suitable observables, then Koopman theory should provide a transformative method for nonlinear dynamical systems and PDEs.We posit that sparse statistical regression techniques from machine learning may provide a path forward towards achieving this goal of selecting quality observables [23,47].Failing this, the Koopman architecture may have a limited impact in the mathematical sciences.Because of the importance of identifying meaningful observables, this is an exciting and growing area of research, especially given new developments in machine learning that may provide a robust and principled approach to observable 14 Complexity selection.For those interested in pursuing the EDMD architecture further, we recommend the recent text [12] which highlights many aspects of the current work and some of the structure of the EDMD algorithm that makes in computationally tractable.Included in the book is a link to all codes used in this manuscript.
Perform the DMD algorithm on the pair Y and Y ′ to compute A Y = Y ′ Y † 11 along with the low-rank counterpart A Y obtained by projection onto a truncated POD subspace.The eigenvalues and eigenvectors of A Y may approximate Koopman eigenvalues and modes if the observables are well chosen (3) DMD can be used to compute the augmented modes Φ Y , which may approximate the Koompan modes, by

Figure 2 :
Figure 2: (a) Evolution dynamics of Burgers' equation with initial condition u x, 0 = exp − x + 2 2 .(b) Fifteen mode DMD approximation of the Burgers' evolution.The simulation of (24) was performed over t ∈ 0, 30 where the sampling was taken at every Δt = 1.The domain was discretized with n = 256 points on a domain x ∈ −15, 15 .

Figure 3 :
Figure 3: DMD of the Burgers' equation.(a) The singular value spectrum demonstrates that a rank r = 15 truncation should be adequate to capture the dynamics of the front propagation in Figure 2. (b) The eigendecomposition in the DMD algorithm produces a DMD spectra whose eigenvalues are decaying.(c) The DMD modes used for reconstructing the solution in Figure 2 ordered according the smallest to largest (in magnitude) eigenvalues.The first mode is like a background mode since the eigenvalue is almost zero.

Figure 5 :
Figure 5: (a)-(d) Koopman spectra of the four kernels considered in (37) and (38a)-(38c), respectively.The red spectra are the Koopman spectra generated from the rank r = 10 observable g 1 x which provides an exceptionally accurate reconstruction of the NLS dynamics.

Figure 6 :
Figure 6: Spatio-temporal breather solution of the CQGLE equation for the parameter regime τ = 0 08, κ = 0, β = 0 66, ν = −0 1, σ = −0 1, and γ = −0 1.Although the dynamics illustrated (a) look relatively simple, the singular value decay (b) shows that a large number of modes are required to reconstruct the fine spatio-temporal features of the nonlinear evolution.Moreover, the temporal dynamics of the first four POD modes, which are extracted from the columns of the V matrix of the SVD (c), characterize a complicated temporal behavior for the individual modes.Unlike the NLS example, a low-rank approximation does not work well for reconstructing the dynamics.The simulation was performed over t ∈ 0, 40 with 301 equally spaced snapshots taken.The domain was discretized with n = 512 points on a domain x ∈ −10, 10 .

Figure 7 :
Figure 7: Reconstruction of the CQGLE dynamics of Figure 6(a) using a rank r = 60 expansion.Shown are (a) a standard DMD approximation g DMD x , (b) the CQGLE motivated g 3 x , and (c) a quartic observable g 4 x .The Koopman spectra for each observable are demonstrated in (d), (e), and (f) which accompany the observables of (a), (b), and (c), respectively.Note that the observable g 3 x produces spectra which are most approximately purely imaginary which is expected of the periodic evolution.A visual inspection shows that the observable g 3 x produces the best reconstruction.

Figure 8 :
Figure 8: Diagnostic features of the reaction diffusion system (41a) and (41b) with β = 1 and D = 0 1. (a)The first four POD modes from sampling snapshots of the spiral wave dynamics.For this data, the snapshots were collected once the spiral wave was formed.Additionally, a spatial filter of the form f x, y = exp −0 01 x 2 + y2 was applied in order to remove the effects of the periodic boundary conditions on the dynamics.(b) There is a two-mode dominance in the dynamics.(c) The temporal dynamics of the first two POD modes, which are extracted from the columns of the V matrix of the SVD.For the spiral waves, these temporal modes are nearly perfect sinusoids.(d) A comparison of the error between the DMD reconstruction (solid line) and reaction-diffusion-inspired observable g 5 u, v (dash-dot line).Note that there is very little difference in the reconstruction error in contrast to the NLS and CQGLE examples.The simulation was performed over t ∈ 0,100 with 1001 equally spaced snapshots taken.The domain was discretized with n = 128 points on the domains x ∈ −20, 20 and y ∈ −20, 20 .