A Convex Relaxation Approach for Learning the Robust Koopman Operator

,


Introduction
The linear dynamical system, which can be described as a function that captures the implicit evolution rule of states in a geometrical space, has achieved huge success in a wide spectrum of applications from machine learning to automatic control. However, effectively modeling states present in highly nonlinear dynamical systems while also accurately quantifying uncertainty is still a challenging task [1]. Research on nonlinear dynamic systems has a long history, while the parameters of the complex system may not be known precisely; thus, we can only obtain approximate results and require stability analysis (such as the Lyapunov stability or structural stability) [2]. In recent years, fully data-driven machine learning methods, in particular deep neural networks, appear to be a viable alterna-tive, which have a remarkable ability to learn complex patterns from training samples. Specifically, recurrent neural networks (RNNs) [3] are a form of neural network architecture which is mainly used to detect patterns in sequential data. RNNs pass information through the network that transmits information back into themselves, which enable them to take into account both previous states and the current state. However, as in most neural networks, vanishing or exploding gradients is a fundamental problem of RNNs. Although long short-term memory units (LSTMs) [4] can alleviate the vanishing gradient problem, it is still limited in maintaining long-term memory.
Another limitation of the deep learning method, which is also the main concern of this article, is the lack of intuitive causal interpretability. In some cases involving specific physical phenomena that need the interpretability of the model, good performance alone is not enough to satisfy the need for practical application.
The complexity of deep neural networks makes it hard to understand and reason the predictions, which hinders its further application and improvement [5]. Instead, most engineers prefer classical off-the-shelf models as it ensures physical plausibility while the learned models only achieve good performance in the vicinity of the training data. Fortunately, motivated by this issue, researchers are becoming more and more interested in how to bridge this gap. The combination of deep learning and physics seems natural, as the compositional structure of deep networks enables the efficient computation of the derivatives, thus, can encode a differential equation describing physical processes [6][7][8].
The physics-based deep model not only helps to improve the generalization and interpretability of the deep model but also eliminates counterfactual components in representation learning.
Nonlinear dynamical systems are very common in engineering, but most of them are task-specific, and there exists no general framework for solving them. Therefore, representing nonlinear dynamics in a linear framework is particularly attractive, because techniques from linear dynamical systems, linear algebra, geometric, and spectral theory can be directly applied. The Koopman operator theory [9] developed in 1931 has recently become the main research focus of linear representations of nonlinear systems. Based on the Koopman operator, the analysis of nonlinear dynamical systems can be lifted to an infinite-dimensional linear space. Most algorithms handle the Koopman operator by converting an infinite-dimensional space into a finite-dimensional space. In this way, it naturally combines the neural network with flexible expression ability for subspace and the physical regularization based on the Koopman operator to achieve the global linear expression of the nonlinear system. Typically, by embedding the sequence onto a low-dimensional latent space in an unsupervised manner, the Koopman operator can be approximated by a linear layer of a neural network. This deep Koopman model retains the physical interpretability and can predict the future hidden state and the hidden state of the current state [10][11][12][13][14][15].
However, most of the current methods impose strict orthogonal constraints to the Koopman eigenvectors spanned by the linear layer, which makes them sensitive to noise [16]. Drawing inspiration from the Koopman theory and convex optimization, in this paper, we propose a convex relaxation approach for learning the robust Koopman operator. Specifically, we find that solving the approximation of the Koopman operator can be regarded as an orthogonal Procrustes problem on the Stiefel manifold. In order to improve model robustness against input noise, we prove that the geometric constraint of the Koopman operator can be replaced with its convex counterpart. Without requiring extra computation cost, the proposed model, which can be trained in an end-to-end manner, shows considerable accuracy and robustness gains by incorporating this convex counterpart regular term into the deep unsupervised model. The proposed method overcomes the overconstraint problem of the Koopman operator in the previous methods [10,11].
The numerical experiment results on two complex nonlinear dynamic systems also prove the effectiveness of the proposed algorithm.

Review of Related Work
Most real-world phenomena can be described by dynamic systems; hence, we can implement state prediction, estimation, and control by constructing and analyzing dynamic systems. Linear dynamical systems can be solved exactly, which can be traced back to Kalman [17]. In reality, more systems are usually complex nonlinear systems [18]. Occasionally, the solutions of some nonlinear systems can be well approximated by an equivalent linear system near its fixed points, such as EKF [19]. While nonlinearity remains a grand challenge in analyzing dynamical systems. Different from linear systems which can be completely characterized in terms of spectral decomposition, there is no similar general mathematical framework to analyze nonlinear systems.
Over the years, researchers considered analyzing the geometry of subspaces of local linearization around fixed points and global heteroclinic being effective ways to solve nonlinear dynamic systems [20]. The geometric theory provides us with a new perspective for modeling complex systems, such as the Hartman-Grobman theorem [21], which determines the conditional constraints of linear dynamics approaching nonlinear systems. Although the geometric perspective provides a quantitative local linear model, the global analysis still requires qualitative calculation, which limits the nonlinear prediction, estimation, and control theory far away from the fixed point [22]. In many fields, the lack of known physical laws makes it impossible to construct control and motion equations. For these complex nonlinear systems, which is also the main concern of modern researchers, the linearization method is more challenging, and the linear approximations often become suspect.
With increasingly powerful data-driven techniques, learning the approximate nonlinear dynamic that emerged in the data is the main focus of renewed interest. Operator theory [23] and deep learning models [24] seem to be two effective ways to automate the discovery of the underlying physical mechanisms of nonlinear dynamic systems. Deep learning methods, especially RNN and its derivative models (LSTM [4] and GRU [25]), can use nonlinear activation functions and recursive units to express nonlinear systems implicitly. However, the black-box nature of DNNs has become one of the primary obstacles to their wide acceptance in mission-critical fields, and ignoring necessary physical bias may lead to counterfactual results.
Discovering underlying dynamics from data that enable the linear representation of nonlinear systems is the main goal of operator-theoretic-based approaches. The operator theory has proved that infinite-dimensional linear operators can be used to represent nonlinear dynamic systems, such as Koopman operators [9,26]. Although the Koopman operator is appealing, it requires infinite degrees of freedom to describe the space of measurement functions, which poses issues for representation and computation. The Koopman 2 Wireless Communications and Mobile Computing operator can be approximated via the dynamic mode decomposition (DMD) algorithm [27][28][29][30].
Recently, many physics-based deep models have been proposed [10-13, 31, 32], which could encode the physical equations into their network to obtain physical plausibility and improve the generalization of deep models. Bethany et al. [12] first proposed to identify nonlinear coordinates on which the dynamics are globally linear using a modified autoencoder. Yeung et al. [11] introduced a method that automatically selects efficient deep dictionaries to approximate the Koopman operator. Samuel [31] leveraged neural networks to learn sets of adaptive, low-dimensional dictionary observables to avoid the over-fitting problem while retaining enough capacity to represent Koopman eigenfunctions. Takeishi et al. [13] combined the Koopman spectral analysis and autoencoder to learn Koopman invariant subspaces from observed data. Azencot et al. [10] considered the forward and backward dynamics of Koopman operators by adding consistency constraints to the loss function. Most recently, Li et al. [32] proposed to use graph neural networks to learn compositional Koopman operators and use a blockwise linear transition matrix to regularize the shared structure across objects. Although the motivations of these methods are different, most of them are based on the deep generative model and approximate the Koopman eigenvectors by the linear layer of the encoder. However, analysis of the Koopman operator in the above methods is based on the DMD method, and the linear operator constructed in this way may be very sensitive to noise.

Method
is a function that describes the time-dependence of a point in a geometrical space. Given any discrete time, a nonlinear dynamical system has a state given by a vector that can be represented by a point x ∈ M in an appropriate geometrical manifold. The evolution rule of the nonlinear discrete-time dynamical system describes what future states follow from the current state t ∈ T : A function g : x ⟶ ℝ can be seen as an observable of the system, and the set of all observables forms an infinitedimensional Hilbert space H . The Koopman operator K is defined as an infinite-dimensional linear operator that acts on observables g: Intuitively, the Koopman operator can be regarded as the state transition matrix of the evolution trajectory in an infinite-dimensional space. The key property of the Koopman operator K is its linearity. For any two observables g and h and scalar values α and β, the linearity is easy to be proved: It would seem natural to characterize the state space dynamics by the spectral properties of the Koopman operator.
The spectrum γðKÞ of the Koopman operator K on an infinite-dimensional vector space is defined as where K − γI is not invertible. In the infinite-dimensional Hilbert space, the Koopman operator K can be denoted as where σ is called the Koopman mode associated with the Koopman eigenvalue-eigenfunction pair, and it is given by the projection of the observable g onto φ.
3.2. Convex Relaxation Constraint. The purpose of most DMD and deep learning approaches is to find eigenfunctions of the Koopman operator directly, satisfying The eigenfunctions of Koopman operator can span an invariant linear subspace. In practice, the Koopman operator can be solved as an orthogonal Procrustes problem (OPP) in the following form: Although the deep learning method can learn these eigenfunctions from data, the eigenfunctions are very sensitive to the noise in the training data. Strict orthogonal constraint would result in the noisy input to be merged into the representation of the hidden features φðxÞ. Thus, the continuity and smoothness of the Koopman spectrum evolution may be destroyed.
Obviously, in Equation (7) the linear operator K lies on the Stiefel manifold.
K T denotes the transpose of K, and I d denotes the d × d identity matrix. From the above description, we can find that if K is a linear operator, then the following propositions are equivalent: The above propositions can be proved by the following formula: Each of the Stiefel manifolds V d ðℝ n Þ can be viewed as a homogeneous space for the action of a classical group in a 3 Wireless Communications and Mobile Computing natural manner. The orthogonal group OðnÞ acts transitively on V d ðℝ n Þ. When d = n, the corresponding action is free so that the Stiefel manifold V d ðℝ d Þ is a principal homogeneous space for the corresponding classical group.
In order to enhance the robustness of the model, in this work, we try to enforce the manifold constraint on K by its convex counterpart. Specifically, we generalize it by seeking the closest Koopman eigenfunctions in which the columns may be orthogonal, but not necessarily orthonormal. Lemma 1. The convex hull of the Stiefel manifold V d ðℝ n Þ ≔ fK ∈ ℝ n×d : K T K = I d g equals the unit spectral-norm ball convðV d Þ = fK ∈ ℝ n×d jkKk 2 ≤ 1g. kKk 2 denotes the spectral norm of a matrix K, which is defined as the largest singular value of K.
Proof. Let UΣV T be the singular value decomposition of K.
The above lemma demonstrates that the manifold constraint fK ∈ ℝ n×d : K T K = I d g can be conveniently imposed by minimizing a convex regularizer.
We can use the above lemma directly to get the convex relaxation constraint term. In particular, the above conditions take the form By combining with an autoencoder, we propose a model capable of processing time-series data. Our model is trained by minimizing the following loss function: where λ re , λ pre , λ Stief el are user-defined hyperparameters that balance between reconstruction, prediction, and convex relaxation constraint. L re , L pre , and L Stief el are defined in Equations (11), (13), and (14)respectively.

Experiments
To evaluate our proposed approach (named Stiefel Koopman), we perform a comprehensive study using two datasets and compare to state-of-the-art Koopman-based approaches. The proposed Stiefel Koopman network minimizes Equation (12) with a decaying learning rate initially set to 0.01. We fix the loss weights to λ re = λ pre = 1 and λ Stief el = 5 × e −4 , for the AE, prediction, and convex relaxation regularizing penalty, respectively. We use ls = 8 prediction steps forward in time.

Baselines.
We compare our approach with two different Koopman-based baselines detailed next. For a fair comparison, unless said otherwise, we always use the same parameters in all methods.
4.1.1. Koopman AE [11]. The Koopman AE simulates the evolution process of the Koopman operator by using spectral decomposition in the AE training process. Many methods such as those in [13,32] are its derivatives; without loss of generality, we will take it as a representative method for comparison. The Koopman AE also can be viewed as a special case of our method without manifold regularizing penalty. [10]. Consistent Koopman AE leverages the forward and backward dynamics, and the consistency penalty is constructed by the relationship between the forward and reverse Koopman operators. The idea of this method is similar to the smooth filtering in Bayesian filtering and is also related to the consistent GAN. We will show that our method can degenerate to consistent Koopman AE and has lower computational complexity and higher accuracy.

Nonlinear Pendulum.
A simple gravity pendulum without friction is an idealized mathematical model and a dynamic system which swings back and forth only under the influence of gravity. When a pendulum is displaced sideways from its equilibrium position, it is subject to a restoring force due to gravity that will accelerate it back toward the equilibrium position. The differential equation of an idealized pendulum can be represented as a second-order ODE as follows: where g = 9:81 is acceleration due to gravity, l = 1 is the length of the pendulum, and θ is the angular displacement. Following the same protocol as that in [10], the initial oscillation angles conditions θ 0 = 0:8 and θ 0 = 2:4 are considered in our experiment, and the time interval and sampling interval are set to t = ½0, 51 and Δt = 0:03, respectively. After obtaining the 2D sampling sequence, we use a random orthogonal matrix to map the samples to a high-dimensional space. This is the finite-dimensional approximation of the Koopman 4 Wireless Communications and Mobile Computing operator in the infinite-dimensional Hilbert space. The training set contains 600 snapshots and 1600 testing snapshots. As there are many network parameters, the source code will be available at https://github.com/usodonkey/Hcigroupgithub. For detailed parameter settings of the model, please refer to the above link. For the comparative evaluation, the performance of the proposed method is tested against other two SOTA approaches listed above for clean pendulum sequences. Figure 1 shows the pendulum results for initial conditions θ 0 = 0:8 and bottleneck = 6, which illustrates that our model can achieve the best results in all the cases we explored. The relative prediction error is computed at each time step via ðkφðx k Þ − Kφðx k−1 Þk 2 Þ/ðkφðx k Þk 2 Þ. Kφðx k−1 Þ is the high-dimensional estimated prediction. We make predictions over a time frame of 1000 steps and average the error across 30 different initial observations φðx k Þ, where the shaded areas represent the standard deviations.
It is worth noting that, as shown in Table 1, we do not need to pay more computational costs to obtain the above results in Figure 1. The running time of our model is equivalent to that of Koopman AE, less than half of that of consistent Koopman AE. As shown in Figure 2, our algorithm can still get the best results when the initial conditions are set to θ 0 = 2:4.
In the theoretical part, we have mentioned that the convex relaxation term could improve the robustness of the model to noise. Therefore, we added two levels (0.05 and 0.1) of the Gaussian white noise to the input. The results are shown in Figures 3 and 4. Obviously, noise has varying degrees of influence on the prediction accuracy of all models. However, compared with other methods, it is not difficult to find that our method exhibits stronger robustness to noise and has a smaller standard deviation.
In addition, we can also add the backward prediction and consistency regular items in consistent Koopman AE to our model. The entire training process is consistent with the paper [10]. As shown in Figure 5, we find that additional constraints can further improve the prediction accuracy of

Wireless Communications and Mobile Computing
the proposed method. But unfortunately, in this configuration, we need to pay nearly twice the cost of computing time. However, we believe that the proposed method obtains rel-atively optimal results under the condition of limited computational complexity. Moreover, we have an interesting discovery in the experiment, and the proposed method  Wireless Communications and Mobile Computing can improve the robustness of Koopman AE by adding a convex relaxation term. However, the robustness of consistent Koopman AE cannot be enhanced by an additional convex relaxation term. This result also may suggest that the poor robustness of the algorithm is caused by the consistency constraint.

Ablation Study.
As we all know, if we use a lot of nodes in the bottleneck layer when designing a neural network, high-dimensional coding will be generated. At this time, the network may overfit the input data by simply remembering the inputs. In this case, we will not be able to get the correct relationship in our coding. Similarly, if we use a bottleneck layer with a very small number of nodes, it will be difficult to capture all relationships. Therefore, we set the number of nodes in the bottleneck layer to 6, 8, and 10. As shown in Figure 6, our method can get the best result when the bottleneck is set to 6. The weight hyperparameter of the convex relaxation term will also affect the performance of the model. If the penalty hyperparameter is too large, our model will degenerate to the Koopman AE. If the penalty hyperparameter is too small, the convex relaxation term will be invalid. Therefore, we conduct hyperparameter selection experiments under different initial conditions. The experimental results are shown in Figure 7. When the initial condition is set to θ 0 = 0:8, setting the penalty hyperparameter λ Stief el = 5 × e −4 can obtain the best result, and when the initial condition is set to θ 0 = 2:4, setting the penalty hyperparameter λ Stief el = 1:2 × e −3 can obtain the best solution.

Cavity Flow.
The lid-driven cavity flow is most probably one of the most studied fluid problems in the computational fluid dynamics field. Due to the simplicity of the cavity geometry, applying a numerical method to this flow problem in terms of coding is quite easy and straightforward. Despite its simple geometry, the driven cavity flow retains a rich fluid flow physics. The flow inside a driven cavity is governed by the 2D steady incompressible Navier-Stokes equations. These equations 7 Wireless Communications and Mobile Computing in stream function and vorticity formulation are given as follows [33]: where v ! represents the velocity field.

Wireless Communications and Mobile Computing
where the terms x,t, uðx, tÞ, and ν denote spatial coordinate, temporal coordinate, speed of fluid at the indicated spatial and temporal coordinates, and viscosity of fluid, respectively. The viscosity is a constant physical property of the fluid, and the other terms represent the dynamics contingent on that viscosity. By setting initial condition u, v, p = 0 everywhere and following the boundary conditions, we can get 2000 samples which are uniformly sampled from 10,000 data by setting the model parameters (nx = 100, ny = 100, nt = 500, and nit = 50); half of the samples are used for training and half for testing.
Compared with the pendulum data, the cavity flow data has a higher dimensionality. Therefore, we set the bottleneck to 24 (the optimal empirical value). The qualitative prediction experiment results are shown in Figure 8. The blue arrow represents the ground truth, and the red is the predictions of different algorithms. We can find that our method is significantly better than Koopman AE. Compared with consistent Koopman AE, our results are also improved (we have marked the significant differences in red circles).
The corresponding prediction error curve on the liddriven cavity flow is shown in Figure 9. It can be found that the prediction error of Koopman AE will increase greatly with the increase of time step.
As shown in Figures 10 and 11, when the input data contains noise, we analyze the prediction performance of our model and the competitive methods from the quantitative and qualitative perspectives, respectively. It is not difficult to find that for complex nonlinear dynamic model models, noise has a very serious impact on the prediction results.

Conclusion
In this paper, we present a data-driven global linearization method for nonlinear dynamic systems. We found that the Koopman operator lies on the Stiefel manifold. The eigenvector of the Koopman operator directly calculated will cause the noise to be merged into the feature representation of the data. Therefore, we propose using the convex relaxation operator of the Stiefel manifold to constrain the eigenvectors of the Koopman operator. Compared to the recently proposed consistent Koopman AE, our algorithm performs better both on clean and noisy data, and the model complexity is much lower than consistent Koopman AE. In future work, we will study more complex nonlinear systems in the real world, such as human action sequences.

Data Availability
The source code will be available at https://github.com/ usodonkey/Hci-group.

Conflicts of Interest
The authors declare that they have no conflicts of interest.