- Research article
- Open Access
- Published:

# Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators

*Advanced Modeling and Simulation in Engineering Sciences*
**volume 8**, Article number: 28 (2021)

## Abstract

Although projection-based reduced-order models (ROMs) for parameterized nonlinear dynamical systems have demonstrated exciting results across a range of applications, their broad adoption has been limited by their intrusivity: implementing such a reduced-order model typically requires significant modifications to the underlying simulation code. To address this, we propose a method that enables traditionally intrusive reduced-order models to be accurately approximated in a non-intrusive manner. Specifically, the approach approximates the low-dimensional operators associated with projection-based reduced-order models (ROMs) using modern machine-learning regression techniques. The only requirement of the simulation code is the ability to export the velocity given the state and parameters; this functionality is used to train the approximated low-dimensional operators. In addition to enabling nonintrusivity, we demonstrate that the approach also leads to very low computational complexity, achieving up to \(10^3{\times }\) in run time. We demonstrate the effectiveness of the proposed technique on two types of PDEs. The domain of applications include both parabolic and hyperbolic PDEs, regardless of the dimension of full-order models (FOMs).

## Introduction

Modern computational architectures have enabled the detailed numerical simulation of incredibly complex physical and engineering systems at a vast range of scales in both space and time [31]. Even with improved high-performance computing, the iterative numerical solutions required for design and optimization may quickly become intractable; the computational demands for real-time feedback control are even more challenging [11]. Fortunately, many high-dimensional dynamical systems, such as a discretized simulation of a fluid flow, are characterized by energetic coherent structures that evolve on an attractor or manifold with a lower dimensional intrinsic dimension [3, 6, 23]. This observed low-dimensionality has formed the basis of reduced-order modeling, which now plays a central role in modern design, optimization, and control of complex systems. Despite the growing importance of model reduction, the challenges of nonlinearity, identifying an effective low-dimensional basis, and multiscale dynamics have limited its widespread use across the engineering and natural sciences [10]. A leading method for reduced-order modeling involves the Galerkin projection of known governing equations onto a mathematical or empirical basis, although this approach is intrusive and challenging to implement. The leading alternative involves black-box system identification, which is purely data-driven, but generally yields uninterpretable models. In this work, we investigate and compare several emerging techniques from machine learning, i.e. applied data-driven optimization, for non-intrusive reduced-order modeling.

Intrusive model reduction methods, based on a working and decomposable numerical simulation of the governing equations, provide the most general and widely used set of techniques. Foremost in this arsenal is the Galerkin projection of the governing equations onto a low-dimensional linear subspace, usually spanned by orthogonal modes, such as Fourier, or data-driven modes from proper orthogonal decomposition (POD) [3, 6, 23, 39]. These approaches benefit from a strong connection to the underlying physics, the ability to include constraints to enforce preservation of the underlying dynamic structure, and adaptivity [2, 15,16,17]. In addition, there are several extensions around hyper-reduction to improve efficiency for systems with parametric dependence or higher order nonlinearities, based on the empirical interpolation method (EIM) [5] and the discrete empirical interpolation method (DEIM) [19]. However, these models are limited to the dynamic terms of the governing equations and the linear projection basis may not be optimal for capturing the dynamics. The main drawback of intrusive methods is that they are challenging to implement, requiring considerable human effort and knowledge to manipulate the governing equations and simulation code.

In contrast to intrusive model reduction, data-driven techniques are becoming increasingly prevalent, catalyzed by the increasing availability of measurement data, the lack of governing equations for many modern systems, and emerging methods in machine learning and data-driven optimization. Collectively, these data-driven techniques form the field of system identification [24, 33, 38]. Many techniques in system identification use regression to identify linear models, such as the eigensystem realization algorithm (ERA) [25] and dynamic mode decomposition (DMD) [4, 30, 49, 56]; recently, both techniques have been connected to nonlinear systems via the Koopman operator [13, 36, 46]. Another classic technique in nonlinear system identification is the NARMAX algorithm [7, 28, 51]. Recently, powerful techniques in machine learning are re-defining what is possible in system identification, leveraging increasingly large datasets and more powerful optimization and regression techniques. Deep learning, based on multi-layer neural networks, is increasingly used to model fluid dynamics and obtain closure models [29, 32, 37, 40, 42, 52, 61, 69]. More generally, machine learning is having a considerable impact in the modeling of dynamical systems and physics [8, 12, 44, 50], for example relevant work in cluster-based reduced order models [26], long-short time memory networks (LSTMs) [58, 60], and Gaussian process regression [43, 59]. Of particular interest are techniques based on the principle of *parsimony* that seek the simplest models, with the fewest terms necessary to describe the observed dynamics [8, 12, 34, 35, 47, 48, 50]. For example, the sparse identification of nonlinear dynamics (SINDy) algorithm [12] uses sparse regression to identify the few active terms in a differential equation. The resulting models balance model complexity with descriptive power, avoiding overfitting and remaining both interpretable and generalizable.

Data-driven techniques in system identification and machine learning are currently being employed for advanced, non-intrusive, and reduced-order modeling, removing the need for full-order model equations or a modifiable numerical code. Loiseau and Brunton [34] extended the SINDy modeling framework to enforce known constraints, such as energy conserving quadratic nonlinearities in incompressible flows. This so-called *Galerkin regression* also enables higher order nonlinearities than are possible with standard Galerkin projection, providing effective closure for the dynamics of truncated modes. Swischuk et al. [54] develop parametric models between input parameters and POD coefficients for physics and engineering problems, comparing many leading methods in machine learning. Peherstorfer and Willcox [41] develop time-dependent reduced order models for partial differential equations, by non-intrusively inferring the reduced-order model operator output as a function of initial conditions, inputs, trajectories of states, and full-order model outputs, without requiring access to the full-order model. They prove that for dynamics that are linear in the state or are characterized by low-order polynomial nonlinearities, the ROM converges to the projection of the FOM onto a low-order subspace. Carlberg et al. [18] propose dimensionality reduction on autoencoders to learn dynamics for recovering missing CFD data. Other non-intrusive ROMs have been developed based on neural networks [22, 27, 62, 63, 68], sparse autoencoders [21], radial basis functions [66], and kernel methods [57, 65]. Hybrid methods are also promising, for example by modeling the error of a Galerkin model to produce closures [67].

Despite the considerable recent progress in leveraging machine learning for non-intrusive reduced-order modeling, current approaches still have a number of shortcomings. First, many methods are limited to low-order polynomial nonlinearities. All of the above approaches learn the low-dimensional operators from full-system trajectories, which is expensive and limits the predictive accuracy, as the ROM trajectory is likely to quickly deviate from the full-system trajectory. Current methods also have limited consideration for stability and the interplay between regression methods and time integration has not been explored. Finally, current approaches do not provide a framework for model selection, as each method will likely be best suited to a different problem class.

In this work, we continue to develop and explore machine learning techniques for non-intrusive model reduction, addressing many of the shortcomings described above. In particular, we develop and compare numerous leading techniques in machine learning to produce accurate and efficient reduced order models. We focus primarily on the class of Galerkin projection models, although these approaches may be extended to more general models. To study the dynamical system in the latent space, we map from the low-dimensional state at the current time instance to the next, in explicit or implicit time integration schemes. We investigate a variety of regression models for the mapping, including k-nearest neighbors [1], support vector regression [53], random forest [9], boosted trees [20], the vectorial kernel orthogonal greedy algorithm (VKOGA) [64, 65] and SINDy [12]. In the following, we explicitly consider stability and explore the connection between various regression methods and their suitability with different time integration schemes; for example, non-differentiable machine learning models are only amenable to explicit time integration. We also explore the modeling procedures of varying complexities on two examples, the 1D Burgers’ equation and the 2D convection–diffusion equation using both explicit and implicit time integrators.

## Problem formulation

### General nonlinear dynamical systems

This work considers parameterized nonlinear dynamical systems characterized by the system of nonlinear ODEs

where \(t\in [0,T]\) denotes time with \(T\in {\mathbb {R}}_+{}\) representing the final time, \({{x}}\in {\mathbb {R}}^{ N }\) denotes the state, \(\varvec{\mu }\in {\mathscr {D}}\subseteq {\mathbb {R}}^{ p }\) denotes the system parameters, \({{x}}^0:{\mathscr {D}}\rightarrow {\mathbb {R}}^{ N }\) is the parameterized initial condition, and \({{f}}:{\mathbb {R}}^{ N }\times {\mathbb {R}}_+{}\times {\mathscr {D}}\rightarrow {\mathbb {R}}^{ N }\) denotes the velocity.

### Galerkin projection

We assume that we we have low-dimensional trial-basis matrix \({\varvec{V}}\in {\mathbb {R}}_\star ^{ N\times n }\) (with \(n\ll N\)) computed, e.g., via proper orthogonal decomposition (POD), such that the solution can be approximated as \({{x}}(t,\varvec{\mu })\approx {\tilde{{{x}}}}(t,\varvec{\mu }) = {\bar{{{x}}}}(\varvec{\mu }) + {\varvec{V}}\hat{{{x}}}(t,\varvec{\mu }) \approx {{x}}\) with \(\hat{{{x}}}\in {\mathbb {R}}^{ n }\) denoting the reduced state. Then, the Galerkin ROM can be expressed as

Critically, note that the ROM is defined by the low-dimensional mapping

The reduced velocity \({{f}}_r\) is thus simply a function that maps the reduced state and inputs to a low-dimensional vector. Thus, we can rewrite Eq. (3) as

However, this approach is intrusive to implement in computational mechanics codes, as it requires querying the full-order model code to compute \({{f}}({\varvec{V}}\hat{{{x}}},t;\varvec{\mu })\) for every instance of \(\hat{{{x}}}\) and \(\varvec{\mu }\) encountered during the ROM simulation; without hyper-reduction (e.g., DEIM or gappy POD), it is also computationally expensive, as computing \({\varvec{V}}^T{{f}}\) given \({{f}}\) incurs \({\mathscr {O}}(Nn)\) flops. Even if hyper-reduction is present, however, the approach remains intrusive, as sampled entries of the velocity must be computed for every instance of \(\hat{{{x}}}\) and \(\varvec{\mu }\) encountered during the ROM simulation. Depending on the complexity of the system trajectories, one can choose *n* to balance the accuracy and computational speedup.

## Regression-based reduced operator approximation

This section outlines learning the low dimensional operator through regression methods. We commence describing the general formulation of the framework in “Mathematical formulation” section, and then “Surrogate ROM” section discusses the proposed regression models and their computational complexity. In “Error analysis” section, we derive the boundedness and stability of of the approximated discrete-time dynamical system.

### Mathematical formulation

The objective is to develop a data-driven, non-intrusive approximation to the reduced velocity with reduced-order models of different dynamical systems. \({{f}}_r\). In particular, we aim to devise an approximate mapping

such that

As the domain and codomain of \({\hat{{{f}}_r}}\) exhibit no specialized structure, we can consider general high-dimensional regression techniques (e.g., kernel regression, random forest regression).

Similar to Eq. (7), the reduced order model is given by

with the initial condition

We assume that the sequences of reduced velocity \({{f}}_r\) can be studied from the Markovian discrete-time dynamical system

for discrete-time velocity \({{g}}: {\mathbb {R}}^{ n }\times {\mathbb {R}}^{ p } \rightarrow {\mathbb {R}}^{ n }\). If such a reduced velocity exists and we can compute the operator, the whole time sequence of the reduced states can then be estimated from Eq. (12) by specifying the initial (reduced) state. By considering the mapping between the current reduced state and the next, we cast the problem into a set of regression models.

### Surrogate ROM

In this work, we collect the reduced states from the Galerkin projection as in “Galerkin projection” section and approximate the operator \({{g}}\) by regressing each component of the reduced velocity. To be specific, we construct individual regression \(\hat{g}_i \approx g_i\) for \(i = 1, \ldots , n\), the *i*th equation of \({{g}}\) is:

where \(\hat{{{x}}}= [x_1 \ldots x_n]\) and \({{g}}= [g_1 \ldots g_n]\). Equation (13) shows that we can map the reduced state at time step \(\hat{{{x}}}^{n}\) to each component of the state at the next time step \(\hat{x}_i^{j+1}\) for \(j = 0,\ldots , N_t- 1\).

In the offline stage, we obtain training data from the existing Galerkin projection model. By querying both features (\(\hat{{{x}}}^{j}\)) and responses (\(\hat{x}_i^{j+1}\)) at every time instance, we generate a data set \({\mathscr {T}}_{train,i} = {(\hat{{{x}}}^{j}, \hat{x}_i^{j+1})}\) for model training and cross validation.

We consider a variety of conventional statistical models, including support vector regression (SVR) with kernel functions (Gaussian, polynomials) [53], tree-based methods (boosted decision trees [20], random forest [9]) and *k*-nearest neighbors [1], for regressing the feature-response pairs. Three types of kernel functions are explored in SVR. Specifically, we refer to the SVR model using the 2nd, 3rd order polynomial kernel as SVR2, SVR3 respectively. When the Gaussian radial basis function (RBF) kernel is used, we refer to the model as SVRrbf. In addition, we investigate the vectorial kernel orthogonal greedy algorithm (VKOGA) and Sparse identification of nonlinear dynamics (SINDy) as advanced regression models. More details of VKOGA and SINDy can be found in “Special regression models” section. We compare the computational complexity of all the proposed regression models in “Special regression models’ section’. For all the candidate models, we employ cross validation to optimize the hyperparameters for model selection, aiming for a balanced bias and variance. We also explore the training and validation error as varying the number of samples in “Cross-validation and hyperparameter tuning” section, and we select a fixed sample size for performance comparison between models.

For time integration along the trajectory of the dynamical system, we investigate the appropriate time step for each regression model. We implement the 4th-order Runge–Kutta as the explicit method, as well as Newton–Raphson and fixed-point iteration both in backward Euler as the implicit methods. We report the numerical results, with respect to the Galerkin reduced-order model (ROM) and the full-order model (FOM) as a function of time in “Numerical experiments” section.

### Error analysis

Assuming the considered regression model generates bounded in reduced space, we examine the boundedness of the surrogate FOM on the time evolution of the states along the trajectory. Let \(J_t=[0, T]\) denote the time domain, \(J_\mu \subset {\mathbb {R}}^{ p }\) be a compact space of \(\varvec{\mu }\), \({{x}}: J_t \rightarrow {\mathbb {R}}^n\) denote the state variable, and \({{f}}({{x}},t;\varvec{\mu })\) be the vector field. Let \( {{f}}_r({\hat{{{y}}}},t;\varvec{\mu }) = {\varvec{V}}^T{{f}}({\varvec{V}}{\hat{{{y}}}},t;\varvec{\mu })\) represent the Galerkin reduced vector field. Let \(\hat{{{x}}}(t)\) denote an approximate reduced vector field constructed by a machine learning method and \({\hat{{{f}}_r}}(\hat{{{x}}},t;\varvec{\mu })\) be the corresponding vector field.

The error of the reduced model can be defined as \({{e}}(t) :={{\tilde{x}}}(t) -{{x}}(t)\). Let \(P = {\varvec{V}}{\varvec{V}}^*\). Let \({{e}}_o(t):=(I_n -P) {{e}}(t)\), which denotes the error component orthogonal to range (\({\varvec{V}}\)), and \({{e}}_i(t):=P {{e}}(t)\), which denotes the component of error parallel to range (\({\varvec{V}}\)). Let \({{\tilde{x}}}:= P {{x}}\) denote the direct projection of \({{x}}\). Thus, we have

However, since the system is evolutionary with time, further approximations of the projection-based reduced model result in an additional error \({{e}}_i(t)\), and we have

Although \({{e}}_i(t)\) and \({{e}}_o(t)\) are orthogonal to each other, they are not independent.

### Lemma 1

Consider the dynamical system as Eq. (1) over the interval \(t\in J\) and \(\varvec{\mu }\in J_\mu \). Suppose \({{f}}({{x}},t;\varvec{\mu })\) is a uniformly Lipschitz function of \({{x}}\) with constant *K* and a continuous function of \(t\) for all \(({{x}},t, \varvec{\mu }) \in B_b({{x}}_0) \times J_t \times J_\mu \). Suppose \({\hat{{{f}}_r}}(\hat{{{x}}},t;\varvec{\mu })\) is a uniformly Lipschitz function of \(\hat{{{x}}}\) and a continuous function of \(t\) for all \(({{x}},t;\varvec{\mu }) \in B_b(x_0) \times J_t \times J_\mu \). Suppose the data-driven approximation satisfies \(\Vert {\hat{{{f}}_r}}(\hat{{{x}}},t;\varvec{\mu }) - {\varvec{V}}^* {{f}}({\varvec{V}}\hat{{{x}}},t;\varvec{\mu }) \Vert \le C\) for all \((\hat{{{x}}}, t, \varvec{\mu }) \in B_b(\hat{{{x}}}(\times _0)) \times J_t \times J_\mu \). Then the error \({{e}}(t)=\hat{{{x}}}(t)-{{x}}(t)\) in the infinity norm for the interval \(J_t\) is bounded by

### Proof

For notation simplification, we fix \(\varvec{\mu }\) and do not explicitly denote it in vector fields. Substituting Eqs. (1), (10) into the differentiation of \({{e}}_o(t)+ {{e}}_i(t) = {\varvec{V}}\hat{{{x}}}(t) - {{x}}(t)\) yields

Left multiplying Eq. (17) by *P* and recognizing that \(P {\varvec{V}}= {\varvec{V}}\) gives

Using this equation by expanding \( \Vert {{e}}_i({{x}}+ h) \Vert \) and applying triangular inequality yields

Using \(\Vert {\varvec{V}}\Vert =\Vert P\Vert =1\) and \(\Vert {\hat{{{f}}_r}}(t, \hat{{{x}}}) - {\varvec{V}}^* {{f}}(t, {\varvec{V}}\hat{{{x}}}) \Vert \le C\), the last inequality gives

Rearranging this inequality and applying the Lipschitz conditions gives

Since \({\mathscr {O}}(h)\) can be uniformly bounded independent of \({{e}}_i(t)\), using the mean value theorem and letting \(h \rightarrow 0\) give

Rewriting the above inequality into integral form, \(\left\| {{e}}_i(t) \right\| \le \alpha (t) + K \int _0^ t\Vert {{e}}_i(\tau ) \Vert d\tau \), where \(\alpha (t): = \left\| {{e}}_i(0) \right\| + K\int _0^ t{\Vert {{e}}_o(\tau ) \Vert d \tau }+C t\), and using Gronwall’s lemma, we obtain

By definition, \({\left\| {{e}}_o\right\| _ \infty } \ge \left\| {{e}}_o(t) \right\| \) for any \(t\in J_t\). It follows that \(\alpha (t) \le \left\| {{{{e}}_i}(0)} \right\| + K t\left\| {{e}}_o\right\| _\infty +C t\). Simplifying the integral of the right-hand side of the above inequality gives

for any \(t\in J_t\). If follows that

Combining the above inequality with \(\left\| {{e}}\right\| _\infty \le \left\| {{{e}}_i} \right\| _\infty + \left\| {{{e}}_o} \right\| _\infty \), one can obtain Eq. (16). \(\square \)

### Remark

The above lemma provides a bound for \(\Vert {{e}}_i(t)\Vert \) in terms of \(\Vert {{e}}_o\Vert _\infty \) and \(\Vert {{e}}_i(0)\Vert \). We have \(\left\| {{e}}_i(0)\right\| =0\) when the initial condition of the reduced model is given by \(\hat{{{x}}}(0)={\varvec{V}}^* {{x}}_0\) for Eq. (10). In this situation, Eq. (16) becomes \({\left\| {{e}}\right\| _\infty } \le e^{KT} {\left\| {{{{e}}_o}} \right\| _\infty }+ \frac{C}{K}(e^{KT}-1) \).

## Numerical experiments

To assess the proposed non-intrusive ROM, we consider two parameterized PDEs: (i) 1D inviscid Burgers’ equation, and (ii) 2D convection–diffusion equation. We implement explicit integrator, including 4th-order Runge–Kutta solvers, and Newton–Raphson, fixed-point iteration in backward Euler as the implicit methods.

### 1D Burgers’ equation

The experiment first employs a 1D parameterized inviscid Burgers’ equation. The input parameters \(\varvec{\mu }=(a,b)\) in the space \([1.5, 2]\times [0.02, 0.025]\). In the current setup, the parameters for online test are fixed to be \(\varvec{\mu }=(1.8, 0.0232)\). In the FOM, the problem is often solved using a conservative finite-volume formulation and Backward Euler in time. The 1D domain is discretized using a grid with 501 nodes, corresponding to \(x=i\times (100/500), i=0,\ldots ,500\). The solution \(u(x,t)\) is computed in the time interval \(t\in [0,25]\) using different time step sizes considering the convergence in each time integrator.

The solution \(u({{x}}, t)\) is computed in the time interval \(t\in [0,25]\) using a uniform computational time-step size \(\varDelta t\).

#### Data collection

We investigate time step verification on choosing an appropriate \(\varDelta t\) of the time integrator for the problem. We collect the solutions under an increasing number of time steps \(N_t= [25, 50, 100, 200, 400, 800, 1600, 3200, 6400]\) using both Runge–Kutta as well as backward Euler integrator. Throughout the paper, we select the time step at \(99\%\) of the asymptotic rate of convergence. The verification results in Fig. 1 show that \(N_t= 200\) is a reasonable number of time steps to use for the 4th-order Runge–Kutta and \(N_t=800\) for backward Euler method. During the offline stage, we run four full simulations corresponding to the corner parameters of the space \([1.5, 2]\times [0.02, 0.025]\). Then, we sample the training data from Latin-hypercube. In the sampling, \(N_\text {training}\) and \(N_\text {validation}\) instances of the state, time and parameters are generated following the criterion that the minimum distances between the data points are maximized. For this study, the default size of the training set is \(N_\text {training}= 1000\) and the default size of the validation set is \(N_\text {validation}= 500\). The reduced vector field \({{f}}_r\) is computed for each input pairs \((\hat{{{x}}},t;\varvec{\mu })\). Note that both the training and validation stage only involves pure machine learning. Then in the test stage, we evaluate the proposed ROM. The parameters are fixed to be \(\mu =(1.8,0.0232)\) for testing purpose.

#### Model validation

We use SVR with kernel functions (2nd, 3rd order polynomials and radial basis function), kNN, Random Forest, Boosting, VKOGA, and SINDy as regression models to approximate reduced velocity. In particular, for each regression method, we change the model hyperparameters and plot the relative training error and validation error. The relative error (continuous-time model) is defined by

We then plot the learning curve of each regression method and compare the performance of each model on training and validation data over a varying number of training instances in “Cross-validation and hyperparameter tuning” section. By properly choosing the hyperparameters and the number of training instances, our regression models can effectively balance bias and variance.

#### Simulation of the surrogate ROM

We can now solve the problem using the surrogate model along the trajectory of the dynamical system. After applying time integration to the regression-based ROM, we compute the relative error of the proposed models as a function of time. We investigate both Newton–Raphson, fixed-point iteration in backward Euler and 4th-order Runge–Kutta in explicit methods. Let \({{x}}(t)\), \(\hat{{{x}}_{b}}(t)\) and \(\hat{{{x}}}(t)\) denote the solution of the FOM, the Galerkin ROM, and non-intrusive ROMs respectively. We define the relative error with respect to FOM \({{e}}_\textit{FOM}(t)\) and Galerkin ROM \({{e}}_\textit{ROM}(t)\) as

The corresponding averaged relative error over the entire time domain \({{e}}_\textit{FOM}\) and \({{e}}_\textit{ROM}\) can be computed as

Let \(t_{FOM}\), \(t_{ROM}\), and \(\tau \) denote the running time of FOM, Galerkin ROM, and non-intrusive ROM respectively. Define the relative running time with respect to the FOM and the Galerkin ROM:

and

The following are the simulation results from backward Euler method with \(N_t=800\). Figure 2 plots the state-space error with respect to the FOM and ROM using the backward Euler integrator. As validation results predict, SVR3 and SINDy behave better than the other models, achieving a relative ROM error below 1*e*-4 over the entire time domain, and the relative error in terms of FOM is well bounded. Figure 3 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. For differential models, the relative time is calculated using the less expensive approach, i.e. Newton’s method. By comparison, SINDy requires much less relative time than SVR3, at a comparable level of relative error in both FOM and ROM. Table 1 summarizes (i) online running time of all methods, (ii) mean time-integrated error with respect to the FOM, and (iii) mean time-integrated error with respect to ROMs using the backward Euler integrator. For differentiable models, we compare the computational time of both Newton’s method and fixed point iteration in backward Euler. For those models that are non-differentiable, i.e. random forest, boosting and kNN, only the running time of the fixed-point iteration method is reported. All the ROM and FOM solutions are computed at the verified backward Euler time step \(\varDelta {t} = 3.12e\)-2. Note that the non-intrusive ROM, e.g. SINDy with Newton’s method can accelerate the solver by \(10.4{\times }\) relative to the FOM and \(2.5{\times }\) compared to the Galerkin ROM at a relative error of 0.0346 and 4.36*e*-5 respectively.

We examine simulation results from 4th-order Runge–Kutta method with \(N_t=200\). Figure 4 shows the state-space error with respect to the FOM and ROM using the Runge–Kutta integrator. SVR2, SVR3 and SINDy have a comparable performance, and result in a bigger ROM error relative to the backward Euler solver. We notice that the random forest model begins to diverge after \(t>10\) in the explicit scheme. This can be explained by the corresponding performance in model evaluation in “Special regression models” section. Figure 5 plots the Pareto frontier error with respect to the relative running time using the backward Euler integrator. VKOGA has the smallest relative time in both the FOM and ROM comparison. SINDy requires slightly more running time while the accuracy outperforms VKOGA. Table 2 shows the online running time of all methods, the mean time-integrated error with respect to the FOM, and the mean time-integrated error with respect to the Galerkin ROM using the Runge–Kutta solver. For a fair comparison, all the ROM and FOM solutions are computed at the verified Runge–Kutta time step as selected in Fig. 1b. The results show that SVR based models, e.g. SVR2 and SVR3, yield the smallest relative errors, however the computational cost is more expensive than the FOM. Note that the non-intrusive ROM (VKOGA) can speed up the solver by \(6.9{\times }\) relative to the FOM and \(3.3{\times }\) over the Galerkin ROM at a relative error of 0.0353 and 0.0164 respectively.

### 2D convection–diffusion

We consider a 2D parametric nonlinear heat equation. Given a state variable \(u = u(x, y, t)\), the governing equation is described as

The parameters are given by \({\varvec{\mu }}_0 = 0.01\), and \(({\varvec{\mu }}_1, {\varvec{\mu }}_2) \in [9, 10]^2\). The spatial domain is \([0,1]^2\) and Dirichlet boundary conditions are applied. The FOM uses a finite difference discretization with \(51 \times 51\) grid points. The full time domain is [0, 2] and we evaluate both the backward Euler and Runge–Kutta methods for time integration with uniform time steps. Figure 6 shows the solution profile at \(t = 2\) with input parameter \((\mu _1, \mu _2) = (9.5, 9.5)\).

#### Data collection

Similar to the 1D case, first we investigate the appropriate time step \(\varDelta t\) for solving the ODE. We collect the solutions of a sequence number of time steps \(N_t= [25, 50, 100, 200, 400, 800, 1600, 3200, 6400]\) for (i) explicit Runge–Kutta and (ii) implicit backward-Euler integrator. The verification results in Fig. 7 shows that \(N_t= 200\) is a reasonable number of time steps to use for Runge–Kutta and \(N_t=800\) for backward Euler method. During the offline stage, we run four full simulations corresponding to the corner parameters of the space \([9, 10]^2\). Then, the training data are sampled from a Latin-hypercube for better covering the parameter space. In the sampling, \(N_\text {training}\) and \(N_\text {validation}\) instances of the state, time and parameters are generated following the criterion that the minimum distances between the data points are maximized. We use the default size of the training set \(N_\text {training}= 1000\) and of the validation set \(N_\text {validation}= 500\). Then the reduced vector field \({{f}}_r\) is computed for each input pairs \((\hat{{{x}}},t;\varvec{\mu })\). In the training and validation stage, we regress the reduced vector field \({{f}}_r\) by the input \((\hat{{{x}}},t;\varvec{\mu })\); in the test stage, we evaluate the ROM. The parameters are fixed to be \((\mu _1, \mu _2) = (9.5, 9.5)\).

#### Model validation

We report the performance of SVR (2nd, 3rd poly and rbf), kNN, Random Forest, Boosting, VKOGA, and SINDy as regression models to approximate reduced velocity. In particular, as in “Model validation” section, for each regression method, we change the model hyperparameters and plot the relative training and validation error. The relative error is defined by Eq. (19). Similarly, we plot the learning curve of each regression method and compare the performance of each model on training and validation data over a varying number of training instances in “Cross-validation and hyperparameter tuning” section. We aim to balance bias and variance in each regression model, by properly choosing hyperparameters and the number of training instances.

#### Simulation of the surrogate ROM

We can now solve this 2D problem using the surrogate model along the trajectory of the dynamical system. After applying time integration to the regression-based ROM, we compute the relative error of the proposed models as a function of time. As in “Simulation of the surrogate ROM” section, we investigate both the backward Euler and Runge–Kutta integrators. The following are the simulation results from backward Euler method with \(N_t=800\). Figure 8 plots the state-space error with respect to FOM and ROM using the backward Euler integrator. VKOGA outperforms the other models, achieving a relative ROM error below 6*e*-2 over the entire time domain, and the accuracy is closest to Galerkin ROM.

Figure 9 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. VKOGA performs best in terms of both accuracy and time efficiency, when comparing with the Galerkin ROM.

Table 3 presents the online running time of all methods, the mean time-integrated error with respect to FOM, and the mean time-integrated error with respect to the Galerkin ROM using the backward Euler integrator for the 2D convection–diffusion equation. To compare, all the ROM and FOM solutions are computed at the verified backward Euler time step. Note that the non-intrusive ROM, e.g. VKOGA with Newton’s method, can improve the solve time by three orders of magnitude over the FOM and \(111.2{\times }\) compared to the Galerkin ROM at a relative error of 0.0059 and 0.0041 respectively.

We examine the simulation results from the 4th-order Runge–Kutta method with \(N_t=200\). Figure 10 shows the state-space error with respect to FOM and ROM using the Runge–Kutta integrator. SVR2, SVR3, VKOGA, and SINDy have a comparable performance, and result in a smaller ROM error relative to in the backward Euler solver. We notice that the random forest, boosting models and kNN begin to diverge quickly in the second half of the time domain. Figure 11 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. VKOGA and SINDy outperform the other models in terms of both computation accuracy and time cost. The relative error compared to Galerkin ROM and FOM is below 1*e*-2. Table 4 includes online running time of all methods, the mean time-integrated error with respect to the FOM, and the mean time-integrated error with respect to the Galerkin ROM using the Runge–Kutta integrator for the 2D convection–diffusion equation. In the comparison, all the ROM and FOM solutions are computed at the verified backward Euler time step. We observe that the computational efficiency of the non-intrusive ROM performs significantly better than Galerkin ROM. VKOGA using Runge–Kutta can speed up the solver \(3406.9{\times }\) that of the FOM and \(468.3{\times }\) that of the Galerkin ROM at a relative error of 0.0083 and 0.0069. SINDy using Runge–Kutta can accelerate the solver \(2524.1{\times }\) over the FOM and \(347.0{\times }\) compared to the Galerkin ROM at a relative error of 0.0077 and 0.0066 respectively.

## Conclusions

In this study, we demonstrate the effectiveness of a data-driven model reduction method for solving two types of parametric PDEs non-intrusively, in both explicit and implicit time integration schemes. The approach successfully avoids the cost and intrusiveness of querying the full-order model and reduced-order model in the simulation, by approximating low-dimensional operators using regression methods. In the offline stage, we train the regression models using state-of-the art techniques from machine learning and specific dynamical learning methods in a reduced order architecture; in the online stage, the surrogate ROM then solves problems more efficiently (orders of magnitude in speedup) using the learned reduced operators. The proposed approach speeds up the full-order model simulation by one order of magnitude in the 1D Burgers’ equation and three order of magnitude in the 2D convection–diffusion equation. Among the machine learning models we evaluate, VKOGA and SINDy distinguish themselves from the other models, delivering superior cost vs. error trade-off.

Further work involves a number of important extensions and directions that arise out of this work. First, it will be interesting to investigate the effectiveness on solving a different types of PDE or a more complex nonlinear dynamical systems, e.g. Euler’s equation. Morerover, for nonlinear Lagrangian dynamical systems, we need to develop structure-preserving approximations for all reduced Lagrangian ingredients in the model reduction. Rather than apply a Galerkin projection to obtain the ROM, one can alternatively employ a least-square Petrov–Galerkin (LSPG) [14, 15, 17], which requires a regression method predicting non-negative values. Analyzing models on a nonlinear manifold is an important area of future work, where we can go beyond the linear spaces, and mappings between the original full-order space \({\mathbb {R}}^N\) and the reduced-order subspace \({\mathbb {R}}^n\) in Galerkin projection. The growing intersection of dynamical systems and data science are driving innovations in estimating the prediction of complex systems [10]. With a deeper understanding of neural networks, it may be possible to generalize the non-intrusive approach to another level of accuracy. Furthermore, physics-informed models [45] may accelerate learning from data and add interpretability to existing approaches. This framework can also be combined with distributed systems to enable parallelism for extreme-scale high performance computing.

## Data availibility statement

Data can be generated by the examples of the PDEs and will be available on reasonable request.

## References

- 1.
Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992;46(3):175–85.

- 2.
Amsallem D, Cortial J, Carlberg K, Farhat C. A method for interpolating on manifolds structural dynamics reduced-order models. Int J Numer Methods Eng. 2009;80(9):1241–58.

- 3.
Aubry N, Holmes P, Lumley JL, Stone E. The dynamics of coherent structures in the wall region of a turbulent boundary layer. J Fluid Mech. 1988;192:115–73.

- 4.
Bai Z, Kaiser E, Proctor JL, Kutz JN, Brunton SL. Dynamic mode decomposition for compressive system identification. AIAA J. 2020;58(2):561–74.

- 5.
Barrault M, Maday Y, Nguyen NC, Patera AT. An ‘empirical interpolation’ method: application to efficient reduced-basis discretization of partial differential equations. C R Math. 2004;339(9):667–72.

- 6.
Berkooz G, Holmes P, Lumley JL. The proper orthogonal decomposition in the analysis of turbulent flows. Ann Rev Fluid Mech. 1993;25(1):539–75.

- 7.
Billings SA. Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. Hoboken: Wiley; 2013.

- 8.
Bongard J, Lipson H. Automated reverse engineering of nonlinear dynamical systems. Proc Nat Acad Sci. 2007;104(24):9943–8.

- 9.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

- 10.
Brunton SL, Kutz JN. Data-driven science and engineering: machine learning, dynamical systems, and control. Cambridge: Cambridge University Press; 2019.

- 11.
Brunton SL, Noack BR. Closed-loop turbulence control: progress and challenges. Appl Mech Rev. 2015;67:050801-1-050801–48.

- 12.
Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Nat Acad Sci. 2016;113(15):3932–7.

- 13.
Brunton SL, Brunton BW, Proctor JL, Kaiser E, Kutz JN. Chaos as an intermittently forced linear system. Nat Commun. 2017;8(19):1–9.

- 14.
Carlberg K, Farhat C, Bou-Mosleh C. Efficient nonlinear model reduction via a least-squares Petrov-Galerkin projection and compressive tensor approximations. Int J Numer Methods Eng. 2011;86(2):155–81.

- 15.
Carlberg K, Farhat C, Cortial J, Amsallem D. The gnat method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. J Comput Phys. 2013;242:623–47.

- 16.
Carlberg K, Tuminaro R, Boggs P. Preserving lagrangian structure in nonlinear model reduction with application to structural dynamics. SIAM J Sci Comput. 2015;37(2):B153–84.

- 17.
Carlberg K, Barone M, Antil H. Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction. J Comput Phys. 2017;330:693–734.

- 18.
Carlberg KT, Jameson A, Kochenderfer MJ, Morton J, Peng L, Witherden FD. Recovering missing cfd data for high-order discretizations using deep neural networks and dynamics learning. J Comput Phys. 2019;395:105–24.

- 19.
Chaturantabut S, Sorensen DC. Nonlinear model reduction via discrete empirical interpolation. SIAM J Sci Comput. 2010;32(5):2737–64.

- 20.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. Berlin: Springer; 2009.

- 21.
Hernandez Q, Badias A, Gonzalez D, Chinesta F, Cueto E. Deep learning of thermodynamics-aware reduced-order models from data. Comput Methods Appl Mech Eng. 2021;379:113763.

- 22.
Hesthaven J, Ubbiali S. Non-intrusive reduced order modeling of nonlinear problems using neural networks. J Comput Phys. 2018;363:55–78.

- 23.
Holmes PJ, Lumley JL, Berkooz G. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge monographs in mechanics. Cambridge: Cambridge University Press; 1996.

- 24.
Juang JN. Applied system identification. Upper Saddle River: Prentice Hall PTR; 1994.

- 25.
Juang JN, Pappa RS. An eigensystem realization algorithm for modal parameter identification and model reduction. J Guid Control Dyn. 1985;8(5):620–7.

- 26.
Kaiser E, Noack BR, Cordier L, Spohn A, Segond M, Abel M, Daviller G, Osth J, Krajnovic S, Niven RK. Cluster-based reduced-order modelling of a mixing layer. J Fluid Mech. 2014;754:365–414.

- 27.
Kani JN, Elsheikh AH. Dr-rnn: a deep residual recurrent neural network for model reduction. 2017. arXiv:1709.00939.

- 28.
Kukreja SL, Brenner MJ. Nonlinear system identification of aeroelastic systems: a structure-detection approach. In: Identification and control. Springer;2007:117–45.

- 29.
Kutz JN. Deep learning in fluid dynamics. J Fluid Mech. 2017;814:1–4.

- 30.
Kutz JN, Brunton SL, Brunton BW, Proctor JL. Dynamic mode decomposition: data-driven modeling of complex systems. Philadelphia: SIAM; 2016.

- 31.
Lee M, Malaya N, Moser RD. Petascale direct numerical simulation of turbulent channel flow on up to 786k cores. In: Proceedings of SC13: International conference for high performance computing, networking, storage and analysis, ACM, 2013. p. 61.

- 32.
Ling J, Kurzawski A, Templeton J. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J Fluid Mech. 2016;807:155–66.

- 33.
Ljung L. System identification: theory for the user. Hoboken: Prentice Hall; 1999.

- 34.
Loiseau JC, Brunton SL. Constrained sparse Galerkin regression. J Fluid Mech. 2018;838:42–67.

- 35.
Loiseau JC, Noack BR, Brunton SL. Sparse reduced-order modeling: sensor-based dynamics to full-state estimation. J Fluid Mech. 2018;844:459–90.

- 36.
Mezić I. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 2005;41(1–3):309–25.

- 37.
Milano M, Koumoutsakos P. Neural network modeling for near wall turbulent flow. J Comput Phys. 2002;182(1):1–26.

- 38.
Nelles O. Nonlinear system identification: from classical approaches to neural networks and fuzzy models. Berlin: Springer Science & Business Media; 2013.

- 39.
Noack BR, Afanasiev K, Morzynski M, Tadmor G, Thiele F. A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. J Fluid Mech. 2003;497:335–63.

- 40.
Pan S, Duraisamy K. Long-time predictive modeling of nonlinear dynamical systems using neural networks. Complexity. 2018. https://doi.org/10.1155/2018/4801012.

- 41.
Peherstorfer B, Willcox K. Data-driven operator inference for nonintrusive projection-based model reduction. Comput Methods Appl Mech Eng. 2016;306:196–215.

- 42.
Qin T, Wu K, Xiu D. Data driven governing equations approximation using deep neural networks. J Comput Phys. 2019;395:620–35.

- 43.
Raissi M, Karniadakis GE. Machine learning of linear differential equations using gaussian processes. 2017. arXiv preprint arXiv:1701.02440.

- 44.
Raissi M, Karniadakis GE. Hidden physics models: machine learning of nonlinear partial differential equations. J Comput Phys. 2018;357:125–41.

- 45.
Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys. 2019;378:686–707.

- 46.
Rowley CW, Mezić I, Bagheri S, Schlatter P, Henningson D. Spectral analysis of nonlinear flows. J Fluid Mech. 2009;645:115–27.

- 47.
Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. Sci Adv. 2017;3:e1602614.

- 48.
Schaeffer H. Learning partial differential equations via data discovery and sparse optimization. Proc R Soc A Math Phys Eng Sci. 2017;473(2197):20160446.

- 49.
Schmid PJ. Dynamic mode decomposition of numerical and experimental data. J Fluid Mech. 2010;656:5–28.

- 50.
Schmidt M, Lipson H. Distilling free-form natural laws from experimental data. Science. 2009;324(5923):81–5.

- 51.
Semeraro O, Lusseyran F, Pastur L, Jordan P. Qualitative dynamics of wavepackets in turbulent jets. 2016. arXiv preprint arXiv:1608.06750.

- 52.
Singh AP, Medida S, Duraisamy K. Machine-learning-augmented predictive modeling of turbulent separated flows over airfoils. AIAA J. 2017;55:2215–27.

- 53.
Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004;14(3):199–222.

- 54.
Swischuk R, Mainini L, Peherstorfer B, Willcox K. Projection-based model reduction: formulations for physics-based machine learning. Comput Fluids. 2018;179:704–17.

- 55.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc: Ser B (Methodol). 1996;58(1):267–88.

- 56.
Tu JH, Rowley CW, Luchtenburg DM, Brunton SL, Kutz JN. On dynamic mode decomposition: theory and applications. J Comput Dyn. 2014;1(2):391–421.

- 57.
Vaerenbergh SV, et al. Kernel methods for nonlinear identification, equalization and separation of signals. Universidad de Cantabria; 2010.

- 58.
Vlachas PR, Byeon W, Wan ZY, Sapsis TP, Koumoutsakos P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc R Soc A. 2018;474(2213):20170844.

- 59.
Wan ZY, Sapsis TP. Reduced-space Gaussian process regression for data-driven probabilistic forecast of chaotic dynamical systems. Physica D: Nonlinear Phenomena. 2017;345:40–55.

- 60.
Wan ZY, Vlachas P, Koumoutsakos P, Sapsis T. Data-assisted reduced-order modeling of extreme events in complex dynamical systems. PLoS ONE. 2018;13(5):e0197704.

- 61.
Wang JX, Wu JL, Xiao H. Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on dns data. Phys Rev Fluids. 2017;2(3):034603.

- 62.
Wang M, Li H, Chen X, Chen Y. Deep learning-based model reduction for distributed parameter systems. IEEE Trans Syst Man Cybern: Syst. 2016;46(12):1664–74.

- 63.
Wang Q, Hesthaven JS, Ray D. Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem. J Comput Phys. 2018. http://infoscience.epfl.ch/record/255708.

- 64.
Wirtz D, Haasdonk B. A vectorial kernel orthogonal greedy algorithm. Dolomites Res Notes Approx. 2013;6(Special-Issue):83–100.

- 65.
Wirtz D, Karajan N, Haasdonk B. Surrogate modeling of multiscale models using kernel methods. Int J Numer Methods Eng. 2015;101(1):1–28.

- 66.
Xiao D, Fang F, Pain C, Hu G. Non-intrusive reduced-order modelling of the Navier-stokes equations based on rbf interpolation. Int J Numer Methods Fluids. 2015;79(11):580–95.

- 67.
Xie X, Mohebujjaman M, Rebholz L, Iliescu T. Data-driven filtered reduced order modeling of fluid flows. SIAM J Sci Comput. 2018;40(3):B834–57.

- 68.
Xie X, Zhang G, Webster CG. Non-intrusive inference reduced order model for fluids using deep multistep neural network. Mathematics. 2019;7(8):757.

- 69.
Zhang ZJ, Duraisamy K. Machine learning methods for data-driven turbulence modeling. In: 22nd AIAA computational fluid dynamics conference, AIAA, 2015. p. 2460.

## Acknowledgements

We would like to thank Kevin Carlberg, Steve Brunton and Youngsoo Choi for valuable discussions. ZB acknowledges support by the U.S. Department of Energy, Office of Science, under Award Number DE-AC02-05CH11231. ZB and LP acknowledge support for preliminary work from Sandia National Laboratories. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

## Author information

### Affiliations

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare that they have no conflict of interest.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendix A: Regression models and tuning

### Appendix A: Regression models and tuning

### Special regression models

In this section, we detail two non-transitional machine learning models, VKOGA and SINDy, in addition to the SVR, kNN, random forest and boosting.

#### Vectorial kernel orthogonal greedy algorithm (VKOGA)

Developed from kernel-based methods, VKOGA makes the assumption that a common subspace can be used for every component of vector-valued response. With such, it significantly reduces the training and evaluation time when the number of kernels in the resulting regression model is large. VKOGA constructs the regression model

where \(K:{\mathbb {R}}^{ N_x }\times {\mathbb {R}}^{ N_x } \rightarrow {\mathbb {R}}^{ }\) denotes a kernel function, for each individual regression model \(h_i\), \(i=1,\ldots ,N_x\), and \(\alpha _i \in {\mathbb {R}}^{ {{x}}_x }, i=1,\ldots ,N_x\) are vector-valued basis functions. VKOGA uses a greedy algorithm to compute kernel functions \(K(\bar{{{x}}}, \cdot )\). The greedy algorithm determines kernel centers from \(\varOmega = {\bar{{{x}}}^1, \ldots , \bar{{{x}}}^{n_{train}}}\). Let the initial state \(\varOmega _0=\emptyset \), and then at stage *m*, choose

where \(\phi _x^{m-1}\) denotes the orthogonal remainder of \(K({{x}}, \cdot )\) with respect to the reproducing kernel Hilbert space spanned by \({K({{x}}_1, \cdot )}, \ldots , K({{x}}_{m-1}, \cdot )\). Then the kernel centers are updated \(\varOmega _m = \varOmega _{m-1} \cup {{{x}}_m}\). At the next step, the basis function \(\varvec{\alpha }_i\), \(i=1, \cdots , n_{\text {VKOGA}}\) are determined by a least-squares approximation to fit the training data. In the numerical experiments, we apply the Gaussian RBF kernel \(K(\bar{{{x}}}^i, \bar{{{x}}}^j) = exp(-\gamma \Vert \bar{{{x}}}^i, \bar{{{x}}}^j\Vert )\), where the hyperparameter is the number of kernel functions used.

#### Sparse identification of nonlinear dynamics (SINDy)

The sparse identification of nonlinear dynamics (SINDy) method [12] identifies nonlinear dynamical system from measurement data, seeking to approximate the vector field \({{f}}\) by a generalized linear model. This architecture avoids overfitting by identifying a parsimonious model, which is more explainable or predictable than black-box machine learning. Although SINDy was originally devised for continuous-time dynamics, it can be extended to discrete-time systems. In particular, SINDy yields a model

where \(p_i:{\mathbb {R}}^{ N_x }\rightarrow {\mathbb {R}}^{ }\), \(i = 1, \ldots , n_{\text {SINDy}}\) denotes a “library” of candidate basis functions. The relevant terms that are active in the dynamics are solved using sparse regression that penalizes the number of functions in the library \({\varvec{P}}({{x}})\):

where \(\Vert \cdot \Vert _1\) promotes sparsity in the coefficient vector \({\varvec{\theta }}_i\) and the parameter \(\alpha \) balances low model complexity with accuracy. The least absolute shrinkage and selection operator (LASSO) [55] or the sequential threshold least-squares (STLS) [12] can be used to determine a sparse set of coefficients (\({\varvec{\theta }}_1, \ldots , \theta _{n_{\text {SINDy}}}\)). In the numerical experiments, we consider linear and quadratic functions in the library, i.e., \(p_i \in \{x_1, \ldots , x_{N_x}, x_1x_1, x_1x_2,\ldots , x_{N_x}x_{N_x}\}\). Hyperparameters in this approach consist of the prescribed basis functions.

Table 5 summarizes the computational complexities of all the considered regression models. Note that the hyperparameter varies in different models, i.e. \(N_\text {trees}\) denotes the number of decision tress in random forest, \(N_\text {learners}\) denotes the number of weak learners in the boosting method, \(N_\text {functions}\) is the numner of kernel functions in VKOGA, and \(N_\text {bases}\) represents the number of bases in SINDy with 2nd order polynomials.

### Cross-validation and hyperparameter tuning

For each regression model proposed in “Surrogate ROM” section, we use cross-validation to prevent overfitting and select hyperparameters.

We report the hyperparameter selection of each model for the 2D convection–diffusion equation in Fig. 12. In Fig. 12a, it shows that with the sensitivity margin \(\epsilon \le 0.1\), both the training error and validation errors are very small, and thus we select the sensitivity margin \(\epsilon =0.1\) for SVR2. Similarly, \(\epsilon =10^{-3}\) is selected for SVRrbf. Figure 12d shows that within 40 trees, the validation error of the random forest model is still relatively large. Moreover, there is a big gap between the training error and test error that implies the random forest model may suffer from overfitting for the given data. We select \(N_\text {trees}=15\) in this study. As shown in Fig. 12e, the validation error in Boosting approaches the minimum at \(N_\text {learners} = 40\), and thus we select 40 weak learners. Figure 12f shows that by choosing the number of neighbors \(K \approx 4\), the validation error is minimized. Thus, \(K = 4\) is a good hyperparameter to balance bias and variance. In Fig. 12g we observe that the more kernel functions we use, the smaller value of training and validation error we obtain. In this study, we select 500 kernel functions in VKOGA.

We examine the performance of the models as an increasing size of the training sample in Fig. 13. Figure 13a shows that with 200 training instances, the SVR2 model has very small validation error, while SVR3 model requires a bigger training set. For SVRrbf, the more training instances we have, the smaller validation error we obtain. This is because the resolution of the radial basis function is determined by the density of input parameter space. Figure 13d shows that the more data are input, the less the relative error. Intuitively, more data give higher density in the input parameters space; and thus each leaf of a decision tree provides finer coverage for test data. Similarly, for the kNN model as in Fig. 13f, close neighbors give better representation of test data. As a result, more data can effectively reduce the variance and solve the overfitting issue. Figure 13e shows that with approximately 1000 training samples, the training error and validation error begin to approach 2*e*-3. This implies that a relative large number of data are required in order to constrain the variance of the boosting method. Figure 13g shows that with 1000 training instances, the training and validation errors begin to converge for VKOGA, while SINDy (2nd order polynomials) only requires a small number of training instances to get a very accurate prediction, as seen in Fig. 13h. Similarly, we select the following hyperparameters: \(\epsilon =10^{-4}\) for SVR2, \(\epsilon =10^{-5}\) for SVR3, \(\epsilon =10^{-3}\) for SVRrbf, \(N_\text {trees} = 15\) for random forest, \(N_\text {learners} = 40\) for boosting, \(K=6\) for *k*-nearest neighbours, and 500 kernel functions for VKOGA in the 1D Burgers’ implementation.

Figure 14 summarizes the overall performance of the above models for the 1D Burgers’ and 2D convection–diffusion equation. All the proposed models are validated using the selected hyperparameters. We notice that SVR with the kernel of 3rd order polynomials outperforms the other models for the 1D Burgers’ equation. This can be related to the discretization structure of quadratic nonlinearties in the PDE solver. VKOGA has the a better reperformance than RF, Boosting, kNN, with a relative error less than 1*e*-4. SINDy reaches a relative error of 3*e*-8. For the 2D convection–diffusion problem, SVR models again perform better than RF, Boosting and kNN, and SVR3 outperforms the other two kernel functions. VKOGA has the best accuracy with relative error \(\approx 1e\)-5, while SINDy reaches a relative error of 1*e*-3. Note that to be consistent, we select fixed training and validation sample sizes for all the considered regression models. However, as illustrated in Fig. 13h, SINDy can approach the same level of accuracy with a small number of training and validation samples.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Bai, Z., Peng, L. Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators.
*Adv. Model. and Simul. in Eng. Sci.* **8, **28 (2021). https://doi.org/10.1186/s40323-021-00213-5

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40323-021-00213-5

### Keywords

- Computational mechanics
- Model reduction
- Machine learning
- Low-dimensional operators
- Dynamical systems