POD-Galerkin reduced order models and physics-informed neural networks for solving inverse problems for the Navier–Stokes equations

We present a Reduced Order Model (ROM) which exploits recent developments in Physics Informed Neural Networks (PINNs) for solving inverse problems for the Navier–Stokes equations (NSE). In the proposed approach, the presence of simulated data for the fluid dynamics fields is assumed. A POD-Galerkin ROM is then constructed by applying POD on the snapshots matrices of the fluid fields and performing a Galerkin projection of the NSE (or the modified equations in case of turbulence modeling) onto the POD reduced basis. A POD-Galerkin PINN ROM is then derived by introducing deep neural networks which approximate the reduced outputs with the input being time and/or parameters of the model. The neural networks incorporate the physical equations (the POD-Galerkin reduced equations) into their structure as part of the loss function. Using this approach, the reduced model is able to approximate unknown parameters such as physical constants or the boundary conditions. A demonstration of the applicability of the proposed ROM is illustrated by three cases which are the steady flow around a backward step, the flow around a circular cylinder and the unsteady turbulent flow around a surface mounted cubic obstacle.


Introduction and literature overview
In recent decades, research into numerical methods for solving systems of Partial Differential Equations (PDEs) has been growing rapidly. Popular methods include the finite difference (FDM), the finite element (FEM), the finite volume (FVM), and the spectral element method (SEM). However, running computational simulations using those numerical methods can be very expensive, especially in high dimensions. The situation becomes worse when simulations have to be run several times with several different input configurations (as in repetitive computational environment). These common settings can be observed in various fields such as Uncertainty Quantification (UQ), sensitivity analysis, real-time control problems, optimization, prediction and parameter estimation/inference. In such circumstances, running simulations using the classical numerical methods for each different input value could be deemed prohibitive. Therefore, numerical techniques which could bring a reduction in the computational cost are needed. Reduced Order Methods (ROMs) represent a suitable tool for achieving the goal of having computational speed-up and providing accurate solutions to the problems of interest. These methods have been applied to a variety of mathematical problems, for greater details on ROMs we refer the reader to [1][2][3][4][5]. In this article we focus on ROMs for fluid dynamics problems in the context of the reduction of the Navier-Stokes Equations (NSE).
In reduced order modeling, projection-based ROMs [6,7] represent a popular technique for the construction of surrogate reduced models, these ROMs have been applied in several fields such as civil engineering, aerospace engineering and nuclear engineering. The reduction in the projection-based ROMs is achieved by finding the reduced solution which lies in a subspace of a much smaller dimension N r , N r N h , where N h is the dimension of the original space constructed by the Full Order Model (FOM). The dimension of the FOM (i.e. N h ) represents the number of unknowns or degrees of freedom in the discretized problem. In a projection-based ROM, there are two main ingredients: (i) low dimensional spaces called the reduced spaces which are often generated using a set of snapshots (which are FOM solutions obtained for different values of time and/or parameter) and (ii) a Galerkin or a Petrov-Galerkin projection for the construction of a low dimensional N r ×N r problem whose solution is the ROM one. In the context of Parameterized PDEs (PPDEs), projection-based ROMs have been exploited for achieving a solution-space reduction by relying on greedy algorithms [8,9] or Proper Orthogonal Decomposition (POD) [10][11][12][13][14] to generate the reduced space. The application of the POD method together with a Galerkin projection technique results in the so-called POD-Galerkin ROM. This type of ROMs has been used extensively for the reduction of PPDEs, for more details on POD-Galerkin ROMs we refer the reader to [11][12][13][15][16][17].
There are several challenges for the construction of efficient POD-Galerkin ROMs for the Navier-Stokes equations. The treatment of the turbulence phenomenon at both the FOM and the ROM levels is one of them. In this work, turbulence is tackled at the full order level through modeling strategies. In other words, turbulent flows are not solved using the Direct Numerical Simulations (DNS) approach because of the enormous computational resources needed to simulate these flows for the problems of interest. In particular, turbulence modeling is done with the help of the Reynolds Averaging Navier-Stokes (RANS) [18] equations and the Large Eddy Simulations (LES) [19,20] approaches. The RANS approach is based on solving the NSE for the time-averaged part of the fluid fields, where it basically assumes that time fluctuations are of no significant interest. On the other hand, the LES approach is based on filtering the Navier-Stokes equations to some scale, then the large scales are simulated while the small scales are modeled.
Beside the issue of turbulence, the treatment at the reduced order level of the nonlinear convective term in the NSE is important and might affect the efficiency of the ROM. In several contributions, the ROM formulation approximates the nonlinear term using a third order tensor [21][22][23][24], this tensor has a dimension of N u × N u × N u , where N u is the number of reduced velocity unknowns. However, this approach of dealing with the nonlinear term could raise the computational burden when the number of reduced unknowns increases. Hyper-reduction techniques could be used for the approximation of the nonlinear term such as the EIM or DEIM methods [25,26]. In order to implement EIM or DEIM, an expensive pre-processing is required to obtain a version of parameterized operators [27]. The Gappy-POD approach also can be employed [28].
Reduced order models have been also constructed using data-driven techniques as in [29][30][31][32][33][34][35]. In these ROMs, the identification of the reduced solutions is done using datadriven approaches such as regression-based methods, interpolation techniques or Neural Networks (NNs). In addition, hybrid reduced order models which merge projectionbased ROMs and data-driven ROMs have been proposed [24,[36][37][38][39][40][41]. The latter ROMs include the use of calibration methods, the introduction of correction terms which can be approximated by the snapshots data, and the employment of data-driven techniques for the approximation of only the turbulent/eddy viscosity in the case of turbulent flows. Deep learning approaches have also been used in ROMs in order to perform nonlinear dimensionality reduction [42][43][44].
In recent years, several contributions have aimed at using machine learning techniques for solving PDEs arising from physical problems. This field of scientific computing is often termed as physics-based machine learning. We give a brief overview of related works in this field. Early work in [45] presents neural minimization algorithms for solving differential equations. The work in [46] presents an approach for solving ODEs and PDEs using feedforward neural networks which is based on approximating the solution function by a trial function that has two parts. The first part which has no tunable parameters satisfies the initial/boundary conditions, while the second part contains all the adjustable parameters which are determined by training the feedforward neural network. The construction of the second term is made in a way that guarantees no contribution to the initial/boundary conditions.
Physics-Informed Neural Networks (PINNs) have been proposed in [47] for solving general nonlinear PDEs as well as inverse problems which involve PDEs. The approach in [47] consists of approximating the solution function of the general PDEs by deep neural networks, the trainable parameters of these NNs are then learned by minimizing a loss function that takes into consideration initial/boundary data and at the same time penalizes the departure from the equations which model the physical problem. The PINNs are based on two different approaches, namely continuous time and discrete time models. The continuous model PINNs allow to infer the solution of the PDE across all time and space. On the other hand, PINNs with the discrete time model approach employ implicit Runge-Kutta time stepping schemes with unlimited number of stages for the prediction of the solution at large time steps without compromising the accuracy of the approximation.
The PINNs presented in [47] were extended to coupled multi-physics problems in [48], where the latter work presents another application of PINNs for solving inverse problems for a Fluid Structure Interaction (FSI) problem. In [49], the authors present a Deep Galerkin Method (DGM) for the approximation of high-dimensional PDEs with a deep neural network, where they solve high-dimensional free boundary PDEs in 200 dimensions. PINNs for solving the Reynolds-averaged Navier-Stokes equations for incompressible turbulent flows are proposed in [50].
A recent work [51] proposes a reduced basis method based on the use of PINNs for solving PPDEs. This work shows that training the PINNs by only minimizing the loss function that corresponds to the reduced equations does not give as accurate results as the ones obtained by the projection of FOM solution onto the reduced space. The authors indicate that for complex nonlinear problems, the PINNs trained only on the reduced equations are not accurate in approximating the original high fidelity solution. This is justified by the fact that the reduced equations do not take into account the impact of the truncated modes on the resolved ones. On the other hand, the authors demonstrate that the PINNs trained on both the output data labels and the reduced equations are more accurate.
The work in this paper aims at employing reduced order modeling coupled with PINNs for solving inverse problems. By solving an inverse problem, we attempt to infer unknown inputs or parameters from a given set of observations of the output (output data). For a review on inverse problems, we refer the reader to [52,53]. Research into inverse problems in a Bayesian setting has been conducted extensively, for example see [54][55][56][57][58]. Reduced order methods and dimension reduction techniques have been used previously for the estimation of unknown parameters in inverse problems. The work [59] presents a Bayesian approach for solving nonlinear inverse problems with the help of a Galerkin projection. In [60], stochastic reduced order models were proposed for solving inverse problems. Active subspaces were utilized for the reduction of the parameter space in a UQ problem for turbulent combustion simulations [61]. In [62], a hybrid data-driven/projection-based reduced order model is proposed for the Bayesian solution of inverse problems. The work in [63] proposes a nonlinear reduced order model for large-scale inverse problems in a Bayesian inference setting. Another approach [64] combines the nonlinear Landweber method with adaptive online reduced basis updates for solving the inverse problem related to the construction of the conductivity in the steady heat equation. The authors in [65] present a reduction approach of a parameterized forward model, which is utilized for obtaining a surrogate model in a Bayesian inverse problem setting, the inversion is done during the online stage by using the surrogate model constructed via the projection of the forward model onto the reduced spaces.
Here, we present a model for solving inverse problems in a reduced order setting. The approach is based on integrating the structure of the POD-Galerkin ROMs into physicsinformed neural networks (PINNs). In particular, we propose to incorporate the POD-Galerkin reduced order equations into the loss function of the PINNs. Consequently, the task of inferring any parameter or physical unknown which is present in the FOM equations will become feasible, thanks to the possibility of introducing additional parameters in the neural networks and making them trainable at low computational cost. The unknown parameters could be physical constants such as the physical viscosity or boundary/initial conditions or the velocity at the inlet in inlet/outlet fluid problems. The approach developed in this work is termed POD-Galerkin PINN ROM and is based on assimilating reduced simulated data into the physical model represented by the POD-Galerkin differential algebraic system. The latter reduced simulated data is obtained from the Galerkin projection of the available FOM data onto the POD reduced spaces.
This new methodology introduces a significant reduction of the computational cost associated with solving inverse problems for the NSE. In fact, the use of the PINNs directly at the full order level for inferring unknown parameters in the mathematical fluid problem could be of significant computational cost. This is due to the number of degrees of freedom in the available full order data, in turbulent 3D problems this number is of the order 10 6 or higher. On the other hand, the proposed approach deals with the inference problem by introducing two levels of approximation. The first level is represented by the dimen-sionality reduction performed by the POD and the Galerkin projection which results in a POD-Galerkin ROM, while in the second approximation level neural networks for the approximation of the reduced fluid variables are utilized. By doing so, one may leverage the power of the PINNs in inferring unknown parameters by solving the optimization problem (which has reduced number of unknowns and hence low computational cost) in which the goal is to minimize the error committed in approximating the (reduced) data and the error caused by the violation of the physics (the reduced POD-Galerkin equations). It is worth mentioning that after the training of the PINNs in the offline stage, the POD-Galerkin PINN ROM will still be able to do fast online forward computations without the need to re-train the neural networks.
This article is organized as follows: "The problem setup: parameterized Navier-Stokes equations" section introduces the full order model and addresses the incompressible Navier-Stokes equations. In "The reduced order model (ROM)" section, the reduced order model structure and methodology is presented. Firstly, the proper orthogonal decomposition method is recalled, then we present the non-intrusive reduced order model developed in this work to treat inverse problems in a reduced setting in the context of the NSE. "Numerical results" section gives three numerical examples that illustrate the results of the parameter identification using the reduction approach proposed in this work. The first example is the steady case of the flow past a backward step in a laminar setting, while the second one is the turbulent case of the flow around a circular cylinder and the last example deals with a more complex 3D turbulent case which is the flow around a surface mounted cubic box.

The problem setup: parameterized Navier-Stokes equations
The NSE are ubiquitous in science and engineering where they describe the physics of many phenomena such as modeling the air flow around an airfoil, the flow in boat wakes and the motion of bluff bodies inserted in fluid flows. In this work, the focus is on the parameterized unsteady NSE, the mathematical formulation of the problem reads as follows: Given the fluid spatial domain ∈ R d , with d = 2 or 3 and the time window [0, T ] under consideration, find the vectorial velocity field u : × [0, T ] → R d and the scalar pressure field p : where t is the time, x is the spatial variable vector and = inlet ∪ 0 ∪ outlet is the boundary of the fluid domain . The three parts that form the boundary are called inlet , outlet and 0 , they correspond to the inlet boundary, the outlet boundary and the physical walls, respectively. The fluid kinematic viscosity is denoted by ν and is constant across the spatial domain. The function f includes the boundary conditions for the non-homogeneous boundary. The initial velocity field is given by the function R(x). The normal unit vector is denoted by n. We remark that the velocity and the pressure fields depend on time, space and the parameter μ ∈ P ⊂ R q , where P is a q-dimensional parameter space, the dependencies are dropped for making the notation concise. The governing equations of (1) are discretized using the FVM [66]. In this work, the numerical solver used for solving the NSE is the finite volume C++ library OpenFOAM ® (OF) [67]. For more details on the finite volume discretization and the techniques used by OpenFOAM, we refer the reader to [68].
The fluid dynamics problems which this work aim at tackling include turbulent problems or problems with moderate to high Reynolds number Re = U ∞ L ν , where L and U ∞ are the characteristic length and velocity of the particular fluid problem, respectively. Flows with low values of Reynolds number are called laminar flows in which fluid moves smoothly or in regular paths, laminar flows are also characterized by having high momentum diffusion and low convection. In contrast to laminar flows, turbulent flows are chaotic where sudden changes in the velocity and the pressure fields are more common. Turbulent flows can be frequently observed in real life applications, examples include external flows over airplanes or ships and oceanic and atmospheric currents. Therefore, it is important to mention how turbulence is treated at both the FOM and ROM levels.
At the FOM level, turbulence is not solved directly using the so-called DNS approach, instead it is modeled using modeling strategies, namely the Reynolds Averaged Navier-Stokes equations (RANS) and the Large Eddy Simulation (LES). In both cases, one resorts to closure models which introduce an additional viscosity term known as the eddy/turbulent viscosity (denoted by ν t ) which has the same unit as the physical kinematic viscosity ν [69]. The estimation of the eddy viscosity requires the use of the so-called closure turbulence models. These models approximate ν t as a function of other turbulence variables such as the turbulent kinetic energy k, where they resolve one or more transportdiffusion PDE for the additional turbulence variables. Examples of such closure models under the RANS approach include the one equation Spalart-Allmaras (S-A) turbulence model [70] and the two equations k − [71] and SST k − ω turbulence models [72], where and ω stand for the turbulent dissipation and the specific turbulent dissipation rate, respectively. As for the LES turbulence models, the Smagorinsky model [73] is a well known LES model, other models are the dynamic eddy viscosity model proposed in [74] and the one equation model named "dynamicKEqn" [75] which has been utilized in this work. Closure turbulence models are also often termed as Eddy Viscosity Models (EVMs). For a comprehensive review on the issue of turbulence modeling, we refer the reader to [18,19].
We report here as an example the modified equations after employing the RANS approach complemented by the k − ω turbulence model: Transport-Diffusion equation for k, where F is an algebraic function that relates ν t with the turbulence variables k and ω. LES closure models result in a similar modified momentum equation for the NSE. We remark that in any of the turbulence modeling strategies mentioned above, there exists an additional vector field term in the momentum equation which is ∇ · ν t ∇u + (∇u) T . The latter term will be referred to as the turbulent term in this work.

The reduced order model (ROM)
This section presents the reduced order model (ROM) constructed for the reduction of the NSE addressed in the previous section. An effective ROM is sought for the approximation of the solutions of the parameterized NSE (1) and (2) for both laminar and turbulent flows. Therefore, the ROM will take into consideration the features of the full order model (FOM) including turbulence treatment when applicable. The ROM will then be used in parameter estimation/inference tasks. The main assumption in reduced order modeling is that the dynamics of the FOM are governed by few dominant modes, and therefore, an accurate reproduction of the full order solution is possible when one combines appropriately those dominant modes. This assumption represents a cornerstone in the construction of ROMs and mathematically it implies that the FOM solution fields of the velocity and pressure can be approximated as sum of spatial modes multiplied by temporal coefficients, i.e.: where the reduced velocity and reduced pressure modes are denoted by φ i (x) and χ i (x), respectively. The reduced modes of both variables depend only on the spatial variables. The coefficients a i (t; μ) and b i (t; μ) represent the i-th reduced solution for velocity and pressure, respectively, they depend on both time t and the parameter μ. Several methods and approaches can be applied for the generation of the reduced order spaces of the velocity and pressure defined by , respectively. The method chosen in this work for the generation of the reduced space is POD [21,22] applied directly on the set of all realizations of the solution fields which might correspond to different values of the parameters and/or time.
Efficient reduced order models rely on the notion of having two decoupled phases termed as the offline and the online phases. In the offline phase, the training procedure of the ROM is carried out. This includes sampling the parameter space and then simulating the FOM in order to generate the snapshots which are used later for the generation of the reduced order space (here the POD space). Hence, the offline stage consists of computing the POD modes and all reduced quantities, which form the reduced system and are dependent on the POD modes. The offline phase is known to have a significant computational cost due to the fact that the offline computations depend on the FOM dimension. However, the offline phase must be carried out just once for a given choice of the ROM dimension. The final result of the offline stage is the reduced order system of equations.
The online stage utilizes the ROM and hence results in fast computations which are dependent only on the dimension of the ROM. Ideally, the online stage should not depend on any aspect of the full order computational model such as accessing the original finite volume mesh. During the online stage, the solution of the reduced order problem is found by solving the low dimensional system produced during the offline stage.
In this work, POD is used for the generation of the reduced order spaces of both the velocity and pressure. After sampling the parameter space, the FOM described in is solved for each value of the parameter μ ∈ P M = {μ 1 , ...μ M } and solutions are acquired at the desired time instants This yields a total of N s = M * N T snapshots which form the following snapshots matrices for velocity and pressure: The POD velocity and pressure modes are then computed using the method of snapshots [76]. As for the identification of the reduced coefficients of the velocity and pressure in (3), we utilize feedforward neural networks to achieve this task in the online stage.
In more details, we use Physics Informed Neural Networks (PINNs) to solve the reduced problem. Firstly, the reduced equations are obtained by performing a Galerkin projection of the FOM equations onto the POD spaces of the velocity and pressure. Then, one may encode these reduced equations as a part of the loss function which has to be minimized by the neural network optimizer. The resulted non-intrusive reduced order model merges aspects of POD-Galerkin ROMs with Physics-Informed Neural Networks (PINNs), therefore, it is termed here as POD-Galerkin PINN ROM. This reduced order model is designed to solve inverse problems for the Navier-Stokes equations. The goal is to identify/infer unknown parameters or inputs in a mathematical models by comparing the predictions of these models with real or simulated measurements or outputs. The inference task is carried out by leveraging the features of neural networks which allow for the introduction of additional trainable weights. These new weights are present in the loss function through the reduced equations which makes it possible to compute gradients of the loss with respect to these weights and consequently to optimize their value. In rest of this section we describe the construction of the proposed ROM.
The construction of a POD-Galerkin ROM for the Navier-Stokes equations starts by projecting the momentum equation onto the reduced space spanned by the velocity POD : After inserting the reduced approximation of the velocity and pressure, we obtain the following ODEs which represent the reduced momentum equation: where each of B, B T , C and H is either a reduced order matrix or tensor. These terms are computed as follows: We note that the treatment of non-convective term in the reduced momentum equation above is done by the use of the third-order tensor C. This approach might lead to a substantial increase in the computational cost of solving the reduced problem when the number of the reduced velocity unknowns N u grows. This approach approximates the projection of the non-linear term ∇ · (u ⊗ u) onto the velocity POD mode φ i as follows: However, we propose to utilize a different approach in which we add a variable for the approximation of the convective term in the reduced momentum equation named c, where: the additional variable c represents the projection of the non-linear vector field ∇ · (u ⊗ u) (which can be retrieved/isolated from any velocity snapshot) onto the velocity POD modes The final form of the reduced momentum equation is then given bẏ An additional set of reduced equations can be obtained by the employment of either the supremizer enrichment approach [77,78] or by considering a reduced version of the Poisson Equation for Pressure (PPE) [22,79,80]. The supremizer approach computes artificial velocity-like modes which are termed the supremizers and then it enriches the original velocity POD modes with the newly computed supremizers in a way that ensures the fulfillment of a reduced version of the inf-sup condition. The additional velocity-like fields or the supremizers are computed by solving the following problems: After that the velocity POD space is enriched with the supremizer modes: The original velocity POD modes are divergence free by construction since they are just a linear combination of the velocity snapshots. This implies that the projection of the continuity equation onto the pressure modes before enriching the velocity POD space would have added no reduced equations. The newly added supremizer modes are not divergence free, therefore, the continuity equation could be utilized for obtaining an additional set of scalar reduced algebraic equations, as follows: The final POD-Galerkin ROM with the supremizer enrichment approach is given by with two additional reduced matrices M and P. The first matrix M is the mass matrix, which is not unitary anymore as a result of the additional supremizer modes. The matrix P is called the divergence reduced matrix. The entries of the additional matrices are given by: In the turbulent case, an additional term in the reduced momentum equation will appear. This term corresponds to the projection of the added turbulence modeling term in the momentum equation (in the RANS or the LES formulation at the FOM level) onto the velocity POD modes. The turbulent POD-Galerkin ROM with the employment of the supremizer enrichment approach is given by where h is the turbulent reduced variable. At this point, we describe the structure of the PINNs which are used to approximate the solution of the POD-Galerkin ROMs. The PINNs have as input the time and the parameter. The outputs are the reduced velocity, pressure, convective and turbulent terms denoted by a, b, c and h, respectively. The dimension of each of these output terms is N u except for the reduced pressure which is of N p dimension.
The starting point of the PINN training is the computation of output label data. This data consist of the L 2 projection coefficients for each of the FOM variables fields onto the velocity or pressure POD basis. The velocity L 2 projection coefficients are computed as follows: similarly the pressure coefficients are given by: Then, the FOM vectorial fields of the convective and turbulent terms are retrieved from the original snapshots of the velocity, pressure and the turbulent eddy viscosity. Then, one may compute the projection of these vectorial fields onto the velocity POD modes . This yields the output data for the vectors c and h which are also needed for the training of the PINN. The additional coefficients are given by: where S i t is the i-th snapshot of the turbulent additional term in the FOM formulation of the RANS or the LES turbulence modeling approach.
The input and output data matricesÃ ∈ R N s ×(q+1) andG ∈ R N s ×(3N u +N p ) are given byÃ The number of PINN outputs is more than the number of outputs in the POD-NN case [33] because of the introduction of the additional terms which appear in the reduced momentum equation as part of the PINN output. The current setting ensures that the dimension of the reduced problem scales linearly with the number of reduced variables of both velocity and pressure. The POD-Galerkin PINN ROM is then constructed by training deep neural networks whose input is time and parameter and whose output is the reduced velocity, pressure, convective and turbulent terms. The loss function which has to be minimized will be a weighted loss that takes into consideration the available data and the POD-Galerkin formulation imposed by the algebraic differential system in (21). The training procedure utilizes the training set {l n , r n = F(l n )} N s n=1 , where {l n } N s n=1 is the set of input vectors, {r n } N s n=1 is the set of output or target vectors and F is the function that relates the input to the output in the neural network. Each row of the matrices defined aboveÃ andG represent a sample (recall that the number of samples was N s = M * N T ). The input and output vectors are defined as l n =Ã(n, :) and r n =G(n, :), respectively. The overall loss function can be written as follows where and The two loss functions E 1 and E 2 enforce the reduced equations given by the POD-Galerkin model. The two weighting coefficients α 1 and α 2 are tuned heuristically depending on the problem but are also in general trainable. The above formulation gives the PINN the ability to estimate unknown parameters which are present in the POD-Galerkin formulation, such parameters might include for example the physical viscosity ν. The approximation of the time derivative of the reduced velocityȧ PINN which appears in R a is done with the help of automatic differentiation [81]. Automatic differentiation represents a crucial tool in PINNs, where it is capable of differentiating the neural networks with respect to their input coordinates and model parameters, the latter model parameters do not include only the weights and biases stacked in the vector w but also any other unknown physical quantity in the model. It is worth mentioning that the POD-Galerkin PINN ROM could incorporate physical constraints related to the velocity at the boundary. In fact, it is common to have inhomogeneous Dirichlet boundary conditions for the velocity field at specific parts of the boundary. This is typical in inlet-outlet problems such as the flow around a circular cylinder or the flow past a backward step. In these circumstances, an additional effort has to be made for the treatment of the inhomogeneous velocity boundary conditions at the ROM level. The common strategies for tackling this issue are the lifting function method [82][83][84] and the penalty method [85][86][87][88][89]. A brief description of the two methods will be given and then the strategy of incorporating them inside the PINN formulation will be addressed.
The lifting function method treats the non-homogeneous Dirichlet boundary condition through the introduction of a lifting function (or several lifting functions). In this method the inhomogeneity is transferred to the lifting function and a new set of velocity snapshots is created. The POD procedure is then performed on the newly created set of velocity snapshots (which have homogeneous Dirichlet boundary conditions). This results in velocity POD modes which have also homogeneous boundary conditions at the Dirichlet boundary. We remark that the lifting function must satisfy certain conditions such as being divergence free. This function is also added to the velocity POD basis. Unlike the lifting method, the penalty method does not involve any modification on the velocity snapshots. The penalty method enforces the inhomogeneous boundary condition by the introduction of an additional term in the reduced momentum equation, this term corresponds to the projection of a function (which has zero value everywhere except on the Dirichlet boundary) onto the velocity POD modes. The penalty method modifies the POD-Galerkin ROM as follows: where U BC is the velocity value at the Dirichlet boundary D , and τ is the penalization factor whose value is tuned heuristically. Higher values of τ generally tend to enforce the boundary conditions in a stronger fashion. The additional reduced operators D and E are defined as follows: In (35), we assumed to have only one inhomogeneous boundary condition at the Dirichlet boundary, however, a generalization for more than one condition can be done [90].
The POD-Galerkin PINN model can be adopted to the penalty method by including the additional constraint as a separate loss function denoted by E 3 (w) defined as follows: with As for the case of the lifting function method, the homogenization procedure leads to the transfer of the inhomogeneous Dirichlet boundary conditions from the velocity snapshots to the lifting functions. Therefore, the lifting velocity mode will have a normalized version of the velocity at the Dirichlet boundary at the inlet. 1 The reduced approximation of the velocity field is modified to the following one: where a = [a l , a u 1 , . . . , a u N u , a S 1 , . . . , a S N S ] ∈ R 1+N u +N S , here the upper-subscript in a u i and a S i refer to the reduced coefficients corresponding to the original velocity POD modes and the supremizer added ones, respectively. The coefficient a l is the one that corresponds to the lifting mode and represents as mentioned a normalized version of the velocity at D . In this case, the reduced operators R a and R b from (32) and (33) in the PINN formulation become: Hence, we observe that the PINNs are able to incorporate the velocity at the boundary in their formulation using both the lifting function and penalty methods, making it possible to learn these physical parameters through the training procedure and the optimization of the total loss function.

Numerical results
This section presents the application of the POD-Galerkin PINN reduced order models on three problems with unknown inputs or parameters. The first problem is the benchmark case of the flow past a backward step. In the first problem the POD-Galerkin PINN model is used for solving inverse and forward problems in the parameterized setting. The second one is the flow around a circular cylinder in a turbulent setting modeled by the RANS approach, in this problem we consider a situation of incomplete data, where the ROM will be used to infer an unknown input value and its corresponding missing output data. The last problem is the 3D flow around a surface mounted cube, this problem is a turbulent one with a large number of degrees of freedom, where turbulence is modeled using the LES turbulence approach. In the latter problem, the POD-Galerkin PINN ROM is used for the identification of the physical viscosity which is assumed to be unknown. This is done by assuming the presence of simulated data for the velocity, pressure and the eddy viscosity fields. The full order solver utilized is OpenFOAM ® (OF) [67] which is an opensource C++-based library for solving fluid problems with the finite volume method. At the reduced order level, the POD modes and the L 2 projection coefficients needed for the training of the PINNs are computed using the library ITHACA-FV [91] which is also based on C++, while the training of the neural network is done using the Python library TensorFlow V2 [92].

Steady case: The Flow past a backward step
We consider the application of the POD-Galerkin PINN based reduced order model to the benchmark case of the flow past a backward step. The problem is studied here in laminar steady setting with physical parameterization. In Fig. 1   The lifting function is then subtracted from the original snapshots which lead to the creation of a new set of homogenized velocity snapshots (velocity snapshots which have homogeneous Dirichlet boundary condition at the inlet). At this stage one can apply the POD procedure on the snapshots matrices of both the homogenized velocity and the pressure. The cumulative eigenvalues decay can be seen in Fig. 2, where the cumulative eigenvalue decay for the pressure is observed to be slower. The second step involves the computation of the supremizer modes which can be done by solving the supremizer problems corresponding to each pressure POD mode. The velocity POD space is then enriched by the supremizer modes. The convective part of each snapshots is then retrieved for later use during the training stage of the neural network.
At this point, we describe the numerical tests which will be done in this subsection. The first test has two objectives which are: • To estimate a physical unknown which is the velocity at the boundary, where we assume for this test that this value is unknown. The identification of this physical unknown is carried out by the POD-Galerkin PINN model.
• To approximate the velocity and pressure fields for test values of the physical viscosity ν which were not seen during the offline stage.
The two objectives are achieved simultaneously by training only once the PINN informed by the POD-Galerkin system using the training data and then performing a prediction task for the test data.
In the second test, the input test data of the physical viscosity mentioned above will be assumed to be unknown and then the POD-Galerkin PINN model will be used to solve the inverse problems associated with the latter test data while assuming that the velocity at the inlet is known.
It is important to mention that the velocity at the boundary is embedded in the POD-Galerkin PINN formulation. In fact, the first coefficient of the reduced velocity vector a, namely a l corresponds to a normalized version of the velocity at the inlet. The reduced approximation of the velocity in this example read as in (40).
The next step is to compute the reduced operators which appear in the following POD-Galerkin system that defines the reduced equations: The input of the PINN is the physical viscosity ν and its output is formed by the reduced velocity (except the lifting coefficient a l ), reduced pressure and the reduced convective term. The number of the PINN output variables is N o = 3N u + 3N S + N p + 1.
To solve the inverse problem involved and to approximate the relation between the physical viscosity and the reduced velocity and pressure, we put forward a physics informed neural network which has 10 layers with each layer containing 100 neurons with tangent hyperbolic activation. This PINN takes as output the reduced variables mentioned above and uses the data available from the offline snapshots together with the knowledge given by the system in (43) to approximate the unknown coefficient a l (or the velocity at the inlet). This is done by training the PINN with a loss function that takes into consideration the data given by the coefficients of the L 2 projections and also by constraining the output to follow the POD-Galerkin reduced equations. The output data is obtained by performing the projections in equations (22), (23) and (24). We would like to remark that both input and output values have been standardized to range of [0, 1] in order to make the neural networks learning task easier. The PINN loss function is written as where The values of the errors are reported for different initial values of the added weight a l , the PINN identified value of a l is reported in the last column. The PINNs are run for 30000 epochs with a learning rate of 1 * 10 −3 and In the above formulation, the weights and biases vector w contains all trainable parameters which include the scalar coefficient a l which is then learned during the training procedure of the PINN. As for the coefficients of the equations losses α 1 and α 2 , they are determined in a heuristic fashion or they can also be trained like the other weights of the network. The total number of trainable parameters in the PINN for the first test is 83627. The PINN is run for 30, 000 epochs with a learning rate of 1 * 10 −3 and with one batch per epoch (batch size is equal to the number of data points N s ). The two weighting coefficients α 1 and α 2 are set to 0.01. The physical parameter a l is initialized with zero value.
The first results are shown for the following number of modes N u = N p = N S = 5. This number of modes gives a total number of PINN outputs of N o = 26. The true value of a l is 2.7302, while the PINN has identified the value of a l to be 2.7291. The relative error in approximating a l is about 0.0409 %. As mentioned above a l has been initialized with zero value, however, we show also the results for different initial values of a l in Table 1, one can see that the PINN is not sensitive to the initial values of the unknown parameter a l . As for the forward task in the first test, we have generated 300 samples for ν which are equidistant samples in the range [0.05, 7]. Then another simulation campaign is launched for these viscosity samples in order to validate the PINN model. After the training of the PINN for the identification of a l , the PINN is used for approximating the output for the newly created set of input ν. Figure 3 show the results of the forward task, where one can see the validation results for the first, second and third components of the reduced variables of the velocity, pressure and convective terms versus the value of the viscosity. We recall that the values of the physical viscosity for which the latter figure is depicted were not used in the training procedure of the POD-Galerkin ROM or the PINN. The error committed in approximating the reduced variables in the mean squared sense is about 9.8921 * 10 −6 , we remark that the latter error is computed on the standardized variables. As for the operators errors for the test data we haveẼ 1 (w) = 0.001027 and E 2 (w) = 2.6525 * 10 −6 . The last quantitative values of the errors show that the PINN was able to generalize for unseen values of the parameter and at the same time constraining the results to satisfy the algebraic system. As for the training time of the PINNs, it ranges from 6 to 7.402 min using "Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz".
As a final result for the first case, we report an assessment of the approximation accuracy of the POD-Galerkin PINN ROM. Namely, we compute the relative L 2 error for velocity a b c Fig. 3 The results of the POD-Galerkin PINN predictions for the 1st, 2nd and 3rd components of the velocity, pressure and convective reduced coefficients for the first numerical test. The plots compare the reduced coefficients with the L 2 projection coefficients of the test velocity and pressure fields onto the corresponding POD modes. The red-dashed lines refers to the L 2 projection coefficients, while the blue-dots correspond to the reduced coefficients obtained by the PINNs. The coefficients are plotted versus the physical viscosity values at which the test data was generated. (a) The first reduced coefficients for velocity, pressure and convective terms compared to the ones obtained by the L 2 projection. (b) The second reduced coefficients for velocity, pressure and convective terms compared to the ones obtained by the L 2 projection. (c) The third reduced coefficients for velocity, pressure and convective terms compared to the ones obtained by the L 2 projection and pressure which, respectively, read as: where u r and p r are the reduced order velocity and pressure fields, respectively. The values of the relative errors for the velocity and the pressure are computed for the 300 test parameter values. The mean value of the u reduced velocity error is 0.3081 %, while the one of the pressure is 0.3459 %.
In the second test, we assume that the input values for the test data in the latter test are unknown. We aim at solving the inverse problems involved with the test data. To this end, we put forward a PINN based on the POD-Galerkin ROM with the following loss function: wherẽ wherel n is the n-th value of the unknown input physical viscosity. The PINN used in this second test has the same structure as the one utilized in the first test. The training parameters are also the same, where the PINN is run for 30000 epochs with learning rate of 1 * 10 −3 . The vector of unknown viscosity valuesl is considered as additional trainable weight of the PINN and is embedded into w. The identified values of the physical viscosity match to high degree of accuracy the true values. The mean squared error of the difference of the true input vector and the PINN-identified one is 0.00055. In a last numerical example, we would like to examine the advantage of having the reduced residual terms in the PINN formulation compared to the POD-NN approach which relies only on the reduced data. We consider the current test case with parameterized physical viscosity. Snapshots are obtained for parameter samples in the range [0.01, 2.1], we assume the presence of 100 snapshots for the velocity and the pressure for equally distributed values of ν in the aforementioned range. We assume the availability of test data for the fluid fields in the range [0.025, 2] which is contained in the training snapshots window. We construct the POD-NN and POD-Galerkin PINN models for the same test case. After the computation of the reduced velocity and pressure using both models, we reconstruct the reduced approximation of the full fields using the stored POD modes. Finally we have computed the L 2 relative error committed by the reduced approximation of both the POD-NN and POD-Galerkin PINN. The errors are computed as function of the parameter ν. The errors in (48) are computed for 320 test samples equally distributed in the range [0.025, 2]. We note that the effects of the parametric variation on the velocity and pressure fields could be noticed more apparently for lower values of ν in the considered sampling range. Therefore, we expect that the results of ROMs generalization for lower values of the parameter to be less accurate than for higher values given the uniform sampling. The PINN is trained by minimizing the data loss given by the reduced data obtained from the 100 snapshots and the reduced equations loss which is computed at random points in the parameter range. We consider the reduced setting of N u = N S = 8, Fig. 4 shows the results of this test for both the velocity and the pressure fields. The results as expected shows that both ROMs (the POD-NN ROM and the POD-Galerkin PINN ROM) have given less accurate results for small values of ν. However, it could be clearly seen that the POD-Galerkin PINN ROM has contained the error values especially for the pressure field in the range of ν < 0.3. Here, we mention that the maximum value of p for the POD-Galerkin PINN ROM is about 6.1709 % while the corresponding value obtained by the POD-NN ROM is 32.4325 %. The POD-NN ROM has in total 8 test samples for which the reduced pressure error exceeded 10 %.
The reason for having more accurate results by the PINN-based ROM could be attributed to the regularization effect that is brought to the ROM by taking into consideration the physics at the reduced level. The POD-NN has given better results only in  the region where the density of the samples was high enough and in which the parametric variation is least observed. We conclude that in general the POD-Galerkin PINN ROM is more useful than the POD-NN in case of limited data since the physical equations provide additional information.

The flow around a circular cylinder
In this subsection we address the case of having incomplete data for different input configurations/settings. The computational problem considered is the one of the flow around a circular cylinder. The problem is 2D turbulent one, where turbulence is modeled using the RANS approach. The domain of the problem is := [−4D, 30D] × [−6D, 6D]\B D (0, 0), where D = 1 m is the diameter of the cylinder. fig. 5 shows the grid used for simulating the problem using OpenFOAM, it also reports the boundary conditions for the velocity and pressure fields. The grid has around 18000 cells, the physical viscosity is 2.5 × 10 −4 m 2 /s.
We assume in this test that we are presented with an incomplete set of data for the fluid dynamics fields. This set of data contains full information about the fluid dynamics fields for some parameter values and contains a set of partial output data for an unknown input/parameter. In particular, we consider the parameter in this test to be the velocity at the inlet U in . The fluid dynamics fields for velocity, pressure and the eddy viscosity are available for three different known values of the parameter U in which are {1, 1.5, 2} m/s. Another set of fluid data for unknown parameter value is presented, the latter set contains partial information as it lacks the pressure fields. The inference tasks in this example are (i) to approximate the unknown velocity at the inlet U in , (ii) to recover the missing pressure field history and finally (iii) to compute the lift and drag forces acting on the surface of the cylinder which are dependent on both U in and the missing pressure data. The snapshots coming from the fluid simulations are covering at least 2 periodic cycles of the developed regime. For each parameter value, the regime frequency known as the vortex shedding frequency is different, resulting in various snapshots time windows for the three parameter data samples. Table 2 shows the snapshots time windows for each parameter sample and the corresponding time period. The number of snapshots per parameter sample is 201 giving a total of 603 snapshots. The POD is done on the velocity and pressure snapshots which yields the velocity and pressure POD basis, then an enrichment procedure is carried out by solving the supremizer problems. The ROM is then obtained via a Galerkin projection, in this example the penalty method is used for the enforcement of the inlet velocity at the reduced level.
At this stage, one may compute the input and output data matrices which will be used to train the PINN. The input of the PINN is the combination of time and the parameter. However, a non-dimensionalization conversion for time is needed in order to give meaningful information for the PINN. To this end, the non-dimensionalized time t is defined as t = tU in D = tU in . As for the output data, it consist in the following reduced coefficients: • The L 2 projection coefficients of the velocity snapshots onto the velocity POD modes, see (22). • The L 2 projection coefficients of the pressure snapshots onto the pressure POD modes, see (23). • The L 2 projection coefficients of the convective terms snapshots onto the velocity POD modes, see (24). • The L 2 projection coefficients of the turbulent terms snapshots onto the velocity POD modes, see (25).
The output of the PINN is the stacked vector of [a, b, h, c]. The POD-Galerkin DAE that models this problem is the one reported in (35). The matrices and vectors which appear in the latter DAE are computed during the offline stage. The partially known output fields for the unknown parameter are also projected onto the velocity POD space and the resulting L 2 coefficients are also used for training the PINN. The set of original weights and biases of the PINN denoted by w is then enlarged. This is done by introducing an additional set of weights denoted by w which contains trainable weights that correspond to the unknown quantities to be approximated. In more details, w contains the scalar weight w U in which is introduced for the approximation of the unknown velocity at the inlet. Also w encapsulates a matrix of weights denoted by w b whose i-th column w b i is the reduced pressure vector for the unknown pressure output data corresponding to U in at a fixed time instant. Hence, the number of additional weights is 201 * N p + 1 (we assume that the number of data points for the unknown input is N T = 201).
The PINN loss function E(w, w ) is defined as follows: as one can notice, the loss function incorporates four different types of loss, the first one E data (w, w ) is the fitting data loss for both the fully known input-output data and the unknown input and partially known output data. The second loss 3 i=1 E i (w) corresponds to the reduced equation losses evaluated only at the known input samples. The third loss is the same as the second one but computed at the unknown input data points. The final loss is defined as R Np , which penalizes the difference between the reduced pressure output of the PINN computed at the unknown input data points and the weights w b . The last loss component will ensure that the reduced missing pressure will be recovered through the additional weights. It is worth mentioning that initial values of w U in is set to be the media of U in of the known data, while a full zero matrix is used as a starting point for w b .
The PINN used in this numerical test has 5 layers and 64 neurons per layer with mixed activations (tanh and SIREN). The Adam optimizer is used for solving the optimization problem. The PINN is run for 3 × 10 5 epochs, we show in Fig. 6 the evolution history of the inlet velocity weight during training. One can see that the PINN approximated inlet velocity has converged to a value which is close to the true value of 1.25 m/s. In fact the weight w U in at the end of the training was 1.25029 which implies that the approximated Reynolds number is 5001.178 while the real one is 5000. The reduced order settings for the last result are N u = N p = N S = 15. Fig. 7 The first four pressure reduced coefficients for the unknown input data As for the reconstruction of the pressure fields, the additional matrix of weights w b which correspond to the reduced pressure of the unknown input data has been optimized in the PINN training process. Figure 7 depicts the time history of the first four reduced coefficients in the ROM approximation of the missing pressure fields and it shows also the corresponding four coefficients obtained by the L 2 projection of the pressure data onto the pressure POD modes.
To assess the accuracy of the approximation of the missing pressure fields, we compute the error p in (48) between the inferred PINN pressure fields and the FOM ones for the 201 different snapshots. For the reduced model with N u = N p = N S = 15, the values of the maximum and mean relative pressure error are 3.1396 % and 1.4457 %, respectively.
Besides solving the inverse problem, we are interested in having an accurate reconstruction of the time-history of the cylinder drag and lift coefficients. These coefficients come from the fluid dynamics forces F acting on the surface of the cylinder which depend locally on the pressure and velocity fields as follows: If F l and F d are the forces components acting on the surface of the cylinder in the lift and drag direction (the lift direction is the one perpendicular to the flow, while the drag direction is horizontal to the flow), respectively, then the non-dimensionalized drag and lift forces coefficients denoted by C d and C l , respectively, are given by: where ρ is the fluid density and A ref is the reference area. The evaluation of the forces at the reduced level is done in a way that respects the full decoupling of the offline and the online stages, for more details we refer the reader to section 2.8 in [90]. The PINN is used to recover the lift and drag coefficients by performing forward test for all time values at which we had recorded the forces during the FOM simulation. Then the PINN forces approximation is used together with the PINN-inferred inlet velocity to obtain the reduced lift and drag coefficients signals. The results of this test are shown in Fig. 8. In order to have a quantitative evaluation of the accuracy of the lift and drag coefficients approximation, we compute the following L 2 relative errors where C l (t) and C d (t) are the signal functions corresponding to the FOM lift and drag coefficients, respectively. As for C l r (t) and C d r (t) they are the corresponding ROM signals, and [T 1 , T 2 ] is the time interval in which the error is sought. The error values for the reduced model with N u = N p = N S = 15 are 0.8551 % and 0.5699 % for lift and drag, respectively. This shows that the POD-Galerkin PINN ROM has been able to reconstruct important CFD performance indicators such as the lift and drag coefficients despite the uncertainty in presence.
In the last results, we have considered the presence of incomplete output data, nevertheless the set of output data was known on the whole internal domain. Now we assume that we have only local data points given by the vector forces acting on the cylinder in (53). This makes the inference problem more difficult to solve because of the locality of the data presented. In spite of that, the POD-Galerkin PINN ROM could be used to solve the inference problem using the data points of the forces. This can be carried out thanks to the physical (reduced) equations which correspond to (53) and which are then incorporated in the PINN loss function. We show the evolution of the PINN approximation of the U in in Fig. 9 for this ultimate test.

The flow around a surface mounted cube
The case considered in this subsection is the one of a flow past a cubic obstacle. The problem is considered in a turbulent setting in three dimensions. Turbulence is modeled using the LES strategy, in particular, the SGS turbulence model used is the one-equation eddy viscosity model named "dynamicKEqn" in OpenFOAM. This model is proposed in [75] as a continuation of the SGS model presented in [74]. The computational grid used in the simulations is depicted in Fig. 10, where one can see In this numerical test, the physical viscosity ν is assumed to be unknown, the goal is to identify its value with a high degree of accuracy and to approximate the nondimensionalized forces coefficients coming from the lift and drag forces acting on the surface of the cubic obstacle.
The fluid problem is simulated for a timespan which is long enough to observe stable values of the time-average of certain output quantities. These quantities include the mean and the Root Mean Squared (RMS) values of the non-dimensionalized forces coefficients coming from the lift and drag forces acting on the surface of the cubic obstacle [see (53) and (54].
This problem has been studied for the same value of the Reynolds number mentioned above in [93][94][95]. In the latter studies, RANS and LES simulation were carried of for the approximation of the values of the lift and drag coefficients of the cubic box. The nature of this problem is characterized by having chaotic turbulent response for the velocity and pressure field profiles. Thus, in order to have an accurate approximation of the mean drag and lift coefficients, at least 100 non-dimensionalized time units t were simulated, where t = tU in L . Hence, the NSE are simulated for 100 s. The resulted graph for the drag coefficient time-history in the build-up phase is shown in Fig. 11a. The mean value of the drag coefficient across the build-up phase is 1.4829. The RMS values of the mean subtracted drag coefficients signal is 0.0648.
After completing the build-up phase, snapshots are taken for the construction of the reduced order model, where the simulation is resumed for another 50 s. Snapshots are acquired each 0.25 s which results in a total of 201 snapshots. The time-history of the drag coefficient during the offline snapshots time-window is depicted in Fig. 11b. The POD method is then applied on the snapshots matrices of velocity and pressure. Figure 12 shows the first two POD modes of the velocity and the pressure. After the computation of the POD modes of the velocity and pressure, one may solve the supremizer problems in order to obtain the supremizer modes which are then added to the original velocity POD basis.
At this point, the computation of the output data of the PINNs is carried out. The penalty method is used for the enforcement of the inhomogeneous Dirichlet boundary condition at the inlet, therefore, the POD-Galerkin DAE that models this problem is the one reported in (35). The matrices and vectors which appear in the latter DAE are computed during the offline stage.
In order to approximate the value of the unknown physical viscosity, we put forth deep neural networks which have time as the input and the stacked vector of [a, b, h, c] as their output. The neural networks used in this test have 7 layers with each layer containing 200 neurons. The activation function used at each neuron is the tangent hyperbolic function. The Adam optimizer is used for training the neural networks with a decaying learning rate of initial value of 1 × 10 −4 , the optimization is run for 5 × 10 5 training epochs. The loss function which has to be minimized during the training procedure of the PINNs is the following: where E data (w), E 1 (w), E 2 (w) and E 3 (w) are defined as in (29), (30), (31) and (38), respectively. An additional trainable variable which corresponds to the physical viscosity and denoted by ν PINN is added to the set of the PINNs tunable weights w. This additional trainable parameter of the neural network is present in the loss function through E 1 (w). Therefore, the Adam optimizer will be able to compute the gradient of the total loss function E(w) with respect to ν PINN and as a result optimize its value. The initial value of ν PINN is assumed to be 10 The PINN proposed in this work relies on approximating the nonlinear convective term as an additional auxiliary variable in the neural network structure. Another approach is based on transforming the nonlinear term into a quadratic form as done in [51]. The last approach implies that one needs to compute a third order tensor C [see (10)] whose dimension is N u + N S . The number of terms which has to be computed for each reduced setting will be (N u + N S ) 3 . Consequently, the cost of computing this tensor for large number of reduced modes is considered significant even if such cost is an offline one. For example, for the case of N u = N S = 60, the computation time is around 7.97 hours when using 24 CPUs and it increases even to 37.27 hours in the case of N u = N S = 100. We would like to compare the accuracy of the PINN based on the latter approach in approximating the physical viscosity in the current test case with the one proposed here (the PINN based on incorporating the reduced convective term as part of the neural network outputs).
The results of the approximation conducted by the PINN proposed in this work for different number of reduced modes are shown in Fig. 13. Figure 14 presents the comparison of the results obtained by the two PINNs which differ in the way the nonlinear term is approximated. The last figure shows that the PINN based on the approximation of the nonlinear term as an additional reduced output has obtained accurate approximation when 60 modes were used in the construction of the ROM for each reduced variable, where the relative error in approximating ν is as low as 1 %. On the other hand the PINN based on the quadratic approximation assumption of the nonlinear term has not yielded an accurate approximation where it converged to a value of 7.0213×10 −5 . The inaccuracy of the last result could be attributed to the effect of linear-based reduction given by the POD on the approximation of nonlinear quantities at the reduced level. In the presented approach such an effect is excluded since the reduced nonlinear term is considered an additional output of the neural network with labeled data available for the training.
The POD-Galerkin PINN ROM is also used for the approximation of the lift and drag coefficients of the cubic obstacle. The reduced forces coefficients are then compared to the FOM ones which are recorded during the full order simulation. In order to obtain the reduced forces, we perform a set of forward computations using the trained PINNs for the time values at which the FOM has recorded the forces. The number of time steps performed by the FOM solver during the acquisition of the offline snapshots is 28370. The results of the PINNs forward computations are the reduced velocity and reduced pressure vectors at the aforementioned 28370 time instants. We remark that these computations are carried out in a computational time which is significantly low, yielding high speed-up factors. The results of the forward application of the PINNs are shown in Fig. 15, where the first and second coefficients of each reduced variable are plotted, the figure depicts both the L 2 projection coefficients and the ROM coefficients. The last figure shows that the PINNs forward predictions are matching the original L 2 projection coefficients curves to a high degree despite the presence of uncertain parameter in the ROM formulation. The reduced velocity and pressure outputs will be used together with the reduced forces matrices which were computed during the offline stage to yield the reduced three dimensional forces acting on the surface of the box. Then, the reduced lift and drag coefficients are computed. In order to have a quantitative evaluation of the accuracy of the lift and drag coefficients approximation, we compute the L 2 relative errors [see (55)]. The errors are computed for different number of reduced modes, Fig. 18 plots the error values versus the number of modes used for the ROM construction. The last plot shows that the error in approximating the drag coefficients reaches 0.3531 % when N u = N p = N S = 100 modes, while the lift coefficients error is 0.1523 % for the same setting.
We report a study of the computational time needed for the completion of each task or step involved in the POD-Galerkin PINN ROM. This study is conducted in Table 3 for different numbers of the reduced modes. Firstly, we mention the computational time taken by the FOM solver for running the problem on 24 CPUs (CPU model "AMD EPYC 7302 16-Core Processor @ 1498MHz") that is T Off = 11, 460 s or approximately 3.1 hours. In the second column of Table 3, we report T proj,DAE which is the time required for projecting the equations and computing the DAE reduced vectors and matrices. The third column lists the time taken for the computation of the PINNs outputs data, this time is denoted by T proj,data . We remark that the computational cost corresponding to T proj,DAE is present also in the case of the intrusive POD-Galerkin ROM, while the cost that corresponds to T proj,data is only present in non-intrusive or hybrid ROMs such as the POD-Galerkin PINN ROM. The most significant cost is the one reported in the fourth column, where one can see the time taken for training the PINNs for 5 × 10 5 epochs on the Graphics Processing Unit (GPU). Table 3 also details in the fifth column the cost for the forward computations carried out by the PINNs for the approximation of the forces denoted by T PINN,F (GPU cost). Finally the speed-up SU is recorded in the last column, where this value is calculated as follows SU = T Off T PINN,F +T Onl,Forces , with T Onl,Forces being the time required for assembling a b Fig. 15 The results of the PINNs predictions for all reduced variables, the reduced order spaces are constructed with N u = N p = N S = 60. The plots compare the reduced coefficients with the L 2 projection coefficients. The red-dashed lines refers to the L 2 projection coefficients, while the blue-dots correspond to the reduced coefficients obtained by the PINNs. (a) The first reduced coefficients for velocity, pressure, turbulent and convective terms compared to the ones obtained by the L 2 projection. (b) The second reduced coefficients for velocity, pressure, turbulent and convective terms compared to the ones obtained by the L 2 projection a b c d the reduced forces in the online stage and whose value is about 7 − 10 * 10 −5 s. As it can be seen from the latter study, the use of deep neural networks in reduced order modeling results in a substantial increase in the offline cost given by the time needed for the PINNs training in this setting T PINNs,T . However, the nature of the inverse problem at hand requires this training procedure for the approximation of the unknown parameter. In addition, the POD-Galerkin PINN ROM compensates the high offline-cost by giving in return high speed-up (SU) value which reaches as high as 10 6 . These speed-up factors are not easily attainable in fully intrusive POD-Galerkin ROMs.
To conclude, it is evident that the POD-Galerkin PINN ROM has provided accurate approximation for parameter estimation problems. At the same time, the presence of unknown parameter has not affected the quality of the approximation of important outputs such as the lift and drag forces acting on the surface of the box. However, the results for the study of the quality of the PINNs approximation for the forward and inverse cases illustrate that the quality of the inverse approximation is not so much damaged when only 10-20 modes are used for each reduced variable. Unlike the inverse case, the forward approximation of the lift and drag coefficients needs substantially larger number of reduced modes. Table 3 The computational time taken by different tasks needed for the implementation of the POD-Galerkin PINN ROM (i) the second column reports the time taken for performing the projection of the equations (computing the reduced quantities in the reduced DAE system) in parallel setting using 24 processors, (ii) the third column details the CPU time (also using 24 processors in parallel) consumed for the computation of the PINNs output data represented by the L 2 projection coefficients of the different variables N u = N p = N S T proj,DAE in s T proj,data in s T PINNs,T in min T PINN  , where T Off = 11, 460 s is the CPU time needed for simulating the FOM for the snapshots time window using 24 processors in parallel, and T Onl,Forces is the time consumed for assembling the reduced forces in the online stage which has taken around 7 − 10 * 10 −5 s in the current experiments. The table shows the computational costs for different sizes of reduced modes

Conclusions
We have presented a reduced order model which is designed to learn unknown input parameters or physical quantities for fluid problems governed by the Navier-Stokes equations.
The proposed model employs the POD for the generation of the reduced order space and then utilizes Galerkin projection for the construction of the reduced order system. The solution of the reduced order system is obtained by the use of physics-informed neural networks (PINNs). The PINNs have as input time and/or parameters, while their output is the combined vector of the reduced velocity, pressure, turbulent and convective terms. The training procedure of the PINNs is carried out by minimizing a loss function which is a combination of the data loss and the reduced equations losses. Unknown physical parameters which appear in the POD-Galerkin DAE could be approximated using the PINNs by exploiting their feature of introducing additional trainable weights. The PINNs optimizer was then used to compute the gradient of the loss function with respect to the additional trainable weight and consequently optimize its value.
The proposed POD-Galerkin PINN ROM has proven being accurate in solving inverse problems involving unknown physical quantities such as the physical viscosity. At the same time, this ROM is able to reconstruct fluid dynamics outputs with high degree of accuracy despite having input uncertainty. Three test cases have been used for the validation of the proposed model. The first case is the steady flow past a backward step, the second case is the one of a circular cylinder immersed in horizontal flow, while the last one is the unsteady flow past a surface mounted cube. The steady case was considered in laminar setting while the unsteady cases are turbulent ones with Reynolds number up to Re = 40, 000. The POD-Galerkin PINN ROM has given very promising results when it comes to the approximation of the unknown parameters and also for the prediction of the fluid dynamics outputs, both for the non-turbulent and the turbulent case.
The approach presented in this work is useful in several different circumstances such as a situation in which parameterized data is presented with the data being incomplete or only partially known. The approach also can be used for the inference of unknown constant quantities such as the PDE identification tasks where physical constants (that define the PDE operator) are not known. The approach may become limited when the data is presented for only one setting/configuration (non-parameterized case) and/or when the unknown variable is a spatial field for which no data is available. We also highlight that the approach requires the computation of turbulent and nonlinear terms from each collection of fluid fields snapshots. We understand that the last requirement could be difficult to meet in certain settings.