A DeepONet multi-fidelity approach for residual learning in reduced order modeling

In the present work, we introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets. Reduced models provide a real-time numerical approximation by simplifying the original model. The error introduced by the such operation is usually neglected and sacrificed in order to reach a fast computation. We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions. We emphasize that the framework maximizes the exploitation of high-fidelity information, using it for building the reduced order model and for learning the residual. In this work, we explore the integration of proper orthogonal decomposition (POD), and gappy POD for sensors data, with the recent DeepONet architecture. Numerical investigations for a parametric benchmark function and a nonlinear parametric Navier-Stokes problem are presented.


Introduction
Multi-fidelity (MF) methods emerged as a solution to deal with complex models, which usually need a high computational budget to be solved [58].Such a framework aims to exploit not only the so-called high-fidelity information, but also the response of low-fidelity models in order to increase the accuracy of the prediction.This feature plays a fundamental role, especially for outer loop applications such as uncertainty propagation and optimization, since it allows to achieve good accuracy without requiring evaluating the high-fidelity model (typically expensive) at every iteration.Thus, its employment is widespread for optimization purposes, and among all the contributions in literature, we highlight the successful application to naval engineering problems [13,12,69], to multiple fidelities modeling [28], and in the presence of uncertainty [54].All these cases, as well as many others, build the correlation between the different fidelities by involving Gaussian process regression (GPR).Another approach with nonlinear autoregressive schemes is described in [59,62], whereas in [64] a possible extension for high-dimensional parameter spaces is investigated.Recently, an alternative to such a probabilistic framework is offered by neural networks, where the mapping between the low-fidelity model and the high-fidelity one is learned by the network during the training procedure [79,51,33].Among the different types of architecture, DeepONet [44,41] has been proposed to approximate operators and it has been successfully applied to MF problems in [45,37].It has also been successfully used to create a fast PDE-constrained optimization method in [73].Another type of architecture that has been successfully applied to multi-fidelity data is the Bayesian neural network [50], resulting in a framework robust to noisy measurements.We also highlight the employment of multi-fidelity techniques for uncertainty quantification.We cite [35,34] for a Bayesian framework capable to deal with model discrepancy using different fidelities, whereas we refer to [27] for an analysis of the trade-off between high-and low-fidelity data in a Monte Carlo estimation.
Reduced order modeling (ROM) [10,19,65] is a family of methods that aims at reducing the computational burden of evaluating complex models.Instead of combining data from heterogeneous models, ROM builds a simplified model, typically from some high-fidelity information.Also, in this case, the capabilities of ROM led to its diffusion in several industrial contexts [52,69,67], especially for optimization tasks [11,3,78,70,25,22] or inverse problems [31,38].In the ROM community, proper orthogonal decomposition (POD) is one of the most employed methods to build the reduced model [60,61,7,8,9].Given a limited set of high-fidelity data, POD is able to compute the reduced space of an arbitrary rank which optimally (in a least squares sense) represents the data.In the last years, its diffusion led to several variants including shifted POD [56,63], weighted POD [18,71], and gappy POD [26,75,17,46].This latter exploit a compressive sensing approach [14,16,40], in order to use only a few information at certain locations of the domain (sensors) to compute the approximation.A generalization of gappy POD can be found in [1], where linear stochastic estimation allows the reconstruction of the linear map between the available data and the system state by an l 2 minimization.A novel approach where such a relation between sensor data and the reduced state is approximated in a nonlinear way employing neural networks can be found in [53].
In the present contribution, we explore the possibility of coupling these two methodologies, MF and ROM, to enhance the accuracy of the model.ROM indeed creates a simplified model from a few high-fidelity data.Such approximation can be considered the low-fidelity model, because of the projection error introduced by the ROM.In this context, MF could be adopted in order to find the correlation between the original model and the ROM one, resulting in a more precise prediction.We can therefore exploit twice the collected high-fidelity data: initially, it is used to build the reduced model, then again during the computation of the MF relation.From this point of view, the proposed improvement does not need any additional high-fidelity evaluations.Here we take into consideration the POD with interpolation or the gappy POD as low-fidelity modeling techniques and the DeepONet to learn the residual.POD with interpolation [74,68,29,24] is applied here for a completely data-driven approach, while gappy POD is used in order to make the pipeline applicable even for sensor data.The framework aims then to exploit the capability of POD models for linear prediction, adding the nonlinear term through the DeepONet, which can be viewed as a data-driven closure model.See [76] for another data-driven modelling approach to close ROMs, while for other recent works that propose nonlinear model order reduction, we cite [4,2,39,66,30,49,42].
The manuscript is organized as follows.In Section 2 we present the end-to-end numerical pipeline, with a focus on POD, gappy POD, and DeepONets.We continue in Section 3 by showing the numerical experiments, and finally we conclude with Section 4 by summarizing the results and drawing some future extensions.

Methods
This section is devoted to present the numerical methods used within the proposed approximation scheme, together with the methods used for comparison.We describe their integration in order to provide a global overview, then we discuss in the following sections the algorithmic details.
Proper orthogonal decomposition (POD) is a widespread technique providing a linear model order reduction, particularly suited to deal with parametric problems [65,48,20].Such a representation is computationally very cheap to acquire, however, it suffers from the linear limitations of POD that may decrease its accuracy, especially when dealing with nonlinear problems.
We are interested in efficiently computing a parametric field u(µ) with u : P → V, where P is the parametric space, V a generic norm equipped vector space with dim(V) = n.POD-based ROMs compute the approximation u POD (µ) such that: where r : P → V is the projection error introduced by the model order reduction, which we assume here to be dependent on the parameter.In a classical POD framework, this residual r is usually neglected, due to its marginal contribution.In the present contribution, we aim instead to learn it by means of machine learning techniques, in order to improve the accuracy of the final prediction.Artificial neural networks (ANNs) can be used to model it, thanks to their general approximation capabilities, learning it by exploiting the snapshots already pre-computed to build the ROM.In particular, dealing with parametric problems, we exploit the DeepONet architecture to learn the residual.The light computational demand to infer the DeepONet enables a nonlinear but still real-time improvement of the POD model, at the cost of additional training during the offline phase.
The only input needed by the proposed methodology is the numerical solutions database {µ i , u(µ i )} N i=1 computed by sampling the parameter space and exploiting any consolidated discretization method (e.g.finite element or finite volume method).These snapshots are combined in order to find the POD space, which can be used for intrusive or non-intrusive ROM.We explore in this contribution only the non-intrusive (data-driven) approach, while future works will study the application to POD-Galerkin contexts.We investigate two options for the non-intrusive ROM: • POD with radial basis functions (POD-RBF) interpolation, which enables the prediction of new solutions (for new parameters) by means of the above-mentioned interpolation technique.In this case, the ROM takes as input the actual parameter providing as output the approximated solution.
• Gappy POD, which allows us to compute the approximated solution by providing only some sensor data thanks to a compressing strategy.
Once the ROM is built, we can exploit it to compute the low-fidelity representation of the original snapshots by passing the corresponding parameters (or sensor data).The high-fidelity and lowfidelity databases are then used to learn the difference between them through the DeepONet network with the final aim of generalizing such residual even to unseen parameters and improving the final prediction.It is important to note that typically the space V is obtained by discretizing a generic R d space.Depending on the complexity of the equation to solve and on the target accuracy, this kind of space can exhibit a high number of degrees of freedom.
Approximating the error over such a high-dimensional space with a neural network leads to two major issues: i.) the number of the neurons in the last layer is equal to the number of degrees of freedom of the space V, resulting in a model too large to treat; ii.) the parameter-to-error relation becomes too complex to be efficiently learned.Thus we extract the spatial coordinates of the degrees of freedom of V. Since we know the error (the difference between the original snapshots and the POD predictions) in any of these coordinates, we can arrange the data in the format {(x i , µ j , r(µ j ) i ) | x i ∈ V ⊂ R d , µ j ∈ P, r(µ j ) i ∈ R} where i = 1, . . ., n and j = 1, . . ., N , to isolate the spatial and parametric dependency of the error.We can use such a dataset to learn the scalar error r net : R d × P → R given the parametric and spatial coordinates.In this way, the network maintains a limited number of output dimensions, improving the identification of spatial recurrent patterns.The loss function which is minimized during the training procedure is then: where r NN (µ) is not the high-dimensional output of a single network evaluation, but the array containing the result of the network inference for any spatial coordinates belonging to the discretized space such that r NN (µ) ≡ r net (x 1 , µ) r net (x 2 , µ) . . .r net (x n , µ) , where x i ∈ R d for i = 1, . . ., n.In the case of gappy POD, it is important to note that the DeepONet takes as input the sensors data and not the actual parameters.We emphasize that the DeepONet training does not need any additional high-fidelity solutions besides those already collected for the POD space construction.For the prediction of solutions for new parameters, the non-intrusive POD model and the DeepONet are finally queried, as sketched in Figure 1.POD returns the low-fidelity (linear) approximation by providing the test parameter or sensor data, while the neural network returns the nonlinear residual.In some sense, this pipeline aims to exploit the advantages of the consolidated POD model, but at the same time improves it by adding a nonlinear term.So it can be also seen as a closure model.

Proper Orthogonal Decomposition for low-fidelity modeling
POD is a consolidated technique widely used for model order reduction.In this section, we briefly introduce how to compute the POD modes, and we devote section 2.1.1 to present the gappy POD variant in detail.
The method consists of the computation of the optimal reduced basis to represent the parametric solution manifold through a linear projection.Let u i ∈ R n be the discrete solution corresponding to the i-th parameter, and U = [u 1 , . . ., u N ] ∈ R n×N be the snapshots matrix, whose columns are the solution vectors.We want to find a linear approximation such that: where ψ i ∈ R n are the vectors comprising the reduced basis, the so-called modes, and a i := a 1 i , a 2 i , . . ., a r i ∈ R r are the coordinates of the corresponding solution at the reduced level, called modal coefficients or latent variables.These reduced variables are obtained by a projection of the solution snapshots onto the modes.The POD modes can be obtained from the matrix U in different ways: by computing its singular value decomposition (SVD), or by decomposing its correlation matrix [72].Moreover, all the modes have a corresponding singular value, which represents their energetic contribution.By arranging these modes in decreasing order (with respect to the singular values), we can express the original system with a hierarchical basis, from which we can discard the less meaningful modes.The energy criterion based on the singluar values decay reads as where σ j is the j-th singular value, and ϵ is a tolerance, usally set ≥ 9.9 × 10 −1 .In other words, by providing some samples of the solution manifold, POD is able to detect correlations between the data and reduce the dimensionality of these discrete solutions.This the approach becomes a fundamental tool for solving parametric partial differential equations (PDEs) in a many-query context, mainly due to the high-dimensional discrete spaces involved.
The POD space can be exploited in a Galerkin framework, by projecting the differential operators, or in a data-driven fashion by coupling it with an interpolation (or regression) technique.In this case, the database of reduced snapshots {µ i , a i } N i=1 is used as input to build the mapping I : P → R r such that I(µ i ) = a i for i = 1, . . ., N , which is used for interpolating the modal coefficients for any new parameter.Depending on the chosen regression technique, the equality could not hold in principle, and we have I(µ i ) ≈ a i .Finally, exploiting such a mapping, we have the possibility to query for the modal coefficients at any test parameter belonging to the space P and finally exploit the POD modes to map back the approximated solution in the original high-dimensional space.

Gappy POD for sensors data
The main assumption for using gappy POD is to have access to only some sensor data.These sensors are placed at specific locations, given by the projection matrix, or point measurement matrix, C ∈ R c×n , with c ≪ n, which contains 1 at measurements location and 0 elsewhere.Using the canonical basis vectors of R n it takes the following form for some indices The measurements ũ * ∈ R c of a generic full state vector u * ∈ R n are thus given by ũ * = Cu * .
If we now consider a parametric framework we can collect the parameter-solution snapshot pairs {µ i , u i } N i=1 , where µ i ∈ P ⊂ R p , and u i ∈ R n is the corresponding full state.We arrange the snapshots by column in U as We take the r-rank SVD of the snapshots matrix U and compute the POD modes Ψ r , so we can project the full states to their low-rank representation a ∈ R r×N : In the classical POD setting, where we deal with the full snapshots, we would just use the modal coefficients matrix a to describe the solution manifold.For the gappy POD, instead, we have to consider the point measurement matrix.So, putting all together we have where Ũ is the matrix containing the sensors measurements {ũ i } N i=1 arranged by columns, as done for the snapshots matrix.For a generic snapshot ũi we have: where ψ k are the columns of Ψ r , and a k i are the modal coefficients, that is the i-th column of a.A possible solution to find the modal coefficients is to minimize the residual in a least-squares sense using the L 2 norm over the sensors locations which means considering the following quantity [15] supp There are many ways in the literature to select the locations of the sensors: optimal sensor locations that improve the condition number of CΨ [75,47], which are robust to sensor noise, the sample maximal variance positions [77], or using information contained in secant vectors between data points [55], for example.In this work, we are going to use the sparse sensor placement optimization for reconstruction described in details in [47] and implemented in PySensors [21].The main idea is to find C that minimizes the reconstruction error using the modes Ψ r as in the following where the symbol † stands for the Moore-Penrose pseudoinverse.

DeepONet for residual learning
DeepONet [44] is a neural network architecture able to learn nonlinear operators.Referring to the original work for all the details, we emphasize its architecture composed by two separate networks whose final outputs are multiplied to obtain the final DeepONet outcome.The two networks, called trunk and branch, can be any available architecture -e.g.convolutional network, graph network -.In this work we consider feedforward networks (FFNs).The networks are trained simultaneously during the learning loop: the input is indeed divided into two independent components, x ∈ R Nx and y ∈ R Ny , which feed the two networks N N x and N N y , respectively.The outputs N N x (x), N N y (y) ∈ R Np are finally multiplied to approximate the operator G: We underline that the choice of the two networks must satisfy the dimensional constraint: they have to produce outputs with the same number of components such that it is possible to compute their inner product.The scheme in Figure 2 graphically summarizes the structure of the DeepONet.In this work we adopt it to approximate the residual R(x)(µ) = u(x, µ)−u POD (x, µ) in a multifidelity approach.We can think at the mapping between the low-fidelity model (the POD/gappy POD) and the high-fidelity model as a parametric operator R(x)(µ).This operator is numerically approximated by means of the DeepONet, using as dataset the low-and high-fidelity databases already computed.This architecture has demonstrated a great capability in fighting overfitting issues [44], allowing to generalize the residual even with a limited set of information.

Numerical results
In this section we present the numerical results obtained by applying the proposed numerical framework to a simple algebraic problem and to a Navier-Stokes problem in a 2D domain.We are going to compare the proposed method with the POD model, the gappy POD model, and with the pure deep learning approach by using DeepONet, aiming for a fair comparison with two state-of-the-art techniques for (linear and nonlinear) data-driven modeling.All the computations are performed using PyTorch [57] for the artificial neural networks, EZyRB [23] for the POD with interpolation and gappy POD calculations.To solve the Navier-Stokes equations with the finite element method we use FEniCS [43].

Algebraic parametric problem
The first test case is a simple benchmark problem inspired by [6].The high-fidelity parametric function f H : Ω × P → R is defined as where The first step is to compute the function value in some points in order to build the high-fidelity database.We use different sampling strategies for the spatial and parametric domain: • we collect n = 500 equispaced samples {x s i } n i=1 in Ω; • we collect 36 samples using the latin hypercube sampling, plus 4 additional samples at the corners of the domain, for a total of N = 40 points {µ s i } N i=1 in P .We thus compose the snapshots matrix, varying the parametric coordinates along the columns as follows: Regarding the residual learning, we use the DeepONet model structured as follows: the spatial network (branch) is composed by 2 inner layers of 30 neurons each, with the softplus activation function, which is the smooth version of the Rectifier Linear Unit (ReLU) [32]; the parametric network (trunk) counts 2 inner layers with 30 neurons and the softplus function.The output layer has 30 neurons for both networks, without applying any additional function at this layer.The learning rate is equal to 5 × 10 −3 , the L 2 −regularization factor is 1 × 10 −4 .
We propose a comparison between the MF approach, POD, and DeepONet in terms of accuracy on test parameters with a fixed input database of solutions.We use different POD spaces in the comparison by selecting an increasing energetic threshold for the modes selection, aiming to analyze the difference in the error by varying the accuracy of the original POD model before getting improved by MFDeepONet 1 .We emphasize that no preprocessing or data centering is performed on the snapshots matrix, resulting in the first mode representing a large amount of energy.This corresponds to the minimal tolerance (0.99) in the experiments below.Regarding the DeepONet architecture, we employ the one described above also to learn the target function without the MF setting, such that the network learns the actual unknown field instead of the residual.In this way, we want to investigate the benefit of using the two methodologies (POD and DeepONet) in a multi-fidelity fashion instead of only separately.We measure the relative error on an equispaced grid of 20 × 20 parametric points.
POD with energy threshold 0.99.For the POD model, we select an energy threshold ϵ = 9.9 × 10 −1 corresponding to N = 1 mode and radial basis function (RBF) interpolation to approximate the map between the parameters and the latent variables.The training for DeepONet and MFDeepONet lasts 1.0000 × 10 4 epochs.Figure 3 shows a quantitative comparison of the three investigated techniques, presenting the relative error in the whole parametric domain, the high-fidelity samples, and the error distribution.The last plot (bottom right corner) graphically shows the technique which best performs in all the tested parameters.
In this experiment, the proposed methodology outperforms both POD and DeepONet.The relative error distribution suggests that mixing the techniques helps in terms of accuracy.Indeed, even if the error shows a greater variance, the MFDeepONet is able on average to achieve the best precision among the tested methods, resulting the better approach in almost all the parametric domain.We can also note that a direct correlation between the samples location and the error distribution is not visible, confirming the DeepONet capabilities in terms of generalization and making the proposed framework effective also during the testing phase. 1 For the remaining of this work, with MFDeepONet we are going to refer to the proposed technique.POD with energy threshold 0.999.In this experiment, we replicate the previous settings with the exception of the new energy threshold for POD modes and a higher number of epochs for the machine learning models (DeepONet and MFDeepONet).Here we increase it to ϵ = 9.99×10 −1 (N = 6 modes), addressing a more accurate original model, and balancing it with longer training.
Figure 4 illustrates the error obtained after a 2.0000 × 10 4 epochs training.The results of the previous experiments are confirmed, even if with a lower overall benefit.The error distribution in the parametric space illustrates again how the MF enhancement combines the original methods: the regions of the parametric space where the methods work better are merged using MFDeepONet, resulting in a globally more accurate model.However, using a more precise POD model (as lowfidelity) reduces the benefits of the MF approach, even with the higher number of epochs.
Gappy POD.Here we propose the same experiments as before, this time in a sensor data scenario.Here we use 5 sensor locations and a rank truncation equal to 10.The involved neural networks are trained in this case for 5.0000 × 10 4 epochs.
Figure 5 summarizes the accuracy of the three tested methods, which are gappy POD, Deep-ONet, and MFDeepONet.The error distribution demonstrates that the multi-fidelity approach performs statistically better than the other methods.Looking at the competition between the techniques, we can also note that the multi-fidelity approach reaches the best accuracy in almost the whole parametric domain, even if at the boundaries there is a precision decrease.Such an issue could be mitigated by exploiting a better sampling strategy for the high-fidelity data.
The plots in Figure 6 provide the comparison in the spatial domain at four test parameters.The statistical results are confirmed in these examples, with the multi-fidelity approach that is able to predict most of the oscillations that the target function exhibits, contrarily to the single-fidelity approaches.

Navier Stokes problem
In the second numerical experiment, we test the accuracy of the proposed method for solving a parametric nonlinear PDE: the incompressible Navier-Stokes equation on a 2D domain.The numerical setting is inspired by [5].
We define the parametric vector field u : Ω × P → R 2 and the parametric scalar field p : Ω × P → R such that: where x = (x 0 , x 1 ) ∈ Ω ⊂ R 2 and µ ∈ P = [1,80].The L-shape spatial domain Ω, together with the boundaries, is sketched in Figure 7.For this test case, the parametric solution is computed numerically by means of finite element discretization.The spatial domain has been tessellated into 1639 non-overlapping elements, and for stability we apply the Taylor-Hood P 2 − P 1 scheme.The high-fidelity dataset is composed of 20 equispaced parametric samples in P , arranged in the snapshots matrix U ∈ R n×N with N = 20 and n = 1639.The DeepONet structure for this problem is the following: • the spatial network (branch) is composed by 3 hidden layers of 50 neurons each; • the parameter network (trunk) is composed by 3 hidden layers of 20 neurons each.
Also in this case, the last layer of the networks has the same number of neurons, 20.The activation function used in all the hidden layers is the Parametric ReLU (PReLU) [36], with the learning rate equal to 3 × 10 −3 and the L 2 −regularization factor equal to 1 × 10 −3 .The learning phase lasted  From top to bottom, we have the relative error in the parametric domain, the location of the high-fidelity samples in the parametric domain, the relative error distribution, and the best performers.
2.5 × 10 4 epochs.The accuracy of the MF approach is compared to the gappy POD and to the standard DeepONet, with the same architecture (single-fidelity).The relative error is evaluated over 500 testing points, randomly sampled in the parametric space.
POD with energy threshold 0.99.As before, we start with a relatively poor POD model, using N = 1 mode selected by the energetic criterion.RBF is employed also here to approximate the solution manifold at the reduced level.The number of epochs is fixed at 1.0000 × 10 4 for the deep learning training.
Figure 8 shows the plot of the mean relative error over the spatial domain for all the test parameters, reporting also the location of the samples in the parameter space.As for the previous experiment, the proposed technique is able to keep a higher precision in the entire domain, without showing a visible correlation between the location of the high-fidelity data and the error trend, demonstrating its robustness in terms of possible overfitting.Employing the DeepONet architecture to learn the residual (between the POD and high-fidelity models) rather than the target function results in a more efficient learning procedure, capable to ourperform the single-fidelity approaches in the entire parametric space here considered.
POD with energy threshold 0.999.As for the previous test case, we repeat the same experiment with a more accurate POD model.Here we use N = 3, raising the training time to 2.0000 × 10 4 epochs.
The trend showed in the previous investigations is confirmed, as depicted in Figure 9.The MFDeepONet method is able to produce a more accurate prediction in all the testing points, with no visible correlation with the training data.For a fair comparison, we also investigated the predicted field in the only point of the parametric domain where the MFDeepONet shows a slightly higher error with respect to the POD model (whereas the standard DeepONet performs poorly there).Figure 10 shows the x-component of the velocity field for the parameter µ = 69.12obtained by the three methods, with a statistical summary of the relative error.The MF approach shows here a smaller spatial variance, even if on average performs equally to the POD model.Looking instead at a different parametric coordinate (Figure 11), the benefits of the proposed approach become clear.The considerations regarding the variance of the error are still valid, but the solution for µ = 39.95 shows a remarkable improvement in the accuracy over the testing points.
Gappy POD.The last numerical experiment focuses on the Navier-Stokes model, for which sensor data are used by the gappy POD for the low-fidelity approximation.Here we use 7 sensor locations and a rank truncation equal to 8. We trained the DeepONet and the MFDeepONet for 5.0000 × 10 4 epochs.
Figure 12 reports the relative test error measured in all the test points.In this case, the     standard DeepONet, is able to outperform the POD model in a large region of the parametric domain, with a relative error that remains close to 1 × 10 −2 .Gappy POD is able to reach the best precision in a few test points, but also here the MF approach is the best compromise in terms of global accuracy, even if it is actually less precise than the POD model for high parameter values (µ > 70).

Summary discussion
This section is devoted to a summary discussion of the results obtained in the numerical investigations.For a fair comparison, we computed the mean relative test error 2 for each method, reporting the accuracy for different neural networks training times.In addition to the previous tests, we show in Table 1 the results obtained by employing a POD space whose modes are selected with an energetic threshold of ϵ = 9.999 × 10 −1 .The error charts for the missing cases, as well as some graphical representations of the parametric solutions, are reported in Appendix 4. The latter experiment aims to analyze the final accuracy when the low-fidelity POD is even more precise: the Mf approach is able to reach the best mean relative error, but its effectiveness is marginal, confirming the trend already defined in the previous tests.The combination of the POD model and DeepONet in the cascade fashion is able to reach the best accuracy in almost all the cases,   but its improvement becomes marginal when the POD has good accuracy.Learning the residual however does not seem to affect the final outcome in a pejorative way, provided that the DeepONet is trained for a proper number of epochs.This is for sure a critical issue inherited by deep learning in general: we can indeed see that a longer training step does not always ensure better accuracy, producing instead over-fitting.On the practical side, the optimal settings of the network -e.g.training epochs, number of layers, type of activation function -need to be calibrated with a trial and error procedure or using more sophisticated approaches such as grid search.This calibration is out of the scope of this investigation where we want to formalize the novel framework, but surely sensitivity analysis regarding the hyper-parameters will be explored in future works.The generalization of the DeepONet, assisted also by the L 2 -regularization imposed during the optimization, is able to improve accuracy over the entire parametric space, without showing a visible correlation between the location of the high-fidelity snapshots and the relative error spatial distribution.
To conclude, we highlight that the numerical experiments demonstrate a great improvement when the original POD model lacks accuracy, resulting in a great tool to treat problems where POD is not able to capture all the fluid characteristics, due to the complexity of the mathematical model or to the limited number of high-fidelity snapshots.In this work, we introduced a novel approach to enhance POD-based reduced order models thanks to a residual learning procedure by DeepONet.It operates by building from a limited set of data an initial low-fidelity approximation exploiting established reduced order modeling techniques.Then it learns the difference between this low-fidelity representation and the original model through the artificial neural networks, that will be inferred to predict the solution at unseen parameters.We emphasize that such an enhancement neither needs any additional evaluation of the original model nor the knowledge of the high-fidelity model, resulting in a generic data-driven improvement at a fixed computational budget.This framework has demonstrated its effectiveness in two different testcases: a univariate parametric function and a Navier-Stokes problem on a 2-dimensional domain, showing a higher precision in both experiments with respect to the use of single-fidelities.We highlight that in these experiments the number of considered POD modes is voluntarily kept small, simulating a POD model with poor accuracy.The present work illustrates the pipeline for POD and gappy POD for the construction of the low-fidelity model and the DeepONet architecture for residual learning.Due to its modularity, the framework is general, admitting in principle to replace the low-fidelity models with different ones.Possible future extensions should investigate adaptive samplings and sensor placement exploiting the proposed numerical framework.

Figure 1 :
Figure 1: Scheme for the multi-fidelity POD framework.The arrows indicate the relationship between the different methods.In the blue box the dashed arrows indicate the online phase when the input parameter is provided.In the purple box the dotted arrows indicate the information flow if only sensor data are provided.The central red frame emphasizes the computationally expensive offline phase.

Figure 3 :
Figure 3: Comparison between POD (0.99 energy threshold), DeepONet, and multi-fidelity Deep-ONet.From top to bottom, we have the relative error in the parametric domain, the location of the high-fidelity samples in the parametric domain, the relative error distribution, and the best performers.

Figure 4 :
Figure 4: Comparison between POD (0.999 energy threshold), DeepONet, and multi-fidelity Deep-ONet.From top to bottom, we have the relative error in the parametric domain, the location of the high-fidelity samples in the parametric domain, the relative error distribution, and the best performers.

Figure 5 :
Figure 5: Comparison between gappy POD, DeepONet, and multi-fidelity DeepONet.From top to bottom, we have the relative error in the parametric domain, the location of the high-fidelity samples in the parametric domain, the relative error distribution, and the best performers.

Figure 8 :
Figure 8: Comparison between POD (0.99 energy threshold), DeepONet, and multi-fidelity Deep-ONet in terms of relative error in the parametric domain.The vertical dotted lines indicate the location of the high-fidelity samples.

Figure 9 :
Figure 9: Comparison between POD (0.999 energy threshold), DeepONet, and multi-fidelity Deep-ONet in terms of relative error in the parametric domain.The vertical dotted lines indicate the location of the high-fidelity samples.The 2 dashed vertical lines indicate the test parameters represented in Figures 10 and 11.
Figure 9: Comparison between POD (0.999 energy threshold), DeepONet, and multi-fidelity Deep-ONet in terms of relative error in the parametric domain.The vertical dotted lines indicate the location of the high-fidelity samples.The 2 dashed vertical lines indicate the test parameters represented in Figures 10 and 11.

Figure 10 :
Figure 10: Representation over the spatial domain of the velocity (along x) in the Navier-Stokes testcase for µ = 69.12.The approximation computed by POD (0.999 energy threshold), Deep-ONet, and multi-fidelity DeepONet is shown at the bottom together with the relative error.The distribution of the error is summarized in the box plot.

Figure 11 :
Figure 11: Representation over the spatial domain of the velocity (along x) in the Navier-Stokes testcase for µ = 39.95.The approximation computed by POD (0.999 energy threshold), Deep-ONet, and multi-fidelity DeepONet is shown at the bottom together with the relative error.The distribution of the error is summarized in the box plot.

Figure 12 :
Figure 12: Comparison between gappy POD, DeepONet, and multi-fidelity DeepONet in terms of relative error in the parametric domain.The vertical dotted lines indicate the location of the high-fidelity samples.

Figure 13 :
Figure 13: Predictions at two different test parameters using gappy POD, DeepONet, and multifidelity DeepONet with different configurations, varying the number of epochs and the POD energy threshold.The left column shows the results for µ = [4, 8], while in the right one we have µ = [3, 16].Top row : POD energy threshold equal to 0.99, 10000 epochs.Center row : POD energy threshold equal to 0.999, 50000 epochs.Bottom row : POD energy threshold equal to 0.9999, 50000 epochs.

Figure 14 :
Figure 14: Comparison between POD (0.9999 energy threshold), DeepONet, and multi-fidelity DeepONet.From top to bottom we have the relative error in the parametric domain, the location of the high-fidelity samples in the parametric domain, the relative error distribution, and the best performers.

Figure 15 :
Figure 15: Comparison between POD (0.9999 energy threshold), DeepONet, and multi-fidelity DeepONet in terms of relative error in the parametric domain.The vertical dotted lines indicate the location of the high-fidelity samples.

Table 1 :
The mean relative error computed in all the experiments.In bold the best results for each row.