Model order reduction assisted by deep neural networks (ROM-net)

In this paper, we propose a general framework for projection-based model order reduction assisted by deep neural networks. The proposed methodology, called ROM-net, consists in using deep learning techniques to adapt the reduced-order model to a stochastic input tensor whose nonparametrized variabilities strongly influence the quantities of interest for a given physics problem. In particular, we introduce the concept of dictionary-based ROM-nets, where deep neural networks recommend a suitable local reduced-order model from a dictionary. The dictionary of local reduced-order models is constructed from a clustering of simplified simulations enabling the identification of the subspaces in which the solutions evolve for different input tensors. The training examples are represented by points on a Grassmann manifold, on which distances are computed for clustering. This methodology is applied to an anisothermal elastoplastic problem in structural mechanics, where the damage field depends on a random temperature field. When using deep neural networks, the selection of the best reduced-order model for a given thermal loading is 60 times faster than when following the clustering procedure used in the training phase.


Introduction
Numerical simulations in physics have become an essential tool in many engineering domains. The development of high-performance computing has enabled engineers and scientists to use complex models for real-world applications, with ultra-realistic simulations involving millions of degrees of freedom. However, such simulations are too time-consuming to be integrated in design iterations in the industry. They are usually limited to the final validation and certification steps, while the design process still relies on simplified models. Accelerating these complex simulations is a key challenge, as it would provide useful numerical tools to improve design processes. The development of numerical methods for fast simulations would also enable using new models that have not been applied to industrial problems yet, because of their complexities. Uncertainty quantification is another important example of analysis that would become practicable if the cost of simulations was sufficiently reduced. Indeed, quantities of interest monitored in numerical simulations depend on the environment of the physical system, which is usually not exactly known. In some cases, these uncertainties strongly influence simulation results, and the probability distributions of the quantities of interest must be estimated in order to ensure the reliability of the industrial product.
The cost of numerical simulations can be reduced by projection-based model order reduction, which consists in restricting the search of the solution to a low-dimensional space spanned by a reduced-order basis. This reduced-order basis is inferred from a set of pre-computed high-fidelity solutions, using linear dimensionality reduction techniques such as the proper orthogonal decomposition (POD, [1][2][3]). In addition to this reduction in terms of degrees of freedom, a second reduction stage may be necessary for nonlinear problems, this time in terms of integration points. This is called hyperreduction, or operator compression according to the terminology introduced in [4]. Operator compression methods include the empirical interpolation method (EIM, [5]), the missing point estimation (MPE, [6]), the a priori hyperreduction (APHR, [7]), the best point interpolation method (BPIM, [8]), the discrete empirical interpolation method (DEIM, [9]), the Gauss-Newton with approximated tensors (GNAT, [10]), the enery-conserving sampling and weighting (ECSW, [11]), the empirical cubature method (ECM, [12]), and the linear program empirical quadrature procedure (LPEQP, [13]).
In this article, we consider uncertainties on a tensorial input variable which affects the solution of the partial differential equations (PDEs) governing the physical system. This input tensor can represent a three-dimensional field, physical constants used in the constitutive equations, images of defects, a X-ray computed tomography scan characterizing a microstructure, boundary conditions, or geometrical details. In the example presented at the end of this paper, the input tensor represents a three-dimensional temperature field influencing the physical properties of the system. The objective is to accelerate numerical simulations where a quantity of interest highly depends on this stochastic input tensor subjected to nonparametrized variabilities. This objective can be achieved with a single reduced-order model, as long as uncertainties on the input tensor are small enough or have a minor impact on the quantity of interest. In other situations, the solution of the governing PDEs lies in a manifold which cannot be covered by a single reduced-order model without increasing its dimension and thus degrading its efficiency. For example, traditional model order reduction techniques fail to solve advection-dominated problems. These problems require more sophisticated techniques, such as those proposed in [14] or [15]. In structural mechanics, the fatigue lifetime assessment of high-pressure turbine blades of an aircraft engine is very sensitive to variations of the temperature field, see [16]. Linear dimensionality reduction is not always suitable for this kind of applications. Nonetheless, linear methods have the critical advantage of being compatible with the Galerkin method, providing a reduced problem in the form of equations assembled on a reduced-order basis.
Many strategies have been proposed to address such problems. The concept of local reduced-order models was first introduced in [17], and applied to computational fluid dynamics in [18]. In these works, the set of pre-computed high-fidelity solutions was partitioned into several clusters, each of them being used to build a small cluster-specific reduced-order model. The resulting dictionary of local reduced-order models was then used to adapt the reduced-order basis to the current state of the solution by finding the closest cluster center. This technique works very well when the solution evolves on a low-dimensional manifold. However, for some specific applications where there is no guarantee that the solution lies in a low-dimensional manifold, this technique might be subjected to the curse of dimensionality [19]. Indeed, in high dimension, the nearest neighbour is almost as far as the furthest point in the dataset because of the loss of contrast in distances, explaining the difficulties of high-dimensional clustering. Other approaches rely on the interpolation of reduced-order models. This trend was initiated by the subspace angle interpolation method [20][21][22][23][24]. A generalization of this method has been proposed in [25] with the ROM adaptation method based on the Grassmann manifold (also called Grassmannian). It has been successfully applied to the CFD-based aeroelastic analysis of a complete aircraft configuration for varying Mach number and angle of attack. In [26], this method has been improved to achieve real-time performance for linear problems. More recent works [27,28] propose other interpolation methods on Grassmannians. All these interpolation methods give excellent results for problems depending on a small number of parameters, but none of them have yet been applied to nonparametrized variabilities of a large input tensor.
The generic methodology developed in this article, called ROM-net, is another attempt to deal with the limits of traditional tools available in the model order reduction literature. It relies on projection-based model order reduction assisted by deep learning techniques. Our objective is to define a general framework for reduced-order model adaptation using deep neural networks, in order to see to what extent model order reduction can benefit from the recent advances in deep learning. Indeed, the growing interest for this discipline has led to the development of innovative methods in many fields. These advances have facilitated the development of surrogate models and data-driven approaches in physics, providing approximate solutions in real time. Other modeling strategies are based on a hybrid approach, mixing physics-based modeling and machine learning. For example, deep neural networks were used in [29] to model the Reynolds stress anisotropy tensor in Reynolds Averaged Navier Stokes (RANS) models, in computational fluid dynamics. In [30], a nonlinear dimensionality reduction is performed using deep convolutional autoencoders. The modeling strategy presented in [31] was the first hybrid approach involving both a dictionary of local hyperreduced-order models and computer vision techniques. In the context of image-based modeling, it showed that convolutional neural networks could be used to recognize the loading case of a mechanical experiment on a digital image, and select a suitable hyperreduced-order model to simulate the experiment.
The concept of ROM-net introduced in this article is an extension of the methodology presented in the aforementioned paper, designed to accelerate numerical simulations where a quantity of interest depends on a stochastic tensorial input. Dictionary-based ROM-nets can be made of one or several deep neural networks selecting the best reducedorder model from a dictionary. In our case, unlike the strategy presented in [31], this dictionary derives from the clustering of outputs of simplified simulations using distances on a Grassmannian. In the first section of this article, we introduce the formal definitions of ROM-nets and dictionary-based ROM-nets. Then, we describe the training procedure for dictionary-based ROM-nets. An application to a temperature-dependent problem in structural mechanics is presented in the last section of this paper, where we focus on the clustering procedure and the construction of a classifier for model selection.

Classical projection-based model order reduction
Let us consider a physics problem whose primal variable of the governing equations is denoted by u and defined on a domain ⊂ R β , β ∈ [[1; 3]] and on a (normalized) time interval [0; 1]. The governing PDEs are generally posed on an infinite-dimensional Hilbert space, but in practice, these equations are solved numerically on a finite-dimensional subspace, denoted by H in this article. In solid mechanics for example, H is the space spanned by finite-element shape functions φ i 1≤i≤dim(H) , and the primal variable u corresponds to the displacement field. The primal variable u computed at the n-th time step can be represented by a vector U n ∈ R dim(H) containing its coordinates in the finite-element basis φ i 1≤i≤dim(H) . In solid mechanics, this numerical solution can be obtained with the Newton-Raphson algorithm, an iterative procedure based on the linearization of the virtual work principle. The resulting linear system to be solved for the m-th iteration at the n-th time increment reads: where J with U (0) n = 0. When the convergence criterion ||R (m) n || ≤ NR ||F ext n || is satisfied for a given m = m * with NR being the tolerance of the Newton-Raphson algorithm and F ext n being the vector of external forces, the solution at the n-th time increment is defined as: Equation (1) is the high-dimensional linear system deriving from the high-fidelity model composed of equilibrium, compatibility and constitutive equations.
Projection-based model order reduction consists in searching an approximation of the high-fidelity solution in a low-dimensional subspace V ROM ⊂ H adapted to the current physics problem. This subspace is spanned by an appropriate reduced-order basis ψ i 1≤i≤N , with N being very small compared to dim(H). The reduced-order basis approximation of the primal variable reads: where {γ i } 1≤i≤N are the reduced coordinates which can be stored in a vector γ ∈ R N . The coordinates of the modes ψ i 1≤i≤N in the finite-element basis φ i 1≤i≤dim(H) are stored in columns in a matrix V ∈ R dim(H)×N called reduction matrix. Hence: These modes can be obtained by applying the POD [1] or the snapshot POD [2,3] to a set of pre-computed high-fidelity solutions evaluated at different time steps or for different configurations of the physical system. After the Galerkin projection of the governing equations on V ROM , the reduced linear system to solve at each iteration of the Newton-Raphson algorithm in the reduced-order model (ROM) is then: Any ordered set of k ≤ dim(H) linearly independent vectors in H is called a k-frame. The Stiefel manifold V (k, H) is the set of all orthonormal k-frames in H. In projection-based model order reduction, reduced-order bases obtained by POD or snapshot POD belong to a Stiefel manifold. In this article, the notation V (H) stands for the set of reduced-order bases: The linear system (6) results from a first reduction stage in terms of degrees of freedom. For some nonlinear problems, a second reduction stage is required to efficiently decrease the computation time. This second reduction stage is referred to as hyperreduction or operator compression, as mentioned in the introduction. In this case, the definition of the set V (H) can be extended to include sets of hyperreduction parameters in addition to the set of reduced-order bases.

ROM-nets
Our objective is to predict a quantity of interest Y via the computation of a primal variable u that belongs to a reduced approximation space and satisfies nonlinear physics equations depending on a stochastic input tensor X. In this article, X denotes the set of input variabilities and Y represents the set containing the quantity of interest. In structural mechanics, Y can represent a damage field, the von Mises stress in a zone of interest, or the displacement of a specific point in the structure, while X can stand for material constants, boundary conditions, geometrical parameters, a X-ray computed tomography scan characterizing the microstructure, images of defects, or even a three-dimensional field defined on the domain such as a temperature field, residual stresses, or heterogeneous material parameters. The only restriction on the input is that it must have a tensorial representation, that is, a representation as a multidimensional array. For instance, images are second-order tensors or two-dimensional arrays, X-ray computed tomography scans are third-order tensors or three-dimensional arrays, and fields discretized on a finite-element mesh can be represented by first-order tensors or one-dimensional arrays. These tensorial inputs are stochastic because they contain the uncertainties on the physical system under study: when considering polycrystalline materials, X-ray computed tomography scans could be used to study macroscopic properties under microstructural variabilities such as grains' sizes, shapes and orientations. We refer the reader to [32,33] for more details on finite-element modeling based on X-ray computed tomography scans. In the application presented at the end of this paper, X is the finite-element discretization of a temperature field with variabilities evolving in L 2 ( ). These stochastic variabilities may be related to turbulence in a fluid-structure interaction with a high-Reynolds-number fluid flow. In aircraft engines, the temperature field in high-pressure turbine blades results from a complex turbulent flow coming from the combustion chamber. While the tensor X can be generated by a parametric stochastic model, it is assumed that we have no prior knowledge of the underlying model. Therefore, the proposed methodology is suitable for nonparametrized (or generic) input variabilities which can represent uncertainties on the environment of the physics problem. This feature is required when the method is trained on data simulated by a parametric model, but applied to real data with unknown distributions obtained from experimental measures or from a more complex model. When the input X ∈ X is modified, the primal variable u evolves on a manifold M. In some situations, it is complicated to build a relevant reduced-order model giving accurate predictions for the primal variable on the whole manifold. In such cases, predictions on the quantity of interest Y are inaccurate since they derive from the behavior of the primal variable. The reduced-order model must be adapted to the input to capture nonlinearities. In this paper, we propose a general framework for reduced-order model adaptation via deep learning algorithms.
Given two sets A and B, the notation B A represents the set of functions f : A → B. Let us now give the definitions of a reduced-order solver and a ROM-net: Definition 1 (Reduced-order solver) Let us consider a physics problem, where a quantity of interest Y ∈ Y depends on a tensorial input variable X ∈ X . A reduced-order solver is an operator S : V (H) → Y X taking a reduced-order model m ∈ V (H) as an input and returning a predictor S[m] : X → Y for the quantity of interest. Given X ∈ X , the quantity of interest Y can be approximated by: In this definition, the reduced-order model m consists in a reduced-order basis and, optionally, parameters related to a hyperreduction algorithm. The function S[m] can be seen as an operator solving the reduced linear system (6) and computing the quantity of interest associated to the reduced-order solution u. Definition 2 (ROM-net) Let us consider a physics problem, where a quantity of interest Y ∈ Y depends on a tensorial input variable X ∈ X and can be predicted by a reducedorder solver S : is a deep learning algorithm returning a reduced-order model R(X) ∈ V (H) adapted to the input X ∈ X . Given X ∈ X , the quantity of interest Y can be approximated by: Contrary to surrogate modeling, using a reduced-order model R(X) enables satisfying homogeneous Dirichlet boundary conditions and solving the constitutive equations at least at some specific points if operator compression (hyperreduction) is used. Hence, a ROM-net provides a hybrid approach mixing physics-based modeling and deep learning. It is used as a reduced-order basis generator for complex problems where the reduced-order basis must be adapted to a tensorial input. In addition, it is noteworthy that the definition of the quantity of interest remains quite flexible after the training of a ROM-net. In solid mechanics for instance, the definition of the damage indicator of an uncoupled damage model can be changed without restarting the training phase. Figure 1 summarizes the concept of ROM-nets.
Remark In [30], another deep learning strategy for model order reduction is proposed for parametrized ordinary differential equations. The governing equations are mapped onto a nonlinear manifold thanks to a deep convolutional autoencoder. Contrary to projectionbased model order reduction methods using linear dimensionality reduction techniques such as the POD or the snapshot POD, this methodology performs a nonlinear dimensionality reduction. The reduced (or generalized) coordinates in the latent space defined by the autoencoder's bottleneck layer are combined by the decoder in a nonlinear fashion to get the high-dimensional state approximation. When the decoder is linear, this methodology is equivalent to classical projection-based model order reduction. In the present paper, instead of looking for an approximate solution on a nonlinear trial manifold, ROM-nets adapt the linear subspace to the input variability. As explained in the next section, dictionary-based ROM-nets use several subspaces to get a piecewise linear approximation space, while the aforementioned methodology would have approximated the solution manifold by a single nonlinear manifold. The choice of keeping a linear method for dimensionality reduction is motivated by its compatibility with the Galerkin method, enabling an easy construction of a hyperreduced problem.

Dictionary-based ROM-nets
When the solution manifold M is embedded in a low-dimensional vector space, one can construct a single global reduced-order model in order to compute approximate solutions of the physics problem for different input variabilities. When the solution manifold M is not embedded in a low-dimensional vector space, using one single global reducedorder model would result in either time-consuming or inaccurate reduced simulations, depending on the number of modes selected in the reduced-order basis. By partitioning the set X of input variabilities, one can define a dictionary of local reduced-order models which enables approximating M by several affine subspaces. Clustering algorithms can be used to split the set X into distinct subsets called clusters. Inputs belonging to the same cluster lead to solutions which can be predicted with the same local reduced-order model because of their proximity on the manifold M. More precisely, for a given integer K ∈ N * , the clustering algorithm gives a partition of the set X : The dictionary of local reduced-order models contains K cluster-specific reduced-order models. Hence, for a given input X ∈ X , one must identify the corresponding cluster X k to select the most appropriate reduced-order model.

Definition 3 (Dictionary of reduced-order models) Given an integer
Definition 4 (Dictionary-based ROM-net) Let us consider a physics problem, where a quantity of interest Y ∈ Y depends on a tensorial input variable X ∈ X and can be predicted by a reduced-order solver S : Figure 2 illustrates the concept of dictionary-based ROM-nets. The strategy presented in [31] for image-based modeling using convolutional neural networks and a dictionary of local reduced-order models fits the definition of a dictionary-based ROM-net. In this definition, the expression deep classifier denotes deep neural networks returning a single class label in [[1; K ]] for a given tensorial input. In multiclass classification, deep classifiers usually have a softmax activation function in the output layer, giving an output vector y pred ∈ R K such that y pred k is the probability for the input tensor to belong to the k-th class. The probabilities y pred k are also called membership probabilities. The deep classifier returns the integer corresponding to the class with the highest membership probability, that is: Such classifiers are called classical deep classifiers in this article. The concept of deep classifier used in the definition of a dictionary-based ROM-net can be extended to include not only the classical ones, but also deep clustering algorithms [34][35][36]. These algorithms use encoders to cluster the data in a low-dimensional latent space, avoiding the difficulties of high-dimensional clustering [19] As mentioned earlier, the partition of the set X used to define cluster-specific reducedorder models is given by a clustering algorithm. Clustering algorithms generally rely on a dissimilarity measure quantifying the difference between two points in the dataset. In this paper, the expression dissimilarity measure refers to a pseudo-semimetric:

Dictionary-based ROM-nets involving classical deep classifiers
The rest of the article focuses on dictionary-based ROM-nets involving a classical deep classifier. In this case, the deep classifier F K solves a classical multiclass classification problem to recommend a suitable reduced-order model from the dictionary. Nevertheless, as the classes are given by a clustering algorithm, one could wonder why a deep neural network is used for cluster assignment. When using a center-based clustering algorithm, each cluster X k is represented by a centerX k . In theory, one could compute the dissimilarities between the new input tensor X and all the clusters' representativesX k , and then select the cluster with the smallest dissimilarity δ(X,X k ). However, this procedure is not reasonable when repeated many times, because of the computation time required to evaluate the dissimilarities. Indeed, as further explained in the next section, dissimilarity measures that are suitable for model order reduction applications may involve numerical simulations. Hence, the time saving obtained by model order reduction would be counterbalanced by the time-consuming operations required for cluster assignment. The true classifier defined by: is too expensive because it is based on numerical simulations. Note that K K is not an artificial neural network. When using the ROM-net, the true classifier K K is replaced by the approximate classifier F K to bypass the computations required for cluster assignment.
In the application presented at the end of this paper, replacing the true classifier by the approximate one enables fast ROM selection with a computation time reduced by a factor of 60. The next section gives some general guidelines for the training of a dictionary-based ROM-net.

Training procedure for dictionary-based ROM-nets
Let K be a positive integer. In this section, we describe the training phase of a dictionarybased ROM-net R K made of a classical deep classifier F K and a K -ROM-dictionary D K . First, an automatic data labelling procedure is presented. It aims at preparing the data for the supervised learning of the deep classifier F K , using simplified numerical simulations and a clustering algorithm. Then, we train the deep classifier F K and build the K -ROMdictionary D K on the grounds of the clusters identified by the labelling procedure. Figure 3 summarizes the main steps for the training of a dictionary-based ROM-net.

ROM-oriented dissimilarity measure
Clustering the input space X enables labelling the training data for the deep classifier, and guides the construction of the ROM-dictionary by defining regions of the input space where high-fidelity simulations must be run to build local reduced-order models. The key point is the choice of the dissimilarity measure for clustering. Indeed, clustering aims at grouping points of a dataset that are similar. As shown in the last section of this article, for some specific applications, defining a dissimilarity measure based on a distance between input tensors leads to inaccurate reduced-order solutions. Furthermore, the difficulties of high-dimensional clustering appear when dealing with large input tensors.
In the application to structural mechanics presented at the end of this paper, this issue is due to the complex interaction between the thermal and the mechanical loadings. In this application, the input tensor X is a temperature field influencing the mechanical response of the material. To illustrate the aforementioned difficulty, let us imagine two temperature fields T 1 and T 2 taking different values only in a very small part ω of the structure . Let us introduce a third temperature field T 3 being equal to T 1 in ω and taking arbitrary values in the rest of the solid domain. If ω is a critical zone from a mechanical point of view, T 1 and T 2 might lead to dissimilar displacement fields, while T 3 might give approximately the same displacement field as T 1 (if thermal expansion is negligible with respect to the strains induced by the mechanical boundary conditions). In this case, T 1 and T 3 should be assigned to the same cluster, while T 2 should be assigned to another one. However, taking the L 2 distance between temperature fields as the dissimilarity measure would assign T 1 and T 2 to the same cluster, as these fields are identical in most of the solid domain.
Consequently, for such cases, one must define a ROM-oriented dissimilarity measure accounting for the variability induced by the stochastic input tensor. In this paper, the dissimilarity is defined using the Grassmann distance [37] between subspaces spanned by outputs of a simplified physics problem. The simplified physics problem consists in computing a few time steps of the original problem with a less restrictive convergence criterion. The boundary conditions can even be simplified to facilitate convergence. The idea is to discover the subspace in which the solution evolves at the beginning of the simulation for a given input X. Two input variabilities leading to solutions lying in nearby subspaces (in terms of principal angles) are then considered as similar.
Before giving a formal definition of the ROM-oriented dissimilarity measure, let us define some useful concepts. The Grassmannian Gr(k, n) is a Riemannian manifold whose points are all the k-dimensional linear subspaces of R n . The infinite Grassmannian Gr(k, ∞) parametrizes the k-dimensional subspaces in R n for all n ≥ k. As shown in [37], one can define a distance between two subspaces of different dimensions. This distance is independent of the dimension of the ambient space. The Grassmann metric d Gr(∞,∞) is defined on the doubly-infinite Grassmannian Gr(∞, ∞), which parametrizes subspaces of all dimensions regardless of the ambient space [37]. In practice, this metric can be obtained with the following formula: where A and B are two linear subspaces of dimension a and b respectively, and where the α i 's are the principal angles obtained by singular value decomposition: with A and B being semi-orthogonal matrices whose columns form orthonormal bases of A and B respectively, and ∈ R a×b is a matrix whose only nonzero coefficients are min(a, b). These coefficients are non-negative, thus α i ∈ [0; π/2] for all i.
Let us introduce the application V : X → Gr(∞, ∞) assigning the input tensor X ∈ X to the subspace V(X) spanned by the primal variable U of the governing equations during the numerical simulation of the simplified physics problem: with n t being the number of time increments in the simplified simulation and U n (X) denoting the vector representation of the primal variable computed for the input X and evaluated at the n-th time increment. The ROM-oriented dissimilarity measure used for the clustering of X is written δ and defined by: The dissimilarity measure is computed for all pairs of inputs belonging to the training set, resulting in a dissimilarity matrix δ ∈ R n T ×n T , where n T is the cardinality of the training set.

The clustering algorithm
Once the dissimilarity matrix is calculated, a clustering algorithm must be applied to partition the dataset into K clusters. The choice of the algorithm depends on the context. In this methodology, clustering is used for the definition of local approximations of a nonlinear solution manifold. Hence, the algorithm must focus on compactness rather than connectivity when looking for clusters in the dataset. This property is satisfied by k-means algorithm [38], the most well-known clustering approach. However, this algorithm needs to calculate clusters' centroids or means, which is impossible when the input data correspond to vector spaces. For this reason, we choose the k-medoids algorithm presented in [39], relying on a Voronoi iteration approach like k-means. The algorithm proposed in [39] can be summarized as follows: • Initialization step: select K initial medoids from the dataset. • Repeat the two following steps until convergence: -Data assignment step: assign each point of the dataset to the cluster corresponding to its closest medoid. -Medoid update step: for each cluster, update the medoid by finding the point which minimizes the sum of distances to all the points in the cluster.
The choice of the hyperparameter K depends on the problem. More details concerning this aspect of the methodology can be found in the final section of this article.

Training of a deep classifier for fast model selection
The ROM-net's classifier F K is trained in a supervised fashion from pairs of examples (X i , K K (X i )) given by the clustering algorithm, where the label K K (X i ) ∈ [[1; K ]] is the index of the cluster containing X i . As explained earlier in the article, the true classifier defined by: is too expensive because it is based on numerical simulations, which motivates the use of an approximate classifier. In the previous equation,X k is the medoid of the k-th cluster. The dataset is split into a training, a validation and a test set. For a given deep neural network architecture and for a given set of hyperparameters, the parameters of the classifier F K are calibrated on the training set via backpropagation with Adam optimizer [40]. The accuracy of the calibrated classifier is evaluated on the validation set. The classifier is calibrated with different architectures and hyperparameters settings until the accuracy on the validation set reaches a satisfying value. Once the best architecture and set of hyperparameters have been selected, the calibrated classifier is evaluated on the test set to get the accuracy of the model for new unseen data. When the input X is an image, one could use well-known convolutional neural networks' architectures and fine-tune their pre-trained parameters to adapt the model to the current data, which is a common transfer learning technique [41].

Construction of a ROM-dictionary
Contrary to the classifier, the ROM-dictionary D K is trained in an unsupervised fashion. Clustering results help for the selection of data-points in X for which high-fidelity simulations must be run. The solutions computed at every time step of these high-fidelity simulations are called snapshots. Selecting clusters' medoids as simulation points for snapshots is recommended, since the clusters are represented by their medoids. Additional snapshots can be computed if necessary. For each cluster, the snapshot POD is applied to the set of snapshots to obtain a local reduced-order basis.

Application to an anisothermal elastoplastic problem in structural mechanics
In this section, a temperature-dependent problem in structural mechanics is considered. Our objective is to study the influence of thermal loading uncertainties on a mechanical quantity of interest such as a damage field. The input tensor corresponds to the final temperature field in the structure, defined by a truncated Gaussian field. The quantity of interest is a damage indicator based on the accumulated plastic strain field and defined on the whole structure.

The high-fidelity model
Let us consider the solid body shown on Fig. 4. The heat produced by mechanical phenomena is neglected, which enables solving the heat equation and then use the resulting temperature field history as a thermal loading for the mechanical problem. The structure is subjected to a displacement-controlled monotonic loading. Assuming a quasi-static evolution, equilibrium equations at the local level and boundary conditions read: The structure is made of an elastoplastic generalized standard material described by the von Mises yield criterion and a nonlinear isotropic hardening law. In the framework of the infinitesimal strain theory, the constitutive equations are: • Hooke's law: • von Mises yield criterion with isotropic hardening: • Nonlinear isotropic hardening law (with p denoting the accumulated plastic strain): • Flow rule for the plastic strain rate tensor: s σ eq (σ) (25) • Karush-Kuhn-Tucker conditions: • Consistency condition for the determination of the plastic multiplier: Under the assumption of isotropic elasticity, the fourth-order elastic stiffness tensor C can be decomposed as follows: where E is the Young's modulus, ν is the Poisson's ratio, and K (resp. J) is the projector onto the space of deviatoric (resp. spherical) second-order tensors. Material constants E, ν, α, σ y , R ∞ and b generally depend on the temperature. For this application, temperature-dependent coefficients E, α, σ y are taken from experimental data on high strength structural steels for fire-resistant structures [42]. The other material parameters are taken as constants. The thermal loading applied to the structure is defined by: with T 0 = 22 o C. The field T max (x) will be replaced by a random temperature field to account for uncertainties on the thermal loading, while the mechanical loading is deterministic. Hence, for this study, the stochastic input X is a tensorial representation of the random temperature field obtained at t = 1. Let us consider a simple damage indicator D : where p f is the material's plastic strain to failure. A crack initiates at x 0 ∈ if D(x 0 , t) reaches the value 1 for some t ∈ [0; 1]. The quantity of interest for this application is the field Y = D( . , 1) : The high-fidelity mechanical problem is solved using the finite-element method. Numerical simulations are performed with Z-set software [43].

Stochastic model for the thermal loading
A training set of random thermal loadings is generated using the stochastic model described in Appendix A. Briefly speaking, the stochastic model draws random combinations of fluctuation modes which are superposed with a reference temperature field T ref : → R. In Eq. (29), the field T max is replaced by the resulting random temperature fields, which gives the random thermal loadings. The reference temperature field defines the mean thermal loading. For this application, the reference temperature field is uniform at 650 o C. For real-world applications, the reference temperature field can be given by aerothermal simulations. Random temperature fields are denoted by T rand : × → R, where is the sample space of the probability space associated to the stochastic model described in Appendix A.
Remark When training the ROM-net, it is assumed that we have no prior knowledge of the data-generating stochastic model. Training data may come from complex aerothermal simulations with random boundary conditions defined by experts. In the present case, in the absence of such data, we use a stochastic model to generate the training data. This stochastic model is parametric, since every random temperature field comes from a random linear combination of 150 fluctuation modes. However, when training the deep classifier for local ROM recommendation, the training data corresponds to nodal values of the temperature fields rather than the 150 random coordinates. This way, in the test phase (or online phase in the model order reduction community), the ROM-net can be applied to thermal loadings coming from unknown stochastic models, complex aerothermal simulations or even experiments, as long as these thermal loadings correspond to perturbations of the reference thermal loading. Hence, the ROM-net can deal with nonparametrized variabilities of the input data. Figure 6 illustrates the influence of the thermal loading on the damage field computed with the high-fidelity model. The fields on the left represent two different temperature fields reached at t = 1 in the simulation. The critical zone is located around the first column of holes on the left-hand side of the structure. The second temperature field (case B) on the figure takes high values in this zone, leading to high values of the damage indicator. On the contrary, values taken by the first temperature field (case A) in this zone are relatively small. The resulting damage field takes smaller values in the first column of holes. One can observe that the shapes of damaged zones in A and B are not the same. In addition, the second column of holes is a bit damaged in A, while this region remains undamaged in B.

Construction of a dictionary-based ROM-net
To compute the damage field as a function of the thermal loading in a reasonable time, a dictionary-based ROM-net R K based on a classical deep classifier F K is used. In this paper, we focus on the training of the deep classifier F K .

Computation of the ROM-oriented dissimilarity matrix
For every thermal loading, we solve a simplified mechanical problem which is similar to the original one, with a less restrictive tolerance for Newton-Raphson's algorithm to reduce the number of required time steps. For the sake of simplicity, only two displacement fields of the simplified simulation are kept: one in the elastic regime, and one in the plastic regime. The space V(T ) is the 2-dimensional space spanned by these fields. The solutions at the other time steps are discarded, but computing them is necessary to ensure the convergence of all the simplified simulations. The 10 4 × 10 4 matrix δ is defined as the dissimilarity matrix whose coefficients are: The 10 4 simplified mechanical simulations are distributed between 84 computer threads. The total computation time is 9h05min, which represents 5min per simulation. Once the simplified simulations are done, computing the dissimilarity matrix takes 11h16min, if its coefficients are distributed between 48 computer threads. Note that less than half of the coefficients are calculated, since the dissimilarity matrix is symmetric with zeros on its diagonal.

k-medoids clustering
The ROM-oriented dissimilarity is used to cluster the data with the k-medoids algorithm. The number K of clusters is a hyperparameter provided by the user. Numerous empirical methods have been proposed to estimate the best number of clusters for different criteria. In our case, the dataset is not organized in distinct clusters. Therefore, the clustering algorithm is applied for different values of K , and the quality of clustering results is evaluated using silhouette analysis [44]. When selecting the most appropriate number of clusters based on silhouette analysis, one must also consider the trade-off between large numbers giving a better approximation of the nonlinear manifold, and small numbers facilitating the classification problem for the deep neural network. In addition, using a large number of clusters increases the cost of the construction of the ROM-dictionary. For the current problem, K = 6 has been identified as a good compromise. A single run of the clustering algorithm takes about 10 s. The true classifier associated to this clustering procedure is denoted by K K .

Visualization of clusters on the nonlinear manifold
The clustering results can be visualized thanks to Multidimensional Scaling (MDS) [45]. MDS is an information visualization method which consists in finding a low-dimensional dataset Z 0 whose matrix of Euclidean distances d(Z 0 ) is an approximation of the input dissimilarity matrix δ. To that end, a cost function called stress function is minimized with respect to Z: This minimization problem is solved with the algorithm Scaling by MAjorizing a COmplicated Function (SMACOF, [46]) implemented in Scikit-learn [47]. Figure 7 shows clustering results with low-dimensional representations obtained by metric MDS. The relative error ς (Z 0 ; δ)/ς (0; δ) is 11% for the 2D representation and 10% for the 3D representation. These visualizations illustrate the nonlinear manifold on which solutions of the mechan-ical problem evolve when changing the thermal loading. Note that the positions of the clusters' labels coincide with the medoids.

Multiclass classification
Classifier based on an ensemble of DNNs. Clustering results obtained with the true classifier K K enable training the approximate classifier F K of the ROM-net R K . Among the 10 4 thermal loadings in the dataset, 6400 are used as training data, 1600 as validation data, and 2000 as test data for final evaluation. As the data labelling procedure involves numerical simulations, the dataset contains a limited amount of training examples in comparison with standard image classification problems. Working on a small dataset makes our deep classifier prone to overfitting. To address this issue, we use the ensemble averaging method [48], which consists in taking an ensemble of classifiers and averaging their predicted membership probabilities. Ensemble averaging is a common technique in ensemble learning. Generally speaking, ensemble methods aim at creating a meta-estimator from several base estimators (or models). Combining different estimators leads to more robust predictions and reduces overfitting. In addition, using an ensemble method replaces the task of finding a single very accurate model by the task of building an effective meta-estimator from several models with lower accuracies. The winners of numerous deep learning challenges used ensemble averaging to improve their predictions [49][50][51]. Our ensemble contains N models = 12 different DNNs trained for the same classification problem with the same training data using Keras [52] and Tensorflow [53] libraries on Python, but with different architectures and loss functions. All of them use the softmax activation function to get membership probabilities. These predictions are combined in a soft voting scheme to give the final prediction: (y pred (T, l)), with y pred (T, l) = 1 N models N models k=1 y k l (T ) ( 3 4 ) with y k l (T ) denoting the membership probability of class l predicted by the k-th model. The 12 models in the ensemble include fully-connected networks (FC), convolutional neural networks (CNN) [54] and global average pooling convolutional neural networks (GAP-CNN) [55]. These DNNs are trained with different loss functions, namely crossentropy, balanced cross-entropy to handle class imbalance, and the focal loss [56] which enables focusing more on misclassified data. Using an ensemble enables recycling the best DNNs obtained during training, and overcoming some weaknesses of every single model in the ensemble. Training one of the DNNs used in this ensemble on a Nvidia Quadro K5200 GPU takes about 2h on average in our case.
An important aspect of our classification problem is the preprocessing step, in which thermal loadings are prepared to be fed into neural networks. Thermal loadings in T are represented by their corresponding random temperature fields reached at t = 1. These temperature fields are projected onto a 38 × 17 × 4 regular grid defined on a bounding box surrounding the solid body, see Fig. 8. For the grid's vertices being inside , the value of the temperature is evaluated using the finite-element shape functions, while vertices being outside are assigned a zero value. This procedure gives a 3D bitmap image of the temperature field, represented by a third-order tensor. Thanks to this tensorial representation, 3D convolutional filters can now be applied to extract features of the input data, like 2D convolutional filters do for image analysis. For fully-connected networks, although   the third-order tensor is flattened, the projection on the grid acts as a subsampling procedure. This preprocessing operation takes about 3 min when the 10 4 fields are distributed between 280 computer threads, which means that the projection of a single field takes 5 s.

Analysis of classification results
In the present study, when evaluated on the test set, the ensemble of DNNs reaches an accuracy of 80%, whereas accuracies of its base classifiers range from 63.05 to 73.75%, see Table 1. As expected, ensemble averaging reduces overfitting and thus improves the ability of the classifier F K to generalize to new unseen data. Table 2 summarizes the values of precision, recall and F1-score. Figure 9 gives the confu- On these figures, one can clearly see that misclassified examples of class i are mostly located close to the border between the i-th cluster and its neighbours. This is actually a nice property, because it means that when the ensemble fails to select the appropriate reduced-order model in the dictionary, it returns a reduced-order model that covers a part of the manifold that is close to the target one.

Evaluation of the methodology and discussion
Let us quickly summarize what has been done up to that point. A dataset of 10 4 thermal loadings has been generated with a stochastic model. Its ROM-oriented dissimilarity matrix has been computed. Based on this matrix, the dataset has been split into K = 6 clusters with k-medoids clustering algorithm. An ensemble of DNNs has been trained to assign new unseen thermal loadings to the best clusters using the ensemble averaging method.
When considering a new thermal loading, true cluster assignment requires one simplified simulation (5 min) and the computation of six Grassmann distances (less than 1 s). On the other hand, when using the deep classifier F K , preprocessing operations take 5 s and the evaluation of the deep classifier is quasi-instantaneous. Hence, the computation time for the selection of the best reduced-order model is decreased by a factor of 60 when using the ROM-net's deep classifier F K instead of the true classifier K K .
The clustering of the thermal loading database T with the ROM-oriented dissimilarity has defined 6 clusters which can be used to construct a dictionary of 6 local ROMs. Let us compare our methodology with another approach consisting in the construction of a temperature-based ROM-dictionary D T . Such a dictionary comes from a direct clustering of the input space, that is, a k-medoids clustering of T using distances between the temperature fields evaluated at t = 1. This clustering only considers the input data and does not use simplified simulations to account for mechanical phenomena. Hence, cluster assignment is directly obtained by taking the minimum distance to clusters' medoids. In this case, there is no need for DNNs since the cost of the classification task is negligible. However, clustering high-dimensional data leads to meaningless results due to the loss of contrast in pairwise distances. This problem is known as the curse of dimensionality [19] and appears naturally when dealing with fields defined on a finite-element mesh. To overcome this difficulty, dimensionality reduction techniques must be applied prior to clustering. Distances are calculated in the low-dimensional latent space, whose dimension is a hyperparameter. In this paper, principal component analysis (PCA) is applied for linear dimensionality reduction with 30 principal components, the dimension 30 being a compromise between large dimensions and low dimensions discarding too much information.
The values δ(T i , T j ) for two inputs T i , T j belonging to the same cluster are called intracluster ROM-oriented dissimilarities. The distributions of intra-cluster ROM-oriented dissimilarities for the ROM-net R K and temperature-based dictionaries are shown on Fig. 13. The distribution obtained with a temperature-based dictionary does not depend on the number of clusters, and coincides with the distribution of dissimilarities δ(T i , T j ) obtained without imposing the inputs T i , T j to belong to the same cluster.
This result shows that a distance on temperature fields does not lead to local ROMs for the Grassmann distance in the present application. However, in the context of ROM interpolation, the Grassmann distance was shown to be the adequate concept when manipulating ROMs [25,27,28]. Because of the complex interactions between the thermal and the mechanical loadings, direct clustering in the space of temperature fields is not appropriate for the mechanical problem presented in this paper. When using some dissimilarity measure δ for clustering in order to build a ROMdictionary, the comparison of the distribution of intra-cluster ROM-oriented dissimilarities with the global distribution of ROM-oriented dissimilarities can be used as a validation criterion. If these distributions are similar, it means that the dissimilarity measure δ does not provide local ROMs for the Grassmann distance. In this case, a dictionary-based ROM-net should be used with the ROM-oriented dissimilarity measure based on the Grassmann distance. Figure 14 gives another visualization of the results shown on Fig. 13. It illustrates clustering results obtained for a temperature-based dictionary with 13 clusters. Points belonging to the same cluster have the same color. The high dispersion of the points assigned to a given cluster proves that direct clustering of the input space does not lead to local ROMs for the Grassmann distance.
These promising results highlight the potential of dictionary-based ROM-nets. Once clusters have been defined, one can construct one local ROM for each cluster, like in [17,18] where small local ROMs outperform a single global ROM in terms of accuracy and speed. This study in the context of dictionary-based ROM-nets is underway.

Conclusion
The concept of ROM-net gives a general framework for reduced-order model adaptation using deep neural networks. In this article, the potential of dictionary-based ROM-nets has been illustrated on a mechanical problem with nonparametrized variabilities of the thermal loading. It has been shown that direct clustering of the input space may give clusters which cannot be exploited to define local reduced-order models. This issue can be avoided by defining a ROM-oriented dissimilarity involving the Grassmann metric on results of simplified numerical simulations. Online cluster assignment can be performed with a classifier based on deep neural networks to bypass numerical simulations, which reduces the computation time by a factor of 60.