 Research article
 Open Access
 Published:
Model order reduction assisted by deep neural networks (ROMnet)
Advanced Modeling and Simulation in Engineering Sciences volume 7, Article number: 16 (2020)
Abstract
In this paper, we propose a general framework for projectionbased model order reduction assisted by deep neural networks. The proposed methodology, called ROMnet, consists in using deep learning techniques to adapt the reducedorder model to a stochastic input tensor whose nonparametrized variabilities strongly influence the quantities of interest for a given physics problem. In particular, we introduce the concept of dictionarybased ROMnets, where deep neural networks recommend a suitable local reducedorder model from a dictionary. The dictionary of local reducedorder models is constructed from a clustering of simplified simulations enabling the identification of the subspaces in which the solutions evolve for different input tensors. The training examples are represented by points on a Grassmann manifold, on which distances are computed for clustering. This methodology is applied to an anisothermal elastoplastic problem in structural mechanics, where the damage field depends on a random temperature field. When using deep neural networks, the selection of the best reducedorder model for a given thermal loading is 60 times faster than when following the clustering procedure used in the training phase.
Introduction
Numerical simulations in physics have become an essential tool in many engineering domains. The development of highperformance computing has enabled engineers and scientists to use complex models for realworld applications, with ultrarealistic simulations involving millions of degrees of freedom. However, such simulations are too timeconsuming to be integrated in design iterations in the industry. They are usually limited to the final validation and certification steps, while the design process still relies on simplified models. Accelerating these complex simulations is a key challenge, as it would provide useful numerical tools to improve design processes. The development of numerical methods for fast simulations would also enable using new models that have not been applied to industrial problems yet, because of their complexities. Uncertainty quantification is another important example of analysis that would become practicable if the cost of simulations was sufficiently reduced. Indeed, quantities of interest monitored in numerical simulations depend on the environment of the physical system, which is usually not exactly known. In some cases, these uncertainties strongly influence simulation results, and the probability distributions of the quantities of interest must be estimated in order to ensure the reliability of the industrial product.
The cost of numerical simulations can be reduced by projectionbased model order reduction, which consists in restricting the search of the solution to a lowdimensional space spanned by a reducedorder basis. This reducedorder basis is inferred from a set of precomputed highfidelity solutions, using linear dimensionality reduction techniques such as the proper orthogonal decomposition (POD, [1,2,3]). In addition to this reduction in terms of degrees of freedom, a second reduction stage may be necessary for nonlinear problems, this time in terms of integration points. This is called hyperreduction, or operator compression according to the terminology introduced in [4]. Operator compression methods include the empirical interpolation method (EIM, [5]), the missing point estimation (MPE, [6]), the a priori hyperreduction (APHR, [7]), the best point interpolation method (BPIM, [8]), the discrete empirical interpolation method (DEIM, [9]), the GaussNewton with approximated tensors (GNAT, [10]), the eneryconserving sampling and weighting (ECSW, [11]), the empirical cubature method (ECM, [12]), and the linear program empirical quadrature procedure (LPEQP, [13]).
In this article, we consider uncertainties on a tensorial input variable which affects the solution of the partial differential equations (PDEs) governing the physical system. This input tensor can represent a threedimensional field, physical constants used in the constitutive equations, images of defects, a Xray computed tomography scan characterizing a microstructure, boundary conditions, or geometrical details. In the example presented at the end of this paper, the input tensor represents a threedimensional temperature field influencing the physical properties of the system. The objective is to accelerate numerical simulations where a quantity of interest highly depends on this stochastic input tensor subjected to nonparametrized variabilities. This objective can be achieved with a single reducedorder model, as long as uncertainties on the input tensor are small enough or have a minor impact on the quantity of interest. In other situations, the solution of the governing PDEs lies in a manifold which cannot be covered by a single reducedorder model without increasing its dimension and thus degrading its efficiency. For example, traditional model order reduction techniques fail to solve advectiondominated problems. These problems require more sophisticated techniques, such as those proposed in [14] or [15]. In structural mechanics, the fatigue lifetime assessment of highpressure turbine blades of an aircraft engine is very sensitive to variations of the temperature field, see [16]. Linear dimensionality reduction is not always suitable for this kind of applications. Nonetheless, linear methods have the critical advantage of being compatible with the Galerkin method, providing a reduced problem in the form of equations assembled on a reducedorder basis.
Many strategies have been proposed to address such problems. The concept of local reducedorder models was first introduced in [17], and applied to computational fluid dynamics in [18]. In these works, the set of precomputed highfidelity solutions was partitioned into several clusters, each of them being used to build a small clusterspecific reducedorder model. The resulting dictionary of local reducedorder models was then used to adapt the reducedorder basis to the current state of the solution by finding the closest cluster center. This technique works very well when the solution evolves on a lowdimensional manifold. However, for some specific applications where there is no guarantee that the solution lies in a lowdimensional manifold, this technique might be subjected to the curse of dimensionality [19]. Indeed, in high dimension, the nearest neighbour is almost as far as the furthest point in the dataset because of the loss of contrast in distances, explaining the difficulties of highdimensional clustering. Other approaches rely on the interpolation of reducedorder models. This trend was initiated by the subspace angle interpolation method [20,21,22,23,24]. A generalization of this method has been proposed in [25] with the ROM adaptation method based on the Grassmann manifold (also called Grassmannian). It has been successfully applied to the CFDbased aeroelastic analysis of a complete aircraft configuration for varying Mach number and angle of attack. In [26], this method has been improved to achieve realtime performance for linear problems. More recent works [27, 28] propose other interpolation methods on Grassmannians. All these interpolation methods give excellent results for problems depending on a small number of parameters, but none of them have yet been applied to nonparametrized variabilities of a large input tensor.
The generic methodology developed in this article, called ROMnet, is another attempt to deal with the limits of traditional tools available in the model order reduction literature. It relies on projectionbased model order reduction assisted by deep learning techniques. Our objective is to define a general framework for reducedorder model adaptation using deep neural networks, in order to see to what extent model order reduction can benefit from the recent advances in deep learning. Indeed, the growing interest for this discipline has led to the development of innovative methods in many fields. These advances have facilitated the development of surrogate models and datadriven approaches in physics, providing approximate solutions in real time. Other modeling strategies are based on a hybrid approach, mixing physicsbased modeling and machine learning. For example, deep neural networks were used in [29] to model the Reynolds stress anisotropy tensor in Reynolds Averaged Navier Stokes (RANS) models, in computational fluid dynamics. In [30], a nonlinear dimensionality reduction is performed using deep convolutional autoencoders. The modeling strategy presented in [31] was the first hybrid approach involving both a dictionary of local hyperreducedorder models and computer vision techniques. In the context of imagebased modeling, it showed that convolutional neural networks could be used to recognize the loading case of a mechanical experiment on a digital image, and select a suitable hyperreducedorder model to simulate the experiment.
The concept of ROMnet introduced in this article is an extension of the methodology presented in the aforementioned paper, designed to accelerate numerical simulations where a quantity of interest depends on a stochastic tensorial input. Dictionarybased ROMnets can be made of one or several deep neural networks selecting the best reducedorder model from a dictionary. In our case, unlike the strategy presented in [31], this dictionary derives from the clustering of outputs of simplified simulations using distances on a Grassmannian. In the first section of this article, we introduce the formal definitions of ROMnets and dictionarybased ROMnets. Then, we describe the training procedure for dictionarybased ROMnets. An application to a temperaturedependent problem in structural mechanics is presented in the last section of this paper, where we focus on the clustering procedure and the construction of a classifier for model selection.
ROMnets and dictionarybased ROMnets
Classical projectionbased model order reduction
Let us consider a physics problem whose primal variable of the governing equations is denoted by \(\mathbf {u}\) and defined on a domain \(\Omega \subset {\mathbb {R}}^{\beta }\), \(\beta \in \llbracket 1;3\rrbracket \) and on a (normalized) time interval [0; 1]. The governing PDEs are generally posed on an infinitedimensional Hilbert space, but in practice, these equations are solved numerically on a finitedimensional subspace, denoted by \({\mathcal {H}}\) in this article. In solid mechanics for example, \({\mathcal {H}}\) is the space spanned by finiteelement shape functions \(\left\{ \varvec{\phi }_{i} \right\} _{1 \le i \le \dim ({\mathcal {H}})}\), and the primal variable \(\mathbf {u}\) corresponds to the displacement field. The primal variable \(\mathbf {u}\) computed at the nth time step can be represented by a vector \(\mathbf {U}_{n}\in {\mathbb {R}}^{\dim ({\mathcal {H}})}\) containing its coordinates in the finiteelement basis \(\left\{ \varvec{\phi }_{i} \right\} _{1 \le i \le \dim ({\mathcal {H}})}\). In solid mechanics, this numerical solution can be obtained with the Newton–Raphson algorithm, an iterative procedure based on the linearization of the virtual work principle. The resulting linear system to be solved for the mth iteration at the nth time increment reads:
where \(\mathbf {J}_{n}^{(m)}\in {\mathbb {R}}^{\dim ({\mathcal {H}})\times \dim ({\mathcal {H}})}\) is the Jacobian matrix, also called (global) tangent stiffness matrix, \(\mathbf {R}_{n}^{(m)}\in {\mathbb {R}}^{\dim ({\mathcal {H}})}\) is the vector of residuals, and \(\delta \mathbf {U}_{n}^{(m)}\in {\mathbb {R}}^{\dim ({\mathcal {H}})}\) is the correction applied to the vector of increments of the primal variable defined by:
with \(\Delta \mathbf {U}_{n}^{(0)} = \mathbf {0}\). When the convergence criterion \( \mathbf {R}_{n}^{(m)}  \le \epsilon _{NR}  \mathbf {F}_{n}^{\text {ext}} \) is satisfied for a given \(m=m^{*}\) with \(\epsilon _{\text {NR}}\) being the tolerance of the Newton–Raphson algorithm and \(\mathbf {F}_{n}^{\text {ext}}\) being the vector of external forces, the solution at the nth time increment is defined as:
Equation (1) is the highdimensional linear system deriving from the highfidelity model composed of equilibrium, compatibility and constitutive equations.
Projectionbased model order reduction consists in searching an approximation of the highfidelity solution in a lowdimensional subspace \({\mathcal {V}}_{ROM} \subset {\mathcal {H}}\) adapted to the current physics problem. This subspace is spanned by an appropriate reducedorder basis \(\left\{ \varvec{\psi }_{i} \right\} _{1 \le i \le N}\), with N being very small compared to \(\dim ({\mathcal {H}})\). The reducedorder basis approximation of the primal variable reads:
where \(\left\{ \gamma _{i} \right\} _{1 \le i \le N}\) are the reduced coordinates which can be stored in a vector \(\varvec{\gamma }\in {\mathbb {R}}^{N}\). The coordinates of the modes \(\left\{ \varvec{\psi }_{i} \right\} _{1\le i \le N}\) in the finiteelement basis \(\left\{ \varvec{\phi }_{i} \right\} _{1 \le i \le \dim ({\mathcal {H}})}\) are stored in columns in a matrix \(\mathbf {V}\in {\mathbb {R}}^{\dim ({\mathcal {H}}) \times N}\) called reduction matrix. Hence:
These modes can be obtained by applying the POD [1] or the snapshot POD [2, 3] to a set of precomputed highfidelity solutions evaluated at different time steps or for different configurations of the physical system. After the Galerkin projection of the governing equations on \({\mathcal {V}}_{ROM}\), the reduced linear system to solve at each iteration of the Newton–Raphson algorithm in the reducedorder model (ROM) is then:
Any ordered set of \(k \le \dim ({\mathcal {H}})\) linearly independent vectors in \({\mathcal {H}}\) is called a kframe. The Stiefel manifold\(V(k,{\mathcal {H}})\) is the set of all orthonormal kframes in \({\mathcal {H}}\). In projectionbased model order reduction, reducedorder bases obtained by POD or snapshot POD belong to a Stiefel manifold. In this article, the notation \(V({\mathcal {H}})\) stands for the set of reducedorder bases:
The linear system (6) results from a first reduction stage in terms of degrees of freedom. For some nonlinear problems, a second reduction stage is required to efficiently decrease the computation time. This second reduction stage is referred to as hyperreduction or operator compression, as mentioned in the introduction. In this case, the definition of the set \(V({\mathcal {H}})\) can be extended to include sets of hyperreduction parameters in addition to the set of reducedorder bases.
ROMnets
Our objective is to predict a quantity of interest Y via the computation of a primal variable \(\mathbf {u}\) that belongs to a reduced approximation space and satisfies nonlinear physics equations depending on a stochastic input tensor X. In this article, \({\mathcal {X}}\) denotes the set of input variabilities and \({\mathcal {Y}}\) represents the set containing the quantity of interest. In structural mechanics, Y can represent a damage field, the von Mises stress in a zone of interest, or the displacement of a specific point in the structure, while X can stand for material constants, boundary conditions, geometrical parameters, a Xray computed tomography scan characterizing the microstructure, images of defects, or even a threedimensional field defined on the domain \(\Omega \) such as a temperature field, residual stresses, or heterogeneous material parameters. The only restriction on the input is that it must have a tensorial representation, that is, a representation as a multidimensional array. For instance, images are secondorder tensors or twodimensional arrays, Xray computed tomography scans are thirdorder tensors or threedimensional arrays, and fields discretized on a finiteelement mesh can be represented by firstorder tensors or onedimensional arrays. These tensorial inputs are stochastic because they contain the uncertainties on the physical system under study: when considering polycrystalline materials, Xray computed tomography scans could be used to study macroscopic properties under microstructural variabilities such as grains’ sizes, shapes and orientations. We refer the reader to [32, 33] for more details on finiteelement modeling based on Xray computed tomography scans. In the application presented at the end of this paper, X is the finiteelement discretization of a temperature field with variabilities evolving in \(L^{2}(\Omega )\). These stochastic variabilities may be related to turbulence in a fluidstructure interaction with a highReynoldsnumber fluid flow. In aircraft engines, the temperature field in highpressure turbine blades results from a complex turbulent flow coming from the combustion chamber. While the tensor X can be generated by a parametric stochastic model, it is assumed that we have no prior knowledge of the underlying model. Therefore, the proposed methodology is suitable for nonparametrized (or generic) input variabilities which can represent uncertainties on the environment of the physics problem. This feature is required when the method is trained on data simulated by a parametric model, but applied to real data with unknown distributions obtained from experimental measures or from a more complex model.
When the input \(X\in {\mathcal {X}}\) is modified, the primal variable \(\mathbf {u}\) evolves on a manifold \({\mathcal {M}}\). In some situations, it is complicated to build a relevant reducedorder model giving accurate predictions for the primal variable on the whole manifold. In such cases, predictions on the quantity of interest Y are inaccurate since they derive from the behavior of the primal variable. The reducedorder model must be adapted to the input to capture nonlinearities. In this paper, we propose a general framework for reducedorder model adaptation via deep learning algorithms.
Given two sets \({\mathcal {A}}\) and \({\mathcal {B}}\), the notation \({\mathcal {B}}^{\mathcal {A}}\) represents the set of functions \(f:{\mathcal {A}}\rightarrow {\mathcal {B}}\). Let us now give the definitions of a reducedorder solver and a ROMnet:
Definition 1
(Reducedorder solver) Let us consider a physics problem, where a quantity of interest \(Y\in {\mathcal {Y}}\) depends on a tensorial input variable \(X\in {\mathcal {X}}\). A reducedorder solver is an operator \({\mathcal {S}}:V({\mathcal {H}})\rightarrow {\mathcal {Y}}^{{\mathcal {X}}}\) taking a reducedorder model \(m\in V({\mathcal {H}})\) as an input and returning a predictor \({\mathcal {S}} [m] : {\mathcal {X}} \rightarrow {\mathcal {Y}} \) for the quantity of interest. Given \(X\in {\mathcal {X}}\), the quantity of interest Y can be approximated by:
\(\square \)
In this definition, the reducedorder model m consists in a reducedorder basis and, optionally, parameters related to a hyperreduction algorithm. The function \({\mathcal {S}}[m]\) can be seen as an operator solving the reduced linear system (6) and computing the quantity of interest associated to the reducedorder solution \(\mathbf {u}\).
Definition 2
(ROMnet) Let us consider a physics problem, where a quantity of interest \(Y\in {\mathcal {Y}}\) depends on a tensorial input variable \(X\in {\mathcal {X}}\) and can be predicted by a reducedorder solver \({\mathcal {S}}:V({\mathcal {H}})\rightarrow {\mathcal {Y}}^{{\mathcal {X}}}\). A ROMnet\({\mathcal {R}}:{\mathcal {X}}\rightarrow V({\mathcal {H}})\) is a deep learning algorithm returning a reducedorder model \({\mathcal {R}}(X)\in V({\mathcal {H}})\) adapted to the input \(X\in {\mathcal {X}}\). Given \(X\in {\mathcal {X}}\), the quantity of interest Y can be approximated by:
\(\square \)
Contrary to surrogate modeling, using a reducedorder model \({\mathcal {R}}(X)\) enables satisfying homogeneous Dirichlet boundary conditions and solving the constitutive equations at least at some specific points if operator compression (hyperreduction) is used. Hence, a ROMnet provides a hybrid approach mixing physicsbased modeling and deep learning. It is used as a reducedorder basis generator for complex problems where the reducedorder basis must be adapted to a tensorial input. In addition, it is noteworthy that the definition of the quantity of interest remains quite flexible after the training of a ROMnet. In solid mechanics for instance, the definition of the damage indicator of an uncoupled damage model can be changed without restarting the training phase. Figure 1 summarizes the concept of ROMnets.
Remark
In [30], another deep learning strategy for model order reduction is proposed for parametrized ordinary differential equations. The governing equations are mapped onto a nonlinear manifold thanks to a deep convolutional autoencoder. Contrary to projectionbased model order reduction methods using linear dimensionality reduction techniques such as the POD or the snapshot POD, this methodology performs a nonlinear dimensionality reduction. The reduced (or generalized) coordinates in the latent space defined by the autoencoder’s bottleneck layer are combined by the decoder in a nonlinear fashion to get the highdimensional state approximation. When the decoder is linear, this methodology is equivalent to classical projectionbased model order reduction. In the present paper, instead of looking for an approximate solution on a nonlinear trial manifold, ROMnets adapt the linear subspace to the input variability. As explained in the next section, dictionarybased ROMnets use several subspaces to get a piecewise linear approximation space, while the aforementioned methodology would have approximated the solution manifold by a single nonlinear manifold. The choice of keeping a linear method for dimensionality reduction is motivated by its compatibility with the Galerkin method, enabling an easy construction of a hyperreduced problem. \(\square \)
Dictionarybased ROMnets
When the solution manifold \({\mathcal {M}}\) is embedded in a lowdimensional vector space, one can construct a single global reducedorder model in order to compute approximate solutions of the physics problem for different input variabilities. When the solution manifold \({\mathcal {M}}\) is not embedded in a lowdimensional vector space, using one single global reducedorder model would result in either timeconsuming or inaccurate reduced simulations, depending on the number of modes selected in the reducedorder basis. By partitioning the set \({\mathcal {X}}\) of input variabilities, one can define a dictionary of local reducedorder models which enables approximating \({\mathcal {M}}\) by several affine subspaces. Clustering algorithms can be used to split the set \({\mathcal {X}}\) into distinct subsets called clusters. Inputs belonging to the same cluster lead to solutions which can be predicted with the same local reducedorder model because of their proximity on the manifold \({\mathcal {M}}\). More precisely, for a given integer \(K \in {\mathbb {N}}^{*}\), the clustering algorithm gives a partition of the set \({\mathcal {X}}\):
The dictionary of local reducedorder models contains K clusterspecific reducedorder models. Hence, for a given input \(X\in {\mathcal {X}}\), one must identify the corresponding cluster \({\mathcal {X}}_k\) to select the most appropriate reducedorder model.
Definition 3
(Dictionary of reducedorder models) Given an integer \(K \in {\mathbb {N}}^{*}\), an injective function \({\mathcal {D}}_{K}: \llbracket 1;K\rrbracket \rightarrow V({\mathcal {H}})\) is called a dictionary of reducedorder models of dimension K, or KROMdictionary. \(\square \)
Definition 4
(Dictionarybased ROMnet) Let us consider a physics problem, where a quantity of interest \(Y\in {\mathcal {Y}}\) depends on a tensorial input variable \(X\in {\mathcal {X}}\) and can be predicted by a reducedorder solver \({\mathcal {S}}:V({\mathcal {H}})\rightarrow {\mathcal {Y}}^{{\mathcal {X}}}\). Given an integer \(K \in {\mathbb {N}}^{*}\), a ROMnet \({\mathcal {R}}_{K}\) is a dictionarybased ROMnet if there exist a deep classifier \({\mathcal {F}}_{K} : {\mathcal {X}} \rightarrow \llbracket 1;K \rrbracket \) and a KROMdictionary \({\mathcal {D}}_{K}: \llbracket 1;K\rrbracket \rightarrow V({\mathcal {H}})\) satisfying:
\(\square \)
Figure 2 illustrates the concept of dictionarybased ROMnets. The strategy presented in [31] for imagebased modeling using convolutional neural networks and a dictionary of local reducedorder models fits the definition of a dictionarybased ROMnet. In this definition, the expression deep classifier denotes deep neural networks returning a single class label in \(\llbracket 1;K \rrbracket \) for a given tensorial input. In multiclass classification, deep classifiers usually have a softmax activation function in the output layer, giving an output vector \(\mathbf {y}^{\text {pred}}\in {\mathbb {R}}^{K}\) such that \(y_{k}^{\text {pred}}\) is the probability for the input tensor to belong to the kth class. The probabilities \(y_{k}^{\text {pred}}\) are also called membership probabilities. The deep classifier returns the integer corresponding to the class with the highest membership probability, that is:
Such classifiers are called classical deep classifiers in this article. The concept of deep classifier used in the definition of a dictionarybased ROMnet can be extended to include not only the classical ones, but also deep clustering algorithms [34,35,36]. These algorithms use encoders to cluster the data in a lowdimensional latent space, avoiding the difficulties of highdimensional clustering [19]
As mentioned earlier, the partition of the set \({\mathcal {X}}\) used to define clusterspecific reducedorder models is given by a clustering algorithm. Clustering algorithms generally rely on a dissimilarity measure quantifying the difference between two points in the dataset. In this paper, the expression dissimilarity measure refers to a pseudosemimetric:
Definition 5
(Dissimilarity measure) A dissimilarity measure on \({\mathcal {X}}\) is a function \(\delta : {\mathcal {X}} \times {\mathcal {X}} \rightarrow {\mathbb {R}}_{+}\) such that \(\delta (X,X')=\delta (X',X)\) for all \((X,X')\in {\mathcal {X}}^2\) and \(\delta (X,X) = 0\) for all \(X\in {\mathcal {X}}\). \(\square \)
Dictionarybased ROMnets involving classical deep classifiers
The rest of the article focuses on dictionarybased ROMnets involving a classical deep classifier. In this case, the deep classifier \({\mathcal {F}}_K\) solves a classical multiclass classification problem to recommend a suitable reducedorder model from the dictionary. Nevertheless, as the classes are given by a clustering algorithm, one could wonder why a deep neural network is used for cluster assignment. When using a centerbased clustering algorithm, each cluster \({\mathcal {X}}_k\) is represented by a center \({\tilde{X}}_{k}\). In theory, one could compute the dissimilarities between the new input tensor X and all the clusters’ representatives \({\tilde{X}}_{k}\), and then select the cluster with the smallest dissimilarity \(\delta (X,{\tilde{X}}_{k})\). However, this procedure is not reasonable when repeated many times, because of the computation time required to evaluate the dissimilarities. Indeed, as further explained in the next section, dissimilarity measures that are suitable for model order reduction applications may involve numerical simulations. Hence, the time saving obtained by model order reduction would be counterbalanced by the timeconsuming operations required for cluster assignment. The true classifier defined by:
is too expensive because it is based on numerical simulations. Note that \({\mathcal {K}}_{K}\) is not an artificial neural network. When using the ROMnet, the true classifier \({\mathcal {K}}_{K}\) is replaced by the approximate classifier \({\mathcal {F}}_K\) to bypass the computations required for cluster assignment. In the application presented at the end of this paper, replacing the true classifier by the approximate one enables fast ROM selection with a computation time reduced by a factor of 60. The next section gives some general guidelines for the training of a dictionarybased ROMnet.
Training procedure for dictionarybased ROMnets
Let K be a positive integer. In this section, we describe the training phase of a dictionarybased ROMnet \({\mathcal {R}}_{K}\) made of a classical deep classifier \({\mathcal {F}}_{K}\) and a KROMdictionary \({\mathcal {D}}_{K}\). First, an automatic data labelling procedure is presented. It aims at preparing the data for the supervised learning of the deep classifier \({\mathcal {F}}_{K}\), using simplified numerical simulations and a clustering algorithm. Then, we train the deep classifier \({\mathcal {F}}_{K}\) and build the KROMdictionary \({\mathcal {D}}_{K}\) on the grounds of the clusters identified by the labelling procedure. Figure 3 summarizes the main steps for the training of a dictionarybased ROMnet.
Automatic data labelling via clustering
ROMoriented dissimilarity measure
Clustering the input space \({\mathcal {X}}\) enables labelling the training data for the deep classifier, and guides the construction of the ROMdictionary by defining regions of the input space where highfidelity simulations must be run to build local reducedorder models. The key point is the choice of the dissimilarity measure for clustering. Indeed, clustering aims at grouping points of a dataset that are similar. As shown in the last section of this article, for some specific applications, defining a dissimilarity measure based on a distance between input tensors leads to inaccurate reducedorder solutions. Furthermore, the difficulties of highdimensional clustering appear when dealing with large input tensors.
In the application to structural mechanics presented at the end of this paper, this issue is due to the complex interaction between the thermal and the mechanical loadings. In this application, the input tensor X is a temperature field influencing the mechanical response of the material. To illustrate the aforementioned difficulty, let us imagine two temperature fields \(T_1\) and \(T_2\) taking different values only in a very small part \(\omega \) of the structure \(\Omega \). Let us introduce a third temperature field \(T_3\) being equal to \(T_1\) in \(\omega \) and taking arbitrary values in the rest of the solid domain. If \(\omega \) is a critical zone from a mechanical point of view, \(T_1\) and \(T_2\) might lead to dissimilar displacement fields, while \(T_3\) might give approximately the same displacement field as \(T_1\) (if thermal expansion is negligible with respect to the strains induced by the mechanical boundary conditions). In this case, \(T_1\) and \(T_3\) should be assigned to the same cluster, while \(T_2\) should be assigned to another one. However, taking the \(L^2\) distance between temperature fields as the dissimilarity measure would assign \(T_1\) and \(T_2\) to the same cluster, as these fields are identical in most of the solid domain.
Consequently, for such cases, one must define a ROMoriented dissimilarity measure accounting for the variability induced by the stochastic input tensor. In this paper, the dissimilarity is defined using the Grassmann distance [37] between subspaces spanned by outputs of a simplified physics problem. The simplified physics problem consists in computing a few time steps of the original problem with a less restrictive convergence criterion. The boundary conditions can even be simplified to facilitate convergence. The idea is to discover the subspace in which the solution evolves at the beginning of the simulation for a given input X. Two input variabilities leading to solutions lying in nearby subspaces (in terms of principal angles) are then considered as similar.
Before giving a formal definition of the ROMoriented dissimilarity measure, let us define some useful concepts. The Grassmannian \(\text {Gr}(k,n)\) is a Riemannian manifold whose points are all the kdimensional linear subspaces of \({\mathbb {R}}^{n}\). The infinite Grassmannian \(\text {Gr}(k,\infty )\) parametrizes the kdimensional subspaces in \({\mathbb {R}}^{n}\) for all \(n\ge k\). As shown in [37], one can define a distance between two subspaces of different dimensions. This distance is independent of the dimension of the ambient space. The Grassmann metric \(d_{\text {Gr}(\infty ,\infty )}\) is defined on the doublyinfinite Grassmannian \(\text {Gr}(\infty ,\infty )\), which parametrizes subspaces of all dimensions regardless of the ambient space [37]. In practice, this metric can be obtained with the following formula:
where \({\mathcal {A}}\) and \({\mathcal {B}}\) are two linear subspaces of dimension a and b respectively, and where the \(\alpha _i\)’s are the principal angles obtained by singular value decomposition:
with \(\mathbf {A}\) and \(\mathbf {B}\) being semiorthogonal matrices whose columns form orthonormal bases of \({\mathcal {A}}\) and \({\mathcal {B}}\) respectively, and \(\varvec{\Sigma }\in {\mathbb {R}}^{a\times b}\) is a matrix whose only nonzero coefficients are \(\Sigma _{ii}=\cos (\alpha _{i})\) for \(i\le \min (a,b)\). These coefficients are nonnegative, thus \(\alpha _{i}\in [0;\pi /2]\) for all i.
Let us introduce the application \({\mathcal {V}}:{\mathcal {X}}\rightarrow \text {Gr}(\infty ,\infty )\) assigning the input tensor \(X\in {\mathcal {X}}\) to the subspace \({\mathcal {V}}(X)\) spanned by the primal variable \(\mathbf {U}\) of the governing equations during the numerical simulation of the simplified physics problem:
with \(n_{t}\) being the number of time increments in the simplified simulation and \(\mathbf {U}_{n}(X)\) denoting the vector representation of the primal variable computed for the input X and evaluated at the nth time increment. The ROMoriented dissimilarity measure used for the clustering of \({\mathcal {X}}\) is written \(\delta \) and defined by:
The dissimilarity measure is computed for all pairs of inputs belonging to the training set, resulting in a dissimilarity matrix \(\varvec{\delta }\in {\mathbb {R}}^{n_{T}\times n_{T}}\), where \(n_{T}\) is the cardinality of the training set.
The clustering algorithm
Once the dissimilarity matrix is calculated, a clustering algorithm must be applied to partition the dataset into K clusters. The choice of the algorithm depends on the context. In this methodology, clustering is used for the definition of local approximations of a nonlinear solution manifold. Hence, the algorithm must focus on compactness rather than connectivity when looking for clusters in the dataset. This property is satisfied by kmeans algorithm [38], the most wellknown clustering approach. However, this algorithm needs to calculate clusters’ centroids or means, which is impossible when the input data correspond to vector spaces. For this reason, we choose the kmedoids algorithm presented in [39], relying on a Voronoi iteration approach like kmeans. The algorithm proposed in [39] can be summarized as follows:
Initialization step: select K initial medoids from the dataset.
Repeat the two following steps until convergence:
Data assignment step: assign each point of the dataset to the cluster corresponding to its closest medoid.
Medoid update step: for each cluster, update the medoid by finding the point which minimizes the sum of distances to all the points in the cluster.
The choice of the hyperparameter K depends on the problem. More details concerning this aspect of the methodology can be found in the final section of this article.
Construction of the ROMnet
Training of a deep classifier for fast model selection
The ROMnet’s classifier \({\mathcal {F}}_{K}\) is trained in a supervised fashion from pairs of examples \((X_i, {\mathcal {K}}_{K}(X_i))\) given by the clustering algorithm, where the label \({\mathcal {K}}_{K}(X_i)\in \llbracket 1;K \rrbracket \) is the index of the cluster containing \(X_i\). As explained earlier in the article, the true classifier defined by:
is too expensive because it is based on numerical simulations, which motivates the use of an approximate classifier. In the previous equation, \({\tilde{X}}_{k}\) is the medoid of the kth cluster.
The dataset is split into a training, a validation and a test set. For a given deep neural network architecture and for a given set of hyperparameters, the parameters of the classifier \({\mathcal {F}}_{K}\) are calibrated on the training set via backpropagation with Adam optimizer [40]. The accuracy of the calibrated classifier is evaluated on the validation set. The classifier is calibrated with different architectures and hyperparameters settings until the accuracy on the validation set reaches a satisfying value. Once the best architecture and set of hyperparameters have been selected, the calibrated classifier is evaluated on the test set to get the accuracy of the model for new unseen data. When the input X is an image, one could use wellknown convolutional neural networks’ architectures and finetune their pretrained parameters to adapt the model to the current data, which is a common transfer learning technique [41].
Construction of a ROMdictionary
Contrary to the classifier, the ROMdictionary \({\mathcal {D}}_{K}\) is trained in an unsupervised fashion. Clustering results help for the selection of datapoints in \({\mathcal {X}}\) for which highfidelity simulations must be run. The solutions computed at every time step of these highfidelity simulations are called snapshots. Selecting clusters’ medoids as simulation points for snapshots is recommended, since the clusters are represented by their medoids. Additional snapshots can be computed if necessary. For each cluster, the snapshot POD is applied to the set of snapshots to obtain a local reducedorder basis.
Application to an anisothermal elastoplastic problem in structural mechanics
In this section, a temperaturedependent problem in structural mechanics is considered. Our objective is to study the influence of thermal loading uncertainties on a mechanical quantity of interest such as a damage field. The input tensor corresponds to the final temperature field in the structure, defined by a truncated Gaussian field. The quantity of interest is a damage indicator based on the accumulated plastic strain field and defined on the whole structure.
The highfidelity model
Let us consider the solid body \(\Omega \) shown on Fig. 4. The heat produced by mechanical phenomena is neglected, which enables solving the heat equation and then use the resulting temperature field history as a thermal loading for the mechanical problem. The structure is subjected to a displacementcontrolled monotonic loading. Assuming a quasistatic evolution, equilibrium equations at the local level and boundary conditions read:
The structure is made of an elastoplastic generalized standard material described by the von Mises yield criterion and a nonlinear isotropic hardening law. In the framework of the infinitesimal strain theory, the constitutive equations are:
Hooke’s law:
$$\begin{aligned} \varvec{\sigma } = {\mathbb {C}} : (\, \varvec{\varepsilon }  \varvec{\varepsilon }^{p}  \alpha (\,TT_{0})\mathbf {1} )\, \end{aligned}$$(22)von Mises yield criterion with isotropic hardening:
$$\begin{aligned} f(\,\varvec{\sigma },R)\, = \sigma _{\text {eq}}(\varvec{\sigma })R\sigma _{y} \qquad \sigma _{\text {eq}}(\varvec{\sigma })=\sqrt{\frac{3}{2} \mathbf {s}:\mathbf {s}} \qquad \mathbf {s} = \varvec{\sigma }  \frac{1}{3} tr(\varvec{\sigma })\mathbf {1} \end{aligned}$$(23)Nonlinear isotropic hardening law (with p denoting the accumulated plastic strain):
$$\begin{aligned} R(p) = R_{\infty }(1\exp (bp)) \end{aligned}$$(24)Flow rule for the plastic strain rate tensor:
$$\begin{aligned} \dot{\varvec{\varepsilon }}^{p} = \frac{3}{2} {\dot{p}} \frac{\mathbf {s}}{\sigma _{\text {eq}}(\varvec{\sigma })} \end{aligned}$$(25)Karush–Kuhn–Tucker conditions:
$$\begin{aligned} {\dot{p}} \ge 0, \quad f \le 0, \quad {\dot{p}} f = 0 \end{aligned}$$(26)Consistency condition for the determination of the plastic multiplier:
$$\begin{aligned} {\dot{p}} {\dot{f}} = 0 \end{aligned}$$(27)
Under the assumption of isotropic elasticity, the fourthorder elastic stiffness tensor \({\mathbb {C}}\) can be decomposed as follows:
where E is the Young’s modulus, \(\nu \) is the Poisson’s ratio, and \({\mathbb {K}}\) (resp. \({\mathbb {J}}\)) is the projector onto the space of deviatoric (resp. spherical) secondorder tensors. Material constants \(E,\nu , \alpha ,\sigma _{y}, R_{\infty }\) and b generally depend on the temperature. For this application, temperaturedependent coefficients \(E,\alpha ,\sigma _{y}\) are taken from experimental data on high strength structural steels for fireresistant structures [42]. The other material parameters are taken as constants. The thermal loading applied to the structure is defined by:
with \(T_0 = 22^oC\). The field \(T_{\text {max}}(\mathbf {x})\) will be replaced by a random temperature field to account for uncertainties on the thermal loading, while the mechanical loading is deterministic. Hence, for this study, the stochastic input X is a tensorial representation of the random temperature field obtained at \(t=1\). Let us consider a simple damage indicator \(D:\Omega \times [0;1] \rightarrow [0;1]\) defined by:
where \(p_{f}\) is the material’s plastic strain to failure. A crack initiates at \(\mathbf {x}_{0}\in \Omega \) if \(D(\mathbf {x}_{0},t)\) reaches the value 1 for some \(t\in [0;1]\). The quantity of interest for this application is the field \(Y=D(\ . \ ,1):\Omega \rightarrow [0;1]\).
The highfidelity mechanical problem is solved using the finiteelement method. Numerical simulations are performed with Zset software [43].
Stochastic model for the thermal loading
A training set of random thermal loadings is generated using the stochastic model described in Appendix A. Briefly speaking, the stochastic model draws random combinations of fluctuation modes which are superposed with a reference temperature field \(T_{\text {ref}}:\Omega \rightarrow {\mathbb {R}}\). In Eq. (29), the field \(T_{\text {max}}\) is replaced by the resulting random temperature fields, which gives the random thermal loadings. The reference temperature field defines the mean thermal loading. For this application, the reference temperature field is uniform at \(650^{o}C\). For realworld applications, the reference temperature field can be given by aerothermal simulations. Random temperature fields are denoted by \(T_{\text {rand}}:\Omega \times \Theta \rightarrow {\mathbb {R}}\), where \(\Theta \) is the sample space of the probability space associated to the stochastic model described in Appendix A.
Generating a database of \(n_{T} = 10^{4}\) random temperature fields \({T_{\text {rand}}(.,\theta _{i}):\Omega \rightarrow {\mathbb {R}}}\) for \( i \in \llbracket 1;n_{T}\rrbracket \) from 150 fluctuation modes takes 17 min on one single computer thread. The standard deviation of random fluctuations is \(50^{o}C\). The database \({\mathcal {T}}\) of thermal loading variabilities can then be defined as (see Fig. 5):
Remark
When training the ROMnet, it is assumed that we have no prior knowledge of the datagenerating stochastic model. Training data may come from complex aerothermal simulations with random boundary conditions defined by experts. In the present case, in the absence of such data, we use a stochastic model to generate the training data. This stochastic model is parametric, since every random temperature field comes from a random linear combination of 150 fluctuation modes. However, when training the deep classifier for local ROM recommendation, the training data corresponds to nodal values of the temperature fields rather than the 150 random coordinates. This way, in the test phase (or online phase in the model order reduction community), the ROMnet can be applied to thermal loadings coming from unknown stochastic models, complex aerothermal simulations or even experiments, as long as these thermal loadings correspond to perturbations of the reference thermal loading. Hence, the ROMnet can deal with nonparametrized variabilities of the input data. \(\square \)
Figure 6 illustrates the influence of the thermal loading on the damage field computed with the highfidelity model. The fields on the left represent two different temperature fields reached at \(t=1\) in the simulation. The critical zone is located around the first column of holes on the lefthand side of the structure. The second temperature field (case B) on the figure takes high values in this zone, leading to high values of the damage indicator. On the contrary, values taken by the first temperature field (case A) in this zone are relatively small. The resulting damage field takes smaller values in the first column of holes. One can observe that the shapes of damaged zones in A and B are not the same. In addition, the second column of holes is a bit damaged in A, while this region remains undamaged in B.
Construction of a dictionarybased ROMnet
To compute the damage field as a function of the thermal loading in a reasonable time, a dictionarybased ROMnet \({\mathcal {R}}_K\) based on a classical deep classifier \({\mathcal {F}}_{K}\) is used. In this paper, we focus on the training of the deep classifier \({\mathcal {F}}_{K}\).
Computation of the ROMoriented dissimilarity matrix
For every thermal loading, we solve a simplified mechanical problem which is similar to the original one, with a less restrictive tolerance for Newton–Raphson’s algorithm to reduce the number of required time steps. For the sake of simplicity, only two displacement fields of the simplified simulation are kept: one in the elastic regime, and one in the plastic regime. The space \({\mathcal {V}}(T)\) is the 2dimensional space spanned by these fields. The solutions at the other time steps are discarded, but computing them is necessary to ensure the convergence of all the simplified simulations. The \(10^{4} \times 10^{4}\) matrix \(\varvec{\delta }\) is defined as the dissimilarity matrix whose coefficients are:
The \(10^{4}\) simplified mechanical simulations are distributed between 84 computer threads. The total computation time is \(9\text {h}05\text {min}\), which represents \(5\text {min}\) per simulation. Once the simplified simulations are done, computing the dissimilarity matrix takes \(11\text {h}16\text {min}\), if its coefficients are distributed between 48 computer threads. Note that less than half of the coefficients are calculated, since the dissimilarity matrix is symmetric with zeros on its diagonal.
kmedoids clustering
The ROMoriented dissimilarity is used to cluster the data with the kmedoids algorithm. The number K of clusters is a hyperparameter provided by the user. Numerous empirical methods have been proposed to estimate the best number of clusters for different criteria. In our case, the dataset is not organized in distinct clusters. Therefore, the clustering algorithm is applied for different values of K, and the quality of clustering results is evaluated using silhouette analysis [44]. When selecting the most appropriate number of clusters based on silhouette analysis, one must also consider the tradeoff between large numbers giving a better approximation of the nonlinear manifold, and small numbers facilitating the classification problem for the deep neural network. In addition, using a large number of clusters increases the cost of the construction of the ROMdictionary. For the current problem, \(K=6\) has been identified as a good compromise. A single run of the clustering algorithm takes about 10 s. The true classifier associated to this clustering procedure is denoted by \({\mathcal {K}}_K\).
Visualization of clusters on the nonlinear manifold
The clustering results can be visualized thanks to Multidimensional Scaling (MDS) [45]. MDS is an information visualization method which consists in finding a lowdimensional dataset \(\mathbf {Z}_{0}\) whose matrix of Euclidean distances \(\mathbf {d}(\mathbf {Z}_{0})\) is an approximation of the input dissimilarity matrix \(\varvec{\delta }\). To that end, a cost function called stress function is minimized with respect to \(\mathbf {Z}\):
This minimization problem is solved with the algorithm Scaling by MAjorizing a COmplicated Function (SMACOF, [46]) implemented in Scikitlearn [47]. Figure 7 shows clustering results with lowdimensional representations obtained by metric MDS. The relative error \(\varsigma (\mathbf {Z}_{0};\varvec{\delta })/\varsigma (\mathbf {0};\varvec{\delta })\) is \(11\%\) for the 2D representation and \(10\%\) for the 3D representation. These visualizations illustrate the nonlinear manifold on which solutions of the mechanical problem evolve when changing the thermal loading. Note that the positions of the clusters’ labels coincide with the medoids.
Multiclass classification
Classifier based on an ensemble of DNNs. Clustering results obtained with the true classifier \({\mathcal {K}}_K\) enable training the approximate classifier \({\mathcal {F}}_{K}\) of the ROMnet \({\mathcal {R}}_K\). Among the \(10^4\) thermal loadings in the dataset, 6400 are used as training data, 1600 as validation data, and 2000 as test data for final evaluation. As the data labelling procedure involves numerical simulations, the dataset contains a limited amount of training examples in comparison with standard image classification problems. Working on a small dataset makes our deep classifier prone to overfitting. To address this issue, we use the ensemble averaging method [48], which consists in taking an ensemble of classifiers and averaging their predicted membership probabilities. Ensemble averaging is a common technique in ensemble learning. Generally speaking, ensemble methods aim at creating a metaestimator from several base estimators (or models). Combining different estimators leads to more robust predictions and reduces overfitting. In addition, using an ensemble method replaces the task of finding a single very accurate model by the task of building an effective metaestimator from several models with lower accuracies. The winners of numerous deep learning challenges used ensemble averaging to improve their predictions [49,50,51]. Our ensemble contains \(N_{\text {models}}=12\) different DNNs trained for the same classification problem with the same training data using Keras [52] and Tensorflow [53] libraries on Python, but with different architectures and loss functions. All of them use the softmax activation function to get membership probabilities. These predictions are combined in a soft voting scheme to give the final prediction:
with \(y_{l}^{k}(T)\) denoting the membership probability of class l predicted by the kth model. The 12 models in the ensemble include fullyconnected networks (FC), convolutional neural networks (CNN) [54] and global average pooling convolutional neural networks (GAPCNN) [55]. These DNNs are trained with different loss functions, namely crossentropy, balanced crossentropy to handle class imbalance, and the focal loss [56] which enables focusing more on misclassified data. Using an ensemble enables recycling the best DNNs obtained during training, and overcoming some weaknesses of every single model in the ensemble. Training one of the DNNs used in this ensemble on a Nvidia Quadro K5200 GPU takes about \(2\text {h}\) on average in our case.
An important aspect of our classification problem is the preprocessing step, in which thermal loadings are prepared to be fed into neural networks. Thermal loadings in \({\mathcal {T}}\) are represented by their corresponding random temperature fields reached at \(t=1\). These temperature fields are projected onto a \(38 \times 17 \times 4\) regular grid defined on a bounding box surrounding the solid body, see Fig. 8. For the grid’s vertices being inside \(\Omega \), the value of the temperature is evaluated using the finiteelement shape functions, while vertices being outside \(\Omega \) are assigned a zero value. This procedure gives a 3D bitmap image of the temperature field, represented by a thirdorder tensor. Thanks to this tensorial representation, 3D convolutional filters can now be applied to extract features of the input data, like 2D convolutional filters do for image analysis. For fullyconnected networks, although the thirdorder tensor is flattened, the projection on the grid acts as a subsampling procedure. This preprocessing operation takes about 3 min when the \(10^{4}\) fields are distributed between 280 computer threads, which means that the projection of a single field takes 5 s.
Analysis of classification results In the present study, when evaluated on the test set, the ensemble of DNNs reaches an accuracy of \(80\%\), whereas accuracies of its base classifiers range from 63.05 to \(73.75\%\), see Table 1. As expected, ensemble averaging reduces overfitting and thus improves the ability of the classifier \({\mathcal {F}}_{K}\) to generalize to new unseen data. Table 2 summarizes the values of precision, recall and F1score. Figure 9 gives the confusion matrix, whose coefficient (i, j) is the percentage of examples of class i being assigned to class j. This matrix is diagonally dominant here, indicating that the predicted class usually corresponds to the true class. Because of the elongated aspect of the dataset and of the numerotation adopted here, the tridiagonal aspect of the confusion matrix indicates that misclassified examples are assigned to neighbouring clusters. This result can also be observed when visualizing misclassified examples on MDS plots, see Figs. 10, 11 and 12. On these figures, one can clearly see that misclassified examples of class i are mostly located close to the border between the ith cluster and its neighbours. This is actually a nice property, because it means that when the ensemble fails to select the appropriate reducedorder model in the dictionary, it returns a reducedorder model that covers a part of the manifold that is close to the target one.
Evaluation of the methodology and discussion
Let us quickly summarize what has been done up to that point. A dataset of \(10^4\) thermal loadings has been generated with a stochastic model. Its ROMoriented dissimilarity matrix has been computed. Based on this matrix, the dataset has been split into \(K=6\) clusters with kmedoids clustering algorithm. An ensemble of DNNs has been trained to assign new unseen thermal loadings to the best clusters using the ensemble averaging method.
When considering a new thermal loading, true cluster assignment requires one simplified simulation (5 min) and the computation of six Grassmann distances (less than 1 s). On the other hand, when using the deep classifier \({\mathcal {F}}_{K}\), preprocessing operations take 5 s and the evaluation of the deep classifier is quasiinstantaneous. Hence, the computation time for the selection of the best reducedorder model is decreased by a factor of 60 when using the ROMnet’s deep classifier \({\mathcal {F}}_{K}\) instead of the true classifier \({\mathcal {K}}_K\).
The clustering of the thermal loading database \({\mathcal {T}}\) with the ROMoriented dissimilarity has defined 6 clusters which can be used to construct a dictionary of 6 local ROMs. Let us compare our methodology with another approach consisting in the construction of a temperaturebased ROMdictionary\({\mathcal {D}}_{T}\). Such a dictionary comes from a direct clustering of the input space, that is, a kmedoids clustering of \({\mathcal {T}}\) using distances between the temperature fields evaluated at \(t=1\). This clustering only considers the input data and does not use simplified simulations to account for mechanical phenomena. Hence, cluster assignment is directly obtained by taking the minimum distance to clusters’ medoids. In this case, there is no need for DNNs since the cost of the classification task is negligible. However, clustering highdimensional data leads to meaningless results due to the loss of contrast in pairwise distances. This problem is known as the curse of dimensionality [19] and appears naturally when dealing with fields defined on a finiteelement mesh. To overcome this difficulty, dimensionality reduction techniques must be applied prior to clustering. Distances are calculated in the lowdimensional latent space, whose dimension is a hyperparameter. In this paper, principal component analysis (PCA) is applied for linear dimensionality reduction with 30 principal components, the dimension 30 being a compromise between large dimensions and low dimensions discarding too much information.
The values \(\delta (T_{i},T_{j})\) for two inputs \(T_{i},T_{j}\) belonging to the same cluster are called intracluster ROMoriented dissimilarities. The distributions of intracluster ROMoriented dissimilarities for the ROMnet \({\mathcal {R}}_K\) and temperaturebased dictionaries are shown on Fig. 13. The distribution obtained with a temperaturebased dictionary does not depend on the number of clusters, and coincides with the distribution of dissimilarities \(\delta (T_{i},T_{j})\) obtained without imposing the inputs \(T_{i},T_{j}\) to belong to the same cluster.
This result shows that a distance on temperature fields does not lead to local ROMs for the Grassmann distance in the present application. However, in the context of ROM interpolation, the Grassmann distance was shown to be the adequate concept when manipulating ROMs [25, 27, 28]. Because of the complex interactions between the thermal and the mechanical loadings, direct clustering in the space of temperature fields is not appropriate for the mechanical problem presented in this paper.
When using some dissimilarity measure \(\delta '\) for clustering in order to build a ROMdictionary, the comparison of the distribution of intracluster ROMoriented dissimilarities with the global distribution of ROMoriented dissimilarities can be used as a validation criterion. If these distributions are similar, it means that the dissimilarity measure \(\delta '\) does not provide local ROMs for the Grassmann distance. In this case, a dictionarybased ROMnet should be used with the ROMoriented dissimilarity measure based on the Grassmann distance.
Figure 14 gives another visualization of the results shown on Fig. 13. It illustrates clustering results obtained for a temperaturebased dictionary with 13 clusters. Points belonging to the same cluster have the same color. The high dispersion of the points assigned to a given cluster proves that direct clustering of the input space does not lead to local ROMs for the Grassmann distance.
These promising results highlight the potential of dictionarybased ROMnets. Once clusters have been defined, one can construct one local ROM for each cluster, like in [17, 18] where small local ROMs outperform a single global ROM in terms of accuracy and speed. This study in the context of dictionarybased ROMnets is underway.
Conclusion
The concept of ROMnet gives a general framework for reducedorder model adaptation using deep neural networks. In this article, the potential of dictionarybased ROMnets has been illustrated on a mechanical problem with nonparametrized variabilities of the thermal loading. It has been shown that direct clustering of the input space may give clusters which cannot be exploited to define local reducedorder models. This issue can be avoided by defining a ROMoriented dissimilarity involving the Grassmann metric on results of simplified numerical simulations. Online cluster assignment can be performed with a classifier based on deep neural networks to bypass numerical simulations, which reduces the computation time by a factor of 60.
Availability of data and materials
Not applicable.
Abbreviations
 CFD:

Computational fluid dynamics
 CNN:

Convolutional neural network
 DNN:

Deep neural network
 FC:

Fullyconnected
 GAP:

Global average pooling
 GPU:

Graphics processing unit
 HROM:

Hyperreducedorder model
 MDS:

Multidimensional scaling
 PCA:

Principal component analysis
 PDE:

Partial differential equation
 POD:

Proper orthogonal decomposition
 ROM:

Reducedorder model
References
Lumley J. The structure of inhomogeneous turbulent flows. Atm Turb Radio Wave Prop. 1967;1967:166–78.
Sirovich L. Turbulence and the dynamics of coherent structures, Parts I. II and III. Q Appl Math. 1987;45:561–90.
Chatterjee A. An introduction to the proper orthogonal decomposition. Curr Sci. 2000;78:808–17.
Casenave F, Akkari N, Bordeu F, Rey C, Ryckelynck D. A nonintrusive distributed reducedorder modeling framework for nonlinear structural mechanics–application to elastoviscoplastic computations. Int J Numer Methods Eng. 2020;121(1):32–53.
Barrault M, Maday Y, Nguyen NC, Patera AT. An ’empirical interpolation’ method: application to efficient reducedbasis discretization of partial differential equations. Compt Rendus Mathemat. 2004;339(9):666–72.
Astrid P, Weiland S, Willcox K, Backx T. Missing point estimation in models described by proper orthogonal decomposition. Proc IEEE Conf Decis Control. 2005;53(10):1767–72.
Ryckelynck D. A priori hypereduction method: an adaptive approach. J Comput Phys. 2005;202(1):346–66.
Nguyen NC, Patera AT, Peraire J. A best points interpolation method for efficient approximation of parametrized functions. Internat J Numer Methods Engrg. 2008;73:521–43.
Chaturantabut S, Sorensen D. Discrete empirical interpolation for nonlinear model reduction. Decision and Control. In: proceedings of the 48th IEEE Conference 2009 held jointly with the 2009 28th Chinese control conference, CDC/CCC 2009. 2010; pp 4316–21.
Carlberg K, Farhat C, Cortial J, Amsallem D. The GNAT method for nonlinear model reduction: Effective implementation and application to computational fluid dynamics and turbulent flows. J Comput Phys. 2013;242:623–47.
Farhat C, Avery P, Chapman T, Cortial J. Dimensional reduction of nonlinear finite element dynamic models with finite rotations and energybased mesh sampling and weighting for computational efficiency. Int J Numer Methods Eng. 2014;98(9):625–62.
Hernandez JA, Caicedo MA, Ferrer A, Cortial J. Dimensional hyperreduction of nonlinear finite element models via empirical cubature. Computer methods in applied mechanics and engineering. 2017;313:687–722.
Yano M, Patera AT. An LP empirical quadrature procedure for reduced basis treatment of parametrized nonlinear PDEs. Comput Methods Appl Mech Eng. 2018;344:1104–23.
Iollo A, Lombardi D. Advection modes by optimal mass transfer. Phys Rev E. 2014;89:022923. https://doi.org/10.1103/PhysRevE.89.022923.
Cagniart N, Maday Y, Stamm B. Model order reduction for problems with large convection effects. In: Chetverushkin B, Fitzgibbon W, Kuznetsov Y, Neittaanmäki P, Periaux J, Pironneau O, editors. Contributions to partial differential equations and applications. Computational methods in applied sciences, vol. 47. Berlin: Springer; 2019.
Casenave F, Akkari N. An error indicatorbased adaptive reduced order model for nonlinear structural mechanics—application to highpressure turbine blades. Math Comput Appl. 2019;24:2.
Amsallem D, Zahr M, Farhat C. Nonlinear model order reduction based on local reducedorder bases. Int J Numer Methods Eng. 2012;92:1–31.
Washabaugh K, Amsallem D, Zahr M, Farhat C. Nonlinear model reduction for CFD problems using local reduced order bases. In: 42nd AIAA fluid dynamics conference. 2012. https://doi.org/10.2514/6.20122686
Bellman RE. Adaptive control processes. Princeton: Princeton University Press; 1961.
Lieu T, Lesoinne M. Parameter adaptation of reduced order models for threedimensional flutter analysis. AIAA Paper. 2004;2004:888.
Lieu T, Farhat C, Lesoinne M. PODbased aeroelastic analysis of a complete F16 configuration: ROM adaptation and demonstration. AIAA Paper. 2005;2005:2295.
Lieu T, Farhat C. Adaptation of PODbased aeroelastic ROMs for varying Mach number and angle of attack: application to a complete F16 configuration. AIAA Paper. 2005;2005:7666.
Lieu T, Farhat C, Lesoinne M. Reducedorder fluid/structure modeling of a complete aircraft configuration. Comput Methods Appl Mech Eng. 2006;195:5730–42.
Lieu T, Farhat C. Adaptation of aeroelastic reducedorder models and application to an F16 configuration. AIAA J. 2007;45:1244–57.
Amsallem D, Farhat C. Interpolation method for adapting reducedorder models and application to aeroelasticity. AIAA J. 2008;46(7):1803–13.
Amsallem D, Farhat C. An online method for interpolating linear parametric reducedorder models. SIAM J Sci Comput. 2011;33(5):2169–98. https://doi.org/10.1137/100813051.
Mosquera R, Hamdouni A, El Hamidi A, Allery C. POD basis interpolation via Inverse distance weighting on Grassmann manifolds. Discr Contin Dyn Syst Series S. 2018;12(6):1743–59.
Mosquera R, El Hamidi A, Hamdouni A, Falaize A. Generalization of the NevilleAitken interpolation algorithm on Grassmann manifolds: applications to reduced order model. 2019. https://arxiv.org/pdf/1907.02831.pdf.
Ling J, Templeton J, Kurzawski A. Reynolds averaged turbulence modeling using deep neural networks with embedded invariance. J Fluid Mech. 2016;807:155–66.
Lee K, Carlberg K. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. 2019. arxiv:1812.08373.
Nguyen F, Barhli SM, Munoz DP, Ryckelynck D. Computer vision with error estimation for reduced order modeling of macroscopic mechanical tests. Complexity. 2018;. https://doi.org/10.1155/2018/3791543.
Proudhon H, Moffat A, Sinclair I, et al. Threedimensional characterisation and modelling of small fatigue corner cracks in high strength Alalloys. Comput Rendus Phys. 2012;13:316–27. https://doi.org/10.1016/j.crhy.2011.12.005.
Buljac A, Shakoor M, Neggers J, Bernacki M, Bouchard PO, Helfen L, Morgeneyer TF, Hild F. Numerical validation framework for micromechanical simulations based on synchrotron 3D imaging. Comput Mech. 2017;59:419–41.
Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML’16, 478487 (2016)
Guo X, Gao L, Liu X, Yin J. Improved deep embedded clustering with local structure preservation. In: Proceedings of IJCAI’17. 2017. 1753–59.
MoradiFard M, Thonet T. Deep kmeans: jointly clustering with kmeans and learning representations. 2018. arxiv:1806.10069.
Ye K, Lim LH. Schubert varieties and distances between subspaces of different dimensions. SIAM J Matrix Anal Appl. 2016;37(3):1176–97.
MacQueen JB. Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability. 1967;1:281–97.
Park HS, Jun CH. A simple and fast algorithm for kmedoids clustering. Expert Syst Appl. 2009;36:3336–41.
Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arxiv:1412.6980.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22:1345–59. https://doi.org/10.1109/TKDE.2009.191.
Chen J, Young B, Uy B. Behavior of high strength structural steel at elevated temperatures. J Struct Eng. 2006;132(12):1948–54.
Mines ParisTech and ONERA the French aerospace lab. Zset: nonlinear material & structure analysis suite. http://www.zsetsoftware.com (1981present.)
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
Borg I, Groenen P. Modern multidimensional scaling—theory and applications. 2nd ed. Berlin: Springer; 2005.
de Leeuw J. Applications of convex analysis to multidimensional scaling. In: Barra JR, Brodeau F, Romier G, van Cutsem B, editors. Recent developments in statistics. Berlin: Springer; 1977. p. 133–45.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikitlearn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
Haykin S. Neural networks—a comprehensive foundation. Second edition. 1999;351–91.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in neural information processing systems 25. Curran Associates Inc; 2012. p. 1097–105. http://papers.nips.cc/paper/4824imagenetclassificationwithdeepconvolutionalneuralnetworks.pdf.
Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. 2014. arxiv:1409.1556
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR). 2016.
Chollet F, et al. Keras. 2015. https://keras.io
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X. TensorFlow: LargeScale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. 2015. https://www.tensorflow.org/
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T. Recent advances in convolutional neural networks. Patter Recogn. 2018;77:354–77.
Lin M, Chen Q, Yan S. Network in network. CoRR abs/1312.4400. 2013.
Lin T, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In: IEEE transactions on pattern analysis and machine intelligence. 2018.
Scarth C, et al. Random field simulation over curved surfaces: Applications to computational structural mechanics. Comput Methods Appl Mech Engrg. 2018;. https://doi.org/10.1016/j.cma.2018.10.026.
Mitchell JSB, Mount DM, Papadimitriou CH. The discrete geodesic problem. SIAM J Comput. 1987;16(4):647–68.
Surazhsky V, Surazhsky T, Kirsanov D, Gortler SJ, Hoppe H. Fast exact and approximate geodesics on meshes. ACM Trans Graph. 2005;24(3):553–60.
Kirsanov D, Malhotra G, Knock S. gdist 1.0.3. https://pypi.org/project/gdist/. 2013.
Acknowledgements
The authors wish to thank Felipe Bordeu (SafranTech) and Julien Cortial (SafranTech), who implemented the Python library BasicTools (https://gitlab.com/drti/basictools) with FC.
Funding
Study funded by Safran and ANRT (Association Nationale de la Recherche et de la Technologie).
Author information
Affiliations
Contributions
The methodology was imagined by TD, FC, NA and DR. TD developed the algorithm with the valuable suggestions of DR, FC and NA. The algorithm was implemented by TD with the help of FC. The manuscript was written by TD and reviewed by DR, FC and NA. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The authors approve the Journal’s ethics policy and consent to participate.
Consent for publication
The authors give their consent for publication.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Stochastic model for the thermal loading
Appendix A: Stochastic model for the thermal loading
In the training procedure, uncertainties on the thermal loading are represented by a stochastic model generating zeromean fluctuations around a reference temperature field \(T_{\text {ref}}:\Omega \rightarrow {\mathbb {R}}\). Random temperature fields are generated following a threestep procedure based on the assumption of a linear thermal behavior:
Step 1: compute temperature fluctuation modes on the solid’s boundary \(\partial \Omega \);
Step 2: for each mode, solve the heat equation with Dirichlet boundary conditions being defined by the mode itself. Solutions of these heat equations give bulk fluctuation modes \(A_{i}^{v}\) satisfying the heat equation;
Step 3: draw a random linear combination of the bulk modes, denoted by \(\tau : \Omega \times \Theta \rightarrow {\mathbb {R}}\), and superpose it with the reference temperature field to obtain a realization of the random temperature field.
More precisely, the random field \(\tau \) is defined using independent and identically distributed random variables \(y_i\) following the standard normal distribution \({\mathcal {N}}(0,1)\):
Consequently, \(\tau \) is a Gaussian random field. To avoid getting unrealistic temperatures when superposing the reference temperature field with the random fluctuations, values below zero Kelvin or beyond the melting point are truncated. The resulting random temperature field \(T:\Omega \times \Theta \rightarrow {\mathbb {R}}\) satisfies the condition:
where \({\mathbb {E}}\) denotes the mathematical expectation. Errors in this equation are negligible, since truncated values fall into the tail of the distribution.
The procedure for the construction of temperature fluctuation modes on \(\partial \Omega \) in step 1 follows the ideas of [57]. An isotropic correlation function is defined:
where \(d(\mathbf {x},\mathbf {y})\) is the geodesic distance computed on the surface \(\partial \Omega \). Geodesic distances are calculated thanks to the algorithm developed by Mitchell, Mount and Papadimitriou [58] and implemented in [59], see [60] for the code. In practice, geodesic distances are only evaluated between nodes of the finiteelement mesh. After having computed the correlation matrix \(C_{ij} = \rho (\mathbf {x}_i ,\mathbf {x}_j )\) and defined a variance vector, one can get the covariance matrix \(\varvec{\Gamma }\) and find \(\mathbf {A}\) such that \(\mathbf {A}\mathbf {A}^T = \varvec{\Gamma }\). The columns of \(\mathbf {A}\) define the fluctuation modes on the solid’s boundary.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Daniel, T., Casenave, F., Akkari, N. et al. Model order reduction assisted by deep neural networks (ROMnet). Adv. Model. and Simul. in Eng. Sci. 7, 16 (2020). https://doi.org/10.1186/s40323020001536
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40323020001536
Keywords
 Model order reduction
 Machine learning
 Deep neural networks
 Nonlinear structural mechanics