Reduced order modeling using PGD
Due to the increasing number of high-dimensional approximation problems, which naturally arise in many situations such as optimization or uncertainty quantification, model reduction techniques have been the object of a growing interest and are now a mature technology [19, 24]. Tensor methods are among the most prominent tools for the construction of model reduction techniques as in many practical applications, the approximation of high-dimensional solutions of Partial Differential Equations (PDEs) is made computationally tractable by using low-rank tensor formats. In particular, an appealing technique based on a canonical format and referred to as Proper Generalized Decomposition (PGD) was introduced and successfully used in many applications of computational mechanics dealing with multiparametric problems [5, 7, 9, 10, 15, 18, 20]. Contrary to POD, the PGD approximation does not require any knowledge on the solution, and it operates in an iterative strategy in which basis functions (or modes) are computed from scratch by solving eigenvalue problems.
In the classical PGD framework, the reduced model is built directly from the weak formulation (here (3)) of the considered PDE, integrated over the parametric space. The approximate reduced solution \(T^m\) at order m is then is then searched in a in a separated form with respect to space, time, and model parameters \({\mathbf {p}}=\{p_1,p_2,\dots ,p_d\}\) seen as extra-coordinates [10]:
$$\begin{aligned} T^m({\mathbf {x}},t,{\mathbf {p}})=\sum _{k=1}^{m} \Lambda _k({\mathbf {x}}) \lambda _k(t) \prod _{i=1}^{d} \alpha ^i_k(p_i) \end{aligned}$$
(5)
The computation of the PGD modal representation is performed in an offline phase by using an iterative method [10], before being evaluated in an online phase at any space-time location and any parameter value from products and sums of one-parameter functions.
For the multi-parametric problem of interest, the construction of the PGD solution is detailed in [26]. It reads:
$$\begin{aligned} T^m({\mathbf {x}},t,\sigma ,Pe)=\sum _{k=1}^{m} \Lambda _k({\mathbf {x}}) \lambda _k(t) \alpha ^1_k(\sigma ) \alpha ^2_k(Pe) \end{aligned}$$
(6)
Considering a heat source term with \(u=1\), the first four PGD modes are represented in Fig. 4 (spatial modes), Fig. 5 (parameter modes), and Fig. 6 (time modes).
Real-time data assimilation with Bayesian inference and Transport Map sampling
Basics on Bayesian inference
The purpose of Bayesian inference is to characterize the posterior probability density function (pdf) \(\pi ({\mathbf {p}}|{\mathbf {d}}^\text {obs})\) of some model parameters \({\mathbf {p}}\) given some indirect and noisy observations \({\mathbf {d}}^\text {obs}\). In this context, the Bayesian formulation of the inverse problem reads [17]:
$$\begin{aligned} \pi ({\mathbf {p}}|{\mathbf {d}}^\text {obs}) = \frac{1}{C} \pi ({\mathbf {d}}^\text {obs}|{\mathbf {p}}). \pi _0({\mathbf {p}}) \end{aligned}$$
(7)
where \(\pi _0({\mathbf {p}})\) is the prior pdf, related to the a priori knowledge on the parameters before the consideration of data \({\mathbf {d}}^\text {obs}\), \(\pi ({\mathbf {d}}^\text {obs}|{\mathbf {p}})\) is the likelihood function that corresponds to the probability for the model \({\mathcal {M}}\) to predict observations \({\mathbf {d}}^\text {obs}\) given values of the parameters \({\mathbf {p}}\), and \(C= \int \pi ({\mathbf {d}}^\text {obs}|{\mathbf {p}})\cdot {\pi ({\mathbf {p}}}) \text {d} {\mathbf {p}}\) is a normalization constant. No assumption is made on the probability densities (prior, measurement noise) or on the linearity of the model.
We consider here the classical case of an additive measurement noise with density \(\pi _\text {meas}\). We also consider that there is no modeling error, even though such an error source could be easily taken into account in the Bayesian inference framework (provided quantitative information on this error source is available). The likelihood function thus reads:
$$\begin{aligned} \pi ({\mathbf {d}}^\text {obs}|{\mathbf {p}})=\pi _\text {meas}({\mathbf {d}}^\text {obs}-{\mathcal {M}}({\mathbf {p}})) \end{aligned}$$
(8)
Furthermore, when considering sequential assimilation of measurements \({\mathbf {d}}_i^{\text {obs}}\) at time steps \(t_i\), \(i \in \{1,\ldots ,N_t\}\), the Bayesian formulation is such that the prior at time \(t_i\) corresponds to the posterior at time \(t_{i-1}\):
$$\begin{aligned} \pi ({\mathbf {p}}|{\mathbf {d}}_1^{\text {obs}},\ldots , {\mathbf {d}}_i^{\text {obs}}) \propto \left( \prod _{j=1}^{i} \pi _{t_j}({\mathbf {d}}_j^{\text {obs}}|{\mathbf {p}})\right) \cdot {\pi _0({\mathbf {p}}}) ; \quad \pi _{t_j}({\mathbf {d}}_j^{\text {obs}}|{\mathbf {p}})=\pi _\text {meas} \left( {\mathbf {d}}_j^\text {obs}-{\mathcal {M}}\left( {\mathbf {p}},t_j\right) \right) \end{aligned}$$
(9)
Once the PGD approximation \(T^m({\mathbf {x}},t,{\mathbf {p}})\) is built (see “Reduced order modeling using PGD” section), an explicit formulation of the non-normalized posterior density can be derived. Indeed, owing to the observation operator \({\mathcal {O}}\), the output \({\mathbf {d}}^m({\mathbf {p}},t)={\mathcal {O}}\left( T^m({\mathbf {x}},t,{\mathbf {p}})\right) \) can be easily computed for any value of the parameter set \({\mathbf {p}}\). The non-normalized posterior density \({\overline{\pi }}\) thus reads:
$$\begin{aligned} {\overline{\pi }}\left( {\mathbf {p}}|{\mathbf {d}}_1^{\text {obs}},\ldots , {\mathbf {d}}_i^{\text {obs}}\right) = \prod _{j=1}^{i} \pi _\text {meas} \left( {\mathbf {d}}_j^\text {obs}-{\mathbf {d}}^m\left( {\mathbf {p}},t_j\right) \right) .\pi ({\mathbf {p}}) \end{aligned}$$
(10)
From the expression of \(\pi ({\mathbf {p}}|{\mathbf {d}}^\text {obs})\) (or \(\pi ({\mathbf {p}}|{\mathbf {d}}_1^{\text {obs}},\ldots , {\mathbf {d}}_i^{\text {obs}})\)), stochastic features such as means, variances, or first-order marginals on parameters may be computed. These quantities are based on large dimension integrals, and classical Monte-Carlo integration-based techniques such as Markov Chain Monte-Carlo (MCMC) require in practice to sample the posterior density a large number of times. This multiquery procedure is much time consuming and incompatible with fast computations; we thus deal with an alternative approach in the following section.
Transport Map sampling
The principle of the Transport Map strategy is to build a deterministic mapping M between a reference probability measure \(\nu _\rho \) and a target measure \(\nu _\pi \). The purpose is to find the change of variables such that:
$$\begin{aligned} \int g \text {d} \nu _\pi = \int g \circ M \text {d} \nu _\rho \end{aligned}$$
(11)
In this framework, samples drawn according to the reference density are transported to become samples drawn according to the target density (Fig. 7). For the considered inference methodology, the target density corresponds to the posterior density \(\pi ({\mathbf {p}}|{\mathbf {d}}^\text {obs})\) derived from the Bayesian formulation, while a standard normal Gaussian density may be chosen as the reference density; for more details, we refer to [29] with effective computation tools (see http://transportmaps.mit.edu).
From the reference density \(\rho \), the purpose is thus to build the map \(M : {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) such that:
$$\begin{aligned} \nu _\pi \approx M_\sharp \nu _\rho = \rho \circ M^{-1} |\text {det} \nabla M^{-1}| \end{aligned}$$
(12)
where \(_\sharp \) denotes the push forward operator. Once the map M is found, it can be used for sampling purposes by transporting samples drawn from \(\rho \) to samples drawn from \(\pi \). Similarly, Gaussian quadrature \((\omega _i,{\mathbf {p}}_i)_{i=1}^N\) for \(\rho \) can be transported to quadrature \((\omega _i,M({\mathbf {p}}_i))_{i=1}^N\) for \(\pi \). Such a (deterministic) numerical integration with quadrature rule from the reference Gaussian density is therefore a technique of choice used in the present work for the calculation of statistics, marginals, or any other information from the posterior pdf.
Maps M are searched among Knothe–Rosenblatt rearrangements (i.e lower triangular and monotonic maps). This particular choice of structure is motivated by the following properties (see [4, 21, 29] for all details):
-
Uniqueness and existence under mild conditions on \(\nu _\pi \) and \(\nu _\rho \);
-
Easily invertible map and Jacobian \(\nabla M\) simple to evaluate;
-
Optimality regarding the weighted quadratic cost;
-
Monotonicity essentially one-dimensional (\(\partial _{p_k}M^k >0\)).
The maps M are therefore parametrized as:
$$\begin{aligned} M({\mathbf {p}})= \left[ \begin{array}{l} M^1({\mathbf {a}}_c^1,{\mathbf {a}}_e^1,p_1) \\ M^2({\mathbf {a}}_c^2,{\mathbf {a}}_e^2,p_1,p_2)\\ \vdots \\ M^d({\mathbf {a}}_c^d,{\mathbf {a}}_e^d,p_1,p_2,\ldots ,p_d) \end{array} \right] \end{aligned}$$
(13)
with \(M^k({\mathbf {a}}_c^k,{\mathbf {a}}_e^k,{\mathbf {p}})= \Phi _c({\mathbf {p}}) {\mathbf {a}}_c^k+\int _{0}^{p_k} (\Phi _e(p_1,...,p_{k-1},\theta ){\mathbf {a}}_e^k)^2 \text {d} \theta \). Functions \(\Phi _c\) and \(\Phi _e\) are chosen as Hermite polynomials with coefficients \(\mathbf {a_c}\) et \(\mathbf {a_e}\). This integrated squared parametrization is a classical choice that automatically ensures the monotonicity of the map, and using Hermite polynomials leads to an integration that can be performed analytically.
With this parametrization, the optimal map M is found by minimizing the following Kullback–Leibler (K–L) divergence:
$$\begin{aligned} \begin{aligned} {\mathcal {D}}_{KL}( M_\sharp \nu _\rho || \nu _\pi )&= {\mathbb {E}}_\rho \left[ \log \frac{\nu _\rho }{M_\sharp ^{-1} \nu _\pi }\right] \\&=\int _P \left[ \log (\rho ({\mathbf {p}}))- \log ([\pi \circ M]({\mathbf {p}})) - \log (|\det \nabla M({\mathbf {p}})|) \right] \rho ({\mathbf {p}}) \text {d} {\mathbf {p}} \end{aligned} \end{aligned}$$
(14)
that quantifies the difference between the two distributions \(\nu _\pi \) and \(M_\sharp \nu _\rho \). Still using a Gaussian quadrature rule \((\omega _i,{\mathbf {p}}_i)_{i=1}^N\) over the reference probability space associated with \(\rho \), the minimization problem reads:
$$\begin{aligned} \underset{{\mathbf {a}}_c^{1,\ldots ,d},{\mathbf {a}}_e^{1,\ldots ,d}}{\min } \sum _{i=1}^{N} \omega _i \left[ - \log ({\widetilde{\pi }} \circ M({\mathbf {a}}_c^{1,\ldots ,d},{\mathbf {a}}_e^{1,\ldots ,d},{\mathbf {p}}_i) - \log (\left| \det \nabla M({\mathbf {a}}_c^{1,\ldots ,d},{\mathbf {a}}_e^{1,\ldots ,d},{\mathbf {p}}_i))\right| ) \right] \end{aligned}$$
(15)
where \({\overline{\pi }}\) is the non-normalized version of the target density. This minimization problem is fully deterministic and may be solved using classical algorithms (such as BFGS) using gradient or Hessian information on the density \({\overline{\pi }}({\mathbf {p}})\).
It is important to notice that the reduced PGD representation (6) of the solution is highly beneficial to solve (15). Partial derivatives of the model with respect to parameters \({\mathbf {p}}\) can indeed be easily computed as:
$$\begin{aligned} \frac{\partial ^n T^m}{\partial p_j^n}({\mathbf {x}},t,{\mathbf {p}})=\sum _{k=1}^{m} \Lambda _k({\mathbf {x}}) \lambda _k(t) \frac{\partial ^n \alpha ^j_k}{\partial p_j^n}(p_j) \prod _{\begin{array}{c} i=1 \\ i \ne j \end{array}}^{d} \alpha ^i_k(p_i) \end{aligned}$$
(16)
and stored in the offline phase. Thanks to the separated representation of the PGD, cross-derivatives are computed by combination of univariate modes derivatives. As a result, the use of PGD also speeds up the computation of transport maps.
The quality of the approximation \(M_\sharp \nu _\rho \) of the measure \(\nu _\pi \) can be estimated by the convergence criterion \(\epsilon _\sigma \) (variance diagnostic) defined in [29] as:
$$\begin{aligned} \epsilon _\sigma = \frac{1}{2} {\mathbb {V}}\text {ar}_\rho \left[ \log \frac{\nu _\rho }{M_\sharp ^{-1} \nu _\pi }\right] \end{aligned}$$
(17)
The numerical cost for computing this criterion is very low as the integration is performed using the reference density and with the same quadrature rule as the one used in the computation of the K–L divergence. Therefore, an adaptive strategy regarding the order of the map can be used to derive an automatic algorithm that guarantees the quality of the approximation \(M_\sharp \nu _\rho \).
In the case of sequential inference, the Transport Map method exploits the Markov structure of the posterior density (9). Indeed, instead of being fully computed, the map between the reference density \(\rho \) and the posterior density at time \(t_i\) is obtained by composition of low-order maps (see Fig. 8):
$$\begin{aligned} \left( M_1 \circ \ldots \circ M_i \right) _\sharp \rho ({\mathbf {p}}) = \left( {\mathbb {M}}_i\right) _\sharp \rho ({\mathbf {p}}) \approx \pi ({\mathbf {p}}|{\mathbf {d}}_1^{\text {obs}},\ldots , {\mathbf {d}}_{i}^{\text {obs}}) \end{aligned}$$
(18)
Therefore, at each assimilation step \(t_i\), only the last map component \(M_i\) is computed between \(\rho \) and the density \(\pi _i^*\) defined as:
$$\begin{aligned} {\pi }^*_i({\mathbf {p}})=\pi _{t_i}({\mathbf {d}}_i^\text {obs}|{\mathbb {M}}_{i-1}({\mathbf {p}}))\cdot {\rho ({\mathbf {p}}}) \end{aligned}$$
(19)
which leads to a process with almost constant CPU effort.
Real-time control
In addition to the mean, maximum a posteriori (MAP), or other estimates on model parameters, another major post-processing in the DDDAS feedback loop is the prediction of some quantities of interest from the model, such as the temperature \(T_3\) at remote point \({\mathbf {x}}_3\) in the present context (see Fig. 2). Once parameters \({\mathbf {p}}\) (\(\sigma \) and Pe here) are inferred in a probabilistic way at each assimilation time point \(t_i\) (\(1\le i \le N_t\)), it is indeed valuable to propagate uncertainties a posteriori in order to know their impact on the output of interest \(T_3\) during the process, and consequently to assess the welding quality.
As the PGD model gives an explicit prediction of the temperature field over the whole space-time-parametric domain, the output \(T_3\) can be easily computed for all values of the parameter samples and at each physical time point \(\tau _j\), \(j \in \{1,\ldots ,N_\tau \}\). For a given physical time point \(\tau _j\), the pdf \(\pi (T_{3|\tau _j}|{\mathbf {p}},t_i)\) of the value of the temperature \(T_3\) knowing uncertainties on the parameter set \({\mathbf {p}}\) from data assimilation up to time point \(t_i\) can thus be computed in real-time and used to determine if the plates are correctly welded and with which confidence. In practice, this computation may be performed for all physical time points \(\tau _j \ge t_i\), and the density \(\pi (T_{3|\tau _j}|{\mathbf {p}},t_i)\) is characterized by a (Gaussian) quadrature rule using the Transport Map method. With this knowledge, a stochastic computation of the predicted temperature evolution can be obtained, and the control of the welding process from the numerical model can be performed.
We detail below the procedure to dynamically determine the value of the control variable u (magnitude of the heat source) in the case where the welding objective is to satisfy a sufficient welding depth. The quantity of interest is then the maximal value of the temperature \(T_3\) obtained at final time \(\tau ^*\), which is an indicator of the welding quality. When \(T_{3|\tau ^*} \ge 1\), the welding depth is supposed to be sufficient. Other welding objectives will be considered in “Results and discussion” section, associated with similar strategies for command synthesis.
Due to the stochastic framework which is employed, the quantity of interest is actually a random variable with pdf \(\pi (T_{3|\tau ^*}|{\mathbf {p}},t_i)\) evolving at each data assimilation time \(t_i\).
The proposed quantity q to monitor is:
$$\begin{aligned} q=\text {mean}(T_{3|\tau ^*}) - 3\cdot {\text {std}}(T_{3|\tau ^*}) = {\mathcal {Q}}(T_{3|\tau ^*}) \end{aligned}$$
(20)
where \({\mathcal {Q}}\) is an operator defined in the stochastic space. This way, setting the objective \(q_\text {obj} = 1\) ensures that the temperature \(T_{3|\tau ^*}\) is larger than the melting temperature with a confidence of 99%, and using the minimal energy (no overheating).
Using the PGD solution computed in “Reduced order modeling using PGD” section for a unit magnitude of the heat source (\(u=1\)) and zero initial conditions, the predicted (stochastic) maximal value \(T_3\) for a given constant magnitude u and for fixed pdfs of \({\mathbf {p}}\) reads:
$$\begin{aligned} T_{3|\tau ^*} \approx u\cdot {T^m}({\mathbf {x}}_3,\tau ^*,{\mathbf {p}})=u\cdot {\sum _{k=1}^{m}} \Lambda _k({\mathbf {x}}_3) \lambda _k(\tau ^*) \prod _{i=1}^{d} \alpha ^i_k(p_i) \end{aligned}$$
(21)
so that \(q=u\cdot {\mathcal {Q}}\left( T^m({\mathbf {x}}_3,\tau ^*,{\mathbf {p}})\right) \) can be obtained in a straightforward manner. This way, setting the source magnitude u to \(u_0=q_\text {obj}/{\mathcal {Q}}\left( T^m({\mathbf {x}}_3,\tau ^*,{\mathbf {p}})\right) \) would enable to reach the welding objective.
Nevertheless, in practice the pdfs on parameters \({\mathbf {p}}\) are updated at each assimilation time point \(t_i\), based on additional experimental information, so that the value of u needs to be tuned with time accordingly. In order to do so, the control variable u(t) is made piecewise constant in time, under the form:
$$\begin{aligned} u(t) = u_0\cdot {H(t)} + \sum _{i=1}^{N_t}\delta u_i\cdot {H(t-t_i)} \end{aligned}$$
(22)
where H is the Heaviside function, \(u_0\) is the initial command on the source magnitude (defined from the prior pdfs on \({\mathbf {p}}\)), and \(\delta u_i\) is the correction to the current command at each assimilation time \(t_i\). Using the linearity of the problem with respect to the loading, a PGD solution associated with the command is made of a series of PGD solutions translated in time; it reads:
$$\begin{aligned} u_0\cdot {T^m}({\mathbf {x}},t,{\mathbf {p}}) + \sum _{n=1}^{N_t}\delta u_i\cdot {T^m}({\mathbf {x}},t-t_i,{\mathbf {p}}) \end{aligned}$$
(23)
Therefore, after each assimilation time point \(t_i\), the new prediction of the quantity of interest \(T_{3|\tau ^*}\) can be easily obtained from PGD:
$$\begin{aligned} \begin{aligned} T_{3|\tau ^*}&\approx u_0\cdot {T^m}({\mathbf {x}}_3,\tau ^*,{\mathbf {p}}) + \sum _{n=1}^i \delta u_n\cdot {T^m}({\mathbf {x}}_3,\tau ^*-t_n,{\mathbf {p}}) \\&= T^{pred,[0,i-1]}_{3|\tau ^*}({\mathbf {p}}) + \delta u_i\cdot {T^m}({\mathbf {x}}_3,\tau ^*-t_i,{\mathbf {p}}) \end{aligned} \end{aligned}$$
(24)
where \(T^{pred,[0,i-1]}_{3|\tau ^*}({\mathbf {p}})=u_0\cdot {T^m}({\mathbf {x}}_3,\tau ^*,{\mathbf {p}}) + \sum _{n=1}^{i-1} \delta u_n\cdot {T^m}({\mathbf {x}}_3,\tau ^*-t_n,{\mathbf {p}})\) is the prediction on \(T_{3|\tau ^*}\) considering the history of the control variable u(t) until time \(t_i\). Consequently, the correction \(\delta u_i\) is defined such that \({\mathcal {Q}}(T_{3|\tau ^*})=q_\text {obj}\), using (24) and considering the current pdfs of the parameter set \({\mathbf {p}}\) (i.e. those obtained after the last Bayesian data assimilation at time \(t_i\)).