Bayesian stochastic multi-scale analysis via energy considerations

In this paper physical multi-scale processes governed by their own principles for evolution or equilibrium on each scale are coupled by matching the stored and dissipated energy, in line with the Hill-Mandel principle. In our view the correct representations of stored and dissipated energy is essential to the representation irreversible material behaviour, and this matching is also used for upscaling. The small scales, here the meso-scale, is assumed to be described probabilistically, and so on the macroscale also a probabilistic model is identified in a Bayesian setting, reflecting the randomness of the meso-scale, the loss of resolution due to upscaling, and the uncertainty involved in the Bayesian process. In this way multi-scale processes become hierarchical systems in which the information is transferred across the scales by Bayesian identification on coarser levels. The quantities to be matched on the coarse-scale model are the stored and dissipated energies. In this way probability distributions of macro-scale material parameters are determined, and not only in the elastic region, but also for the irreversible and highly nonlinear elasto-damage regimes, refelcting the aleatory uncetainty at the meso-scale level. For this purpose high dimensional meso-scale stochastic simulations in a non-intrusive functional approximation forms are mapped to the macro-scale models in an approximative manner by employing a generalised version of the Kalman filter. To reduce the overall computational cost, a model reduction of the meso-scale simulation is achieved by combining the unsupervised learning techniques based on the Bayesian copula variartional inference with the classical functional approximation forms from the field of uncertainty quantification.


Introduction
The predictive modelling of highly nonlinear and damaging material behaviour such as occurs in cementitious-like materials which are heterogeneous over a large range of scales requires realistic mathematical models. As a detailed description on the desired macroscopic level is not computationally feasible for large-scale structures, some kind of multiscale approach is called for. In this paper simple prototypical models of macro-and mesoscale descriptions of cementitious materials will be considered. However, smaller scales can be introduced as well in order to explicitly describe heterogeneities characterising the material structure of aggregates, the mortar matrix, or the interfacial zone.
Conceptually, the prevalent computational methods to tackle multi-scale problems can be classified into concurrent and non-concurrent approaches. Concurrent schemes consider both the macro-and meso-scales during the course of the simulation e.g. the FE 2 method [12,13], whereas the non-concurrent ones are based on a scale separation idea, by which the desired quantity of interest (QoI), e.g. average stresses or strains, are estimated given numerical experiments on a representative volume element (RVE), see [17] for a recent overview on related techniques. Although the non-concurrent method has proven to work very well for elastic properties, i.e. homogenisation especially under the assumption of scale separation, problems appear when this is not satisfied, or the material is loaded into a range where irreversible material changes occur under complex loading paths, possibly with induced localisation effects, which are of crucial concern when dealing with material non-linearities such as plasticity and damage. In [22,41,44] a combination of concurrent and non-concurrent approaches is proposed. In case heterogeneity is present over a larger range of scales, the so-called size-effect problem appears and has to be resolved [9]. This is difficult to handle with standard local material models in the FE 2 method [12,13] as integration points loose any size information. In [22,36,37,44] this problem is approached by considering the mesh in element approach (MIEL) [24,25,41], in which the meso-scale structure is embedded in a macro-scale finite element. This allows the precise transfer of size-information between the scales.
In this paper we take the MIEL approach, and focus on the stochastic multiscale problem in which the meso-scale information is described by variations reflecting aleatoric uncertainties describing the geometry, the spatial distribution and the material properties of the individual meso-scale material constituents and their mutual interaction. The idea is to design an appropriate stochastic macro-scale model and to estimate its corresponding stochastic parameters such that the aleatoric meso-scale information is preserved.
In the literature stochastic homogenisation is usually performed on an ensemble of RVEs in order to extract the relevant statistical QoI [32][33][34][35]52,53,[58][59][60]. For example, in [19,29], the moving-window approach is used to characterise the probabilistic uni-axial compressive strength of random micro-structures. Another active direction of research is to develop stochastic surrogate models for strain energy of random micro-structures as in [6][7][8]57]. The main goal is to mitigate the effect of the curse of dimensionality due to a large number of stochastic dimensions. With the rapid expansion of machine learning and data driven techniques, the current trend is to train neural network based approximate models [28,30,48,65,66] as a cheaper computational alternative in multiscale methods. Preserving mechanical invariance properties in such settings is a difficult task, which is why energy and dissipation based scale transfer methods were proposed [41]. Furthermore, to obtain a probabilistic description of macro-scale characteristic by incorporating micro-scale measurements, Bayesian methods have been applied to such problems with promising success, please see [5,[14][15][16]26] for recent applications in the formulation of high dimensional probabilistic inverse problems generally, and estimating distribution of material parameters specifically. Moreover [11,27,50,54] demonstrate the application of the Bayesian framework to multi-scale problems.
In this paper we assume that the stochastic macro-scale continuum model can be described as a stochastic generalised standard material model [18,20,24], characterised by the specification of the stored energy and the dissipation. By using physics based principles and Bayesian inference, we employ these two thermodynamics functions as a coupling constituent between the stochastic meso-and macro-scales as previously suggested in [25,41,[49][50][51] for one specific realisation of a representative volume element (RVE). Hence, the stochastic stored energy and dissipation on the meso-scale are captured, and further used as measurements in the Bayesian estimation of the stochastic macro-scale properties of the generalised model. In this regard, the focus of the paper is two-fold: on one hand we suggest the Bayesian approach for upscaling the uncertain meso-scale information to the macro-scale, and on the other we suggest a novel approach for reducing the dimension of the stochastic meso-scale measurement. By repeating experiments on several RVEs we collect the meso-scale measurement samples, i.e. instances of measurement data that represent aleatoric uncertainty. Furthermore, we assume that the distribution of measurement uncertainty is not known, and search for the minimally parametrized model that represents the stochastic meso-scale data via an unsupervised learning algorithm. The goal is to represent the meso-scale measurement as a nonlinear function of simpler independent random variables (e.g. Gaussian random variables), and hence indirectly determine the distribution of the meso-scale data. Here this is achieved by employing a copula based Bayesian variational inference on a generalised mixture model. Once the measurement data are quantified we use them in a Bayesian upscaling procedure to estimate the stochastic macro-scale properties.
The paper is organised as follows: a generalised model problem is presented in "Abstract model problem" section, and the research questions are discussed. "Bayesian upscaling of random meso-structures" section describes a Bayesian framework for upscaling random meso-scale information with a particular focus on the approximate posterior estimation. From a computational point of view this is done through unsupervised learning. The energy-like observations used to map the information between the scale is described in concrete terms using an example material model in "Bayesian upscaling of random mesostructures" section and its algorithm performance is analysed on two different numerical examples in "Numerical results" section. "Conclusion" section concludes the discussion on the proposed approach.
plus appropriate essential boundary conditions u 0 for the displacement u of the macroscale body on the Dirichlet boundary Γ d ⊂ ∂G, where G ⊂ R d is the domain occupied by said body, under volume load f and boundary tractions g on the Neumann boundary Γ n ⊂ ∂G, equilibrated by the Cauchy stress σ(u) in the body, which depends on the displacement through constitutive relations, which we turn to next. As this is to be a model for possibly more complex behaviour of the meso-scale, we assume that the macro-scale continuum model can be described as a generalised standard material model [18,20,24]. These materials obey the maximum dissipation hypothesis, and are thus in a sense optimal in fulfilling the requirements of the second law of thermodynamics. They have the additional advantage that these materials are completely characterised by the specification of two scalar functions, the stored energy resp. Helmholtz free energy density W , and the dissipation pseudo-potential density F . In our view this description is also a key for the connection with the meso-scale behaviour. No matter how the physical and mathematical/computational description on the meso-scale has been chosen, in all cases where the description is based on physical principles it will be possible to define the stored (Helmholtz free) energy and the dissipation (entropy production). These two thermodynamic functions will thus be employed as "measurements" resp. coupling quantitites in the Bayesian inference used to identify the macro-scale model parameters given the meso-scale response stored energy and dissipation. This approach is also a good start for computational procedures (e.g. [10,24,40,55]), and has for some specific subset of materials been given a fully variational formulation in a Hilbert space context [21]. This description has been subsequently extended to much more general cases for the here interesting case of rate-independent behaviour [43].
To summarise the generalised standard material model in a nutshell [18,20,24,40,42], for an isothermal small-strain situation with strain ε, from the Clausius-Duhem inequality it follows that the stress σ at some material point x ∈ G is where ∇ s is the symmetric part of the gradient, w are the internal phenomenological variables [18,20,24,40,42] describing possibly irreversible changes in the material, D ε is the partial derivative w.r.t. ε, the collection Q of tensors of even order describes the specific material (and has to be identified later), and ζ = − D w W (ε, w, Q) are "thermodynamic forces" conjugate to the "thermodynamic fluxes" v :=ẇ -the inner product ζ,ẇ is a rate of (dissipated) energy. The evolution of w-i.e. v =ẇ-is then [18,20,24,40,42] defined by the dissipation pseudo-potential F through a variational inclusion: where ∂ v F is the subdifferential of F w.r.t v, which is by definition equivalent with the variational inequalities here F * is the Legendre-Fenchel dual of the dissipation potential F (ε, w, v, Q) w.r.t. v =ẇ, where F is assumed as a convex and lower semi-continuous function w.r.t. that variable. Convex analysis and variational inequalities enter, as for rate-independent material behaviour-which we are interested in here-the dissipation pseudo-potential F cannot be smooth [20,24,40,42,43], in fact it is positively homogeneous of the first degree in v =ẇ; thus it is appropriate to use the subdifferential ∂ v w.r.t. v =ẇ resp. ∂ ζ w.r.t. ζ in (3)-this is a concise form of writing a variational inequality. Specific forms of W and F will be shown and used later. In [21,43] it is shown how, by collecting the state variables u and w as fields  [20,21,40,42,43]-something inherited from the local function F in Eq. (3). Then the evolution of the state z ∈ Z M of the macro-mechanical system can be described mathematically in an abstract variational manner by the subdifferential inclusion in which δ z stands for the Gâteaux derivative w.r.t. the state variable z ∈ Z M , and the derivative of F M is given in terms of the set-valued subdifferential ∂ s F M w.r.t. the second variable s in the sense of the convex analysis, e.g. see [40,43]-see also Eqs. (4) and (5) for reference. Furthermore, we assume that the rate-independent system in Eq. (6) is parameterised by a field Q representing the global form of the variables Q in Eqs. (2) and (3), i.e. the specific material characteristics. This means that the full list of arguments looks like Given the mathematical description of the macro-scale model in Eq. (6), the goal is to find the unknown variables Q such that the structural response of the macro-scale model matches the response of the meso-scale model as well as possible. As remarked before, the meso-scale could be any kind of model with the ability to produce values for stored and dissipated energy. Here, for testing purposes and to show how the procedure works, we take as meso-scale structure a more detailed and spatially resolved description of the macro-scale counterpart-one that accounts for material and geometrical heterogenety at a lower-scale level, and hence represents the system that we can evaluate at possibly high computational cost. The macro-scale model is hence one that has to represent equivalent physical behaviour at a much coarser spatial resolution. Thus the meso-scale mathematical model is formally equal to Eq. (6), and reads and is subjected to equivalent outside actions-incorporated into W m -as the one in Eq. (6). As before we assume that there are meso-scale parameters Q m which describe possibly different meso structures. Hence the full list of arguments for the meso-scale energy and dissipation is As Z M = Z m , the states z M and z m cannot be directly compared, and the two models are to be compared by some observables or measurements y ∈ Y, where Y is typically some vector space like R m . In other words, let be the meso-scale observable (e.g. energy, stress or strain etc.) in which Y m describes the measurement operator, f m is the external excitation andˆ is a random variable which represents the measurement noise. On the other side, let be the prediction of the same observation on the macro-scale level, this time described by the measurement operator Y M and the external excitation f M of the same type as f m . To incorporate the possible model error and other discrepancies between the mesoand macro-scale, we model y M as a noisy variant of Y M . For this purpose we introduce a probability space (Ω , B , P )-Ω is the space of elementary events or realisations, B is the σ -algebra of measurable events, and P is the probability measure-and add to Y M (Q, z M (Q, f M )) the random variable (ω ) ∈ L 2 (Ω , B , P ) that best describes our knowledge about those discrepancies. Hence, Eq. (9) becomes stochastic and reads Typically, (ω ) is modelled as a zero-mean Gaussian random variable ∼ N (0, C ) with covariance C . However, other models for (ω ) can also be introduced without modifying the general setting presented in this paper. This is now the point to discuss different possibilities and choices for identification. Recall that Bayesian identification (e.g. [38,39]) proceeds such that the unknown object is considered as uncertain in an epistemic sense-an uncertainty of our knowledgeand modelled as a random variable. New information about that object is then obtained through some connected observation or measurement via conditioning, which hopefully will reduce the epistemic uncertainty. Now, if different observations or measurements come from a fixed specimen, or fixed computational model-fixed values of Q m in Eq. (7) in our case-for different external actions, then it is not an unreasonable interpretation to say what one "really" wants to know is a fixed value of Q, and the uncertainty introduced in the Bayesian modelling is purely epistemic.
Trying to achieve a theoretical description in a Bayesian framework, as now Q is to be modelled as a random variable, one needs a stochastic version of the theory sketched so far. This means one needs a stochastic version of continuum thermodynamics [45], with a new interpretation of equilibrium Eq. (1), and a stochastic generalised standard material with re-interpreted Eqs. (2) and (3), see [51]. An early version of this for plasticity is [46]. This is achieved by modelling Q as a random quantity due to epistemic uncertainty, and assume that it is a random variable on the probability space (Ω u , B u , P u ) for epistemic uncertainty, for the sake of simplicity we assume that it has finite variance, Q ∈ L 2 (Ω u , B u , P u ). As Q is now a random quantity, the macro-scale Eq. (6) has to be interpreted stochastically, the state becomes a stochastic quantity z : Ω u → Z M , and Eq. (6) is translated into [46] where the expectation being defined as E (W) Ω u := Ω u W(ω u ) P(dω u ). The random solution z(ω u ) from Eq. (11) is the input to Eq. (10) to provide the prediction y M (ω u , ω ). This leads to: The upscaling process that is related to Problem 1 was already considered in [47,[49][50][51], and hence will not be repeated here. However, as we show later, this problem is a special case of the upcoming Problem 2, which is the main topic of this paper. This Problem 2 deals with the situation that different observations or measurements come from a population of different RVE realisations, further called aleatoric uncertainty. In our case observations come from different realisations of Q m in Eq. (7). In such a case the natural approach would be to capture that randomness also in the macro-scale model, i.e. we want to identify in the macro-scale model a random variable Q.
Thus we model Q m at each point as a random variable in L 2 (Ω Q , B Q , P Q ; Q m ), i.e. a random field defined by the mapping where Q m is the parameter space, which depends on the application. As a consequence, the deterministic evolution problem in Eq. (7) also becomes uncertain, and Eq. (7) rewrites to in which respectively.
The formal changes are now that the observation in Eq. (8) not only contains the noise described byˆ , but is a random quantity of an aleatoric nature, because so is Q m (ω Q ) and hence z m (ω Q ). Hence, one has: As we want to see the randomness from Ω Q reflected in Q on the macro-scale, these quantities now have to become a random variable on the probability space Ω u × Ω Q , and hence so does the Eq. (11), with the change where i.e. the expected values are now on the space Ω u × Ω Q . The prediction of the observation Eq. (10) has now to be interpreted in this way too, it contains also the aleatoric randomness from Ω Q . With this re-interpretation of Eq. (10), this in turn modifies Problem 1 into

Problem 2 Find an epistemically uncertain random (aleatoric) variable
The designation of epistemic and aleatoric uncertainty is a kind of interpretation, mathematically Ω u and Ω Q are just probability spaces. Once such an object like Q is identified in a Bayesian framework, it does usually not really matter what caused the uncertainty described in a probabilistic sense. Thus, for the sake of simplicity, in the following we shall take on the macro scale just one probability space (Ω, B, P) for the description of the uncertainty, where one might think of ω ∈ Ω as a realisation ω = (ω u , ω Q ) ∈ Ω u × Ω Q = Ω, i.e. the product of those two probability spaces.

Bayesian upscaling of random meso-structures
The goal is to identify the quantity Q defined on Ω conditioned on the observation y m using Bayes's rule. As some or all of the components of Q may be required to be positive definite-as is often the case for material quantities-, this constraint has to be taken into consideration. In our case all components of Q have to fulfil that requirement. In most updating methods it is advantageous if the quantities to be identified have no constraints. We shall explain how to achieve this by considering a scalar component Q of Q. The first is to scale Q by a reference Q 0 to obtain a dimensionless quantity, and consider now q := log(Q/Q 0 ). As numerically log(Q/Q 0 ) = log Q − log Q 0 , it is convenient to choose Q 0 such that Q 0 = 1 in the units used, and hence numerically log Q 0 = 0 and q := log Q. Henceforth we assume that this has been done. The variable q now has no constraints, it is a free variable on all of R. This procedure may be extended to all even order positive tensors, but will only be needed for scalars here. So instead of identifying the collection Q directly, we identify the logarithms of its components giving in this way a collection q where we write symbolically q = log Q, as in this way whatever approximations or linear operations are performed computationally on the numerical representation of q(x, ω), in the end exp(q(x, ω)) is always going to be positive. This also gives the right kind of mean-the geometric mean-for positive quantities. The underlying reason is that the multiplicative group of positive real numbers-a (commutative) one-dimensional Lie group-is thereby put into correspondence with the additive group of reals, which also represents the (one-dimensional) tangent vector space at the group unit, the number one. This is the corresponding Lie algebra. A positive quadratic form on the Lie algebra-in one dimension necessarily proportional to Euclidean distance squared-can thereby be carried to a Riemannian metric on the Lie group. A similar argument holds for positive tensors of any even order.
Therefore, instead of Problem 2, we consider its modified version: Bayesian updating is in essence a probabilistic conditioning, the foundation of which is the conditional expectation operator [4]. Here we are interested in the case where the conditioning occurs w.r.t. another random variable, namely y(q(ω)), which depends on the quantity q to be updated. For any function ϕ : Q → F of finite variance of q, the conditional expectation of it is defined [4] by the projection onto a closed subspace C ϕ ⊂ L 2 , which is in simple terms the L 2 -closure of all multivariate polynomials in the components of y with coefficients from the vector space F, i.e.
where f α ∈ F and the V α are real-valued multivariate polynomials in y = (y 1 , . . . , y j , . . . ), which means that for a multi-index α = (α 1 , α 2 , . . . ) the polynomial V α is of degree α j in the variable y j . It turns out [4] that C ϕ contains all measurable functions g : Y → F so that g(y(q(ω))) is of finite variance.
Here we will be only interested in the function ϕ(q) = q, i.e. the conditional mean of q. To compute it, one may use the variational characterisation and compute the minimal distance from the subspace C := C q to the point q: In this section, the expectation operators are to be understood as acting only on the variables which describe the uncertainty in the estimation, i.e. in the notation of "Abstract model problem" section only on the variables from Ω u . One may observe from Eq. (16) that φ is the best "inverse" of y(q) in a least square sense, the orthogonal projection P C q of q onto C. Following Eq. (16), one may decompose the random variable q into the projected component q p ∈ C and the orthogonal residual q r ∈ C ⊥ , such that holds. Here, q p = P C q = E (q | y) = φ(y(q)) is the orthogonal projection onto the subspace C of all random variables consistent with the data, whereas q r := (I − P σ (y) )q is its orthogonal residual. This can be used to build a filter-filtering the observation y m together with the prior forecast q f -which is optimal in this least square sense [38,39] and returns an assimilated random variable q a which has the correct conditional expectation. The first term in the sum in Eq. (17) is taken to be the conditional expectation given the observation y m , i.e. φ(y m ) = E q f | y m , whereas (q f − φ(y(q f ))) is the residual component. Following this, Eq. (17) can be recast to obtain the update q a for the prior random variable q f as (18) in which y M (q f ) (see Eq. (10)) is the random variable representing our prior prediction/forecast of the measurement data, and q a is the assimilated random variable. By recalling φ(y(q f )) = P C q f , one sees immediately that E (q a | y m ) = E q f | y m , and the assimilated random variable q a has the correct conditional expectation. As in engineering practice one is often not interested in estimating the full posterior measure, and the conditional expectation is the most important characterisation, we will use this computationally simpler procedure. Therefore, to estimate q a one requires only information on the map φ in Eq. (18). To make the determination of this map computationally feasible, and for the sake of simplicity, the map φ in Eq. (18) can be approximated by a n-th order polynomial-i.e. the minimisation in Eq. (16) is not over all measurable maps, but only over n-th order polynomials-such that the map φ in Eq. (18) becomes with ∀j : 0 ≤ α j ≤ n, and multivariate polynomials V α as before in Eq. (15). In the affine case, when n = 1 and φ 1 (y; β) = Ky + b in the previous formula Eq. (19), the Eq. (18) reduces to the Gauss-Markov-Kalman filter [38,39]: a generalisation of the well known Kalman filter. In order to estimate the macro-scale properties using Eq. (18), one requires both y m and y M , preferably in the functional approximation form. Note that y M is the prediction of the measurement data on the macro-scale level, and is obtained by propagating the prior knowledge q f (here a spatially homogeneous quantity) through the macro-scale model. In this paper we use Bayesian regression -not related to the Bayesian updating-to estimate the functional approximation of y M (q f ), as will be presented in "Approximating the macro-scale response by Bayesian regression" section. On the other hand, y m represents the response of the high-dimensional meso-scale model in which meso-scale properties q m are heterogeneous and uncertain. By modelling q m one may estimate y m in a similar manner as y M , see the first case scenario in Fig. 1. However, due to the high-dimensionality of the input q m , and the nonlinearity of the meso-scale model, a straightforward uncertainty quantification of y m is often not computationally affordable. The estimate would require too many data points as thousands of input parameters can easily be involved in the description of the meso-scale properties. Therefore, we use an unsupervised learning algorithm to reduce the stochastic dimension of the meso-scale measurement, as further described in "Approximation of the meso-scale observation by unsupervised learning" section.

Approximating the macro-scale response by Bayesian regression
The measurement prediction y M is approximated by a surrogate model (21) in which φ M is usually taken to be nonlinear map of q f ∈ L 2 (Ω u , B u , P u ) with the coefficients β. Furthermore, we assume that y M is in general only known as a set of samples, and our goal is to matchŷ M with y M . Let be the full set of N data samples describing the forward propagation of q f to y M via macro-scale model. In order to specifyŷ M the only thing we need to find are the coefficients of the map φ f . Therefore, we infer β given data x using Bayes's rule In general case the marginalisation in Eq. (22) can be expensive, and therefore in this paper we use the variational Bayesian inference instead [31]. The idea is to introduce a family D := {g(β) := g(β|λ, w)} over β indexed by a set of free parameters (w, λ) such thatŷ M ∼ y M . Thus, the idea is to optimise the parameter values by minimising the Kullback-Leibler divergence After few derivation steps as depicted in [31], the previous minimisation problem reduces to in which L(g) is the evidence lower bound (ELBO), or variational free energy. To obtain a closed-form solution for β * , the usual practice is to assume that both the posterior p(β|x) as well as its approximation g(β) can be factorised in a mean sense, i.e.
in which each factor p i (β i |x), g i (β i ) is independent and belongs to an exponential family. Similarly, their complete conditionals given all other variables and observations are also assumed to belong to exponential families, and are assumed to be independent. Obviously these assumptions lead to conjugacy relationships, and closed form solution of Eq. (24) as further discussed in more detail in [31].
To approximate y M (ω), we take Eq. (21) to be described in a form of a polynomial chaos expansion (PCE) or generalised PCE (gPCE) [67]. In other words, y M (ω) and q f (ω) are taken to be functions of known RVs {θ 1 (ω), . . . , θ n (ω), . . . }. Often, when for example stochastic processes or random fields are involved, one has to deal here with infinitely many RVs, which for an actual computation have to be truncated to a finite vector θ(ω) = [θ 1 (ω), . . . , θ L (ω)] ∈ Θ ∼ = R L of significant RVs. We shall assume that these have been chosen such as to be independent, and often even normalised Gaussian and independent. The reason to not use q f directly is that in the process of identification of q they may turn out to be correlated, whereas θ can stay independent as they are. Thus a RV y M (θ) is replaced by a functional approximation and analogously q f by in which the multi-index α = (. . . , α k , . . . ), and the set J Z of multi-indices is a finite set with cardinality (size) Z.
The coefficients q M } α∈J Z , are estimated by minimising the ELBO analogous to the one in Eq. (24) by using the variational relevance vector machine method [3]. Namely, the measurement forecast y s := {y M (θ j )} N j=1 can be rewritten in a vector form as y s = vΨ (28) in which Ψ is the matrix of collection of basis functions Ψ α (θ) evaluated at the set of sample points {θ j } N i=1 . However, the expression in the previous equation is not complete, as the PCE in Eq. (26) is truncated. This implies the presence of the modelling errors. Under a Gaussian assumption, the data then can be modelled as p(y s ) ∼ N (vΨ , ς −1 I) (29) in which ς ∼ Γ (a ς , b ς ) denotes the imprecision parameter, here assumed to follow Gamma distribution. The coefficients v are given a normal distribution under the independency assumption: in which Z denotes the cardinality of the PCE, and ζ := {ζ i } is a vector of hyper-parameters.
To promote for sparsity, the vector of hyper-parameters is further assumed to follow Gamma distribution under the independency assumption. In this manner the posterior for β : = {v, ζ, ς), i.e. p(β|y s ), can be approximated by a variational mean field form the factors of which are chosen to take same distribution type as the corresponding prior due to the conjugacy reasons. Once this assumption is made, one may maximise the corresponding ELBO in order to estimate the parameter set.
Finally, we have everything to describe the macro-scale response y M , and therefore we may fill this term in the filtering equation as presented in Eq. (18) to obtain: Note that Eq. (33) is not yet fully computationally operational, the RV y m has to be put into a computationally accessible form. This is considered in the following "Approximation of the meso-scale observation by unsupervised learning" section.

Approximation of the meso-scale observation by unsupervised learning
Let the measurement y m be approximated by The previous equation is more general than Eq. (25), and hence includes the problem described in "Approximating the macro-scale response by Bayesian regression" section as a special case. The main reason is that next to the coefficients w we also need to estimate the argument η such that the functional approximation in Eq. (34) is minimally parametrised. Following theory in "Approximating the macro-scale response by Bayesian regression" section, Eq. (35) is reformulated to the computationally simpler variational inference problem. In other words, we introduce a family of density functions D := {g(β) := g(β|λ, )} over β indexed by a set of free parameters ( , λ) that approximate the posterior density p(β|y d ), and further optimise the variational parameter values by minimising the Kullback-Leibler divergence between the approximation g(β) and the exact posterior p(β|y d ). Hence, following Eq. (24), we maximise the ELBO by using the mean-field factorisation assumption, and conjugacy relationships. The optimisation problem attains a closed form solution in which the lower bound is iteratively optimised with respect to the global parameters keeping the local parameters fixed, and in the second step the local parameters are updated and the global parameters are held fixed. The algorithm can be improved by considering the stochastic optimisation in which a noisy estimate of the gradient is used instead of the natural one. The mean field factorisation as presented previously is computationally simple, but not accurate. For example, one cannot assume independence between the stored energy and dissipation coming from the same experiment. In other words, the correlation among the latent variables is not explored. As a result, the covariance of the measurement will be underestimated. To allow dependence in the factorisation, one may extend the mean-field approach via copula factorisations [63,64]: in which c(F 1 (β 1 ), ..., F m (β m ), χ) is the representative of a copula family, F i (β i ) is the marginal cumulative distribution function of the random variable β i , and χ is the set of parameters describing the copula family. Similarly, g i (β i ) represent the independent marginal densities. In this manner any distribution type can be represented by a formulation as given in Eq. (37) according to Sklar's theorem [56]. Following Eq. (37), the goal is to find g(β) such that the Kullback-Leibler divergence to the exact posterior distribution is minimised. Note that if the true posterior is described by then the Kullback-Leibler divergence reads: and contains one additional term compared to the mean field approximation. When the copula is uniform, the previous equation reduces to the mean field one, and hence only the second term is minimised. On the other hand, if the mean field factorisation is not a good assumption and the dependence relations are neglected, then the total approximation error will be dominated by the first term. To avoid this, the ELBO derived in Eq. (36) modifies to and is a function of parameters of the latent variables β, as well as of the copula parameters χ. Therefore, the algorithm applied here consists of iteratively finding the parameters of the mean field approximation, as well as those of the copula. The algorithm is adopted from [63], and is a black-box algorithm as it only depends on the likelihood p(y m , β) and copula description in a vine form. Note that when the copula is equal to identity, i.e. uniform, the previous factorisation collapses to the mean field one. Once the copula dependence structure is found, the measurement data y m are represented in a functional form-here taken as generalised mixture model-as in Eq. (34), which is different than the polynomial chaos representation. In other words, the measurement is given in terms of dependent random variables, and not independent ones. Therefore, the dependence structure has to be mapped to an independent one. In a Gaussian copula case, the Nataf transformation can be used, and otherwise the Rosenblatt transformation is applied. For high-dimensional copulas, such as a regular vine copula, [1] provides algorithms to compute the Rosenblatt transform and its inverse. The result of the transformation are mutually independent and marginally uniformly distributed random variables, which further can be mapped to Gaussian ones or other types of standard random variables via marginals [62]. Let the functional approximation of the measurement be given as in which J m is a multi-index set, and G α (ξ) is a set of functions (e.g. orthogonal polynomials) with random variables ξ as arguments. With this, we have obtained the measurement y m in a minimised functional approximation form, which further can be plugged into Eq. (33) to obtain the final filter discretisation. By combining the random variables θ and ξ, one may re-write Eq. (33) in the following form in which H α is a generalised polynomial chaos expansion with random variables (θ, ξ) as arguments. Note that the coefficients q m , are sparse as they only depend on θ or ξ, respectively. As θ describes the a priori (epistemic) uncertainty, one may take the mathematical expectation of the previous equation w.r.t. θ to obtain the natural (aleatoric) variability of the macro-scale parameters: In general, the approximation of the meso-scale information as previously described can be cubersome due to high nonlinearity and time-dependence of y m . Therefore, instead of approximating y m in a form as in Eq. (34), one may discretise y m in a Monte Carlo sampling manner such that Eq. (33) rewrites to ∀ω i : i = 1, . . . , M In other words we repeat the update formula M times for each instance of the measurement y m , and thus obtain M posteriors q (i) a , i = 1, . . . , M that depend only on the epistemic uncertainty embodied in θ. By averaging over θ one obtains a set of samples: i.e. the data which are to be used for the estimation of the functional approximation form of the macro-scale parameter q M similar to Eq. (34). To achieve this, we search for an approximation given the incomplete data set q d := (q i ) n i=1 . Here, w q and η q have same meaning as in Eq. (34), and therefore can be estimated by using same unsupervised algorithm as previously described. This approach is computationally more convenient, as the correlation structure between the material parameters is easier to learn than the one between measurement data on the meso-scale.
For better clarity, we re-capitulate the upscaling procedure in Fig. 1 for comparison reasons. In Fig. 1a) is shown the direct computational approach in which Eq. (42) is used with y m being approximated in same manner as y M by supervised Bayesian regression described in "Approximating the macro-scale response by Bayesian regression" section. Due to a high computational footprint, this approach is not considered in this paper, for more information please see [47]. The upscaling approach presented in Eq. (42), in which y m is approximated by Eq. (34) via an unsupervised learning algorithm, is further depicted in Fig. 1b. Here one first uses the Bayesian unsupervised learning algorithm to learn the distribution of the meso-scale measurement, and later a Bayesian upscaling procedure to estimate the macro-scale parameters. Finally, the upscaling approach given by Eqs. (44) and (46) is shown in Fig. 1c. In this approach one first uses a Bayesian upscaling procedure and estimates the macro-scale parameter sample-wise, after which the Bayesian unsupervised learning algorithm is used to approximate the distribution of the macro-scale parameters. The choice of algorithm depends on the application and dimensionality, as well as on the nonlinearity of the meso-scale model.

Bayesian upscaling via energy considerations
As an example of the abstract model in "Abstract model problem" section in Eqs. (2) and (3), we choose a prototypical version of the compressive behaviour of a cementitious-like material, one which displays in its irreversible behaviour both a "softening" component, as well as an "hardening" component, and therefore interaction with the Helmholtz free energy, to test the merits of the identification algorithm on such behaviour. It is a simple version of a coupled elasto-damage model introduced in [24] (Sect. 3.5.2), in which the state variable z = (u, w) from Eq. (6) is locally at some point x ∈ G in the body z(x) = (u(x), w(x)) ∈ Z M = U M × W M , giving the local resulting strain ε(x) = ∇ s u(x), as well as the internal variables w = (D, ς ). Here D ∈ [1, ∞[ is a damage variable which will modify the stiffness through the elasticity tensor, and ς is a scalar hardening variable. Observe that oftenD = (1 − 1/D) is regarded as the "real damage", asD = 0 corresponds to virgin or undamaged material, andD = 1 corresponds to totally damaged material. For the elastic part, we choose to identify an isotropic material, as the meso-model description is stochastically isotropic. Splitting the strain in its volumetric vol ε = (tr ε/3)I and deviatoric part dev ε = ε − vol ε, the elastic relations may be written in the isotropic case (e.g. [24]) as

Fig. 1 Stochastic multi-scale analysis
where κ is the bulk modulus and μ the shear modulus, in total describing a damage depended elasticity tensor C(D), the damage only acting on the bulk response. For undamaged material the damage variable is D = 1, and as it grows D → ∞, the bulk response weakens.
The local stored energy is given by with the components of the characterising parameter vector Q = (κ, μ, σ f , K d ), where K d is a hardening modulus for the scalar internal variable ς for hardening, and σ f is a failure stress, used in the failure criterion, where we choose one of crushing damage-as it may occur e.g. for cementitious material [24] (Sect. 3.5.2)-with failure function where x = 1/2(x + |x|) is the Macauley bracket, and χ d = −K d ς , the thermodynamic force corresponding to hardening. The elastic domain is f d (tr σ, χ d ) ≤ 0, i.e. damage occurs when the pressure satisfies p = − tr σ ≥ σ f + K d ς. The internal variables are w = (D, ς ). The Legendre-Fenchel dual F * of the dissipation pseudo-potential F (ε, w,ẇ, Q) -the indicator function of the elastic domain-is then The thermodynamic forces are so that the instantaneous dissipation density becomes η =Ḋ(tr σ) 2 /(18κ) +ςK d ς.
The measurement or observation prediction on the macro-scale y M will be only registered at certain intervals t of the pseudo-time variable t, t 0 = 0, t 1 = t, . . . , t = t, . . . . The observation prediction is specified by the spatial average of the energy-type prediction of the macro model y M = (E e , E h , E d ) at observation time t , given by i.e. the integrated or averaged stored elastic and hardening energy, and the energy dissipated between the last observation at t = t −1 and now at t = t . In the numerical experiments the previous model is used on both the meso-and macroscale. Finally, the upscaling is considered for the energy-type of measurement from the meso-scale y m = (E me , E mh , E md ), and in our case defined exactly in an analogous way as in Eq. (49), using the meso-scale model respectively. On the macro-scale, the integrand quantities C(D)(x) and K d (x) in Eq. (49)-in fact all components of Q-are assumed spatially homogeneous or constant, whereas on the heterogeneous meso-scale model they do vary spatially and are modelled as random fields. In this manner, the stored as well as the dissipated portion of the total energy is faithfully mapped from the meso-to the macro-scale model.
To represent the measurement data y m for all meso-structure realisations, one may use generalised mixture models. In our particular application the measurement is positive. Therefore, we use samples x := (log y m (ω i )) N i=1 to approximate log y m as a Gaussian mixture model [2] p(x) = K k=1 π k N (x|ν k ), K k=1 π k = 1, 0 ≤ π k ≤ 1 ( 5 0 ) described by parameter set ν = (μ k , Σ k ) K k=1 with ν k := (μ k , Σ k ) being the statistics parameters of Gaussian components, and π k are the mixing coefficients. These constitute the parameter vector w. The hidden variable η is the indicator vector z k of dimension N that describes the membership of each data point to the Gaussian cluster. Following this, the joint distribution is given as p(x, Z, μ, Σ, π) = p(x|z, μ, Σ, π) p(z|π) p(π) p(μ|Σ) p(Σ) (51) in which and The priors are chosen such that p(π) is a Dirichlet prior, whereas p(μ, Σ) follows an independent Gaussian-Wishart prior governing the mean and precision of each Gaussian component. Hence, our parameter set β is described by a set of global parameters w := (μ, Σ, π) and the hidden variable z. To incorporate correlations, the copula dependence structure of Gaussian mixture as in Eq. (37) is found, and the measurement data are represented in a functional form, as discussed in "Bayesian upscaling of random mesostructures" section.

Bayesian upscaling of a heterogeneous linear elastic material model
In this section the proposed upscaling scheme is first applied on a linear elastic random heterogeneous material. Here it is known from homogenisation theory that for a large enough representative volume element (RVE) one may indeed find a spatially homogeneous constant value for the elastic parameters. This is also quickly seen from a general consideration: for a homogeneous strain, on the macro-scale the stored energy is a quadratic function of the strain. The meso-scale local strain on the other hand is a linear function of the homogeneous boundary strain on the meso-scale model, and locally the stored energy is a quadratic function of the local strain, hence it is locally a quadratic function of the homogeneous boundary strain. The observable, the spatial average of the local meso-scale stored energy, is as a spatial integral of those quadratic functions again a quadratic function of the homogeneous boundary strain. Hence there exists a unique set of coefficients on the macro-scale for a perfect match of the energies. In computational practice, the RVE will often not be large enough, and the above consideration only applies if the macro-scale allows a large enough elastic symmetry class to capture possible anisotropies. Here we want an isotropic macro-scale model, as the meso-scale description is stochastically also isotropic and homogeneous. But as the RVE may not be large enough, residual model errors can be expected in numerical practice.
All our experiments will be performed only for 2D situations, as this is sufficient to demonstrate the approach. The example meso-scale specimen consists of a 2D block described by 64 circular inclusions of equal size randomly distributed in the domain. In the first case scenario only one meso-scale realisation is observed for verification purposes. The computational FE-model uses regular 50 × 50 mesh of standard bi-linear Q4 quad elements for the meso-scale, and one such element for the macro-scale. The material properties are taken as follows: the bulk modulus is K m = 4 MPa, and the shear modulus is G m = 1 MPa for the matrix phase, whereas the inclusions characteristics are prescribed to be ten times higher. The volume fraction of the inclusions is taken as 40%.
The meso-scale characteristics are upscaled in a Bayesian manner to the coarse scale homogeneous isotropic finite element described by material properties taking the form of a posteriori random variable as schematically shown in Fig. 2. To gather as much as information as possible in observation data, we consider different types of loading conditions including pure shear or compression, or their combination as shown in Fig. 2.

Verification
To verify our method, we compare Bayesian upscaling procedure to the deterministic homogenisation approach (as presented in [61], Exercise 4). Therefore, we initially observe only one realisation of the random meso-structure and apply periodic boundary conditions. For the sake of clarity, we describe here the deterministic approach briefly: the FE simulation is performed on the meso-structure, the resulting response is used to compute the macro-scale deformation or stress tensor (depending on the deformation, traction or mixed boundary conditions on the RVE), finally the computed average macro-scale quantities are used to compute the homogeneous/effective material parameters i.e. bulk and shear moduli in the current setting. For further details, the interested reader can consult [61]. An example of the fine-scale response in terms of the element level energy density is shown in Fig. 3 for Experiment 1 and 4. Here we recall that one of the distinguishing features of Bayesian upscaling approach is that it uses energy to estimate macro-scale material parameters.
The comparison of the deterministic homogenization and Bayesian upscaling results (abbreviated as DHB and BUB respectively) are shown in Fig. 4, along with different analytical bounds (computed using the given material properties and volume fraction). These bounds are defined as follows (with increasing degree of refinement): the material properties bound (Mat) with inclusion and matrix properties defined as higher and lower limits respectively, the Reuss-Viogt bounds (RBH) and Hashin-Strickman bounds (HSB). As expected and depicted in Fig. 4, the DHB fails to predict the shear moduli in a pure compression state (Experiment 4). An analogous result is expected and also obtained for the bulk modulus in case when only shear loading conditions are applied (Experiment 1). On the other hand, the BUB in the form of Eq. (33) regularises the problem by introducing prior information. When the data are not informative about the parameter set, this is recognised in that the posterior estimate is unchanged from the prior. Otherwise, the posterior mode and the deterministic homogenised value are identical after all experiments. Note that the blue crosses representing the DHB results, taken from [61], are fluctuating before the final estimate in contrast to the Bayesian estimate. We also observe from Fig. 4 that both the DHB and BUB results remain within the confines of analytical bounds. In particular for BUB results, the bounds for shear modulus reside inside all of the considered analytical limits for the last two experiments: 1 and 2, whereas for bulk modulus, a similar behaviour is observed for experiments: 3 and 4. To conclude, the Bayesian upscaling procedure is more robust than the classical one, and it additionally reflects possible model errors due an insufficient size of the RVE in the residual uncertainty. In addition, in the Bayesian upscaling procedure one may sequentially introduce the measurement data into the upscaling process. For example, one may first use the measurement information coming from the fourth experiment to obtain the upscaled material properties. These further can be used as a new prior for the third experiment, and so on, see Fig. 4.

Upscaling of random elastic material
To quantify randomness on the meso-scale level, the previously described experiment is repeated several times, and the averaged stored elastic energies per experiment are collected. In particular, we observe realisations of the meso-scale elastic material described by randomly placed inclusions with a volume fraction of 40%. Initially, the stored energy is identified given the observed data by using the variational Bayesian inference method as described in "Bayesian upscaling of random meso-structures" section, resp. "Approximation of the meso-scale observation by unsupervised learning" section. The logarithm of the energy is modelled by a copula Gaussian mixture model, and the individual compo- nents are identified. The optimal number of mixture components is further decoupled by an inverse transform. The resulting uncorrelated random variables are then further used to obtain the polynomial chaos surrogate of the measurement data.
The simulation is performed on the 2D meso-structure with an increasing number of particles and linear displacement based boundary condition.The material properties for the matrix and inclusion phases are kept the same as considered previously in the verification procedure. For a given number of inclusions embedded in the matrix phase, an ensemble of 100 realisations of stored energy is considered to gather corresponding measurement set. In Fig. 5 the PDFs for the identified elastic energies are shown for the pure shear and the bi-axial compression test, respectively. As expected, the variation of stored energy reduces with the increase of the number of particles in the matrix phase. One would expect that these would converge to the same PDF after taking large number of inclusions. However, here the largest taken number was not large enough. In order to be able to compare results to [61], we therefore did not increase this number any more. It is interesting to note that in the compression case the mean responses of stored energies vary more than in a pure shear test. This is closely related to the way how boundary conditions are imposed. Namely, we take into consideration directly the element on which the loadings are imposed, and thus one may recognise the strong influence of boundary conditions on the obtained results. In would be better if one would take into consideration only internal elements, which are away from the boundary. This means that the averaging would be performed only over an RVE embedded in a large domain on which the boundary conditions are applied.
The previously discussed results are estimates of the stored energy given its samples following Eq. (41), obtained by the unsupervised learning algorithm described in "Approximation of the meso-scale observation by unsupervised learning" section. However, the residual uncertainty consists of two kinds of uncertainties: aleatoric (meso-scale randomness) and epistemic (prior information in the unsupervised learning algorithm, see Eq. (35)). Estimating the confidence intervals w.r.t. to the epistemic uncertainties one obtains the corresponding PDFs of energy: the mean PDF which represents purely aleatory uncertainty and p 95 upper and lower PDF's that describe 95% epistemic quantiles on the mean PDF, see Fig. 6a. Naturally the epistemic quantile intervals strongly depend on the size of the measurement set. From Fig. 6b one may conclude that with the smaller measurement set by using only 10 samples our confidence about the estimated PDF is lower than in case of higher number of measurements, as expected.
Besides the previous analysis, the impact of boundary conditions on the upscaled quantities is another important factor to study. In Fig. 7 are depicted the 95% quantiles of energy for linear displacement (LD), periodic (PR) and uniform tension (UT) boundary conditions. According to these results, linear displacement defines the upper bound on the estimated energy, whereas uniform tension gives its lower limit. On the other hand, variations of the energies are similar for all three types of boundary conditions, and are inverse proportion to the number of inclusions.
Once the measurement energy is identified, in the second step we use the proxy of y m to identify the elastic macro-scale material characteristics by using the filter of polynomial order 2 as given in Eq. (42). When using this type of upscaling ,one is biased to the prior knowledge of the material characteristics on the macro-scale. In a multi-scale analysis, however, it is not an easy task to define the prior knowledge, or better to say the limits of the   Fig. 8a. It is interesting to note that even though the posterior (including both aleatoric and epistemic uncertainties) of the upscaled shear moduli changes w.r.t. the prior assumption, its posterior averaged over the epistemic uncertainty as in Eq. (43) remains the same, see Fig. 8b, and does not depend on the prior knowledge.
To verify our result further, in Fig. 9 we compare the aleatoric part of the posterior distribution with the posterior distribution obtained by repeating the deterministic homogenisation, see [61], on each of the meso-scale samples. As one may notice, the distribution coming from the deterministic homogenisation (denoted by det) and the aleatoric one obtained by our approach (denoted by partial) are matching. They are further compared with the full posterior distribution (denoted by total), i.e. the total uncertainty that includes both aleatory and epistemic knowledge.

Upscaling of damage phenomena
In this subsection, the proposed approach is applied to another interesting problem. For this purpose a phenomenological elasto-damage model is considered as described in the beginning of this section. The goal is to compute a homogenised description of random material parameters on the macro-scale given meso-scale measurements. The meso-scale is assumed to follow same constitutive model as the macro-scale. For verification purposes we assume that the meso-scale has homogeneous material properties, which are modelled by a random variable. Hence, the meso-and macro-scale are identical. Furthermore, in a second experiment we model the meso-scale material properties as spatially varying, and apply the upscaling procedure in order to estimate the homogeneous macro-scale material properties. In both experiments we only simulate the displacement controlled uniform bi-axial compression of a 2D block with unit length ( similar to the experiment in the previous example). The volumetric strain ε v = tr ε/2 in 2D is calculated for a given time step through piece-wise linear interpolation from the set of values given as

Verification
For verification purposes the material parameters Q m on the meso-scale are modelled as lognormal random variables with the mean values shown in Table 1 and the coefficient of variation 5%. After propagating the variables through the elasto-damage model, the corresponding measurements as in Eq. (49) are estimated. We assume that the polynomial chaos approximation of the measurement is not given, but only a set of 100 samples. Therefore, the log of measurements are modelled as copula Gaussian mixtures with the unknown number of components. The simulation is run in 8 equidistant time steps, the first two being elastic. In the third to sixth steps the behaviour is a combination of elasticity and damage, whereas in the last step is dominated by the damage component. In Fig. 10 are shown scatter plots of energies in the first and the third step, both depicting two states in the response: elastic and damage. The red circles denote samples that are in the elastic state in the third step, whereas blue crosses denote samples that experience damage behaviour in the third step. The third step is the first time step in which damage behaviour initiates. Hence, the correlation between elastic energies in the third and the first step for samples that are undergoing elastic behaviour is linear, see red circles in left plot in Fig. 10, as expected. However, the elastic part of energy in the third step has nonlinear correlation to the elastic part of energy in the first step for the samples that are switching from elastic to damage state, see blue crosses in the left plot in Fig. 10. Similar holds for the right plot in Fig. 10. Here one may see that the samples that remain in elastic state from the first to the third step do not have E d in the third step (therefore the straight line made of red circles), whereas samples that change their state from elastic to damage have non-zero E d nonlinearly correlated to the elastic part of energy in the first step.
To estimate the macro-scale properties, we observe measurements at the last time step as depicted in Fig. 11. Clearly, the E h and E d are almost linearly related in the log-space,  represents both aleatoric plus epistemic uncertainty, whereas "estimate" is only the aleatoric one whereas this does not hold for the E d and E e . Furthermore, we employ a copula approach, see "Approximation of the meso-scale observation by unsupervised learning" section, to uncouple these measurement data, and estimate their functional approximations. Once they are mapped to Gaussian random variables, we may easily generate the approximation of measurements at other time steps. For this we utilise the approach described in "Approximating the macro-scale response by Bayesian regression" section.
Given the approximation of the measurement y m , we may estimate homogeneous macro-scale properties by the approach described in "Bayesian upscaling of random mesostructures" section. The a priori description of the macro-scale properties is taken to be also modelled as lognormal random variables with the mean 20% larger than in the mesoscale case, and a coefficient of variation of 20%.
The resulting updated macro-scale properties are shown in Fig. 13. The bulk modulus κ and the limit stress σ f that initiates the damage are both updated and match the true distribution, whereas their correlation and the mapping to the normal space is shown in Fig. 12. The left plot in Fig. 12 depicts the correlation between upscaled bulk modulus κ  43) and (44) in terms of PDF of the log of macro-scale parameters, Right: Correlation between the log of macro-scale parameters and the limit stress σ f . Here, one may distinguish the correlation calculated by taking into account the full posterior (both aleatory uncertainty due to uncertain RVEs, as well as epistemic uncertainty or only its aleatoric (estimate) part (obtained by averaging the posterior over the prior uncertainty). Note that the remaining epistemic uncertainty is bigger for the limit stress σ f than the bulk modulus κ, as the relationship between the parameter and the measurement is more nonlinear than in case of the bulk modulus. The right plot in Fig. 12 describes the correlation between variables (η 1 , η 2 ) obtained after mapping the measurement data (E e , E d ) to the Gaussian space by the use of algorithms in "Approximation of the meso-scale observation by unsupervised learning" section. These are referred as transformed variables, and are further compared to the sample set of standard uncorrelated Gaussians in order to verify the mapping algorithm. As can be seen, the mapped Gaussians are indeed uncorrelated, and hence can be used for further approximations. On the other hand, due to the chosen experiment, both shear and hardening moduli stay unidentified as they are not observable. Hence, their analysis is not considered.
In the previously described experiment the relationships between the measurement data and their approximations are too complex in order to be properly modelled. Therefore, the experiment is repeated in same setting, only this time the measurement is not functionally approximated. Instead, the inverse problem is solved for each individual sample of measurement (each RVE), and then the updated parameters are collected into the set of parameter samples as described in Eqs. (44) and (46). This calculation is expected to be simpler than the previous one as the relationships between the material parameters are easier to model. In Fig. 14 is depicted the difference between this approach (est1) and the previous one (est2), as well as the joint distribution between the bulk modulus κ and σ f . Hence, by upscaling we obtain a simpler representation of our meso-scale data, however, at the expense of a correlation of the material parameters.

Upscaling of a heterogeneous medium
As before, the block is deformed by displacement controlled uniform bi-axial compression.
As far as the material description is concerned, the material properties on the heterogeneous meso-scale are a priori assumed to be realisations of log-normal random fields with the statistics depicted in Table 2, and Gaussian covariance functions. These are simulated using different values of the correlation length c ∈ {5l e , 10l e , 25l e } (l e is the element length on the meso-scale) and coefficients of variation c var ∈ {5%, 10%}. In Fig. 15 is shown an example of the meso-scale random field realisations given different correlation lengths. The realisation is becoming smoother when the correlation length increases. This means that the material becomes more homogeneous in the limit c = ∞. On the other hand, the macro-scale material properties are taken a priori as a log- normal random variables, with the same mean and standard deviation as their meso-scale counterpart.
The measurement data are made of three type of averaged data: E e , E d , and E h . Their logarithms are simulated using mixture models and vine copulas, and further identified using a variational Bayesian rule. Similarly to the experiment in the verification section, in the first two simulation steps one may observe only the elastic energy, as the dissipation effects do not appear yet. Therefore, we start the simulation with the last step, and Fig. 19 The presence of damage (elements marked in black) on the meso-scale for one random field realisation with c var = 0.1 and c = 10 e w.r.t. time step approximate the corresponding measurements by mixture models and vine copulas, see Fig. 16. The complete simulation steps from the previous section are repeated. However, it is interesting to note the change of the scatter plots with the correlation length. The scatter plots of E e in two consecutive linear steps are wider with the reduction of the correlation length. The opposite holds true when it comes to relationship between E e and E d in the last simulation step.
The distribution of damage is graphically illustrated in Figs. 18 and 19. The elements without any color are undamaged, whereas the ones marked in black are damaged. As shown in Fig. 17 the variation of measurements increases with the correlation length size for the case when the c var of the meso-scale random field is taken to be 10%. The reason for this is that measurement realisations are less fluctuating with increasing correlation length, but their average value is more pronounced as prospective fluctuations do not cancel out, as similar can be concluded when observing Fig. 15. The previous conclusion holds for all measurements, and can be explained by Fig. 18 in which the presence of damage for one random field realisation and different correlation lengths is shown. With increase of the correlation length the damage is more pronounced, and hence one expects higher variations. In addition, one may also conclude that the corresponding PDFs are becoming more skewed when the material model approaches the homogeneous case, see Fig. 17. The skewness in terms of long tails is not completely caused by variations of the random meso-scale, but also by the inaccuracy of the variational method used for the PCE estimation of the measurement due to possible overestimation.
On the right side of Fig. 17, one may observe the energy evolution w.r.t. to time. Here, c var of the corresponding meso-scale random field is chosen to be 10% and the correlation length is c = 10 e . The top figure depicts the elastic energy. As expected, the energy variation grows in time. On the contrary, E d seen in the middle does not alter much the PDF form. The damage initialises in the third step, and mostly shifts towards higher average value due to increased presence of damage as shown in Fig. 19. Finally, E h increases, but also changes the PDF form significantly in time.
The upscaled parameter estimates behave similarly to the measurement estimates as shown in Fig. 20. The hardening parameter does not get updated, and stays constant over time.

Conclusion
The stochastic multi-scale analysis as previously presented is one particular kind of inverse problem in which the macro-scale parameters are to be estimated given the meso-scale information. In this paper we employed an extended Hill-Mandel principle in order to estimate the macro-scale parameters given the meso-scale energy observations. Such an approach allows the fitting of appropriate constitutive laws on the macro-scale counterpart, the ones that are optimally matching the energy information. Furthermore, we show that in a case when the meso-scale energy information is of deterministic kind, i.e. describes one particular RVE, the process of estimation can be easily done by employing a nonlinear conditional expectation filter. The filter represents the map between the observation and the quantity of interest, i.e. the macro-scale model parameters, or the model itself. In addition, we have shown that this kind of mapping can be also used in a more general situation, in which one wants to upscale an ensemble of meso-structures, and the meso-scale information is described by aleatoric uncertainty. The only requirement to achieve this is to fully specify the random variables representing the data, i.e. to describe its probability distribution. For this purpose we employ a Bayesian variational inference in combination with copula theory. Computationally, the measurement probability distribution is then represented by a functional approximation in terms of the polynomial chaos expansion obtained by mapping the measurement data to the Gaussian space, applying an inverse transform, and using an additional sparse Bayes variational regression for the purpose of estimation of the expansion coefficients. As the inverse map from the energy space to the Gaussian one is not easy to approximate, we recommend to first discretise the energy space (i.e. to sample), and then to map each sample to the macro-scale model parameters. As shown on both the linear elastic and the elasto-damage examples, the latter ones can be more accurately approximated. Note that in this paper we have only observed the elasto-damage models on two scales under one specific loading condition. This was possible due to the simple nature of the damage model. However, in practice this will not suffice to achieve a good macro-scale representation. Therefore, the next step to be considered is to add different loading conditions into estimation.