Parametric models analysed with linear maps

Parametric entities appear in many contexts, be it in optimisation, control, modelling of random quantities, or uncertainty quantification. These are all fields where reduced order models (ROMs) have a place to alleviate the computational burden. Assuming that the parametric entity takes values in a linear space, we show how is is associated to a linear map or operator. This provides a general point of view on how to consider and analyse different representations of such entities. Analysis of the associated linear map in turn connects such representations with reproducing kernel Hilbert spaces and affine-/linear-representations in terms of tensor products. A generalised correlation operator is defined through the associated linear map, and its spectral analysis helps to shed light on the approximation properties of ROMs. This point of view thus unifies many such representations under a functional analytic roof, leading to a deeper understanding and making them available for appropriate analysis.


Introduction
Many mathematical and computational models depend on parameters.These may be quantities which have to be optimised during a design, or controlled in a real-time setting, or these parameters may be uncertain and represent uncertainty present in the model.Such parameter dependent models are usually specified in such a way that an input to the model, e.g. a process or a field, depends on these parameters.In an analogous fashion, the output or the "state" of the model will depend on those parameters.Any of these entities may be called a parametric model.To make things a bit more specific, we look at an example: Consider the parametric entities in the following equation: Here A(ζ(µ); •) : V → V is a possibly nonlinear opertor from the Hilbert space U into itself, dependent on ζ(µ) ∈ Z -a vector in another Hilbert space Z used to specify the system -u ∈ V is the state of the system described by A, whereas f (µ) ∈ V is the excitation resp.action on the system.The parameters µ ∈ M are elements of some admissible parameter set M.Here f (µ) and ζ(µ) are two examples of such parametric entities; and as the whole equation depends on µ, we assume that for each µ ∈ M the system Eq.( 1) will be well-posed and allow for the state u(µ) to also be a unique function of the parameters -another example of a parametric entity.
When one has to do computations with a system such as Eq. ( 1), one needs computational representations of the parametric entities such as the "inputs" f (µ), ζ(µ), and also the to be determined state u(µ), the "output".Let us denote any of such generic entities as r(µ); then one seeks a computational expression to compute r(µ) for any given parameter µ ∈ M. The first question which has to be addressed is how to choose "good co-ordinates" on the parameter set M. With this we mean scalar functions ξ m : M → R, so that the collection and specification of all {ξ m (µ)} m=1,...,M will on one hand specify the particular µ ∈ M as regards the system Eq.( 1), and on the other hand be a computational handle for the parametric entities r(µ), which now can be expressed as r(ξ 1 , . . ., ξ M ).Often the parameter set is already given as M ⊆ R d , so that µ = [η 1 , . . ., η d ] ∈ R d are directly given co-ordinates, and the co-ordinate functions η k may directly serve as co-ordinates.But often, and not only, but especially, when d ∈ N is a large number, it may be advisable to choose other co-ordinates ξ m , which should be free of possible constraints and be as "independent" as possible.This is usually part of finding a good computational representation for r(µ), and will be addressed as part of our analysis.One may term this as a re-parametrisation of the problem.
The second question to be addressed is the actual number of degrees-of-freedom needed to describe the behaviour of the system Eq.( 1) through some finite-dimensional approximation or discretisation.Often the initial process of discretisation produces a first approximation with a large number of degrees-of-freedom; this initial computational model is often referred to as a full-scale or high-fidelity model.For many computational purposes it is necessary to reduce the number of degrees-of-freedom in the computational model in order to be able to carry out the computations involved in an optimisation or uncertainty quantification in a acceptable amount of time; such computational models are then termed reduced order models (ROMs).If the high-fidelity model is a parametric model, the same is required from the ROM r a (µ) ≈ r(µ).
The question of how to produce ROMs for specific kinds of systems like Eq. ( 1) is an important one, and is the subject of extensive current research.For the general subject of model order reduction there is an excellent collection of recent work in [40] and survey in [11], as well as an introductory text in [39]; see also [20,34] for important contributions.
Besides these general considerations, in the present case parametrised ROMs are of particular interest.The general survey [5] covers the literature up to 2015 very well, as well as the later one [10], which is concerned mainly with uncertainty quantification.Excellent collections on the topic of parametrised ROMs are contained in [6] and [4].A recent systematic monograph is [27], and important recent contributions are e.g.[9,45,46].Machine learning and so-called data-driven procedures have also been used in this context, see the recent contributions in [28,42,41,43,44], but this is at the very beginning.
Here a particular point of view will be taken for the analysis -not to be found in the recent literature just surveyed -namely the identification of a parametric entity with a linear mapping defined on the dual space, which is introduced in Section 2. This idea has been around for a long time, and has surfaced mostly when the "strong" notion of a concept has to be replaced by a "weaker" one.In this sense one may see the present point of view as a generalisation of the view of distributions of generalised functions as linear mappings [21,23].They were used to define weak notions of random quantities [24], and some of the present ideas are also contained in [33].In some sense these ideas are already contained in [30] -see also the English translation [31] -and may most probably be found even earlier.The reason on why to approach the subject in this way is that for linear operators there is a host of methods which can be used for their analysis, and it puts all such parametric entities under one "roof".
Here we want to explain the basic abstract framework and how it applies to ROMs.This present work is a continuation of [36] and [38,37,35].The general theory was shown in [38], and here the purpose is primarily to give an introduction into this kind of analysis, which draws strongly on the spectral analysis of self-adjoint operators (e.g.[22,24,16]), and an overview on how to use it in the analysis of ROMs.This is the topic of Section 3. Coupled systems and their ROMs are the focus of [37], and [35] is a short note on how this is used for random fields and processes.In the Section 4 some examples of such refinements of the basic concept are given.
As will be seen, it is very natural to deal with tensor products in this topic of parametrised entities.In the form of the proper generalised decomposition (PGD) this idea has been explained and used in [13,2,17,12,11].The topic of tensor approximations [26] turns out to be particularly relevant here, and recently new connections between such approximations and machine learning with deep networks have been exposed [14,32].In Section 5 we conclude with a recapitulation of the main ideas.

Parametric models and linear maps
This is a gentle introduction and short recap of the developments in [38,37,35], where the interested reader may find more detail.To start, and to take a simple motivating example, one could think of a scalar function r(x, µ), defined on some set X , which depends on some parameters in a set M -in other words a parametric function.In what follows, this function will be viewed as a mapping so that for each value of µ ∈ M the function w := r(•, µ) ∈ U is a scalar function w(x) defined on the set X .
To simplify further and make everything finite-dimensional, assume that we are only interested in four positions in X , namely x 1 , x 2 , x 3 , x 4 ∈ X , or, alternatively and even simpler, that X = {x 1 , x 2 , x 3 , x 4 } has only four elements, and finally for the sake of simplicity, that the parameter set has only three elements M = {µ 1 , µ 2 , µ 3 }.Then one can arrange all the possible values of r(x, µ) with the abbreviation r i,j = r(x i , µ j ), (i = 1, . . ., 3, j = 1, . . ., 4) in the following matrix: It is obvious that knowing the function r(x, µ) is equivalent with knowing the matrix R. As a matrix R obviously corresponds to a linear mapping from U = R 4 to R = R 3 , and one has for any u = where φ(µ i ) = 4 j=1 r j (µ i )u j -a weighted average of r(•, µ i ) -is a scalar function φ ∈ F = R M in the linear space F of scalar functions R M on the parameter set M. If we denote the function of Eq. ( 2) in this case by r(•), which for every µ ∈ M is an element Obviously, knowing R is the same as knowing Ru for every u ∈ U = R 4 -actually a basis in U would suffice -which in turn is the same as knowing r(•), u U for every u ∈ U.
The point to take away from this simple example is that the parametric function r(x, µ), where for each parameter value µ ∈ M one has r(•, µ) ∈ U in some linear space U -of functions on X in this case -is equivalent to a linear map R : U → F into a space F ⊆ R M of scalar functions on the parameter set M.
It is now easy to see how to generalise this further to cases where the set X or M or both have infinitely many values, and even further to a case where the vector space of functions U just has an inner product, say given by some integral, so that for u with some measure m on X .Then for each parameter µ ∈ M one has r(•, µ) ∈ U, a function on X , or in other words an element of the linear space U.In this case one defines the linear map R : which is a linear map from U onto a linear space of scalar functions φ ∈ F ⊆ R M on the parameter set M.
This then is almost the general situation, where one views r : M → V as a map from the parameters µ ∈ M, where M may be some arbitrary set, into a topological vector space V.One then defines a linear map R : V * → F ⊆ R M from the dual space V * onto a space of scalar functions F on M by is the duality pairing between V and its dual space V * .For the following exposition of the main ideas we shall take a slightly less general situation by assuming for the sake of simplicity that the linear space V is in fact a separable Hilbert space with an inner product •, • U , and use this in the usual manner to identify it with its dual.
Associated linear map: So with a vector-valued map r : M → V, one defines the corresponding associated linear map R : Obviously only the Hilbert subspace U = cl(span r(M)) ⊆ V actually reached by the map r is interesting, whereas U ⊥ = ker R ⊆ V is not.Hence from now on we shall only look at r : M → U, and additionally assume that U = cl(span r(M)), or in other words, that the vectors {r(µ) | µ ∈ M} = r(M) form a total set in U.The map R is thus formally redefined as R : Again, in the linear space R M of all scalar functions on M, only the part F = im R = R(U) is interesting.Allow here a little digression, to point out similarities and analogies to other connected concepts.First, on the parameter set M, where up to now no additional mathematical structure was used, we now have the linear space F .This can be viewed as a first step to introduce some kind of "co-ordinates" on the set M, and is in line with many other constructs where potentially complicated sets are characterised by algebraic constructs, such as groups or vector spaces for e.g.homology or cohomology.Even if from the outset the parameter set M ⊆ R m is given as some subset of some R m and therefore has already coordinates, these may not be good ones, and as we shall see, it may be worthwhile to contemplate re-parametrisations, i.e. choosing some φ k ∈ F as "co-ordinates".These real valued functions are in general of course not "real co-ordinates", as they only distinguish what is being felt by the parametric object r.

Reproducing kernel Hilbert space:
The second concept to touch on comes from the idea to use the function space F in place of span r(M): As is easy to see, the map in Eq. ( 6) is injective, hence invertible on its image F = im R = R(U), and this may be used to define an inner product on F as and to denote the completion of F with this inner product by R = cl F ⊆ R M .One immediately obtains that R−1 is a bijective isometry between span im r and F , hence extends to a unitary map R−1 between U and R, and the same hold for R, the extension being denoted by R.
Given the maps r : M → U and R : U → R, one may define the reproducing kernel [7,29] given by κ(µ 1 , µ 2 ) := r(µ 1 ), r(µ 2 ) U .It is straightforward to verify that κ(µ, •) ∈ F ⊆ R, and span{κ(µ, •) | µ ∈ M} = F , as well as the reproducing property φ(µ) = κ(µ, •), φ R for all φ ∈ F .Another way of stating this reproducing property is to say that the linear map An abstract way of putting this using the adjoint R * = R−1 of the unitary map R would be to note that that map is in fact With the reproducing kernel Hilbert space (RKHS) R one can build a first representation and thus obtain a relevant "co-ordinate system" for M. As U is separable, it has a Hilbert basis or complete orthonormal system (CONS) {y k } k∈N .As R is unitary, the set With this, the unitary operator R, its adjoint or inverse R * = R−1 , and the parametric element r(µ) become [38] Observe that the relations Eq. ( 8) and Eq. ( 9) exhibit the tensorial nature of the representation mapping.One sees that model reductions may be achieved by choosing only subspaces of R, i.e. spanned by a-typically finite-subset of the CONS {ϕ k } k .Furthermore, the representation of r(µ) in Eq. ( 9) is linear in the new "parameters" ϕ k .

Coherent states:
The third concept one should mention in this context is the one of coherent states, e.g.see [1,3].In this development from quantum theory, these quantum states were initially introduced as eigenstates of certain operators, and the name refers originally to their high coherence, minimum uncertainty, and quasi classical behaviour.
What is important here is that the idea has been abstracted, and represents overcomplete sets of vectors or frames {r(µ) | µ ∈ M} in a Hilbert space U, which depend on a parameter µ ∈ M from a locally compact measure space.This space often has more structure, e.g. a Lie group, and the coherent states are connected with group representations in the unitary group of U, i.e. if µ → U(µ) ∈ L (U) is a unitary representation, the coherent states may be defined by r(µ) = U(µ)w for some w ∈ U.There are usually further requirements like weak continuity for the map M ∋ µ → r(µ) ∈ U, and that these coherent states form a resolution of the identity, in that one has (weakly) where ̟ is a measure on M-naturally defined on some σ-algebra of subsets of M, a detail which needs no further mention here.We shall leave this topic here, and come back to similar representations later, but note in passing the tensor product structure under the integral.The above requirement of the resolution of the identity may sometimes be too strong, and one often falls to the case of RKHS discussed above.
Reduced models: Assume now that M ∋ µ → r a (µ) ∈ U is an approximate or reduced order model (ROM) of r(µ).One possibility of producing such a ROM was already mentioned above by letting the sum in Eq. ( 9) run over fewer terms.The ROM r a (µ) thus has an associated linear map Ra .As the associated linear maps carry all the relevant information, the analysis of both the original parametric object r(µ), and the comparison and analysis of accuracy of the approximation r a (µ) can be carried out in terms of the associated linear maps R and Ra .In the present setting R is unitary, so Ra can be judged by how well it approximates that unitary mapping.In the next Section 3, where a second inner product will be introduced on the space R M of scalar functions of M, this will be even more pronounced, as it will offer the possibility of deciding which CONS or other complete sets in subspaces of R M are advantageous for ROMs.

Correlation and Representation
In what was detailed up to now in the previous Section 2 with regard to the RKHS, was that the structure of the Hilbert space was carried reproduced on the subspace R ⊆ R M of the full function space.In the remarks about coherent states one could already see an additional structure, namely a measure ̟ on M.This measure structure can be used to define the subspace A := L 0 (M, ̟) of measurable functions, as well as its Hilbert subspace of square-integrable functions We shall simply assume here that there is a Hilbert space Q ⊆ R M of functions with inner product •, • Q , which may or may not come from an underlying measure space.The associated linear map R : U → R, essentially defined in Eq. ( 5) with range the RKHS R, will now be seen as a map R : U → Q into the Hilbert space Q, i.e. with a different range with different inner product •, • Q from the RKHS inner product •, • R on R. One may view this inner product as a way to tell what is important in the parameter set M: functions φ with large Q-norm are considered more important than those where this norm is small.The map R : U → Q is thus generally not unitary any more, but for the sake of simplicity, we shall assume that it is a densely defined closed operator, see e.g.[16].As it may be only densely defined, it is sometimes a good idea to define R through a densely defined bilinear form in U ⊗ Q: Following [33,38,37,35], one now obtains a densely defined map C in U through the densely defined bilinear form, in line with Eq. ( 10): The map C = R * R -observe that now the adjoint is w.r.t. the Q-inner product -may be called the "correlation" operator, and is by construction self-adjoint and positive, and if R is bounded resp.continuous, so is C.
In the above case that the Q-inner product comes from a measure, one has from Eq. ( 11) This is reminiscent of what was required for coherent states.But it also shows that if ̟ were a probability measure -i.e.̟(M) = 1 -with the usual expectation operator then the above would be really the familiar correlation operator [33,35] E (r ⊗ r) of the U-valued random variable (RV) r, therefore from now on we shall simply refer to C as the correlation operator, even in the general case not based on a probability measure.
The fact that the correlation operator is self-adjoint and positive implies that its spectrum σ(C) ⊆ R + is real and non-negative.This will be used when analysing it with any of the versions of the spectral theorem for self-adjoint operators (e.g.[16]).The easiest and best known version of this is for finite dimensional maps.
Finite dimensional beginnings: So let us return to the simple example at the beginning of Section 2 where the associated linear map can be represented by a matrix R. If we remember the each row r T (µ j ) is the value for the vector r(µ) for one particular µ ∈ M, we see that the matrix can be written as and that the rows are just "snapshots" for different values µ j .What is commonly done now is the so-called method of proper orthogonal decomposition (POD) to produce a ROM.
The matrix R -to generalise a bit, assume it of size m × n -can be decomposed according to its singular value decomposition (SVD) where the matrices Φ = [φ k ] and V = [v k ] are orthogonal with unit length orthogonal columns -right and left singular vectorsφ k resp.v k , and Σ = diag(ς k ) is diagonal with non-negative diagonal elements ς k , the singular values.For clarity, we arrange the singular values in a decreasing sequence, It is well known that this decomposition is connected with the eigenvalue or spectral decomposition of the correlation with eigenvalues ς 2 k , eigenvectors v k , and its companion with the same eigenvalues, but eigenvectors φ k .The representation is based on R T , and its accompanying POD or Karhunen-Loève decomposition: where φ k (µ j ) = φ j k , and . .] T .The second expression in Eq. ( 15) is a representation for r(µ), and that is the purpose of the whole exercise.Similar expressions may be used as approximations.It clearly exhibits the tensorial nature of the representation, which is also evident in the expressions Eq. ( 12), Eq. ( 13), and Eq. ( 14).One sees here that this is just the j-th column of R T , so that with the canonical basis in by taking other vectors ψ in Q = R m to give weighted averages or interpolations.The general picture which emerges is that the matrix R is a kind of "square root"or more precisely factorisation -of the correlation C = R T R, and that the left part of this factorisation is used for reconstruction resp.representation.In any other factorisation like where B maps into some other space H; the map B will necessarily have essentially the same singular values ς k and right singular vectors v k as R, and can now be used to have a representation or reconstruction of r on H via r ≈ B T h for some h ∈ H.
A popular choice is to use the Choleski-factorisation C = LL T of the correlation into two triangular matrices, and then take B T = L for the reconstruction.
As we have introduced the correlation's spectral factorisation in Eq. ( 13), some other factorisations come to mind, although they may be mostly of theoretical value: where then the reconstruction map is B T = (V Σ) or B T = (V ΣV T ).Obviously, in the second case the reconstruction map is symmetric , and is actually the true square root of the correlation C.
Other factorisation can come from looking at the companion C Q in Eq. ( 14).Any factorisation F : Z → Q or approximate factorisation F a of is naturally a factorisation or approximate factorisation of the correlation where V and Φ are the left and right singular vectors -see Eq. ( 12) -of the associated map R resp. the eigenvectors of the correlation C in Eq. ( 13) and its companion C Q in Eq. (14).A new ROM representation can now be found for z ∈ Z via One last observation here is important: the expressions for r resp.one of its ROMs r a are linear in the newly introduced parameters or "co-ordinates" φ k in Eq. ( 15), resp.ψ in Eq. ( 16), resp.h in Eq. ( 18) and Eq. ( 25), as well as z in Eq. ( 22); which is an important requirement in many numerical methods.
Reduced order models -ROMs: As has become clear now, and was mentioned before, that approximations or ROMs r a (µ) to the full model r(µ) ≈ r a (µ) produce associated maps R a , which are approximate factorisations of the correlation: This introduces different ways of judging how good an approximation is.If one looks at the difference between the full model r(µ) and ist approximation r a (µ) as a residual, and computes weighted versions of it then this is just the difference linear map R − R a applied to the weighting vector u.In Eq. ( 15) is was shown that r( As usual, one may now approximate such an expressions by leaving out terms with small or vanishing singular values, say using only ς 1 , . . ., ς ℓ , getting an approximation of rank ℓ -this also means that the associated linear map R a in Eq. ( 15) has rank ℓ.As is well known [26], this is the best ℓ-term approximation in the norms of U and Q.But from Eq. ( 23) one may gather that the error can also be described through the difference R − R a .As error measure one may take the norm of that difference, and, depending on which norm one chooses, the error is then in -this example approximation -ς ℓ+1 in the operator norm, or min(m,n) k=ℓ+1 ς k in the trace-resp.nuclear norm, or min(m,n) k=ℓ+1 ς 2 k in the Frobenius-resp.Hilbert-Schmidt norm.
On the other hand, different approximations or ROMs can now be obtained by starting with an approximate factorisation and introducing a ROM via r ≈ r a = B T a h.(25) Such a representing linear map B, may, e.g.via its SVD, be written as a sum of tensor products, and approximations B a are often lower rank expressions, directly reflected in a reduced sum for the tensor products.As will become clearer at the end of this section, the bilinear forms Eq. ( 10) resp.Eq. ( 11) can sometimes split into multi-linear forms, thus enabling the further approximation of B a through hierarchical tensor products [26].

Infinite dimensional continuation -discrete spectrum:
For the cases where both U and Q are infinite dimensional, the operators R and C live on infinite dimensional spaces, and the spectral theory gets a bit more complicated.We shall distinguish some simple cases.After finite dimensional resp.finite rank operators just treated in matrix form, the next simplest case is certainly the case when the associated linear map R and the correlation operator C = R * R has a discrete spectrum, e.g. if C is compact, or a function of a compact operator, like for example its inverse.In this case the spectrum is discrete (e.g.[16]), and in the case of a compact operator the non-negative eigenvalues λ k of C may be arranged as a decreasing sequence ∞ > λ 1 ≥ λ 2 ≥ • • • ≥ 0 with only possible accumulation point the origin.It is not uncommon when dealing with random fields that C is a nuclear or trace-class operator, i.e. an operator which satisfies the stronger requirement k λ k < ∞.The spectral theorem for a an operator with purely discrete spectrum takes the form where the eigenvectors It is not necessary to repeat in this setting of compact maps all the different factorisations considered in the preceding paragraphs, and especially their approximations, which will be usually finite dimensional as they are made to be used for actual computations, e.g. the approximations will usually involve only finite portions of the infinite series in Eq. ( 26) and Eq. ( 27), which means that the induced linear maps have finite rank and essentially become finite dimensional, so that the preceding paragraphs apply practically verbatim.
But one consideration is worth to follow up further.In infinite dimensional Hilbert spaces, self-adjoint operators may have a continuous spectrum, e.g.[16]; this is what is usually the case when homogeneous random fields or stationary stochastic processes have to be represented, This means that the expressions developed for purely discrete spectra in Eq. ( 26) and Eq. ( 27) are not general enough.These expressions are really generalisations of the last equalities in Eq. ( 13) and Eq. ( 12); but is is possible to give meaning to the matrix equalities in those equations, which simultaneously cover the case of a continuous spectrum.

In infinite dimensions -non-discrete spectrum:
To this end we introduce the so called multiplication operator: Let L 2 (T ) be the usual Hilbert space on some locally compact measure space T , and let γ ∈ L ∞ (T ) be an essentially bounded function.Then the map for ξ ∈ L 2 (T ) is a bounded operator M γ ∈ L (X ) on L 2 (T ).Such a multiplication operator is the direct analogue of a diagonal matrix in finite dimensions.
Using such a multiplication operator, one may introduce a formulation of the spectral decomposition different from Eq. ( 26) which does not require C to be compact [16], C resp.R do not even have to be continuous resp.bounded: where V : L 2 (T ) → U is unitary between some L 2 (T ) on a measure space T and U.In case C is continuous resp.bounded, one has γ ∈ L ∞ (T ).As C is positive, the function γ is non-negative (γ(t) ≥ 0 a.e. for t ∈ T ).This covers the previous case of operators with purely discrete spectrum if the function γ is a step function and takes only a discrete (countable) set of values -the eigenvalues.This theorem is actually quite well known in the special case that C is the correlation operator of a stationary stochastic process -an integral operator where the kernel is the correlation function; in this case V is the Fourier transform, and γ is known as the power spectrum.

General factorisations:
To investigate the analogues of further factorisations of R, C = R * R, and its companion C Q = RR * , we need the SVD of R and R * .They derive generally in the same manner as for the finite dimensional case from the spectral factorisations of C in Eq. ( 28) and a corresponding one for its companion with a unitary Φ : L 2 (T * ) → Q between some L 2 (T * ) on a measure space T * and Q.Here in Eq. ( 29), and in Eq. ( 28), the multiplication operator M γ plays the role of the diagonal matrix Σ 2 in Eq. ( 13) and Eq. ( 14).For the SVD of R one needs its square root, and as γ is non-negative, this is simply given by M 1/2 γ = M √ γ , i.e. multiplication by √ γ.Hence the SVD of R and R * is given by These are all examples of a general factorisation C = B * B, where B : U → H is a map to a Hilbert space H with all the properties demanded from R-see the beginning of this section.It can be shown [38] that any two such factorisations B 1 : U → H 1 and Equivalently, each such factorisation is unitarily equivalent to R, i.e. there is a unitary X : Analogues of the factorisations considered in Eq. ( 19) are where again And just as in the case of the factorisations of C Q considered in Eq. ( 20) and the resulting factorisation of C in Eq. ( 21), it is also here possible to consider factorisations of C Q in Eq. ( 29), such as with some Hilbert space E, which lead again to factorisations of and representation on the space E; with the representing linear maps given by W * = V Φ * F resp.W * a = V Φ * F a .Coming back to the situation where C has a purely discrete spectrum and a CONS of eigenvectors {v m } m in U, the map B from the decomposition C = B * B can be used to define a CONS {h m } m in H: h m := BC −1/2 v m , which is an eigenvector CONS of the operator C H := BB * : H → H, with C H h m := λ m h m , see [38].From this follows a SVD of B and B * in a manner analogous to Eq. ( 27).The main result is [38] that in the case of a nuclear C with necessarily purely discrete spectrum every factorisation leads to a separated representation in terms of a series, and vice versa.In case C is not nuclear, the representation of a "parametric object" via a linear map is actually more general [38,35] and allows to the rigorous and uniform treatment of also "idealised" objects, like for example Gaussian white noise on a Hilbert space.
In this instance of a discrete spectrum and a nuclear C and hence nuclear C Q , the abstract equation be written in a more familiar form in the case when the inner product on Q is given by a measure ̟ on M. It becomes for all ϕ, ψ ∈ Q: This shows that C Q is really a Fredholm integral operator, and its spectral decomposition is nothing but the familiar theorem of Mercer [15] for the kernel Factorisations of C Q are then usually expressed as factorisations of the kernel κ(µ 1 , µ 2 ), which may involve integral transforms already envisioned in [30] -see also the English translation [31]: where the "factors" ρ(µ, y) are measurable functions on the measure space (Y, n).This is the classical analogue of the general "kernel theorem" [24].

Connections to tensor products:
Although not as obvious as for the case of a discrete spectrum in Eq. ( 12), Eq. ( 13), and Eq. ( 14); and Eq. ( 26), Eq. ( 27), such a connection is also possible in the general case of a non-discrete spectrum.But as the spectral values in the continuous part have no corresponding eigenvectors, one has to use the concept of generalised eigenvectors [22,24,16,38].Then it is possible to formulate the spectral theorem in the following way: with the spectral measure ν on R + .Observe the analogy, especially of Eq. ( 36), with Eq. ( 26), where the sum now has been generalised to an integral to account for the continuous spectrum.Eq. ( 35) is for the case of a simple spectrum; in the more general case of spectral multiplicity large than one, the Hilbert space U = m U m can be written as an orthogonal sum [22,24,16] of Hilbert subspaces U m , each invariant under the operator C, on which an expression like Eq. ( 35) holds, and on which the spectrum is simple.For the sake of brevity we shall only consider the case of a simple spectrum now, and avoid writing the sums over m.The difficulty in going from Eq. ( 26) to Eq. ( 36) is that the values λ in the truly continuous spectrum have no corresponding eigenvector, i.e. v λ / ∈ U, but it has to be found in a generally larger space.The possibility of writing an expression like Eq. ( 35) rests on the concept of a "rigged" resp."equipped" Hilbert space or Gel'fand triple.This means that one can find [24] a nuclear space K ֒→ U densely embedded in the Hilbert space U, such that Eq. ( 35) holds for all u, v ∈ K.This also means that the generalised eigenvectors should be seen as linear functionals on K.As the subspace K is densely embedded in U, it also holds that U ֒→ K * is densely embedded in the topological dual K * of K, i.e. one has the Gel'fand triple The generalised eigenvectors can now be seen as elements of the dual, v λ ∈ K * , where the generalised eigenvalue equation Cv λ = λv λ holds after an appropriate extension of C.
If an expressions such as Eq.(35) or Eq. ( 36) have to be approximated numerically, it becomes necessary to evaluated the integral in an approximate way.The integral is really only over the spectrum of σ(C) of C, as outside of σ(C) the spectral measure ν vanishes.Obviously, one would first split the spectrum σ and a continuous part σ c (C).On the discrete part, the integral is just a sum as shown before.On the continuous part, the integral has to be evaluated by a quadrature formula.Choosing quadrature points λ z ∈ σ c (C) and appropriate integration weights w z ∈ R, the integral can be approximated by an expression very similar to the ones used in case of discrete spectra.
Further tensor products: Essentially, the constructions we have been investigating could be seen as elements of the tensor product U ⊗ Q, or extensions thereof as in the preceding paragraph.Often one, or both of the spaces U or Q, can be further so factored in tensor products, say without loss of generality that Q = Q I ⊗ Q II .This is for example the case for the white-noise modelling of random fields [33,29,36,35], where one has Q = ∞ m=1 S m .We just want to indicate how this structure can be used for further approximation.
It essentially means that the whole foregoing is applied, instead on U ⊗ Q, on the tensor product Q I ⊗ Q II .Combined with the upper level decomposition on U ⊗ Q, one sees that this becomes one on U ⊗ (Q I ⊗ Q II ).The bilinear forms Eq. ( 10) and Eq. ( 11) can thus be written as tri-linear forms, making a direct connection to tensor products and multi-linear forms [26].Like in the just cited example of random fields [36,35], often this can be extended to higher order tensor products in a tree-like manner -by splitting U, or Q I resp.Q II .This leads to a hierarchical structure encoded in this binary tree, with the top product U ⊗ Q the root of the tree, and the individual factors as leaves of the tree.The higher the order of the tensor product, the better it is possible to exploit dependencies in low-rank formats [26,25].This has been recently also pointed out in the tight connections between deep neural networks [14,32] and such tensor decompositions, which come in different formats or representations [26].The indicated binary tree leads to what is known as a hierarchical Tuckeror HT-format; but obviously the multi-factor tensor product can be split also in a non-binary fashion, leading to more general tree-based tensor formats [19].A completely flat tree structure with only root and leaves corresponds to the well known canonic polyadicor CP-decomposition or format, the original proper generalised decomposition (PGD) falls into this category [13,2,17,12,11].

Structure preservation
The foregoing development for a parametric model r : M → U did not assume anything more than that U is a Hilbert space.In Section 2 it was already indicated on how to proceed if U is not a Hilbert space, but a more general topological vector space.The treatment so far preserves the linear structure of the space U, and the approximations are using this linear structure as well.The tensor based representations using tensors of certain rank already have a more difficult geometric structure [18], indeed a manifold structure [8].
But here the concern is about the structure of the image set of the parametric object r(µ) and its preservation under the approximations or ROMs r a (µ).In case the image set-here U-is not a vector space, but say a differential manifold, things are bound to get more complicated; one possible route of attack seems to use the previous linear methods like the ones described here to map into the tangent spaces.One instance of this, which seems to be more accessible, is the case when the image set is a Lie group G. Then everything can be done in the tangent space at the group identity, the Lie algebra g of the Lie group G.The Lie algebra is a linear space, and one may take U := g.One then has to map further from g to G, but this can be achieved by the canonical exponential map exp : g → G.A representation or ROM then would have the form M r,ra This has the added advantage that interpolations along straight lines in g, which like in any Euclidean or unitary space are also geodetics, is mapped into interpolations along geodetics on the Riemannian manifold structure on G.We shall come back to a somewhat similar situation later in this section.
Vector fields: One of the probably simplest situations is when the image set has the structure of V = U ⊗E, where E is a finite-dimensional inner-product (Hilbert) space [33]: and the r k are maps r k (µ)M → U as before in Section 2 and Section 3, whereas the r k are typically linearly independent vectors in E. Often one wants to preserve the structure V = U ⊗ E; one can think of this in the following way: U is a space of scalar functions, on some domain in Euclidean space, and E are vectors from the associated vector space.
Hence one could call this a vector field in some sense.The associated linear map is then defined a bit differently, namely as where the maps R k : U → Q are defined as before in Eq. (10).
The "correlation" can now be given by a bilinear form; namely the densely defined map and extended by linearity, where each R k : U → Q is the map associated to r k (µ) as before for just a single map r(µ).It may be called the "vector correlation".By construction it is self-adjoint and positive.The corresponding kernel is not scalar, but has values in E ⊗ E: The eigenvalue problem on for an integral operator with such a kernel -representing the companion map -is on W = Q ⊗ E.
Coupled systems: A in some way similar situation is when the state space U = U 1 ×U 2 comes from a combined or coupled system [37], and one wants to conserve this information or structure.The state is represented as u = (u 1 , u 2 ), and the natural inner product on such a normal product space is This is for two coupled systems, labelled as '1' and '2'.The parametric map is The associated linear map is As before, these R 2 valued functions on M are like two problem-adapted co-ordinate systems on the joint parameter set, one for each sub-system.From this one obtains the "coupling correlation" C c , again defined through a bilinear form The kernel is then a 2 × 2 matrix valued function in an integral operator on W = Q × Q: Other variations regarding coupled systems are possible, see [37], like when the parameter set M = M 1 × M 2 is a product.The parametric map can then defined as with the associated linear map The correlation may be defined as before in Eq. ( 43), and also the kernel on Q = Q 1 × Q 2 is as in Eq. ( 44), but now the first diagonal entry is a function on M 1 × M 1 only, and analogous for the second diagonal entry.
Tensor fields: This is similar to the case of vector fields in that the state space is W = U ⊗A, where U is a space of scalar valued functions on some set; and A ⊂ B = E ⊗E, where E is a finite-dimensional vector space [38], and A is a manifold of tensors in the full tensor product B of tensors of even degree.If A were the full tensor product B, which is a linear finite-dimensional space, there would be no difference to the case of vector fields.But in case of tensors of even degree are often used in more special situations.Obviously, such tensors may be identified with linear maps [38] L (E) ∼ = B, which will be done here.Therefore one may speak of e.g. the manifold of special orthogonal tensors, say SO(E), and of the manifold of symmetric positive definite tensors Sym + (E).
We shall consider only these two mentioned examples.The special orthogonal tensors are a Lie group A := SO(E) with Lie algebra a := so(E), the skew-symmetric tensors, a free linear space.For S ∈ so(E), the exponential map carries it onto exp(S) ∈ SO(E).Therefore a parametric element in W = U ⊗ A can first be represented as a parametric element in the linear space Z = U ⊗ a, where all the preceding statements on vector fields apply.It is on this intermediary representation that one can define ROMs.Such a representation is then further mapped through exponentiation: The associated linear map goes from Z = U ⊗ a to the linear space Y = Q ⊗ a; and again from here one would use an analogue of the above exponential to map on Q ⊗ A.
The positive definite tensors A := Sym + (E) are not a classical Lie algebra under multiplication, i.e. concatenation of linear maps, but rather only a Riemannian manifold, geometrically a convex cone.But there still is an exponential map, carrying the free linear space of symmetric tensors a := sym(E) onto A := Sym + (E).In fact, for a H ∈ sym(E), the exponential maps it onto exp(H) ∈ Sym + (E).Thus we have recovered formally the same situation as for orthogonal tensors just described, and the same procedures may be followed.

Conclusion
Parametric mappings r : M → U have been analysed with in a variety of settings via the associated linear map R : U → Q ⊆ R M .It was shown that the associated linear map contains the full information present in the parametric entity.It is actually a mathematically more general concept which allows one to define extreme or idealised such entities; this is particularly relevant in the field of uncertainty quantification when one has to deal with stochastic processes and random fields [35].
So instead of analysing a parametric entity r(µ) and ist approximations or ROMs r a (µ) directly, one may take the cues on how to do this from considering the associated linear maps R and R a .One has to say that in practical situations the associated linear maps are typically not simply available explicitly, but they provide a conceptual framework on how to deal with the situation.And even though they are not directly available, the desired quantities needed in such analyses are all in principle computable.
Very closely related to such an associated linear map R is the so-called "correlation operator" C = R * R and its companion C Q = RR * , both self-adjoint and positive definite.Their spectral analysis turns out to be very helpful in understanding the nature of such parametric entities, as well as possible ROMs.The very general nature and mathematical embedding of parametric entities, which also incorporates random fields, is shown in the different spectral properties of the correlation operator.Such generalised parametric entities may yield correlation operators with continuous spectra -as it typically occurs for homogeneous random fields -and thus this needs the full generality of the spectral analysis in rigged Hilbert spaces for understanding the spectral analysis in terms of generalised tensor products.Other factorisations of the correlation, such as C = B * B, induce other representations for the parametric entities, and any other representation or re-parametrisation may be understood in these terms.
Preservation of certain structural properties is often very desirable.Examples are given to show how the general idea can be refined to reflect some linear structures in the representation.This even extends to non-linear manifolds if they can be easily parametrised by linear spaces.Lie groups with their associated Lie algebras are one such example which is mentioned in a bit more detail.This last point is especially relevant to the representation of spatially varying or even random material properties, which are typically fields of symmetric positive tensors.A similar comment applies to "orientation fields", which are spatially varying and possibly random fields of orthogonal tensors.
Additionally it is explained how representations in tensor product spaces arise naturally in such situations, and how this process can be cascaded to produce a tree like structure for the analysis.Low-rank tensor approximations can thus be used as ROMs, and this certainly offers fresh new impulses.The same applies to machine learning and data-driven approaches, which obviously can also be analysed with the proposed framework.These deep learning methods have recently been shown to be closely connected with low-rank tensor approximations, offering some insights and avenues for their analysis.With the proposed framework of analysing such parametric entities via linear maps, we hope to introduce a fresh point of view which may lead to new ideas on how to construct and analyse ROMs.