In what was detailed up to now in the previous “Parametric models and linear maps” section with regard to the RKHS, was that the structure of the Hilbert space was carried reproduced on the subspace \(\mathcal {R}\subseteq \mathbb {R}^{\mathcal {M}}\) of the full function space. In the remarks about coherent states one could already see an additional structure, namely a measure \(\varpi \) on \(\mathcal {M}\). This measure structure can be used to define the subspace \(\mathcal {A}:=\mathrm {L}_0(\mathcal {M},\varpi )\) of measurable functions, as well as its Hilbert subspace of square-integrable functions \(\mathcal {Q}:=\mathrm {L}_2(\mathcal {M},\varpi )\) with associated inner product
$$\begin{aligned} \langle \phi , \psi \rangle _{\mathcal {Q}} := \int _{\mathcal {M}} \phi (\mu ) \psi (\mu ) \; \varpi ({\mathrm {d}}\mu ). \end{aligned}$$
We shall simply assume here that there is a Hilbert space \(\mathcal {Q}\subseteq \mathbb {R}^{\mathcal {M}}\) of functions with inner product \(\langle \cdot , \cdot \rangle _{\mathcal {Q}}\), which may or may not come from an underlying measure space. The associated linear map \(\tilde{R}:\mathcal {U}\rightarrow \mathcal {R}\), essentially defined in Eq. (5) with range the RKHS \(\mathcal {R}\), will now be seen as a map \(R:\mathcal {U}\rightarrow \mathcal {Q}\) into the Hilbert space \(\mathcal {Q}\), i.e. with a different range with different inner product \(\langle \cdot , \cdot \rangle _{\mathcal {Q}}\) from the RKHS inner product \(\langle \cdot , \cdot \rangle _{\mathcal {R}}\) on \(\mathcal {R}\). One may view this inner product as a way to tell what is important in the parameter set \(\mathcal {M}\): functions \(\phi \) with large \(\mathcal {Q}\)-norm are considered more important than those where this norm is small. The map \(R:\mathcal {U}\rightarrow \mathcal {Q}\) is thus generally not unitary any more, but for the sake of simplicity, we shall assume that it is a densely defined closed operator, see e.g. [16]. As it may be only densely defined, it is sometimes a good idea to define R through a densely defined bilinear form in \(\mathcal {U}\otimes \mathcal {Q}\):
$$\begin{aligned} \forall u\in {{\mathrm {dom}}}R, \phi \in \mathcal {Q}: \langle Ru, \phi \rangle _{\mathcal {Q}} := \langle \langle r(\cdot ), u \rangle _{\mathcal {U}}, \phi \rangle _{\mathcal {Q}}. \end{aligned}$$
(10)
Following [33, 35, 37, 38], one now obtains a densely defined map C in \(\mathcal {U}\) through the densely defined bilinear form, in line with Eq. (10):
$$\begin{aligned} \forall u, v:\quad \langle Cu, v \rangle _{\mathcal {U}} := \langle Ru, Rv \rangle _{\mathcal {Q}} . \end{aligned}$$
(11)
The map \(C=R^* R\)—observe that now the adjoint is w.r.t. the \(\mathcal {Q}\)-inner product—may be called the “correlation” operator, and is by construction self-adjoint and positive, and if R is bounded resp. continuous, so is C.
In the above case that the \(\mathcal {Q}\)-inner product comes from a measure, one has from Eq. (11)
$$\begin{aligned} \langle Cu, v \rangle _{\mathcal {U}} = \int _{\mathcal {M}} \langle r(\mu ), u \rangle _{\mathcal {U}}\langle r(\mu ), v \rangle _{\mathcal {U}} \; \varpi ({\mathrm {d}}\mu ),\text { i.e. } C = R^* R = \int _{\mathcal {M}} r(\mu ) \otimes r(\mu ) \; \varpi ({\mathrm {d}}\mu ). \end{aligned}$$
This is reminiscent of what was required for coherent states. But it also shows that if \(\varpi \) were a probability measure—i.e. \(\varpi (\mathcal {M})=1\)—with the usual expectation operator
$$\begin{aligned} \mathbb {E}\left( \phi \right) := \int _{\mathcal {M}} \phi (\mu ) \; \varpi ({\mathrm {d}}\mu ), \end{aligned}$$
then the above would be really the familiar correlation operator [33, 35] \(\mathbb {E}\left( r\otimes r\right) \) of the \(\mathcal {U}\)-valued random variable (RV) r, therefore from now on we shall simply refer to C as the correlation operator, even in the general case not based on a probability measure.
The fact that the correlation operator is self-adjoint and positive implies that its spectrum \(\sigma (C)\subseteq \mathbb {R}^+\) is real and non-negative. This will be used when analysing it with any of the versions of the spectral theorem for self-adjoint operators (e.g. [16]). The easiest and best known version of this is for finite dimensional maps.
Finite dimensional beginnings
So let us return to the simple example at the beginning of the “Parametric models and linear maps” section where the associated linear map can be represented by a matrix \({\varvec{R}}\). If we remember the each row \({\varvec{r}}^{\mathsf {T}}(\mu _j)\) is the value for the vector \({\varvec{r}}(\mu )\) for one particular \(\mu \in \mathcal {M}\), we see that the matrix can be written as
$$\begin{aligned} {\varvec{R}} = [{\varvec{r}}(\mu _1),\dots ,{\varvec{r}}(\mu _j),\dots ]^{\mathsf {T}}, \end{aligned}$$
and that the rows are just “snapshots” for different values \(\mu _j\). What is commonly done now is the so-called method of proper orthogonal decomposition (POD) to produce a ROM.
The matrix \({\varvec{R}}\)—to generalise a bit, assume it of size \(m\times n\)—can be decomposed according to its singular value decomposition (SVD)
$$\begin{aligned} {\varvec{R}} = {\varvec{\varPhi }}{{\varvec{\varSigma }}}{\varvec{V}}^{\mathsf {T}}= \sum _{k=1}^{\min (m,n)} \varsigma _k\, {\phi }_k \otimes {\varvec{v}}_k, \end{aligned}$$
(12)
where the matrices \({{\varvec{\varPhi }}}=[{\phi }_k]\) and \({\varvec{V}}=[{\varvec{v}}_k]\) are orthogonal with unit length orthogonal columns—right and left singular vectors—\({\phi }_k\) resp. \({\varvec{v}}_k\), and \({{\varvec{\varSigma }}} = {{\mathrm {diag}}}(\varsigma _k)\) is diagonal with non-negative diagonal elements \(\varsigma _k\), the singular values. For clarity, we arrange the singular values in a decreasing sequence, \(\varsigma _1 \ge \varsigma _2 \ge \dots \ge 0\). It is well known that this decomposition is connected with the eigenvalue or spectral decomposition of the correlation
$$\begin{aligned} {\varvec{C}} = {\varvec{R}}^{\mathsf {T}}{\varvec{R}} = {\varvec{V}}{{\varvec{\varSigma }}}{{\varvec{\varPhi }}}^{\mathsf {T}}{{\varvec{\varPhi }}}{{\varvec{\varSigma }}}{\varvec{V}}^{\mathsf {T}}= {\varvec{V}}{{\varvec{\varSigma }}}^2{\varvec{V}}^{\mathsf {T}}= \sum _{k=1}^{\min (m,n)} \varsigma _k^2 \, {\varvec{v}}_k\otimes {\varvec{v}}_k, \end{aligned}$$
(13)
with eigenvalues \(\varsigma _k^2\), eigenvectors \({\varvec{v}}_k\), and its companion
$$\begin{aligned} {\varvec{C}}_{\mathcal {Q}} := {\varvec{R}} {\varvec{R}}^{\mathsf {T}}= {{\varvec{\varPhi }}}{{\varvec{\varSigma }}}^2{{\varvec{\varPhi }}}^{\mathsf {T}}= \sum _{k=1}^{\min (m,n)} \varsigma _k^2 \, {\phi }_k\otimes {\phi }_k, \end{aligned}$$
(14)
with the same eigenvalues, but eigenvectors \({\phi }_k\). The representation is based on \({\varvec{R}}^{\mathsf {T}}\), and its accompanying POD or Karhunen–Loève decomposition:
$$\begin{aligned} {\varvec{R}}^{\mathsf {T}}= \sum _{k=1}^{\min (m,n)} \varsigma _k\, {\varvec{v}}_k \otimes {\phi }_k ,\qquad {\varvec{r}}(\mu _j) = \sum _{k=1}^{\min (m,n)} \varsigma _k\, {\varvec{v}}_k \otimes {\phi }_k(\mu _j), \end{aligned}$$
(15)
where \({\phi }_k(\mu _j)=\phi _k^j\), and \({\phi }_k = [\phi _k^1,\dots ,\phi _k^j,\dots ]^{\mathsf {T}}\).
The second expression in Eq. (15) is a representation for \({\varvec{r}}(\mu )\), and that is the purpose of the whole exercise. Similar expressions may be used as approximations. It clearly exhibits the tensorial nature of the representation, which is also evident in the expressions Eqs. (12), (13), and (14). One sees here that this is just the j-th column of \({\varvec{R}}^{\mathsf {T}}\), so that with the canonical basis in \(\mathcal {Q}=\mathbb {R}^m\), \({\varvec{e}}_j^{(m)} = [\updelta _{ij}]^{\mathsf {T}}\) with the Kronecker-\(\updelta \), that expression becomes just
$$\begin{aligned} {\varvec{r}}(\mu _j) = {\varvec{R}}^{\mathsf {T}}{\varvec{e}}_j^{(m)}; \quad \text { and } \quad {\varvec{r}}\approx {\varvec{r}}_a = {\varvec{R}}^{\mathsf {T}}{\psi } \end{aligned}$$
(16)
by taking other vectors \({\psi }\) in \(\mathcal {Q}=\mathbb {R}^m\) to give weighted averages or interpolations.
The general picture which emerges is that the matrix \({\varvec{R}}\) is a kind of “square root”—or more precisely factorisation—of the correlation \({\varvec{C}}={\varvec{R}}^{\mathsf {T}}{\varvec{R}}\), and that the left part of this factorisation is used for reconstruction resp. representation. In any other factorisation like
$$\begin{aligned} {\varvec{C}} = {\varvec{B}}^{\mathsf {T}}{\varvec{B}}, \quad \text { with } \quad {\varvec{B}}:\mathcal {U}\rightarrow \mathcal {H}, \end{aligned}$$
(17)
where \({\varvec{B}}\) maps into some other space \(\mathcal {H}\); the map \({\varvec{B}}\) will necessarily have essentially the same singular values \(\varsigma _k\) and right singular vectors \({\varvec{v}}_k\) as \({\varvec{R}}\), and can now be used to have a representation or reconstruction of \({\varvec{r}}\) on \(\mathcal {H}\) via
$$\begin{aligned} {\varvec{r}} \approx {\varvec{B}}^{\mathsf {T}}{\varvec{h}} \quad \text { for some } \quad {\varvec{h}}\in \mathcal {H}. \end{aligned}$$
(18)
A popular choice is to use the Choleski-factorisation \({\varvec{C}}={\varvec{L}}{\varvec{L}}^{\mathsf {T}}\) of the correlation into two triangular matrices, and then take \({\varvec{B}}^{\mathsf {T}}={\varvec{L}}\) for the reconstruction.
As we have introduced the correlation’s spectral factorisation in Eq. (13), some other factorisations come to mind, although they may be mostly of theoretical value:
$$\begin{aligned} {\varvec{C}} = {\varvec{B}}^{\mathsf {T}}{\varvec{B}} = ({\varvec{V}}{{\varvec{\varSigma }}})({\varvec{V}}{{\varvec{\varSigma }}})^{\mathsf {T}}= ({\varvec{V}}{{\varvec{\varSigma }}}{\varvec{V}}^{\mathsf {T}})({\varvec{V}}{{\varvec{\varSigma }}}{\varvec{V}}^{\mathsf {T}})^{\mathsf {T}}, \end{aligned}$$
(19)
where then the reconstruction map is \({\varvec{B}}^{\mathsf {T}}= ({\varvec{V}}{{\varvec{\varSigma }}})\) or \({\varvec{B}}^{\mathsf {T}}= ({\varvec{V}}{{\varvec{\varSigma }}}{\varvec{V}}^{\mathsf {T}})\). Obviously, in the second case the reconstruction map is symmetric \({\varvec{B}}^{\mathsf {T}}={\varvec{B}}={\varvec{C}}^{1/2}\), and is actually the true square root of the correlation \({\varvec{C}}\).
Other factorisation can come from looking at the companion \({\varvec{C}}_{\mathcal {Q}}\) in Eq. (14). Any factorisation \({\varvec{F}}:\mathcal {Z}\rightarrow \mathcal {Q}\) or approximate factorisation \({\varvec{F}}_a\) of
$$\begin{aligned} {\varvec{C}}_{\mathcal {Q}} = {\varvec{F}}{\varvec{F}}^{\mathsf {T}}\approx {\varvec{F}}_a{\varvec{F}}_a^{\mathsf {T}}\end{aligned}$$
(20)
is naturally a factorisation or approximate factorisation of the correlation
$$\begin{aligned} {\varvec{C}} = {\varvec{W}}^{\mathsf {T}}{\varvec{W}} \approx {\varvec{W}}_a^{\mathsf {T}}{\varvec{W}}_a, \quad \text { with } \quad {\varvec{W}} = {\varvec{F}}^{\mathsf {T}}{{\varvec{\varPhi }}}{\varvec{V}}^{\mathsf {T}}\text { and } {\varvec{W}}_a = {\varvec{F}}_a^{\mathsf {T}}{{\varvec{\varPhi }}}{\varvec{V}}^{\mathsf {T}}, \end{aligned}$$
(21)
where \({\varvec{V}}\) and \({{\varvec{\varPhi }}}\) are the left and right singular vectors—see Eq. (12)—of the associated map \({\varvec{R}}\) resp. the eigenvectors of the correlation \({\varvec{C}}\) in Eq. (13) and its companion \({\varvec{C}}_{\mathcal {Q}}\) in Eq. (14). A new ROM representation can now be found for \({\varvec{z}}\in \mathcal {Z}\) via
$$\begin{aligned} {\varvec{r}} \approx {\varvec{r}}_a = {\varvec{W}}_a^{\mathsf {T}}{\varvec{z}} = {\varvec{V}}{{\varvec{\varPhi }}}^{\mathsf {T}}{\varvec{F}}_a {\varvec{z}}. \end{aligned}$$
(22)
One last observation here is important: the expressions for \({\varvec{r}}\) resp. one of its ROMs \({\varvec{r}}_a\) are linear in the newly introduced parameters or “co-ordinates” \({\phi }_k\) in Eq. (15), resp. \({\psi }\) in Eq. (16), resp. \({\varvec{h}}\) in Eq. (18) and Eq. (25), as well as \({\varvec{z}}\) in Eq. (22); which is an important requirement in many numerical methods.
Reduced order models—ROMs
As has become clear now, and was mentioned before, that approximations or ROMs \({\varvec{r}}_a(\mu )\) to the full model \({\varvec{r}}(\mu ) \approx {\varvec{r}}_a(\mu )\) produce associated maps \({\varvec{R}}_a\), which are approximate factorisations of the correlation:
$$\begin{aligned} {\varvec{C}} \approx {\varvec{R}}_a^{\mathsf {T}}{\varvec{R}}_a . \end{aligned}$$
This introduces different ways of judging how good an approximation is. If one looks at the difference between the full model \({\varvec{r}}(\mu )\) and ist approximation \({\varvec{r}}_a(\mu )\) as a residual, and computes weighted versions of it
$$\begin{aligned} \langle {\varvec{r}}(\cdot ) - {\varvec{r}}_a(\cdot ), u \rangle _{\mathcal {U}} = ({\varvec{R}}-{\varvec{R}}_a) {\varvec{u}} = {\varvec{R}} {\varvec{u}} - {\varvec{R}}_a {\varvec{u}} , \end{aligned}$$
(23)
then this is just the difference linear map \({\varvec{R}}-{\varvec{R}}_a\) applied to the weighting vector \({\varvec{u}}\). In Eq. (15) is was shown that \({\varvec{r}}(\cdot ) = \sum _{k=1}^{\min (m,n)} \varsigma _k\, {\varvec{v}}_k \otimes {\phi }_k(\cdot )\) is a representation. As usual, one may now approximate such an expressions by leaving out terms with small or vanishing singular values, say using only \(\varsigma _1, \dots ,\varsigma _\ell \), getting an approximation of rank \(\ell \)—this also means that the associated linear map \({\varvec{R}}_a\) in Eq. (15) has rank \(\ell \). As is well known [26], this is the best \(\ell \)-term approximation in the norms of \(\mathcal {U}\) and \(\mathcal {Q}\). But from Eq. (23) one may gather that the error can also be described through the difference \({\varvec{R}}-{\varvec{R}}_a\). As error measure one may take the norm of that difference, and, depending on which norm one chooses, the error is then in—this example approximation —\(\varsigma _{\ell +1}\) in the operator norm, or \(\sum _{k=\ell +1}^{\min (m,n)}\varsigma _k\) in the trace- resp. nuclear norm, or \(\sqrt{\sum _{k=\ell +1}^{\min (m,n)}\varsigma _k^2}\) in the Frobenius- resp. Hilbert–Schmidt norm.
On the other hand, different approximations or ROMs can now be obtained by starting with an approximate factorisation
$$\begin{aligned} {\varvec{C}} \approx {\varvec{B}}_a^{\mathsf {T}}{\varvec{B}}_a, \end{aligned}$$
(24)
and introducing a ROM via
$$\begin{aligned} {\varvec{r}} \approx {\varvec{r}}_a = {\varvec{B}}_a^{\mathsf {T}}{\varvec{h}} . \end{aligned}$$
(25)
Such a representing linear map \({\varvec{B}}\), may, e.g. via its SVD, be written as a sum of tensor products, and approximations \({\varvec{B}}_a\) are often lower rank expressions, directly reflected in a reduced sum for the tensor products. As will become clearer at the end of this section, the bilinear forms Eq. (10) resp. Eq. (11) can sometimes split into multi-linear forms, thus enabling the further approximation of \({\varvec{B}}_a\) through hierarchical tensor products [26].
Infinite dimensional continuation—discrete spectrum
For the cases where both \(\mathcal {U}\) and \(\mathcal {Q}\) are infinite dimensional, the operators R and C live on infinite dimensional spaces, and the spectral theory gets a bit more complicated. We shall distinguish some simple cases. After finite dimensional resp. finite rank operators just treated in matrix form, the next simplest case is certainly the case when the associated linear map R and the correlation operator \(C=R^* R\) has a discrete spectrum, e.g. if C is compact, or a function of a compact operator, like for example its inverse. In this case the spectrum is discrete (e.g. [16]), and in the case of a compact operator the non-negative eigenvalues \(\lambda _k\) of C may be arranged as a decreasing sequence \(\infty >\lambda _1\ge \lambda _2\ge \dots \ge 0\) with only possible accumulation point the origin. It is not uncommon when dealing with random fields that C is a nuclear or trace-class operator, i.e. an operator which satisfies the stronger requirement \(\sum _k \lambda _k < \infty \). The spectral theorem for an operator with purely discrete spectrum takes the form
$$\begin{aligned} C = R^* R = \sum _{k=1}^\infty \lambda _k \, (v_k \otimes v_k) , \end{aligned}$$
(26)
where the eigenvectors \(\{v_k\}_k \subset \mathcal {U}\) form a CONS in \(\mathcal {U}\). Defining a new corresponding CONS \(\{s_k\}_k\) in \(\mathcal {Q}\) via \(\lambda _k^{1/2} s_k := R v_k\), one obtains the singular value decomposition of R and \(R^*\) with singular values \(\varsigma _k=\lambda _k^{1/2}\):
$$\begin{aligned} R= & {} \sum _{k=1}^\infty \varsigma _k (s_k \otimes v_k)\,; \quad \text {i.e. } \quad R(u)(\cdot ) = \sum _{k=1}^\infty \varsigma _k \langle v_k, u \rangle _{\mathcal {U}} s_k(\cdot ), \quad R^* = \sum _{k=1}^\infty \varsigma _k (v_k \otimes s_k)\,; \nonumber \\ r(\mu )= & {} \sum _{k=1}^\infty \varsigma _k \, s_k(\mu ) v_k = \sum _{k=1}^\infty s_k(\mu )\, R^* s_k, \text { as } \; R^*s_k = \varsigma _k \, v_k. \end{aligned}$$
(27)
It is not necessary to repeat in this setting of compact maps all the different factorisations considered in the preceding paragraphs, and especially their approximations, which will be usually finite dimensional as they are made to be used for actual computations, e.g. the approximations will usually involve only finite portions of the infinite series in Eqs. (26) and (27), which means that the induced linear maps have finite rank and essentially become finite dimensional, so that the preceding paragraphs apply practically verbatim.
But one consideration is worth to follow up further. In infinite dimensional Hilbert spaces, self-adjoint operators may have a continuous spectrum, e.g. [16]; this is what is usually the case when homogeneous random fields or stationary stochastic processes have to be represented, This means that the expressions developed for purely discrete spectra in Eqs. (26) and (27) are not general enough. These expressions are really generalisations of the last equalities in Eqs. (13) and (12); but is is possible to give meaning to the matrix equalities in those equations, which simultaneously cover the case of a continuous spectrum.
In infinite dimensions—non-discrete spectrum
To this end we introduce the so called multiplication operator: Let \(\mathrm {L}_2(\mathcal {T})\) be the usual Hilbert space on some locally compact measure space \(\mathcal {T}\), and let \(\gamma \in \mathrm {L}_\infty (\mathcal {T})\) be an essentially bounded function. Then the map
$$\begin{aligned} M_{\gamma }:\mathrm {L}_2(\mathcal {T})\rightarrow \mathrm {L}_2(\mathcal {T}); \qquad M_{\gamma }:\xi (t) \rightarrow \gamma (t)\xi (t) \end{aligned}$$
for \(\xi \in \mathrm {L}_2(\mathcal {T})\) is a bounded operator \(M_{\gamma }\in \mathscr {L}(\mathcal {X})\) on \(\mathrm {L}_2(\mathcal {T})\). Such a multiplication operator is the direct analogue of a diagonal matrix in finite dimensions.
Using such a multiplication operator, one may introduce a formulation of the spectral decomposition different from Eq. (26) which does not require C to be compact [16], C resp. R do not even have to be continuous resp. bounded:
$$\begin{aligned} C = R^* R = V M_{\gamma } V^*, \end{aligned}$$
(28)
where \(V:\mathrm {L}_2(\mathcal {T})\rightarrow \mathcal {U}\) is unitary between some \(\mathrm {L}_2(\mathcal {T})\) on a measure space \(\mathcal {T}\) and \(\mathcal {U}\). In case C is continuous resp. bounded, one has \(\gamma \in \mathrm {L}_\infty (\mathcal {T})\). As C is positive, the function \(\gamma \) is non-negative (\(\gamma (t) \ge 0\) a.e. for \(t\in \mathcal {T}\)). This covers the previous case of operators with purely discrete spectrum if the function \(\gamma \) is a step function and takes only a discrete (countable) set of values—the eigenvalues. This theorem is actually quite well known in the special case that C is the correlation operator of a stationary stochastic process—an integral operator where the kernel is the correlation function; in this case V is the Fourier transform, and \(\gamma \) is known as the power spectrum.
General factorisations
To investigate the analogues of further factorisations of R, \(C=R^* R\), and its companion \(C_{\mathcal {Q}} = R R^*\), we need the SVD of R and \(R^*\). They derive generally in the same manner as for the finite dimensional case from the spectral factorisations of C in Eq. (28) and a corresponding one for its companion
$$\begin{aligned} C_{\mathcal {Q}} = R R^* = \varPhi M_{\gamma } \varPhi ^* \end{aligned}$$
(29)
with a unitary \(\varPhi :\mathrm {L}_2(\mathcal {T}_*)\rightarrow \mathcal {Q}\) between some \(\mathrm {L}_2(\mathcal {T}_*)\) on a measure space \(\mathcal {T}_*\) and \(\mathcal {Q}\). Here in Eq. (29), and in Eq. (28), the multiplication operator \(M_{\gamma }\) plays the role of the diagonal matrix \({{\varvec{\varSigma }}}^2\) in Eqs. (13) and (14). For the SVD of R one needs its square root, and as \(\gamma \) is non-negative, this is simply given by \(M_{\gamma }^{1/2} = M_{\sqrt{\gamma }}\), i.e. multiplication by \(\sqrt{\gamma }\). Hence the SVD of R and \(R^*\) is given by
$$\begin{aligned} R = \varPhi M_{\sqrt{\gamma }} V^*,\quad R^* = V M_{\sqrt{\gamma }} \varPhi ^*. \end{aligned}$$
(30)
These are all examples of a general factorisation \(C = B^* B\), where \(B:\mathcal {U}\rightarrow \mathcal {H}\) is a map to a Hilbert space \(\mathcal {H}\) with all the properties demanded from R—see the beginning of this section. It can be shown [38] that any two such factorisations \(B_1:\mathcal {U}\rightarrow \mathcal {H}_1\) and \(B_2:\mathcal {U}\rightarrow \mathcal {H}_2\) with \(C=B_1^*B_1=B_2^*B_2\) are unitarily equivalent in that there is a unitary map \(X_{21}:\mathcal {H}_1\rightarrow \mathcal {H}_2\) such that \(B_2 = X_{21} B_1\). Equivalently, each such factorisation is unitarily equivalent to R, i.e. there is a unitary \(X:\mathcal {H}\rightarrow \mathcal {Q}\) such that \(R= X B\).
Analogues of the factorisations considered in Eq. (19) are
$$\begin{aligned} C = B^* B = (V M_{\sqrt{\gamma }})(V M_{\sqrt{\gamma }})^* = (V M_{\sqrt{\gamma }}V^*)(V M_{\sqrt{\gamma }V}^*)^*, \end{aligned}$$
(31)
where again \(C^{1/2} = V M_{\sqrt{\gamma }}V^*\) is the square root of C.
And just as in the case of the factorisations of \({\varvec{C}}_{\mathcal {Q}}\) considered in Eq. (20) and the resulting factorisation of \({\varvec{C}}\) in Eq. (21), it is also here possible to consider factorisations of \(C_{\mathcal {Q}}\) in Eq. (29), such as
$$\begin{aligned} C_{\mathcal {Q}} = F F^* \approx F_a F_a^*, \quad \text { with} \quad F, F_a :\mathcal {E} \rightarrow \mathcal {U} \end{aligned}$$
(32)
with some Hilbert space \(\mathcal {E}\), which lead again to factorisations of
$$\begin{aligned} C = W^* W \approx W_a^* W_a, \quad \text { with } \quad W = F^*\varPhi V^* \text { and } W_a = F_a^*\varPhi V^* , \end{aligned}$$
(33)
and representation on the space \(\mathcal {E}\); with the representing linear maps given by \(W^* = V \varPhi ^* F\) resp. \(W_a^* = V \varPhi ^* F_a\).
Coming back to the situation where C has a purely discrete spectrum and a CONS of eigenvectors \(\{v_m\}_m\) in \(\mathcal {U}\), the map B from the decomposition \(C=B^* B\) can be used to define a CONS \(\{h_m\}_m\) in \(\mathcal {H}\): \(h_m := B C^{-1/2} v_m\), which is an eigenvector CONS of the operator \(C_{\mathcal {H}} := B B^*:\mathcal {H}\rightarrow \mathcal {H}\), with \(C_{\mathcal {H}} h_m := \lambda _m h_m\), see [38]. From this follows a SVD of B and \(B^*\) in a manner analogous to Eq. (27). The main result is [38] that in the case of a nuclear C with necessarily purely discrete spectrum every factorisation leads to a separated representation in terms of a series, and vice versa. In case C is not nuclear, the representation of a “parametric object” via a linear map is actually more general [35, 38] and allows to the rigorous and uniform treatment of also “idealised” objects, like for example Gaussian white noise on a Hilbert space.
In this instance of a discrete spectrum and a nuclear C and hence nuclear \(C_{\mathcal {Q}}\), the abstract equation \(C_{\mathcal {Q}} = \sum _k \lambda _k s_k \otimes s_k\) can be written in a more familiar form in the case when the inner product on \(\mathcal {Q}\) is given by a measure \(\varpi \) on \(\mathcal {M}\). It becomes for all \(\varphi , \psi \in \mathcal {Q}\):
$$\begin{aligned} \langle C_{\mathcal {Q}}\varphi , \psi \rangle _{\mathcal {Q}}&= \sum _k \lambda _k \langle \varphi , s_k \rangle _{\mathcal {Q}} \langle s_k, \psi \rangle _{\mathcal {Q}} = \langle R^* \varphi , R^* \psi \rangle _{\mathcal {U}} \\&= \iint \limits _{\mathcal {M}\times \mathcal {M}} \varphi (\mu _1) \langle r(\mu _1), r(\mu _2) \rangle _{\mathcal {U}} \psi (\mu _2)\; \varpi ({\mathrm {d}}\mu _1) \varpi ({\mathrm {d}}\mu _2) \\&= \iint \limits _{\mathcal {M}\times \mathcal {M}} \varphi (\mu _1) \varkappa (\mu _1, \mu _2) \psi (\mu _2)\; \varpi ({\mathrm {d}}\mu _1) \varpi ({\mathrm {d}}\mu _2) \\&= \iint \limits _{\mathcal {M}\times \mathcal {M}} \varphi (\mu _1) \left( \sum _k \lambda _k s_k(\mu _1) s_k(\mu _2) \right) \psi (\mu _2)\; \varpi ({\mathrm {d}}\mu _1) \varpi ({\mathrm {d}}\mu _2). \end{aligned}$$
This shows that \(C_{\mathcal {Q}}\) is really a Fredholm integral operator, and its spectral decomposition is nothing but the familiar theorem of Mercer [15] for the kernel
$$\begin{aligned} \varkappa (\mu _1, \mu _2) = \sum _k \lambda _k s_k(\mu _1) s_k(\mu _2) . \end{aligned}$$
(34)
Factorisations of \(C_{\mathcal {Q}}\) are then usually expressed as factorisations of the kernel \(\varkappa (\mu _1, \mu _2)\), which may involve integral transforms already envisioned in [30]—see also the English translation [31]:
$$\begin{aligned} \varkappa (\mu _1,\mu _2) = \int _{\mathcal {Y}} \rho (\mu _1,y) \rho (\mu _2,y)\, \mathsf {n}({\mathrm {d}}y), \end{aligned}$$
where the “factors” \(\rho (\mu , y)\) are measurable functions on the measure space \((\mathcal {Y},\mathsf {n})\). This is the classical analogue of the general “kernel theorem” [24].
Connections to tensor products
Although not as obvious as for the case of a discrete spectrum in Eqs. (12), (13), and (14); and Eqs. (26), (27), such a connection is also possible in the general case of a non-discrete spectrum. But as the spectral values in the continuous part have no corresponding eigenvectors, one has to use the concept of generalised eigenvectors [16, 22, 24, 38]. Then it is possible to formulate the spectral theorem in the following way:
$$\begin{aligned} \langle C u, w \rangle _{\mathcal {U}}&= \int _{\mathbb {R}^+} \lambda \, \langle u, v_\lambda \rangle \langle w, v_\lambda \rangle \, \nu ({\mathrm {d}}\lambda ) , \quad \text { or in a weak sense} \end{aligned}$$
(35)
$$\begin{aligned} C&= \int _{\mathbb {R}^+} \lambda \, (v_\lambda \otimes v_\lambda ) \, \nu ({\mathrm {d}}\lambda ), \end{aligned}$$
(36)
with the spectral measure \(\nu \) on \(\mathbb {R}^+\). Observe the analogy, especially of Eq. (36), with Eq. (26), where the sum now has been generalised to an integral to account for the continuous spectrum. Equation (35) is for the case of a simple spectrum; in the more general case of spectral multiplicity large than one, the Hilbert space \(\mathcal {U}=\bigoplus _m \mathcal {U}_m\) can be written as an orthogonal sum [16, 22, 24] of Hilbert subspaces \(\mathcal {U}_m\), each invariant under the operator C, on which an expression like Eq. (35) holds, and on which the spectrum is simple. For the sake of brevity we shall only consider the case of a simple spectrum now, and avoid writing the sums over m. The difficulty in going from Eqs. (26) to (36) is that the values \(\lambda \) in the truly continuous spectrum have no corresponding eigenvector, i.e. \(v_\lambda \notin \mathcal {U}\), but it has to be found in a generally larger space. The possibility of writing an expression like Eq. (35) rests on the concept of a “rigged” resp. “equipped” Hilbert space or Gel’fand triple. This means that one can find [24] a nuclear space \(\mathcal {K}\hookrightarrow \mathcal {U}\) densely embedded in the Hilbert space \(\mathcal {U}\), such that Eq. (35) holds for all \(u, v \in \mathcal {K}\). This also means that the generalised eigenvectors should be seen as linear functionals on \(\mathcal {K}\). As the subspace \(\mathcal {K}\) is densely embedded in \(\mathcal {U}\), it also holds that \(\mathcal {U}\hookrightarrow \mathcal {K}^*\) is densely embedded in the topological dual \(\mathcal {K}^*\) of \(\mathcal {K}\), i.e. one has the Gel’fand triple
$$\begin{aligned} \mathcal {K}\hookrightarrow \mathcal {U} \hookrightarrow \mathcal {K}^* . \end{aligned}$$
(37)
The generalised eigenvectors can now be seen as elements of the dual, \(v_\lambda \in \mathcal {K}^*\), where the generalised eigenvalue equation \(C v_\lambda = \lambda v_\lambda \) holds after an appropriate extension of C.
If an expressions such as Eqs. (35) or (36) have to be approximated numerically, it becomes necessary to evaluate the integral in an approximate way. The integral is really only over the spectrum of \(\sigma (C)\) of C, as outside of \(\sigma (C)\) the spectral measure \(\nu \) vanishes. Obviously, one would first split the spectrum \(\sigma (C) = \sigma _d(C) \cup \sigma _c(C)\) into a discrete \(\sigma _d(C)\) and a continuous part \(\sigma _c(C)\). On the discrete part, the integral is just a sum as shown before. On the continuous part, the integral has to be evaluated by a quadrature formula. Choosing quadrature points \(\lambda _z\in \sigma _c(C)\) and appropriate integration weights \(w_z\in \mathbb {R}\), the integral can be approximated by
$$\begin{aligned} \int _{\sigma _c(C)} \lambda \, \langle u, v_\lambda \rangle \langle w, v_\lambda \rangle \, \nu ({\mathrm {d}}\lambda ) \approx \sum _z w_z \lambda _z \langle u, v_{\lambda _z} \rangle \langle w, v_{\lambda _z} \rangle , \end{aligned}$$
an expression very similar to the ones used in case of discrete spectra.
Further tensor products
Essentially, the constructions we have been investigating could be seen as elements of the tensor product \(\mathcal {U}\otimes \mathcal {Q}\), or extensions thereof as in the preceding paragraph. Often one, or both of the spaces \(\mathcal {U}\) or \(\mathcal {Q}\), can be further so factored in tensor products, say without loss of generality that \(\mathcal {Q} = \mathcal {Q}_I \otimes \mathcal {Q}_{II}\). This is for example the case for the white-noise modelling of random fields [29, 33, 35, 36], where one has \(\mathcal {Q}= \bigotimes _{m=1}^\infty \mathcal {S}_m\). We just want to indicate how this structure can be used for further approximation.
It essentially means that the whole foregoing is applied, instead on \(\mathcal {U}\otimes \mathcal {Q}\), on the tensor product \(\mathcal {Q}_I\otimes \mathcal {Q}_{II}\). Combined with the upper level decomposition on \(\mathcal {U}\otimes \mathcal {Q}\), one sees that this becomes one on \(\mathcal {U}\otimes (\mathcal {Q}_I\otimes \mathcal {Q}_{II})\). The bilinear forms Eqs. (10) and (11) can thus be written as tri-linear forms, making a direct connection to tensor products and multi-linear forms [26]. Like in the just cited example of random fields [35, 36], often this can be extended to higher order tensor products in a tree-like manner—by splitting \(\mathcal {U}\), or \(\mathcal {Q}_I\) resp. \(\mathcal {Q}_{II}\). This leads to a hierarchical structure encoded in this binary tree, with the top product \(\mathcal {U}\otimes \mathcal {Q}\) the root of the tree, and the individual factors as leaves of the tree. The higher the order of the tensor product, the better it is possible to exploit dependencies in low-rank formats [25, 26]. This has been recently also pointed out in the tight connections between deep neural networks [14, 32] and such tensor decompositions, which come in different formats or representations [26]. The indicated binary tree leads to what is known as a hierarchical Tucker- or HT-format; but obviously the multi-factor tensor product can be split also in a non-binary fashion, leading to more general tree-based tensor formats [19]. A completely flat tree structure with only root and leaves corresponds to the well known canonic polyadic- or CP-decomposition or format, the original proper generalised decomposition (PGD) falls into this category [2, 11,12,13, 17].