 Research Article
 Open Access
 Published:
Estimation of error in observables of coarsegrained models of atomic systems
Advanced Modeling and Simulation in Engineering Sciences volume 2, Article number: 5 (2015)
Abstract
Background
The use of coarsegrained approximations of atomic systems is the most common methods of constructing reducedorder models in computational science. However, the issue of central importance in developing these models is the accuracy with which they approximate key features of the atomistic system. Many methods have been proposed to calibrate coarsegrained models so that they qualitatively mimic the atomic systems, but these are often based on heuristic arguments.
Methods
A general framework for deriving a posteriori estimates of modeling error in coarse–grained models of key observables in atomistic systems is presented. Such estimates provide a new tool for model validation analysis. The connection of error estimates with relative information entropy of observables and model predictions is explained for socalled misspecified models. The relationship between model plausibilities and KullbackLeibler divergence between the true parameters and model predictions is summed up in several theorems.
Results
Numerical examples are presented in this paper involving a family of coarsegrained models of a polyethylene chain of united atom monomers. Numerical results suggest that the proposed methods of error estimation can be very good indications of the error inherent in coarsegrained models of observables in the atomistic systems. Also, new theorems relating the KullbackLeibler divergence between model predictions and observations to measures of model plausibility are presented.
Conclusions
A formal structure for estimating errors produced by coarsegraining atomistic models is presented. Numerical examples confirm that the estimates are in agreement with exact errors for a simple class of materials. Errors measured in the D _{ KL }divergence can be related to computable model plausibilities. The results should provide a powerful framework for assessing the validity and accuracy of coarsegrained models.
Background
Coarsegrainedreduced order models
The most common method of constructing reducedorder models in all of computational science involves the use of coarsegrained models of atomic systems, whereby systems of atoms are aggregated into “beads”, or “super atoms”, or molecules to reduce the number of degrees of freedom and to lengthen the time scales in which the evolution of events are simulated.
The use of coarsegrained (CG) approximations has been prevalent in molecular dynamics (MD) simulations for many decades Comprehensive reviews of a large segment of the literature on CG models was recently given by Noid [1] and Li et al. [2], and an application to semiconductor nanomanufacturing is discussed in Farrell et al. [3]. The issue of central importance in developing CG models is the accuracy with which they approximate key features of the atomistic system. Many methods have been proposed to calibrate CG models so that they qualitatively mimic the allatom (AA) systems, but these are often based on heuristic arguments.
In this paper, we develop a posteriori estimates of error in CG approximations of observables in the AA system. We focus on standard molecular dynamics models of microcanonical ensemble (NVE) thermodynamics, and we call upon the theory of model adaptivity and error estimation laid down in [4] and [5]. In this particular setting, new estimates are also obtained when the information entropy of Shannon [6] is used as a quantity of interest. This leads to methods for estimating CGmodel parameters that involve the KullbackLeibler divergence between probability densities of observables in the AA and CG systems.
In the final Results and discussion section of this presentation, we review several statistical properties of parametric models, including asymptotic properties of misspecified models and generalizations of the Bernsteinvon Mises theorem advanced by Kleijn and van der Vaart [7]. There, the fundamental role of the KullbackLeibler distance (the D _{ KL }) between the true probability distribution and the observations accessible by the model is reviewed. We present results in the form of theorems that relate the D _{ KL } to measures of model plausibility that arise from Bayesian approaches to model selection. The relationships of the a posteriori estimates to the statistical interpretations are summarized in concluding remarks.
Preliminaries, conventions and notations
We generally approach the problem of developing computer models of large atomic systems through the use of any of several hardened molecular dynamics (MD) codes or through equivalent Monte Carlo approximations invoking the ergodic hypothesis. For a system of n atoms, the Hamiltonian is
where r ^{n}={r _{1},r _{2},…,r _{ n }} is the set of atomic coordinate vectors, p ^{n}={p _{1},p _{2},…,p _{ n }} is the set of particle momentum vectors, m _{ α } the atomic mass of the α−th atom and u(r ^{n}) is the potential energy or interaction potential. Then (r ^{n},p ^{n}) defines a point or microstate in the phase space Γ _{ AA } of the all atom (AA) model. In typical MD simulations, the potential is, for example, of the form
where
Here covalent bonds are represented by the harmonic potential (1a), changes in bond angles by (1b), torsional potentials by changes in dihedral angles (1c), LennardJones nonbonded potentials by (1d), with r _{ ij }=r _{ i }−r _{ j } and r _{ c } the cutoff radius, (α,β) typically = (12,6), and Coulomb potentials between charges q _{ i } at r _{ i } and q _{ j } at r _{ j } (1e). These forms are typical of those implemented in popular MD codes, although several other common potentials could be added. The parameters of the potential model are given by the vector of physical coefficients: {k _{ i },k _{ θ i },V _{ b l a c k j i },ϕ _{ i },ε _{ ij },σ _{ ij },r _{ c },… }. In general, atomic properties and values of parameters for the full allatom system are supplied by systems calibrated using experimental data or quantum mechanics predictions (see, e.g. the OPLS data in [8,9]).
Given the Hamiltonian (1), Hamilton’s equations of motion are:
In MD, it is assumed that the atomic system evolves according to the laws of Newtonian mechanics, so we set $\mathbf {p}_{\alpha } = m_{(\alpha)}\dot {\mathbf {r}}_{\alpha }$ , and the second Hamiltonian equation in (4) reduces to the system of equations
where repeated indices are summed throughout their range, m _{ α β }=m _{(α)} δ _{ α β } is the mass of atom α, superimposed dots indicate time derivations, r _{ β i } is the component of r _{ β } in direction i, ∂ _{ α i }=∂/∂ r _{ α i } u(r ^{n}(t)) is the total interatomic potential of the system given, e.g., by (2), and f _{ α i }(t) is the ith component of applied force on atom α at time t. We will add initial conditions, $\dot {r}_{\beta i}(0) = v_{\beta i}$ , and $r_{\beta i}(0)=r^{0}_{\beta i}$ , where v _{ β i } and $r^{0}_{\beta i}$ , for now, are assumed to be given.
Molecular dynamical equations of the form (5) are typical of those in standard MD codes that are numerically integrated with randomlysampled initial conditions over time intervals to approximate systems with constant energy and fixed volume and fixed number of particles corresponding to socalled microcanonical ensembles. Without loss in generality, we confine this development to such thermodynamic scenarios noting that straightforward extensions to, say, constant temperature settings, are covered by replacing (5) with appropriate “thermostat” models, such as the Langevin or Nose–Hoover formulations (see e.g. [10]). The general approach is then applicable to canonical ensembles and more general statistical thermodynamics settings.
A fundamental concept in the molecular theory of matter is that macroscopic properties actually observed in experiments, the socalled observables, must be the result of averages over a time interval τ of some phase function q(r ^{n},p ^{n}) that depends on the phasepoint positions (r ^{n},p ^{n}) in phase space Γ _{ AA }, as all measurements require a finite duration. Moreover, for thermodynamic systems in equilibrium, this average, denoted 〈q〉, must be independent of the starting time t _{0} and it must attain a value from an essentially infinite time duration. Thus, our goal in constructing the allatom model is to compute observables of the form (cf [10,11]),
Here we shall confine our attention to phase functions that depend only on the configurations of systems in thermodynamic equilibrium. Our quantities of interest are then written,
In all but the simplest applications, it is impossible to solve the dynamic system (5) owing to its enormous size. Therefore, reducedorder models must be developed. The process involves aggregating groups of atoms into beads or molecules or “super atoms” so as to create a coarsegrained (CG) molecular model. The CG model has N coordinate vectors R ^{N}(t)={R _{1}(t),R _{2}(t),…,R _{ N }(t)}, N<n; and the corresponding equations of motion are
M _{ AB } defining the CG mass matrix, ∂ _{ Ai }=∂/∂ R _{ Ai },U(·,·) the interaction potential energy of the CG system, θ a vector of parameters defining the CG model, and F _{ Ai }(t) the ith component of applied force at bead A at time t. Initial conditions are $\dot {R}_{A i}(0) = V_{A i}$ , and $R_{A i}(0)=R^{0}_{A i}$ . The unknown parameters with a potential of the form (2) are denoted, for example,
the notation, in analogy with (1), being chosen to indicate parameters of the CG model.
It is important to establish a kinematic and algebraic relation between coordinates of particles in the AA system and those in the CG system. A very large literature exists on various coarsegraining mapping schemes, and choices of the appropriate map from the AA to the CG system or vice versa are often based on heuristic methods (see e.g. [2]). Our general approach can be adapted to any such welldefined AAtoCG or CGtoAA map, but for definiteness, we describe one such family of mappings.
Let $\mathcal {J}_{A}$ be the index set of AAcoordinate labels of atoms aggregated into a single bead A with CGcoordinate vector R _{ A } emanating from the origin to a reference point labeled A within the bead (e.g. R _{ A } could be chosen to define the center of mass, $\mathbf {R}_{A} = \sum _{\alpha \in \mathcal {J}_{A}} m_{\alpha }\mathbf {r}_{\alpha }/M_{A}, M_{A} = \sum _{\alpha \in \mathcal {J}_{A}} m_{\alpha }$ ). Let a _{ A α }(t) be a vector from the reference center of bead A to the end point of AAcoordinate vector $\mathbf {r}_{\alpha }(t), \alpha \in \mathcal {J}_{A}$ . Let $G^{\cdot A}_{\alpha }$ be component of the n×N array,
Then we have,
Here we assume that each AA coordinate vector r _{ α } belongs to only one bead identified with CG vector R _{ A }, but this is not a necessary assumption. In the case of bonded systems in which r _{ α } is associated with, say, two index sets $\mathcal {J}_{A}$ and $\mathcal {J}_{B}$ , we simply choose either $\mathcal {J}_{A}$ or $\mathcal {J}_{B}$ as the representative of r _{ α } and associate r _{ α } with only one bead to avoid double counting.
These kinematical conventions are illustrated in Figure 1. The AA coordinates r _{ α } of atoms assigned to molecular bead A remain with that bead throughout all possible motions of the CG system. The Boolean array $G^{\cdot A}_{\alpha }$ merely adjusts labels of the AA system to agree with labels assigned beads in the CG system.
Returning to (7), it is clear that the CG approximation of the quantities of interest (QoI) are of the form,
where G is the AAtoCG map defined in (11), where we denote r ^{n}=G(R ^{N}(t),θ), and where we specifically present the dependence of the QoI on the CG potential parameters θ.
Now it is obvious that the evolution of the CG system defined by the coordinate vectors R _{ A }(t) satisfying (8) do not satisfy the “true” equations of motion (5). Indeed, if R ^{N}(t)={R _{1}(t),R _{2}(t),…,R _{ N }(t)} is the set of Nvectors satisfying (8), they constrain the motion of the AA system via (11), so upon introducing the constrained motions into (5) we obtain the residual,
where $\sum _{\beta } \omega ^{\beta }_{\alpha }=1, \sum _{\alpha \in \mathcal {J}_{A}} \mathbf {f}_{\alpha } = \mathbf {F}_{A}$ , and
On the lefthand side of (13), we have taken note that the residual depends on the CGmodel parameters θ, not explicitly presented on the right for simplicity. The AA force f _{ α } at coordinate α is a fraction $\omega ^{\beta }_{\alpha }$ of the CG force F _{ A i }. In general, as a firstorder approximation, one can take the vectors a _{ A α } as timeindependent constant vectors equal to their value in a given reference configuration $(\mathbf {a}_{A \alpha }(t) = \mathbf {a}_{A \alpha }(t_{0}) ; \dot {\mathbf {a}}_{A \alpha } = \ddot {\mathbf {a}}_{A \alpha }=\mathbf {0})$ . Then the approximate residual results of the form
with $f_{\alpha i} = \omega ^{\beta }_{\alpha } G^{\cdot A}_{\beta } F_{Ai}$ , and with repeated indices summed, 1≤α,β≤n,1≤A≤N.
It is possible to define a reverse or “push back” relationship that assigns to every AA coordinate vector r _{ α } the CG coordinate vector R _{ A } of the bead to which r _{ α } belongs. Given r _{ α }, set
where $G^{\cdot \alpha }_{A}$ is the transpose of $G^{\cdot A}_{\alpha }$ (no sum on α).
Thus, for each time t∈[t _{0},τ], one can select a sample ω(t) of AA coordinates $\{ \mathbf {r}^{*}_{1}(t), \mathbf {r}^{*}_{2}(t), \dots, \mathbf {r}^{*}_{n}(t) \}$ and employ (16) to generate the corresponding CG coordinates $\{ \mathbf {r}^{*}_{1}(t), \mathbf {r}^{*}_{2}(t), \dots, \mathbf {r}^{*}_{n}(t) \}$ . We use the simplified notation,
to define the image of this sample in the CG system.
Methods
Weak forms of the dynamical problem
It is convenient to recast the molecular dynamics problem into a weak or variational form. Toward this end, we introduce the Banach spaces
and the semilinear and linear forms $\mathcal {B}: \mathcal {V} \times \mathcal {V} \rightarrow \mathbb {R}, \mathcal {F}: \mathcal {V} \rightarrow \mathbb {R}$ , given by
where
The notation $\mathcal {B(\cdot ; \cdot)}$ is intended to mean that $\mathcal {B(\cdot ; \cdot)}$ is possibly nonlinear in entries to the left of the semicolon and linear in the entries to the right of it.
The problem of finding $\mathbf {r}^{n} \in \mathcal {V}$ such that
is equivalent to (8) in the sense that every solution of (8) with appropriate initial conditions, satisfies (22), and any sufficiantly smooth solution of (22) satisfies (8).
The adjoint problem
Let
and
where Q is a functional on , and both $\mathcal {B}'(\cdot ; \cdot)$ and Q ^{′}(·;·) are assumed to exist and be finite (i.e. $\mathcal {B}(\cdot ; \cdot)$ and Q(·) are Gateaux differentiable). Then the adjoint or dual problem associated with (22) is
Find $\mathbf {z}^{n} = \{ \mathbf {z}_{1}, \mathbf {z}_{2}, \dots, \mathbf {z}_{n} \} \in \mathcal {V}$ such that
Introducing (19) into (23) gives, after some manipulations,
where H _{ α i β j } is the Hessian,
Likewise, if $Q(\mathbf {r}^{n}) = \int _{0}^{\tau } q(\mathbf {r}^{n}(t)) \text {d}t$ , then
Note that (25) is solved “backward in time;” the forward problem (22) is solved for r ^{n}(t), which determines the coefficients in $\mathcal {B}'(\mathbf {r}^{n}; \mathbf {z}^{n}, \mathbf {v}^{n})$ which marches the adjoint solution from t=τ to t=0. The dynamical system corresponding to (25) is:
Theory of a posteriori estimation of modeling error
Let us now review the theory of a posteriori estimation of modeling error advanced in [4] and expanded in [5]. We consider an abstract variational problem of finding an element u in a topological vector space such that,
where $\mathcal {B(\cdot ; \cdot)}$ is a semilinear form from $\mathcal {V} \times \mathcal {V}$ into and is a linear functional on . Problem (30) is equivalent to the problem of finding a solution u of the problem A(u)=F in the dual space $\mathcal {V}'$ , where A is the map induced by $\mathcal {B(\cdot ; \cdot)} : \langle A(u),v \rangle = \mathcal {B} (u ; v) = \mathcal {F}(v) = \langle \mathcal {F}, v \rangle $ , 〈·;·〉 denoting duality pairing in $\mathcal {V}' \times \mathcal {V}$ . Assuming (30) is solvable for u, we wish to compute the value Q(u) of a functional $Q: \mathcal {V} \rightarrow \mathbb {R}$ representing a quantity of interest, or an observable of interest.
We assume that the semilinear form $\mathcal {B} (\cdot ; \cdot)$ and the functional Q(·;·) are three times Gateaux differentiable on with respect to u. In particular, the following limits exist (recall (23) and (24)),
with similar definitions of higherorder derivatives, e.g. $\mathcal {B}^{\prime \prime }(u; w_{1}, w_{2},v)$ , $\mathcal {B}^{\prime \prime \prime }(u; w_{1}, w_{2}, w_{3}, v)$ , Q ^{′′}(u;w,v), Q ^{′′}(u;v _{1},v _{2},v _{3}), etc. See [5] for details.
The adjoint problem associated with (30) and the quantity of interest Q consists of finding $z \in \mathcal {V}$ such that
Now let u _{0} be an arbitrary element selected in . The residual functional (or “residuum”) associated with u _{0} is defined as the semilinear functional $\mathcal {R}: \mathcal {V} \times \mathcal {V} \rightarrow \mathbb {R}$ ,
which, for each $u_{0} \in \mathcal {V}$ , is a linear functional on .
Obviously, if u _{0}=u, the solution of (30), $\mathcal {R}(u;v)=0 \;\; \forall v \in \mathcal {V}$ . Thus, $\mathcal {R}(u_{0};v)$ describes the degree to which the vector u _{0} fails to satisfy the central problem (30).
We now recall the basic theorem in [5]:
Theorem 1.
Let the semilinear form $\mathcal {B}(\cdot ; \cdot)$ in (30) and the quantity of interest Q be threetimes continuously Gateaux differentiable on . Let u _{0} be an arbitrary element of . Then the error in Q(u)produced by replacing u by u _{0} is given by:
where Δis a remainder involving higherorder terms in e _{0}=u−u _{0} and ε _{0}=z−z _{0}, z _{0} being an approximation of z.
An explicit form of Δ is given in the appendix.
If u _{0} is not an arbitrary vector taken from but is a solution of a surrogate problem approximating (30) (such as a coarsegrained model approximating an AA model), then it often happens that Δ is negligible compared to the residual. Then (34) reduces to the approximation,
This relation is the basis for many successful methods of a posteriori error estimation of both modeling error and numerical error. Whenever $\mathcal {B}(\cdot ;\cdot)$ is a bilinear form and Q(·) is linear, Δ≡0.
A–posteriori estimation of error in CG approximations
The CG approximations of the “ground truth” AA system are characterized by a parametric class $\mathcal {P}(\boldsymbol {\theta })$ of molecular dynamics models, one model corresponding to each choice of the vector θ in a space Θ of parameters defining the CG intermolecular potential U(R ^{N}(t),θ). For a given value of θ, observables of interest in states of thermodynamic equilibrium of the CG system are typically generated as averages of samples of the observables taken over subintervals [t _{ k },t _{ k+1}]⊂[0,τ], for a distribution of initial conditions (see, e.g. [10]).
If we employ the approximation (35) to the AA and CG models, then an estimate of the error in CG approximations of the observable is immediate. Let q(r ^{n}) be a phase function whose ensemble average 〈q〉 is an observable of interest, denoted Q _{ r } as in (7). The CG approximation is Q _{ R }(θ) and the error, given by (35), is
where $ \mathcal {R}(\mathbf {R}^{N}(\boldsymbol {\theta }); \mathbf {z}^{n})= \int _{0}^{\tau } \rho _{\alpha i} (\boldsymbol {\theta }, t) z_{\alpha i} (t) \text {d}t $ , ρ _{ α i }(θ,t) is the residual in (13) (or (15)), and z ^{n} is the solution to the corresponding adjoint problem, and is generally unknown. If Z ^{n} is an approximation of z ^{n}, then
with
$\ \cdot \_{\mathcal {V}'}$ being the norm on the dual space $\mathcal {V}'$ . The problem of error estimation thus reduces to one of developing efficient procedures to compute the residual (ρ) and to compute reasonable approximations of z ^{n}.
It is clear that a quantitative estimate such as (36) (or an approximation with z ^{n} replaced by Z ^{n}) could be a powerful tool for determining validity of the CG model or in designing validation experiments for CG models. In theory, it also provides a basis for selecting optimal parameters for a given model so as to manage ε(θ). We elaborate on this notion in the final part of the Results and discussion section.
Results and discussion
Numerical example: estimation of error in CGapproximation of a polyethelyne chain
We describe in this section an application of a poasterior error estimation described in the previous section, involving CG approximations of a wellknown model of polyethylene. For the base “AA” model, we consider a united atom model of a polyethylene chain containing 200 CH2 (methyl) monomers, meaning we have aggregated hydrogen and carbon atoms into an “AA” bead for simplicity. The unitedatom coordinates r _{ i } define the locations of each particle on an raxis, and the displacement is denoted u _{ i }(t). As an additional simplification, we assume that the interatomic potential is characterized by harmonic bonds of the form (1a), with parameter k _{ l }=k= 350 k C a l/m o l, bond length of l= 1.5 Å, and atomic mass m= 14.026 g r/m o l. Initially, each united atom is separated by bond length l, the initial velocities are zero, and the system is assigned an initial displacement field u _{ i }(0)=f(r _{ i }), where $f(r_{i}) = 1.2 \text {e}^{0.1r_{i}(0)}$ . Under these conditions, the AA system (5) reduces to
where m and k are the mass matrix and the stiffness matrix of the united atom system and are of the form
A family of CG approximations of this model is obtained by aggregating the atoms into beads, with CG models in distinguished by the number P of atoms per CG bead. The resulting CG system (8) is of the form
with the mass of each bead set to M=P m and the bond stiffness K=α k/P; $\alpha \in \mathbb {R}^{+}$ .
Upon solving (41) for the CG displacement trajectory U(t), we compute the residual trajectory
where U _{CG}(t) is the projection Π U(t) onto AA atom locations.
The bilinear and linear forms described in (19)(21) reduce, in this case, to
and (25) yielding
As an example of a QoI, we take Q _{ r } to be the locallyaveraged displacement,
for which the strong form of the dual problem is
where N _{0} is the number of united atoms considered in set and q is the vector defined such that $\mathcal {Q}(\boldsymbol {u}) = \boldsymbol {q}^{T}\boldsymbol {u}$ . Given the QoI (46), q will be as a n×1 vector,
The residual function $\mathcal {R} (\cdot, \cdot)$ of (36) in this example is of the form
The estimated error in the QoI is then
where $Q_{\mathbf {R}} = \int _{0}^{\tau } \eta _{t}(t)\text {d}t$ and then the exact error is
Δ being the blackremainder in (34). Since the forms in (43)(46) are linear in their respective arguments, the exact blackremainder Δ should be zero, but the error introduced by the numerical integration schemes employed generally leads to an additional numerical error Δ _{ Δ t }≠0. We employ a converted RungaKutta algorithm here to integrate (39), (41), and (47).
The results of the coarsegrained model for the case of P=4 are presented in Figure 2. Figure 2a shows the coarsescale displacement U=U(t) at different times, obtained from solution of (41) over the time domain t∈[0,τ]. The local residual of Figure 2b is then computed from (42). Figure 2c shows the solution of z(t) at different times. It observed that the adjoint solution propagates in time in the opposite direction to the primal solution, (47) being integrated backward in time.
It is known that in general, the solution of the base model is not available. However, in order to show the effectiveness of the method presented here, the equations of motion for the united atom system is also solved in this example. Having the solution of the united atom model, u(t), the evolution of the exact ζ and estimated η _{ t } over time is shown in Figure 2d.
Numerical approximations to the exact error are compared with the estimated error for various CG approximation of the united atom model in Figure 3a. The estimated error $\mathcal {R} = \mathcal {R}(\boldsymbol {U}_{\textit {CG}}(\boldsymbol {\theta }); \boldsymbol {z}^{n})$ for various values of P, with θ=α k/P, are indicated in Figure 3b for α=1. The computed estimated error ( $\mathcal {E}_{\text {est}}=\mathcal {R}$ ) versus the parameter α are indicated in Figure 3c.
In general, the solution of the base model is not available, but the effectiveness of the method presented here is determined by comparing the CG solutions with the exact united atom model. The exact ζ and estimated η _{ t } over time are shown in Figure (2d).
Maximum entropy principle for atomic systems
Among features of the AA system that could qualify as quantities of interest, we consider a special measure of uncertainty content embodied in the socalled information entropy. In 1948, Shannon [6] introduced the concept of information entropy as a realvalued function H(p) of probability distributions (densities) p as a logical measure of uncertainty content in p that satisfied four rather straight forward “commonsense” desiderata (see also [12] for full details). For a discrete pdf p={p _{1},p _{2},…,p _{ n }}, the entropy is defined by
and for a continuous density, $p \in L^{2} (\mathbb {R})$ , we write
Given two probability densities p and q, with nonempty support of domains, the relative entropy between p and q is given by the KullbackLeibler divergence,
where H(p,q) ( $=\int _{\mathbb {R}} p \log q \text {d}y$ ) is the cross entropy and it is understood that $0 \log \frac {0}{0}=0$ and $0 \log \frac {0}{q}=0$ .
Shannon’s principle of maximum entropy asserts that in the set of all possible probability distributions relevant to a random field, the correct probability p corresponds to the maximum entropy:
Errors in information entropy
The connection with the statistical mechanics characterization of the AA and CG models can be established by choosing as a quantity of interest the infinite–time average of the phase function q(r ^{n}(t)) over [0,∞]. For this we invoke the ergodic hypothesis,
ρ(r ^{n}) being the distribution function for the ensemble under study and Γ the corresponding phase space subdomain. The corresponding CG approximation is
where the notation G(R ^{N};θ) represents the relation (11). Setting
gives immediately
where D _{ KL }(·∥·) is the KullbackLeibler divergence defined in (54). Thus, if z ^{n} is the equilibrium solution of (25) with $Q'(\bold {r}^{n} ; \bold {v}^{n}) = \int _{0}^{\tau } \partial _{\alpha _{i}} q(\mathbf {r}^{n}(t)) v_{\alpha i} \text {d}t$ , then
The specification (60) of the CG approximation (with ρ(r ^{n}) as opposed to ρ(R ^{N}(θ))) requires some explanation. In interpreting (60), one assumes the role of an observer who resides in the AA system and, instead of the true phase function q(r ^{n}), observes a corrupted version for each choice of θ constrained to reside only in microstates accessible by the CGmodel. This is also the interpretation of the residual described in (13) and (15). It is also noted that the estimate (63) is reminiscent of the minimum relative entropy method suggested by Shell [13].
A fundamental question arises at this point: given estimates (36) or (63), is it possible to find a special parameter vector θ ^{∗} that makes the error ε(θ ^{∗})=0? This question is related to the socalled wellspecification or missspecification of the CG model. We believe the answer to this question is generally “no.”
Model misspecification and statistical analysis
A fundamental concept in the mathematical statistics literature on parametric models is the notion of a wellspecified model, one that has the property that a special parameter vector θ ^{∗} exists that the model maps into the truth; i.e. the true observational data. If no such parameter exists, the model is said to be misspecified.
More generally, we consider a space of physical observables (in our case, the values of appropriate observables sampled from the AA model) and a set $\mathbb {M(\mathcal {Y})}$ of probability measures μ on . As always, a target quantity of interest $Q:\mathbb {M} \rightarrow \mathbb {R}$ is identified (e.g. Q(μ)=μ[X≥a], X being a random variable and a a threshold value). We seek a particular measure μ ^{∗} which yields the “true” value of the quantity of interest Q(μ ^{∗}). We wish to predict Q(μ ^{∗}) using a parametric model $\mathcal {P}: \Theta \rightarrow \mathbb {M}(\mathcal {Y})$ , Θ being the space of parameters. Again, if a θ ^{∗}∈Θ exists such that $\mathcal {P}(\boldsymbol {\theta }^{*})=\mu ^{*}$ , the model is said to be wellspecified; otherwise, if $\mu ^{*} \notin \mathcal {P}(\Theta)$ , the model is misspecified. See, e.g., Geyer [14], Kleijn and van der Vaart [7], Freedman [15], Nickl [16]. In our model discussed in Section ‘Preliminaries, conventions and notations’, we seek a parameter θ ^{∗} of the CG model such that ε(θ ^{∗}) of (36) is zero, an unlikely possibility for most choices of Q.
To recast the issue of error estimation into a statistical setting, we presume that our goal is to determine (predict) a probability distribution of a random variable, an observable q in the AA system, using a CG model , given a set y _{1},y _{2},⋯,y _{ n } of iid (independent, identicallydistributed) random variables representing samples y _{ i }=q(ω _{ i }) ( $\omega _{i} = \mathbf {r}_{i}^{n}$ is meant to denote a particular point in phase space). We denote by π(y _{ i }θ) the conditional probability density p of the distance between the random data y _{ i } and the parametertoobservation map d _{ i }(θ),
where π(y _{ i }θ) is the ith component of the likelihood function. The joint density of the data vector y ^{n}=y _{1},y _{2},⋯,y _{ n } is then,
The loglikelihood function is
Let π(θ) be any prior probability density on the parameters θ (computed, for instance, using the maximum entropy method of Jaynes [12], as described for CG models in [3]); then the posterior density satisfies,
where $Z(\boldsymbol {\theta }) = \int _{\Theta } \pi (\mathbf {y}  \boldsymbol {\theta }) \pi (\boldsymbol {\theta }) \text {d}\boldsymbol {\theta }$ is the model evidence.
The following definitions and theorems follow from these relations:

The Maximum Likelihood Estimate (MLE) is the parameter $\hat {\boldsymbol {\theta }}^{n}$ that maximizes L _{ n }(θ):
$$ \hat{\boldsymbol{\theta}}^{n} = \underset{\boldsymbol{\theta} \in \Theta}{\text{argmax}} \; L_{n}(\boldsymbol{\theta}). $$((68)) 
The Maximum A Posterior Estimate (MAP) is the parameter $\tilde {\boldsymbol {\theta }}^{n}$ that maximizes the posterior pdf:
$$ \tilde{\boldsymbol{\theta}}^{n} = \underset{\boldsymbol{\theta} \in \Theta}{\text{argmax}} \; \pi_{n}(\boldsymbol{\theta}  \mathbf{y}). $$((69)) 
The Bayesian Central Limit Theorem for wellspecified models under commonly satisfied smoothness assumptions (also called the Bernsteinvon Mises Theorem [7,16,17]) asserts that
$$ \pi_{n}(\boldsymbol{\theta}  \mathbf{y}) \overset{\mathcal{P}}{\rightarrow} \mathcal{N}(\boldsymbol{\theta}^{*}; \mathbf{I}^{1}(\boldsymbol{\theta}^{*})), $$((70))where convergence is convergence in probability, $\mathcal {N}(\boldsymbol {\mu }, \boldsymbol {\Sigma })$ denotes a normal distribution with mean μ and covariance matrix Σ, $\hat {\boldsymbol {\theta }}$ is the generalized MLE, and I(θ) is the Fisher information matrix,
$$ I_{ij}(\boldsymbol{\theta}) = \sum^{n}_{k=1} \left[ \frac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}} \log \pi({y}_{k}  \boldsymbol{\theta}) \right]_{\boldsymbol{\theta} = \boldsymbol{\theta}^{*}} $$((71)) 
Given a set of parametric models, $\mathcal {M} = \{ \mathcal {P}_{1} (\boldsymbol {\theta }_{1}), \mathcal {P}_{2}(\boldsymbol {\theta }_{2}), \cdots, \mathcal {P}_{m} (\boldsymbol {\theta }_{m}) \}$ , the posterior plausibility of model j is defined through the applications of Bayesian arguments by (see [3,18])
$$ \rho_{j} = \pi(\mathcal{P}_{j}  \mathbf{y}, \mathcal{M}) = \frac{\int_{\Theta_{i}} \pi (\mathbf{y}  \boldsymbol{\theta}_{j}, \mathcal{P}_{j}, \mathcal{M}) \pi (\boldsymbol{\theta}_{j}  \mathcal{P}_{j}, \mathcal{M}) \text{d} \boldsymbol{\theta}_{j} \pi (\mathcal{P}_{j}  \mathcal{M})} {\pi(\mathbf{y}  \mathcal{M})} $$((72))with $\sum _{j=1}^{m} \rho _{j} = 1$ , and the largest ρ _{ j }∈[0,1] corresponds to the most plausible model for data $\mathbf {y} \in \mathcal {Y}$ .
Finally, we come to the case of misspecified parametric models in which $\mu ^{*} \notin \mathcal {P}(\Theta)$ ; i.e. no parameter θ ^{∗} exists such that the truth $\mu ^{*} = \mathcal {P}(\boldsymbol {\theta }^{*})$ . This situation, we believe, is by far the most common encountered in the use of CG models.
We remark that in the (rare?) case of a wellspecified CG model, for any continuous functional $Q: \Theta \rightarrow \mathbb {R}$ and if Θ is compact, if θ ^{∗} is the unique minimizer of Q and if
as n→∞, then the sequence
converges to θ ^{∗} in probability as n→∞. This is proved in Nickl [16]. In particular, under mild assumptions on the smoothness of the loglikelihood L _{ n }(θ),
D _{ KL }(·∥·) being the KullbackLeibler distance defined in (54). By Jensen’s inequality (see, e.g. [16]), Q(θ ^{∗})≤Q(θ)∀θ∈Θ; i.e. θ ^{∗} is the minimizer of Q.
The asymptotic results for the finite misspecified case is summed up in the powerful result of Kleijn and van der Vaart [7,19]: let g(y) denote the probability density associated with the true distribution μ ^{∗}. Then the posterior density π _{ n }(θy) converges in probability to the normal distribution,
where
Thus, the best approximation to g in $\mathcal {P}(\Theta)$ is the model with the parameter
being a class of parametric models to which belongs.
It is easily shown that θ ^{†} is a maximum likelihood estimate, i.e. it maximizes the expected value of the loglikelihood relative to the true density g:
where the negative selfentropy $\int g \log g \; d\textbf {y}$ was eliminated since it does not depend on θ and therefore does not affect the optimization.
Plausibility D _{ KL } theory
Let us now suppose that we have two misspecified models, $\mathcal {P}_{1}$ and $\mathcal {P}_{2}$ . We may compare these models in the Bayesian setting through the concept of model plausibility: if $\mathcal {P}_{1}$ is more plausible than $\mathcal {P}_{2}$ , ρ _{1}>ρ _{2}. In the maximum likelihood setting, the model that yields a probability measure closer to μ ^{∗} is considered the “better” model. That is, if
it can be said that model $\mathcal {P}_{1}$ is better than model $\mathcal {P}_{2}$ . The theorems presented here define the relationship between these two notions of model comparison.
However, Bayesian and frequentist methods fundamentally differ in the way they view the model parameters. Bayesian methods consider parameters to be stochastic, characterized by probability density functions, while frequentist approaches seek a single, deterministic parameter value. To bridge this gap in methodology, we note that considering parameters as deterministic vectors, for example θ _{0}, is akin to assigning them delta functions as their posterior probability distributions, which result from delta prior distributions. In this case, the model evidence is given by
In particular, if we consider the optimal parameter $\boldsymbol {\theta }^{\dagger }_{i}$ for model $\mathcal {P}_{i}$ , $\pi (\textbf {y}  \mathcal {P}_{i}, \mathcal {M}) = \pi (\textbf {y}  \boldsymbol {\theta }_{i}^{\dagger }, \mathcal {P}_{i}, \mathcal {M})$ . We can take the ratio of posterior model plausibilities,
where $O_{12} = \pi (\mathcal {P}_{1}\mathcal {M})/\pi (\mathcal {P}_{2}\mathcal {M})$ is the prior odds and is often assumed to be one. With these assumptions in force, we present the following theorems.
Theorem 2.
Let (82) hold. If $\mathcal {P}_{1}$ is more plausible than $\mathcal {P}_{2}$ and O _{12}≤1, then (80) holds.
Proof.
If $\mathcal {P}_{1}$ is more plausible than $\mathcal {P}_{2}$ ,
Equivalently, the reciprocal of the far righthand side is less than one, so
Since g(y) is a probability measure, it is always nonnegative. Thus
This can be expanded into
which means
By adding the quantity $\int _{\mathcal {Y}^{n}} g \log g \; d\textbf {y}$ to both sides, the desired result (80) immediately follows.
This theorem demonstrates that if model $\mathcal {P}_{1}$ is “better” than model $\mathcal {P}_{2}$ in the Bayesian sense, it is also a “better” deterministic model in the sense of (80). However, the reverse implication requires much stronger conditions. The assertion (80) can be equivalently written as
For this inequality to hold, the relationship
does not necessarily need to be true for every point $\textbf {y} \in \mathcal {Y}^{n}$ .
One perhaps naive way to proceed is to invoke the Mean Value Theorem: if $\left  \mathcal {Y}^{n} \right  < \infty $ and under suitable smoothness conditions, there exists some $\bar {\textbf {y}} \in \mathcal {Y}^{n}$ such that
Then, combining (88) and (90) yields,
Since $\left  \mathcal {Y}^{n} \right  > 0\) and g(y)>0,
If O _{12}≥1,
Thus $\mathcal {P}_{1}$ is more plausible than $\mathcal {P}_{2}$ for given data $\bar {\textbf {y}}$ .
In summary, we have:
Theorem 3.
If $D_{\textit {KL}}(g \ \pi (\textbf {y}  \boldsymbol {\theta }^{\dagger }_{1}, \mathcal {P}_{1}, \mathcal {M})) < D_{\textit {KL}}(g \ \pi (\textbf {y}  \boldsymbol {\theta }^{\dagger }_{2}, \mathcal {P}_{2},\mathcal {M}))$ and if $\left  \mathcal {Y}^{n} \right  < \infty $ and if (90) holds, then there exists a $\bar {\textbf {y}} \in \mathcal {Y}^{n}$ such that $\mathcal {P}_{1}$ is more plausible than $\mathcal {P}_{2}$ , given that O _{12}≥1.
Conclusions
The formal structure of a posterior estimates for errors in quantities of interest in CG approximations of atomistic systems is given by (36) if the CG model is sufficiently close to the AA model in some sense, and this error depends upon the CG model parameter θ. Numerical experiments presented in Section ‘Numerical example: estimation of error in CGapproximation of a polyethelyne chain’ involving a family of CG models of a polyethylene chain of united atom monomers suggest that these estimates can be very good indications of the error inherent in CG models of observables in the AA system.
In section ‘Errors in information entropy’ an example of special interest arises in the comparison of the information entropy of AA and CG models. This leads to estimates (62) and (63) involving the KullbackLeibler divergence, D _{ KL }.
When the CG model is misspecified in a statistical sense, which is generally the case, the “ D _{ KL }distance” between the AA truth and the best possible approximation of any CG model is defined by choosing θ=θ ^{†}, the minimizer of the D _{ KL } as indicated in (78). Under special assumptions, one can relate the D _{ KL } distance to Bayesian posterior model plausibility, as stated in our Theorem 2, which provides sufficient conditions for the most plausible model among a class of models to be in fact closest to the AA model in the D _{ KL } sense. The possible role of estimates such as (36), (62), and (63) in model validation should be noted.
For each map G:AA→CG of the type defined by (10), a set of parametric model classes $ \{ \mathcal {P}_{1} (\boldsymbol {\theta }_{1}), \mathcal {P}_{2} (\boldsymbol {\theta }_{2}), \cdots, \mathcal {P}_{m} (\boldsymbol {\theta }_{m}) \} $ is defined, each with undetermined and possibly random parameter vectors θ _{ i }. For a calibration scenario, AA calibration data y _{ c }={y _{1},y _{2},⋯,y _{ n }} are sampled, and a series of Bayesian updates is performed using an expanded form of Bayes’s rule that recognizes prior choices of the set and the class $\mathcal {P}_{j}$ within :
The marginalization of the righthand side of this relation is the model evidence, which serves as a likelihood function for a higher level of Bayes’s rule. The corresponding posteriors are the model plausibilities of (72). We remark that the notion of model plausibilities is an extension of the idea of Bayes factors prevalent in statistic literature (see e.g. [12] for discussion of the ideas) and was introduced to the best of our knowledge in [18]. The development of algorithms involving Bayesian plausibilities to study model selection in CG models of complex atomic system is discussed in [3,20].
It has been demonstrated, the most plausible model in a set will, under stated assumptions, involve parameters that minimize the D _{ KL }−distance between the model and the socalled truth parameters. Whether that “best” model is valid for the intended purpose depends on tolerances set of error in key observables, the QoIs of the validation scenario (see [3]).
Appendix
A surrogate pair of equations approximating (30) and (32) may be embodied in the problem of finding the pair $(u_{0}, z_{0}) \in \mathcal {V} \times \mathcal {V}$ such that
The remainder Δ in (34) can, in this case, be shown (see [4]) to be given by:
where
The theory and estimates reduce to finite element a posterior error estimates in the special case in which u _{0}=u _{ h } and z _{0}=z _{ h } are finite element approximation of solutions (u,z) to partial differential equations (see e.g. [21]).
References
 1
Noid WG (2013) Perspective: coarsegrained models for biomolecular systems. J Chem Phys 139: 090901.
 2
Li Y, Abberton BC, Kroger M, Liu WK (2013) Challenges in multiscale modeling of polymer dynamics. Polymers 5(2): 751–832. doi:10.3390/polym5020751.
 3
Farrell K, Oden JT (2014) Calibration and validation of coarsegrained models of atomic systems: Application to semiconductor manufacturing. Comput Mech 54(1): 3–19. doi:10.1007/s004660141028y.
 4
Oden JT, Prudhomme S (2002) Estimation of modeling error in computational mechanics. J Comput Phys 182(2): 496–515.
 5
Oden JT, Prudhomme S, Romkes A, Bauman PT (2006) Multiscale modeling of physical phenomena: adaptive control of models. SIAM J Sci Comput 28(6): 2359–2389.
 6
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423623656.
 7
Kleijn BJK, van der Vaart A (2002) The asymptotics of misspecified bayesian statistics. In: Mikosch T Janzura M (eds)Proceedings of the 24th European Meeting of Statisticians.. Prague, Czech Republic.
 8
Jorgensen WL, TiradoRives J (1988) The OPLS potential functions for proteins. Energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 110(6): 1657–1666.
 9
Jorgensen WL, Maxwell DS, TiradoRives J (1996) Development and testing of the OPLS allatom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 118(45): 11225–11236.
 10
Frenkel D, Smit B (2001) Understanding molecular simulation: from Algorithms to applications, Computational science. 2nd edn, Vol. 1. Academic Press, San Diego.
 11
Haile JM (1997) Molecular dynamics simulation. John Wiley and Sons, NY.
 12
Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge.
 13
Shell MS (2008) The relative entropy is fundamental to multiscale and inverse thermodynamic problems. J Chem Phys 129(14): 144108.
 14
Geyer CJ (2003). 5601 Notes: the sandwich estimator. School of Statistics, University of Minnesota.
 15
Freedman DA (2006) On the socalled “Huber sandwich estimator” and “robust standard errors”. Am Stat 34: 299–302.
 16
Nickl R (2012). sTATISTICAL THEORY. Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge.
 17
Kleijn BJK (2004). Bayesian asymptotics under misspecification. PhD thesis, Free University Amsterdam.
 18
Beck JL, Yuan KV (2004) Model selection using response measurements: Bayesian probabilistic approach. J Eng Mech 130(2): 192–203.
 19
Kleijn BJK (2012) van der Vaart AW (2012) The BernsteinvonMises theorem under misspecification. Electronic J Stat 6: 354–381. doi:10.1214/12EJS675.
 20
Farrell K, Oden JT, Faghihi D (2015) A Bayesian framework for adaptive selection, calibration, and validation of coarsegrained models of atomistic systems. J Comput Phys 295: 189–208. ISSN 00219991.
 21
Becker R, Rannacher R (2001) An optimal control approach to a posteriori error estimation in finite element methods. Acta Numerica 10: 1–102. doi:10.1017/S0962492901000010.
Acknowledgements
This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE5C0009286. The authors benefited from suggestions of Eric Wright, who read an early draft of this work.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The theory and numerical simulation represent joint work by all authors. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Molecular dynamics
 Coarse–grained models
 Adjoint systems
 Information entropy