# Efficient solvers for time-dependent problems: a review of IMEX, LATIN, PARAEXP and PARAREAL algorithms for heat-type problems with potential use of approximate exponential integrators and reduced-order models

• 1502 Accesses

• 1 Citations

## Abstract

In this paper, we introduce and comment some recent efficient solvers for time dependent partial differential or ordinary differential problems, considering both linear and nonlinear cases. Here “efficient” may have different meanings, for instance a computational complexity which is better than standard time advance schemes as well as a strong parallel efficiency, especially parallel-in-time capability. More than a review, we will try to show the close links between the different approaches and set up a general framework that will allow us to combine the different approaches for more efficiency. As a complementary aspect, we will also discuss ways to include reduced-order models and fast approximate exponential integrators as fast global solvers. For developments and discussion, we will mainly focus on the heat equation, in both linear and nonlinear form.

## Background

This paper deals with efficient numerical approaches to solve time-dependent problems, possibly including parallel-in-time sub-domain decomposition and making help of coarse reduced-order model solvers. As a typical problem of discussion, we will consider the classical heat equation: let $$\Omega$$ be a bounded domain in $$\mathbb {R}^m$$, $$m\in \{2,3\}$$ with a Lipschitz-continuous boundary. Let $$\kappa$$ be a positive constant. Consider $$T>0$$, $$u^0\in H^1_0(\Omega )$$ and $$f\in L^2((0,T), L^2(\Omega ))$$. The linear heat problem with $$u^0$$ an initial value, homogeneous boundary conditions, for time t in the interval [0, T] reads

\begin{aligned} \left\{ \begin{array}{lll} \partial _t u - \nabla \cdot (\kappa \nabla u) = f \quad &{} \text { in } \Omega \times [0,T], \\ u(.,t) = 0 \quad &{}\text { on } \partial \Omega \times [0,T], \\ u(.,0) = u^0 \quad &{}\text { in } \Omega . \end{array} \right. \end{aligned}
(1)

The problem (1) has a unique solution u in $$L^2((0,T),H^1_0(\Omega ))$$. Semi-discretizing the problem (1) in space (method of lines) will classically lead to a high-dimensional ordinary differential problem set in $$\mathbb {R}^d$$ with generally large discrete dimension d. For simplicity, we will assume that the semi-discrete problem is written

\begin{aligned} \left\{ \begin{array}{l} {\dot{\varvec{u}}} + A \varvec{u}= \varvec{f}\quad \text { in } [0,T], \\ \varvec{u}(0) = \varvec{u}^0, \end{array}\right. \end{aligned}
(2)

with $$\varvec{u}^0\in \mathbb {R}^d$$, $$\varvec{f}\in L^2((0,T),\mathbb {R}^d)$$ and $$A\in \mathscr {M}_d(\mathbb {R})$$ typically symmetric positive definite, with a sparse structure. In this paper we will also consider nonlinear versions of the heat problem with a thermal conductivity coefficient $$\kappa (u)$$ depending on u itself. We will assume that there exists a constant $$\underline{\kappa }>0$$ and a constant $$\overline{\kappa }> 0$$ such that

\begin{aligned} \underline{\kappa } \le \kappa (u) \le \overline{\kappa }\quad \forall u. \end{aligned}

\begin{aligned} \left\{ \begin{array}{lll} \partial _t u - \nabla \cdot (\kappa (u)\nabla u) = f \quad &{}\text { in } \Omega \times [0,T], \\ u(.,t) = 0 \qquad &{}\text { on } \partial \Omega \times [0,T], \\ u(.,0) = u^0 \qquad &{}\text { in } \Omega \end{array} \right. \end{aligned}
(3)

and we will assume that its semi-discretized form reads

\begin{aligned} \left\{ \begin{array}{l} {\dot{\varvec{u}}} + A(\varvec{u})\, \varvec{u}= \varvec{f}\quad \text { in } [0,T], \\ \varvec{u}(0) = \varvec{u}^0 \end{array}\right. \end{aligned}

with $$A(\varvec{u})$$ sparse, symmetric positive definite for any $$\varvec{u}$$, uniformly bounded. Let us now consider time discretization. Usually, time advance schemes for such kind of problems are chosen implicit or semi-implicit for stability purposes. As an exemple, the pure explicit Euler time advance scheme

\begin{aligned} \frac{\varvec{u}^{n+1}-\varvec{u}^n}{\Delta t} - \nabla \cdot (\kappa (\varvec{u}^n)\nabla \varvec{u}^n)=f \end{aligned}

where $$\varvec{u}^{n}\simeq \varvec{u}(.,t^n)$$, $$t^{n+1}=t^n+\Delta t$$ has a too restrictive numerical stability domain with typically $$\Delta t =O(h^2)$$, h being representative of the space step size. Semi-implicit linear schemes in the form

\begin{aligned} \frac{\varvec{u}^{n+1}-\varvec{u}^{n}}{\Delta t} - \nabla \cdot (\kappa (\varvec{u}^n)\nabla \varvec{u}^{n+1})=f \end{aligned}

show a far better stability domain but require the update of the stiffness matrix with the solution of a large linear sparse system at each time step. Finally, full implicit schemes

\begin{aligned} \frac{\varvec{u}^{n+1}-\varvec{u}^n}{\Delta t} - \nabla \cdot (\kappa (\varvec{u}^{n+1})\nabla \varvec{u}^{n+1})=f \end{aligned}

provide strong numerical stability but require fixed-point (Newton or quasi-Newton) algorithms for their numerical solution, what becomes computationally time-consuming.

This paper gives an overview of recent alternative time advance schemes with interesting algorithmic features, including the possibility of parallel computations. First for linear problems, we will introduce the PARAEXP algorithm based on a superposition principle for achieving parallel-in-time computation. For nonlinear problems, the iterative LATIN method is a kind of splitting approach by alternating global linear solutions and local nonlinear projections. We will then discuss more general fixed point algorithms with a special focus on Newton and quasi-Newton methods, separation of linear terms and nonlinear residuals in an implicit-explicit discretization strategy, then time sub-domain decomposition and parallel-in-time computing involving coarse global and fine local propagators in the PARAREAL method.

## The PARAEXP algorithm

Numerical methods allowing parallelization in the time direction have been thought since a long time (see Nievergelt  in 1964) and have known great developments particularly in the last decade because of today’s growing HPC platforms. Among time-parallel solvers, the PARAEXP algorithm introduced by Gander and Güttel  in 2013 is dedicated to linear ordinary differential problems, that is problems in the form

\begin{aligned} \left\{ \begin{array}{l} {\dot{\varvec{u}}} + A \varvec{u}= \varvec{f}(t), \quad t\in [0,T], \\ \varvec{u}(0) = \varvec{u}^0 \end{array}\right. \end{aligned}
(4)

especially when $$\varvec{f}(t)$$ is varying fastly in time. Problem (4) has a solution written in integral form thanks to the variation-of-constant formula:

\begin{aligned} \varvec{u}(t) = \exp (-tA)\varvec{u}^0 + \int _0^t \exp (-(t-\tau )A)\varvec{f}(\tau )\, d\tau . \end{aligned}
(5)

If we want to take advantage of (5) for deriving a numerical computational method, in particular we need a high-order quadrature formula of the integral term. If the $$\varvec{f}(t)$$ are fast varying source terms, quadrature may become irrelevant from the accuracy point of view. Gander and Güttel rather propose to split the problem over p sub-domains in time and use a superposition principle based on independent problems set onto different time domains:

1. 1.

First, define a partitioning of the time domain [0,T] into p time sub-intervals $$[T_{j-1},T_j]$$, $$j=1,...,p$$, $$0=T_0<T_1<...<T_p=T$$;

2. 2.

For each $$j=1,...,p$$, solve the initial zero value problem

\begin{aligned} {\dot{\varvec{v}}_j}(t) = -A \varvec{v}_j(t) + \varvec{f}(t), \quad \varvec{v}_j(T_{j-1})=0,\quad t\in [T_{j-1}, T_j]; \end{aligned}
(6)
3. 3.

For each $$j=1,...,p$$, solve the homogeneous problem

\begin{aligned} {\dot{\varvec{w}}}_j(t) = -A \varvec{w}_j(t), \quad \varvec{w}_j(T_{j-1})=\varvec{v}_{j-1}(T_{j-1}), \quad t\in [T_{j-1},T] \end{aligned}
(7)

(with the notation $$\varvec{v}_0(T_0):=\varvec{u}^0$$).

It is clear that by a superposition principle, on can synthesize a solution $$\varvec{u}$$ of (4) by the summation formula

\begin{aligned} \varvec{u}(t) = \varvec{v}_k(t) + \sum _{j=1}^k \varvec{w}_j(t)\quad \text { for } k \text { such that } t\in [T_{k-1},T_{k}], \ k\in \{1,...,p\}. \end{aligned}
(8)

The PARAEXP algorithm is dedicated to parallel computing architectures, otherwise of course there is no benefit to execute it sequentially on one processor. It is remarkable to notice that good implementations of PARAEXP do not require any communication until the solution synthesis step, so the theoretical optimal efficiency is 1 before synthesis. Of course there is an issue of load balancing between processors because for a uniform time domain partitioning, some processors (especially the first one) are doing more work than others. The algorithm is graphically summarized in Fig. 1.

Another key of performance is the fast computation of the matrix exponentials. The solution of the homogeneous problem (7) in $$[T_{j-1},T]$$ is

\begin{aligned} \varvec{w}_j(t) = \exp (-At) \varvec{v}_{j-1}(T_{j-1}) \end{aligned}

and thus has to be evaluated many times (at any time t in fact). There are many approaches to compute accurate approximate matrix exponentials as commented in . A way is to search approximations into the Krylov subspace $$K^M$$ of dimension M:

\begin{aligned} K^M = \text {span}(\varvec{u}^0, A\varvec{u}^0,\ldots ,A^{M-1}\varvec{u}^0), \end{aligned}

looking for a best approximation into the truncated series expansion. We will come back on this issue when reduced order models (ROM) will be introduced below.

## Nonlinear problems: an implicit-explicit IMEX time advance scheme

Hereafter, we switch to the nonlinear case considering the nonlinear heat equation as reference example. Before going into iterative and parallel algorithms, let us first consider a variant implicit-explicit (IMEX) time advance scheme introduced by Filbet, Negulescu and Yang  in 2012. The idea is to consider an implicit linear diffusion term with an upper bound of the thermal conductivity, and explicit remainding terms at the right hand side:

\begin{aligned} \frac{u^{n+1}-u^n}{\Delta t} - \underbrace{\nabla \cdot (\overline{\kappa } \nabla u^{n+1})}_{\text {linear, constant coefficients}} = \underbrace{-\nabla \cdot \big ( [\overline{\kappa }-\kappa (u^n)]\nabla u^n \big )}_{\text {varying coefficients, depend on } u} + \ f, \end{aligned}
(9)

with

\begin{aligned} \overline{\kappa }\ \ge \ \sup _{x,t}\ \kappa (u(x,t)). \end{aligned}

As demonstrated in , let us show that the semi-discrete in time scheme is stable in the one-dimensional case for a certain norm. Consider the homogeneous case $$f=0$$ with homogeneous Neumann boundary conditions to simplify. We multiply (9) on $$u^{n+1}$$ on the domain $$\Omega =(0,1)$$, hence we have

\begin{aligned} \frac{1}{2}\int _0^1 |u^{n+1}|^2\, ds - \frac{1}{2}\int _0^1 |u^n|^2\, ds\le & {} \int _0^1 \left( (u^{n+1})^2 - u^{n+1} u^n \right) \, ds \\\le & {} \Delta t\! \int _0^1 \left( (\overline{\kappa }-\kappa (u^n)) \partial _x u^n \partial _x u^{n+1} - \overline{\kappa }(\partial _x u^{n+1})^2\right) \, ds \end{aligned}

Let us recall the Peter-Paul inequality (extended Young’s inequality): for any nonnegative real numbers a and b, we have $$ab \le \varepsilon a^2/2 + b^2(2\varepsilon )$$ for every $$\varepsilon >0$$. Using the assumption that $$\kappa (u^n)\le \overline{\kappa }$$, and applying Peter-Paul’s inequality with $$a=|\partial _x u^n|$$ we obtain

\begin{aligned} (\overline{\kappa }-\kappa (u^n))\partial _x u^n \partial _x u^{n+1}\le & {} \frac{\varepsilon }{2} (\partial _x u^n)^2 + \frac{(\overline{\kappa }-\kappa (u^n))^2}{2\varepsilon }(\partial _x u^{n+1})^2 \\\le & {} \frac{\varepsilon }{2} (\partial _x u^n)^2 +\frac{\overline{\kappa }^2}{2\varepsilon }(\partial _x u^{n+1})^2. \end{aligned}

Therefore with the choice $$\varepsilon =\overline{\kappa }$$, we have the weighted Sobolev norm decrease

\begin{aligned} \frac{1}{2}\int _0^1 (u^{n+1})^2 dx + \frac{\overline{\kappa }}{2}\Delta t\int _0^1 |\partial _x u^{n+1}|^2\,dx \le \frac{1}{2}\int _0^1 (u^{n})^2 dx + \frac{\overline{\kappa }}{2}\Delta t\int _0^1 |\partial _x u^{n}|^2\,dx. \end{aligned}

This semi-discretization leads to a full discrete scheme in the form

\begin{aligned} \frac{\varvec{u}^{n+1} - \varvec{u}^n}{\Delta t} + \overline{A}\, \varvec{u}^{n+1} = \varvec{g}(\varvec{u}^n) \end{aligned}
(10)

with $$\varvec{g}(\varvec{u})$$ in the form $$\varvec{g}(\varvec{u})= (\overline{A}-A(\varvec{u}))\varvec{u}+ \varvec{f}$$. What is interesting of course is that the matrix of the implicit part is constant, and thus has to be assembled and factorized once. Moreover the system is linear in the variable $$\varvec{u}^{n+1}$$. Unfortunately, the PARAEXP algorithm cannot be applied directly here because the right hand side $$\varvec{g}(\varvec{u}^n)$$ depends on the solution itself.

## Iterative methods: the LATIN approach

A usual way to deal with nonlinear equations numerically is to use an iterative process within a fixed point algorithm. The LATIN (LArge Time INcremental) method pioneered by Ladevèze  and since then broadly used in computational structural Mechanics and material science (see  for a recent reference) solves time-dependent problems (linear or nonlinear) according to a two-step iterative process. To separate the difficulties, equations are partitioned into two groups: (i) a group of equations being local in space and time, possibly nonlinear (representing equilibrium equations for example); (ii) a group of linear equations, possibly global in the spatial variable. Then ad-hoc space-time approximations methods are used for the treatment of the global problem. Of course, space-time local equations can be solved in parallel, what makes the LATIN method efficient and suitable for today’s HPC facilities. Let us emphasize that with LATIN, it is possible to solve hard nonlinear mechanics problems including thermodynamics irreversible problems (plasticity, friction as examples).

As an llustration, let us describe the LATIN method on the (rather simple) nonlinear heat problem:

1. 1.

Initialization ($$k=0$$): let $$u_{(0)}\in L^2((0,T), H^1(\Omega ))$$ an approximate solution (in space and time) of the nonlinear problem (it can be an approximate solution obtained with a coarse solver for example); compute $$\tilde{\kappa }_{(0)}=\kappa (u_{(0)})$$;

2. 2.

Iterate k, step 1 (global linear solution). Solve the linear problem

\begin{aligned} \partial _t u_{(k+1)} - \nabla \cdot (\tilde{\kappa }_{(k)} \nabla u_{(k+1)}) = f \end{aligned}

with given initial and boundary conditions.

3. 3.

Iterate k, step 2 (local projection over the admissible manifold). Compute

\begin{aligned} \tilde{\kappa }_{(k+1)} = \kappa (u_{(k+1)}). \end{aligned}
4. 4.

Check convergence, $$k\leftarrow k+1$$ if not and go to 2.

The step 1 performs a global (linear) evolution of the solution whereas a pointwise nonlinear projection on the equilibrium conductivity coefficients is done in step 2. We have a natural convergence indicator in terms of distance between the frozen conductivity $$\tilde{\kappa }$$ and $$\kappa (u_{(k)})$$:

\begin{aligned} e_{(k)}=\left\| \kappa (u_{(k+1)})-\tilde{\kappa }_{(k)} \right\| . \end{aligned}

In particular, if $$\kappa$$ is Lipschitz continuous with Lipschitz constant L, then

\begin{aligned} e_{(k)} \le L \Vert u_{(k+1)}-u_{(k)}\Vert . \end{aligned}

Remark that step 2 can be performed in parallel (in time).

For better and faster convergence, one can imagine variant approaches using a relaxation approach: remark first that $$\kappa (u)$$ is (formally) solution of the partial differential equation

\begin{aligned} \partial _t \kappa (u) - \nabla \cdot (\kappa (u)\nabla \kappa (u))= \kappa '(u)f\ { - \ \kappa ''(u)\kappa (u) |\nabla u^2|}. \end{aligned}

If $$\kappa$$ is a strictly convex function for example, then the second term at the right hand side is negative. One may consider the approximate (augmented) problem

\begin{aligned} \left\{ \begin{array}{l} \partial _t u -\nabla \cdot (\tilde{\kappa }\nabla u) = f, \\ \partial _t \tilde{\kappa }- \nabla \cdot (\overline{\kappa }\, \nabla \tilde{\kappa }) = \kappa '(u) f + \dfrac{\kappa (u)-\tilde{\kappa }}{\varepsilon }. \end{array}\right. \end{aligned}
(11)

where $$\varepsilon >0$$ is a given relaxation time (assumed to be rather small). By this way, it is expected that $$\tilde{\kappa }$$ evolves much closer toward the value $$\kappa (u)$$. One can then derive an iterative process with again two steps ($$\hbox {linear solution} + \hbox {projection}$$) as in the LATIN method.

## Newton and quasi-Newton approaches

For the sake of simplicity, let us consider here the initial value problem with general autonomous system of ordinary differential equations

\begin{aligned} {\dot{\varvec{u}}} = \varvec{f}(\varvec{u}), \quad t\in (0,T], \end{aligned}
(12)

with $$\varvec{f}$$ assumed to be differentiable and Lipschitz continuous, and initial condition $$\varvec{u}(0)=\varvec{u}^0$$. The solution $$\varvec{u}\in L^2((0,T), \mathbb {R}^d)$$ can be seen as the zero of a nonlinear operator $$\varvec{G}$$,

\begin{aligned} \varvec{G}(\varvec{u}):={\dot{\varvec{u}}}-\varvec{f}(\varvec{u})=0. \end{aligned}

The directional derivative of $$\varvec{G}$$ at point $$\varvec{u}$$ in the direction $$\varvec{v}$$ is

\begin{aligned} D\varvec{G}(\varvec{u})\varvec{v}= {\dot{\varvec{v}}} - D\varvec{f}(\varvec{u})\,\varvec{v}. \end{aligned}

Then the standard Newton-Raphson method applied to $$\varvec{G}$$ reads for the k-th iterate

\begin{aligned} D\varvec{G}(\varvec{u}_{(k)})(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}) = -\varvec{G}(\varvec{u}_{(k)}) \end{aligned}

that simplifies into

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)}= & {} \varvec{f}(\varvec{u}_{(k)}) + D \varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}) \nonumber \\= & {} D \varvec{f}(\varvec{u}_{(k)})\varvec{u}_{(k+1)} + \left( \varvec{f}(\varvec{u}_{(k)})-D \varvec{f}(\varvec{u}_{(k)}) \varvec{u}_{(k)}\right) \end{aligned}
(13)

Hence, the Newton-Raphson method provides a sequence of linear problems (of unknown $$\varvec{u}_{(k+1)}$$) with variable coefficients and sources (depending on $$\varvec{f}$$ and $$\varvec{u}_{(k)}$$).

### Spectral structure of the linearized problem

Let us emphasize that, at a given k, the linear system has the expected spectral structure for approximate solutions near an equilibrium $${\overline{\varvec{u}}}$$, that is $$\varvec{f}({\overline{\varvec{u}}})=0$$. For $$\varvec{u}_{(k+1)}$$ close to $${\overline{\varvec{u}}}$$, we have

\begin{aligned} \frac{d}{dt}(\varvec{u}_{(k+1)}-{\overline{\varvec{u}}})= & {} D\varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k+1)}-{\overline{\varvec{u}}}) + \Big [ \varvec{f}(\varvec{u}_{(k)})-\varvec{f}({\overline{\varvec{u}}})-D\varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k)}-{\overline{\varvec{u}}})\Big ] \end{aligned}

For $$\varvec{u}_{(k)}$$ close to $${\overline{\varvec{u}}}$$ and $$\varvec{f}\in \mathscr {C}^2$$ we have

\begin{aligned} \varvec{f}(\varvec{u}_{(k)})-\varvec{f}({\overline{\varvec{u}}})-D\varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k)}-{\overline{\varvec{u}}}) = O(|\varvec{u}_{(k)}-{\overline{\varvec{u}}}|^2), \end{aligned}

then

\begin{aligned} \frac{d}{dt}(\varvec{u}_{(k+1)}-{\overline{\varvec{u}}}) \simeq D\varvec{f}({\overline{\varvec{u}}})(\varvec{u}_{(k+1)}-{\overline{\varvec{u}}}) \end{aligned}

which is the expected linearized system.

### Quasi-Newton approach

As an additional approximation, a quasi-Newton method will replace the Jacobian matrix $$D\varvec{f}(\varvec{u}_k)$$ by an approximate one $$A_{(k)}\simeq D\varvec{f}(\varvec{u}_k)$$, simpler to compute, thus giving the iterative process

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)}= & {} \varvec{f}(\varvec{u}_{(k)}) + A_{(k)}\,(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}) \nonumber \\= & {} A_{(k)}\,\varvec{u}_{(k+1)} + \left( \varvec{f}(\varvec{u}_{(k)})-A_{(k)}\varvec{u}_{(k)}\right) \end{aligned}

If we are able to build some coarse approximation $$\varvec{g}$$ of $$\varvec{f}$$ such that the quasi-Newton secant condition

\begin{aligned} A_{(k)}\,(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}) = \varvec{g}(\varvec{u}_{(k+1)}) - \varvec{g}(\varvec{u}_{(k)}) \end{aligned}
(14)

is satisfied, we get the Jacobian-free quasi-Newton iteration

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)} = \varvec{f}(\varvec{u}_{(k)}) + \left( \varvec{g}(\varvec{u}_{(k+1)}) - \varvec{g}(\varvec{u}_{(k)}) \right) \end{aligned}
(15)

or equivalently

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)} = \varvec{g}(\varvec{u}_{(k+1)}) + \left( \varvec{f}(\varvec{u}_{(k)}) - \varvec{g}(\varvec{u}_{(k)}) \right) . \end{aligned}
(16)

In (16), $$\varvec{g}(\varvec{u}_{(k+1)})$$ can be seen as a predictor term, whereas $$(\varvec{f}(\varvec{u}_{(k)}) - \varvec{g}(\varvec{u}_{(k)}))$$ is a corrector term toward $$\varvec{f}$$ depending on the iterate (k) only. By construction we retrieve the accuracy of $$\varvec{f}$$ at convergence. A quasi-Newton secant condition ensures superlinear convergence according to the Dennis and Moré theorem.

## The PARAREAL method

The recent PARAREAL method, initially proposed by Lions et al.  in 2001, is nothing else but a parallel-in-time version of the quasi-Newton method (16) above. In PARAREAL, the time domain is decomposed into p subdomains. Then we define a double-index sequence of approximate solutions $$\varvec{u}_{(k)}^j$$, where k still denotes the current index of the iterative process and j is the number of the time subdomain $$[T_{j-1},T_j]$$. In its regular current form (see [3, 4]), the PARAREAL algorithm is defined as follows:

1. 1.

Define a partition in time $$[T_{j-1},T_{j}]$$, $$0=T_0<T_1<...<T_p=T$$;

2. 2.

Define a cheap coarse propagator $$\mathscr {G}$$ and a fine propagator $$\mathscr {F}$$.

3. 3.

Initialization ($$k=0$$): $$\varvec{u}_{(0)}^0 = \varvec{u}^0$$, $$\varvec{u}_{(0)}^{j+1} = \mathscr {G}(\varvec{u}_{(0)}^j)$$;

4. 4.

Loop on the iterates k:

\begin{aligned} \varvec{u}_{(k+1)}^{j+1} = {\mathscr {G}(\varvec{u}_{(k+1)}^j)} \ + \ {\left( \mathscr {F}(\varvec{u}_{(k)}^j) - \mathscr {G}(\varvec{u}_{(k)}^j) \right) } \end{aligned}
(17)
5. 5.

Check convergence, test the stop criterion.

The PARAREAL algorithm is graphically represented in the schematics of Fig. 2. From (17) and the graph dependency of Fig. 2, one can understand that each corrector term on time slice j

\begin{aligned} \left( \mathscr {F}(\varvec{u}_{(k)}^j) - \mathscr {G}(\varvec{u}_{(k)}^j) \right) \end{aligned}

can be evaluated in parallel over the p processors. On the other hand the coarse propagator term $$\mathscr {G}(\varvec{u}_{(k+1)}^j)$$ induces a persistent sequential part into the algorithm but it is expected to be evaluated quite fast. The trade-off is to design a fast, “accurate enough” coarse propagator which does not affect the whole performance of the algorithm.

One can imagine different choices of coarse solvers: low-order accurate time advances schemes, simplified equations, simplified models, discretizations on coarser meshes, etc. Reference papers like Bal and Maday  and Baffico et al.  show general convergence theorems for nonlinear ordinary differential systems using coarse time integrators as coarse solvers. Gander and Hairer in  also show a superlinear convergence of the parareal algorithm.

## Putting all together

Actually, there are different ways to mix the strategies seen so far. As an example, let us still consider the nonlinear heat equation with time-varying source term:

\begin{aligned} \partial _t u - \nabla \cdot (\kappa (u)\nabla u ) = f(t). \end{aligned}

In the spirit of IMEX and LATIN, let us define the following iterative approach:

\begin{aligned}&\partial _t u_{(k+1)} - \nabla \cdot (\overline{\kappa }\nabla u_{(k+1)}) = f(t) + \nabla \cdot ((\overline{\kappa }-\kappa _{(k)})\nabla u_{(k)}), \end{aligned}
(18)
\begin{aligned}&\kappa _{(k+1)} = \kappa (u_{(k+1)}). \end{aligned}
(19)

On the left-hand side of the equation, we have replaced the thermal conductivity $$\kappa (u)$$ by some supremum as suggested by IMEX. In semi-discrete form, we get an equation in the form

\begin{aligned} {{\dot{\varvec{u}}}_{(k+1)} + \overline{A}\, \varvec{u}_{(k+1)}} = { \varvec{f}(t) + \varvec{r}_{(k)}}. \end{aligned}
(20)

We get a linear equation for the unknown $$\varvec{u}_{(k+1)}$$ with constant coefficient matrix $$\overline{A}$$, and the right hand side only depends on time through f(t) and $$\varvec{u}_{(k)}(t)$$. Then the PARAEXP algorithm can be applied at each iterative k. The remaining nonlinear operations like (19) and the assembling of $$\varvec{r}_{(k)}$$ can be done in parallel (in time). In conclusion, we have replaced a nonlinear problem by a sequence of linear problems where some nonlinear evaluations have been sent into the right hand side, and so can be computed in parallel.

## The Newton method to handle nonlinear terms with ROMs of dynamical systems

Reduced-order modeling is a general methodology to determine the principal information of a general high-dimensional problem and then reduce the problem, for example by projection. Reduction is generally possible when the M-Kolmogorov width

\begin{aligned} \delta _M(U) = \inf _{\begin{array}{c} V^M \text {linear space},\, V^M\subset V\\ \text {dim}(V^M)=M \end{array}}\quad \sup _{x\in U}\quad \inf _{y^M\in V^M} \Vert x-y^M\Vert _V. \end{aligned}

into an admissible close set U of a Banach space V is rather small for a rather small integer M (the dimension of the approximate space). One of the main motivations to do that is to strongly reduce the computational cost for the numerical solution. Even if there are recent advances in nonlinear reduced order modeling, in particular with the empirical interpolation method (EIM) proposed by Maday et al. , or discrete empirical interpolation method (DEIM) by Chaturantabut and Sorensen , there are still some issues and open problems for nonlinear time-dependent problems. Dealing with general nonlinear terms and reduced-order modeling for dynamical systems may be a difficult task, because:

• reduced-order models are expected to reproduce the stability of the system (for instance in the sense of Lyapunov, see  on this subject);

• the local dynamics has to be reproduced, at least “at first order”, involving a compatibility of the spectral properties between full and reduced systems;

• the area visited by the trajectories into the state-space may be defined over a nonlinear manifold rather than in a linear subspace. Thus nonlinear dimensionality reduction methods would be better candidates for reduction.

Balanced truncation strategy [1, 22] for example is a trade-off in the reduction process to provide sufficient accuracy for controllability and observability of dynamical systems. However the theory mainly deals with linear time-invariant (LTI) systems.

For time-dependent problems, one can adopt a greedy incremental strategy during time by adapting/enriching the low-dimensional subspace when the principal components are changing during time. But the price to pay is to online evaluate some (high-dimensional) nonlinear terms to control the error, what can be a penalizing factor of performance. If there is no other choice, parallel-in-time computing once again appears to be a complementary tool to keep global performance of the method.

### Newton method and Galerkin projection method

Let us go back to the Newton method (13) that we rewrite here again

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)}=D \varvec{f}(\varvec{u}_{(k)})\varvec{u}_{(k+1)} + \varvec{f}(\varvec{u}_{(k)})-D \varvec{f}(\varvec{u}_{(k)}) \varvec{u}_{(k)}. \end{aligned}

Let us consider a Galerkin approximation into the linear vector space

\begin{aligned} V^M_{(k)} = \text {span}(\varvec{w}^1_{(k)},\ldots ,\varvec{w}^M_{(k)}) \end{aligned}

and assume that $$(\varvec{w}^\ell _{(k)},\varvec{w}^m_{(k)})=\delta _{\ell m}$$, $$1\le \ell ,m\le M$$. We are looking for an approximate rank-M solution $$\varvec{u}_{k+1}^M(t)$$ in $$V^M_{(k)}$$, i.e.

\begin{aligned} \varvec{u}_{(k+1)}^M(t) = \sum _{m=1}^M a^m_{(k+1)}(t)\, \varvec{w}^m_{(k)} \end{aligned}
(21)

for some real coefficients $$a^m_{(k+1)}(t)$$, $$1\le m\le M$$ at time t. In order to get a reduced system, the Eq. (13) is projected onto the vector space $$V_{(k)}^M$$. Multipling (13) by any test function $$\varvec{v}^M\in V_{(k)}^M$$, we look for a low-order solution $$\varvec{u}_{(k+1)}^M(t)$$ in the form (21) such that

\begin{aligned} \left( {\dot{\varvec{u}}}_{(k+1)}^M, \varvec{v}^M \right) = \left( D \varvec{f}(\varvec{u}_{(k)})\varvec{u}_{(k+1)}^M,\varvec{v}^M \right) + \left( \varvec{f}(\varvec{u}_{(k)})-D \varvec{f}(\varvec{u}_{(k)}) \varvec{u}_{(k)},\varvec{v}^M \right) , \quad \forall \varvec{v}^M\in V_{(k)}^M. \end{aligned}

Taking $$\varvec{v}^M=\varvec{w}^m_{(k)}$$, $$1\le m\le M$$, by orthogonality of the eigenvectors we get

\begin{aligned} {\dot{ a}}^m_{(k+1)} = \sum _{\ell =1}^M a^\ell _{(k+1)} (D\varvec{f}(\varvec{u}_{(k)})\varvec{w}^\ell _{(k)},\varvec{w}^m_{(k)}) + (\varvec{f}(\varvec{u}_{(k)})-D\varvec{f}(\varvec{u}_{(k)})\varvec{u}_{(k)},\varvec{w}^m_{(k)}). \end{aligned}

In vector form, one obtains a reduced system in the form

\begin{aligned} {\dot{\varvec{a}}}_{(k+1)}^M = \tilde{A}_{(k)}^M(t)\,\varvec{a}_{(k+1)}^M + \varvec{r}_{(k)}^M(t) \end{aligned}

with $$\varvec{a}_{(k+1)}^M(t)=(a_{(k+1)}^m(t))_m$$, $$(\tilde{A}_{(k)}^M)_{\ell m}(t)=(D\varvec{f}(\varvec{u}_{(k)}(t))\varvec{w}^\ell _{(k)},\varvec{w}^m_{(k)})$$ and $$(\varvec{r}_{(k)}^M(t))_m=(\varvec{f}(\varvec{u}_{(k)}(t))-D\varvec{f}(\varvec{u}_{(k)}(t))\varvec{u}_{(k)},\varvec{w}^m_{(k)})$$. Remark that when the initial system is linear, i.e. $$\varvec{f}(\varvec{u})=A \varvec{u}$$, we retrieve the classical Galerkin projection over the space $$V^M$$:

\begin{aligned} {\dot{ \varvec{a}}}_{(k+1)}^M = \tilde{A}_{(k)}^M\,\varvec{a}_{(k+1)}^M \end{aligned}

with a constant matrix $$\tilde{A}_{(k)}^M$$, $$(\tilde{A}_{(k)}^M)_{\ell m} = (A\varvec{w}^\ell _{(k)},\varvec{w}^m_{(k)})$$. The assembling of both $$\tilde{A}_{(k)}^M(t)$$ and $$\varvec{r}_{(k)}^M(t)$$ requires high-dimensional operations, but, fortunately, one can do this task in parallel (in time). Thus, one can nonetheless expect to get rather high performance. To summarize, at this stage of analysis, the algorithm of reduced-order modeling is the following:

1. 1.

(initialization). Use a coarse solver and compute $$\varvec{u}_{(0)}$$. Loop over (k):

2. 2.

Compute M principal components $$\varvec{w}_{(k)}^m$$, $$m=1,\ldots ,M$$ or a suitable reduced basis from the knowledge of $$\varvec{u}_{(k)}$$.

3. 3.

Assemble and compute in parallel $$\tilde{A}_{(k)}^M(t)$$ and $$\varvec{r}_{(k)}^M(t)$$ at all the discrete times.

4. 4.

Solve the linear problem

\begin{aligned}&{\dot{ \varvec{a}}}_{(k+1)}^M = \tilde{A}_{(k)}^M(t)\,\varvec{a}_{(k+1)}^M + \varvec{r}_{(k)}^M(t), \quad t\in (0,T]\\&\varvec{a}_{(k+1)}^M(0) = \varvec{a}^0_{(k+1)}\in \mathbb {R}^M, \end{aligned}

and compute

\begin{aligned} \varvec{u}_{(k+1)}^M(t) = \sum _{m=1}^M a^m_{(k+1)}(t)\, \varvec{w}^m_{(k)}. \end{aligned}
5. 5.

Test convergence after iterate k.

### Remark 1

For the computation of the basis functions $$\varvec{w}_{(k)}^m$$, one can of course use Proper Orthogonal Decomposition (POD)  or any other dimensionality reduction method. The update the reduced basis may also be done by incrementing the basis set within an adaptive learning algorithm.

### Remark 2

In the step 3, it is assumed that both $$\tilde{A}_{(k)}(t)$$ and $$\varvec{r}_{(k)}(t)$$ have to be assembled and computed at all the discrete times. Of course, that may appear too penalizing for achieving high performance. Actually, one can consider additional reduction strategies for approximating both Jacobian matrix and right hand sides. This will be the aim of the following “Discussion” section.

There are many options to improve the whole numerical complexity of the algorithm using some additional approximations or reduction strategies.

### Freezing up the Jacobian matrices

Let us go back to the Newton method

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)} = \varvec{f}(\varvec{u}_{(k)}) + D \varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}) \end{aligned}

where the correction term $$D\varvec{f}(\varvec{u}_{(k)})(\varvec{u}_{(k+1)}-\varvec{u}_{(k)})$$ ensures quadratic convergence when it is converges. As already discussed in “Newton and quasi-Newton approaches”, one can approximate the Jacobian matrix by some approximation $$A_{(k)}(t)$$ which is cheaper to evaluate, leading to the quasi-Newton approach

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)} = \varvec{f}(\varvec{u}_{(k)}) + A_{(k)}(t)\,(\varvec{u}_{(k+1)}-\varvec{u}_{(k)}). \end{aligned}

The matrices $$A_{(k)}$$ still depend on time t a priori. But one could consider frozen approximates Jacobian matrices $$A_{(k)}^j$$ of time slices $$[T_j,T_{j+1}]$$, further inviting for a parallel-in-time strategy.

If we do not want to worry about Jacobian matrices, then the other option is to consider a coarse model $$\varvec{g}$$ of $$\varvec{f}$$ as mentioned in “Newton and quasi-Newton approaches” section. In this case, the quasi-Newton iteration reads

\begin{aligned} {\dot{\varvec{u}}}_{(k+1)} = \varvec{f}(\varvec{u}_{(k)}) + \left( \varvec{g}(\varvec{u}_{(k+1)})-\varvec{g}(\varvec{u}_{(k)})\right) . \end{aligned}

In order to achieve an efficient reduced-order model, one have now to deal with the nonlinear term $$\varvec{g}(\varvec{u}_{(k+1)})$$. An efficient and tractable way to proceed is to use an empirical interpolation method (EIM, ) for that. In that case, we can even make $$\varvec{g}$$ depend on (k), according to some adaptive learning process (greedy algorithm, inflating basis, etc). Remark finally that the iterative process can once again be set up into a parallel-in-time framework following ideas from the PARAREAL algorithm.

### Achieving dimensionality reduction for $$\varvec{f}$$

If possible, one can also use a reduced-order approximation for $$\varvec{f}$$. If the iterative algorithm is expected to converge towards a solution that has the same order of accuracy than the original one, one have to consider an accurate reduced-order model for $$\varvec{f}$$. Once the empirical interpolation method may help us for that. However, if a global-in-time reduction strategy is considered, it is possible that the dimension M of the low-order vector space becomes too large, leading to a degradation of the whole performance.

An alternative approach would be to consider a family of local-in-time empirical interpolation methods for $$\varvec{f}$$. In this case, we should also consider local models $$\varvec{f}_{(k)}^j$$ available in the time slice $$[T_j, T_{j+1}[$$ which can also be updated at each k from a learning process.

## Approximate exponential integrators

In order to make the PARAEXP algorithm globally efficient, it is essential to compute fast and accurate approximate exponential integrators. In the case of the linear heat equation, we have to compute the exponential of a large scale, symmetric sparse matrix A. More precisely, for the the problem $${\dot{\varvec{u}}} = A\varvec{u}$$ with initial data $$\varvec{u}(0)=\varvec{u}^0$$, we have to compute the solution $$\varvec{u}(t)=\exp (tA)\varvec{u}^0$$ for any $$t\in [0,T]$$.

As mentioned in , there are numerous techniques to determine accurate exponential matrices. Among then, one can for example mention Padé approximants, exponentially fitted integration methods, or approximations based on projections over Krylov subspaces

\begin{aligned} K^M = [\varvec{u}^0 \ A\varvec{u}^0 \ A^2 \varvec{u}^0 \ \ldots \ (A^{M-1}\varvec{u}^0)]. \end{aligned}

through Arnoldi orthogonalization iterations . Actually the Krylov-Galerkin projection can be seen as a reduced-order technique, with a suitable reduced basis that fits action of matrix exponentials. But of course there are other choices of suitable basis functions like the first eigenvectors $$\phi ^m$$ of A:

\begin{aligned} A \phi ^m = \lambda ^m \phi ^m. \end{aligned}

For A symmetric positive definite with eigenvalues arranged in increasing order, that is $$0<\lambda ^1\le \lambda ^2 \le \ldots \lambda ^M \le \ldots$$, it is natural to consider from the approximation error point of view the M first eigenvectors of A as vectors spanning the reduced approximation subspace. We will denote $$\tilde{A}$$ the projection of A on this discrete subspace and of course we have $$\text {rank}(A)=M$$. Considering the iterative approach of linear problems

\begin{aligned}&{\dot{\varvec{u}}}_{(k+1)} = \tilde{A} \varvec{u}_{(k)} + (A-\tilde{A})\varvec{u}_{(k)}, \quad t\in [0,T], \end{aligned}
(22)
\begin{aligned}&\varvec{u}(0)=\varvec{u}^0, \end{aligned}
(23)

by superposition principle, one can first consider the low-order homogeneous problem

\begin{aligned} {\dot{\varvec{v}}}_{(k+1)}= & {} \tilde{A}\, \varvec{v}_{(k+1)}, \\ \varvec{v}_{(k+1)}(0)= & {} \varvec{u}^0 \end{aligned}

for which we have an efficient low-order exponential solution, and on the other side the high-dimensional problem with zero initial value

\begin{aligned} {\dot{\varvec{w}}}_{(k+1)}= & {} \tilde{A}\, \varvec{w}_{(k+1)} + (A-\tilde{A})\varvec{u}_{(k)}, \\ \varvec{w}_{(k+1)}(0)= & {} 0, \end{aligned}

then $$\varvec{u}_{(k+1)}(t)=\varvec{v}_{(k+1)}(t)+\varvec{w}_{(k+1)}(t)$$. In the spirit of the PARAEXP algorithm, one can set up the superposition principle within a parallel-in-time time decomposition to deal separately with low-order homogeneous exponential solution and high-dimensional inhomogeneous problems.

## Closing discussion

From this review on efficient time-advance solvers including IMEX, LATIN, PARAEXP and PARAREAL algorithms, we try to show the different ways and tracks to deal with large-scale dynamical systems, linear and/or nonlinear terms. For the sake of an easy discussion, we have taken the example of the heat equation (linear or nonlinear). We are aware that this may be too restrictive and nonlinear computational mechanics including for example thermodynamics irreversible problems need more efforts and technical developments. Among the methods discussed above, some of them have been designed to address these problems. This is the case for the LATIN approach for example.

Time parallelization appears to be a promising key element of speedup. For problems with a small Kolmogorov width, reduced-order modeling may be a supplementary methodology to accelerate the whole time advance solution. For numerous reasons, it is interesting to cast a nonlinear problem into a sequence of linear problems within an iterative process. Linear problems are easier to deal with, and there are dedicated tools like the parallel-in-time PARAEXP method. On the other hand, an iterative process allows for achieving multi-fidelity adaptive solvers, using incremental, greedy or learning algorithms. Of course, we have to keep in mind that iterative methods may not converge. So in the design process of the numerical approach, one has to answer to the following questions: is the whole iterative process stable, is it possible to prove the convergence ? If the method is convergent, what is the rate of convergence ? Is it possible to accelerate the convergence ? At convergence, is it sure that the iterative algorithm converges to the solution obtaines with the accuracy we paid at the finest level ? For parallel algorithms, what is the effective speedup ?

Last but not least, managing multi-fidelity models and multi-level reduced-order models as well as parallel-in-time algorithms and learning algorithms implemented on distributed memory computer architecture necessarily require data management efforts and smart software engineering.

## Conclusions

The first aim of this paper is to review different efficient time-advance solvers (including IMEX, PARAEXP, LATIN, PARAREAL) and show connections between them. We also try to show the links with quasi-Newton approaches and relaxation/projection methods to deal with nonlinear terms. Parallel-in-time algorithms appear to be a complementary and promising framework for the fast solution of time-dependent problems. Finally, reduced-order models (POD-based, principal eigenstructure, a priori reduced bases, ...) can be possibly included to achieve better performance. In a future paper, we will achieve numerical experiments on different hybrid approaches.

## References

1. 1.

Antoulas AC. An overview of approximation methods for large-scale dynamical systems. Annu Rev Control. 2005;29:181–90.

2. 2.

Audouze C, De Vuyst F, Nair PB. Nonintrusive reduced-order modeling of parametrized time-dependent partial differential equations. Numeri Methods Partial Differen Equ. 2013;29(5):1587–628.

3. 3.

Baffico L, Bernard S, Maday Y, Turinici G, Zrah G. Parallel-in-time molecular dynamics simulations. Phys Rev E. 2002;66(5):057701.

4. 4.

Bal G, Maday Y. A “parareal” time discretization for nonlinear PDEs with application to the pricing of an American put. Recent developments in domain decomposition methods. Berlin: Springer; 2002. p. 189–202.

5. 5.

Chaturantabut S, Sorensen DC. Nonlinear model reduction via discrete empirical interpolation. SIAM J Sci Comput. 2010;32(5):2737–64.

6. 6.

Chinesta F, Ammar A, Lemarchand F, Beauchène P, Boust F. Parallel time integration and high resolution homogenization. CMAME. 2008;197(5):400–13.

7. 7.

Cortial J, Fahrat C, Guibas LJ, Rajashekhar M. Compressed sensing and time-parallel reduced-order modeling of structural health monitoring using DDDAS. Computational science-ICCS 2007. Berlin: Springer; 2007. p. 1171–9.

8. 8.

Filbet F, Negulescu C, Yang C. Numerical study of a nonlinear heat equation for plasma Physics. Int J Comp Math. 2012;89(8):1060–82.

9. 9.

Gander MJ, Hairer E. Nonlinear convergence analysis for the PARAREAL algorithm. In: Domain decomposition methods in Science and Engineering. 2008. vol. 60, p. 4556.

10. 10.

Gander MJ, Güttel S. PARAEXP: a parallel integrator for linear initial-value problems. SIAM J Sci Comput. 2013;35(2):C123–42.

11. 11.

Gander MJ. 50 years of time parallel time integration. Multiple shooting and time domain decomposition. Berlin: Springer; 2015.

12. 12.

Hochbruck M, Lubich C. On Krylov subspace approximations to the matrix exponential operator. SIAM J Num Anal. 1997;34(5):1911–25.

13. 13.

Kalashnikova I, Barone MF. On the stability of a Galerkin reduced order model (ROM) of compressible flow with solid wall and far-field boundary treatment. IJNME. 2010;83:1345–75.

14. 14.

Kalashnikova I, Barone MF, Arunajatesan S, von Bloemen Waanders BG. Construction of energy-stable projection-based reduced order models. Appl Math Comp. 2014;249:569–96.

15. 15.

Ladevèze P. Non linear computational structural mechanics new approaches and non-incremental methods of calculation. New York: Springer-Verlag; 1999.

16. 16.

Ladevèze P, Passieux JC, Néron D. The LATIN multiscale computational method and the proper generalized decomposition. Comput Methods Appl Mech Eng. 2010;199(21):1287–96.

17. 17.

Lions J, Maday Y, Turinici G. A “parareal” in time discretization of PDE’s. Comptes Rendus de l’Académie des Sciences, Séries I, Mathematics. 2001;332(7):661–8.

18. 18.

Maday Y, Nguyen NC, Patera AT, Pau GSH. A general multipurpose interpolation procedure: the magic points. Commun Pure Appl Anal. 2009;81:383–404.

19. 19.

Minion M. A hybrid parareal spectral deferred corrections method. Commun Appl Math Comput Sci. 2010;5(2):265–301.

20. 20.

Nievergelt J. Parallel methods for integrating ordinary differential equations. Comm ACM. 1964;7:731–3.

21. 21.

Prud’homme C, Rovas D, Veroy K, Maday Y, Patera AT, Turinici G. Reliable real time solution of parametrized partial differential equations: reduced-basis output bound methods. J Fluids Eng. 2002;124(1):70–80.

22. 22.

Willcox K, Peraire J. Balanced model reduction via the proper orthogonal decomposition. AIAA J. 2002;40(11):2323–30. 