## 1 Introduction

In recent years the increasing availability of High Performance Computing (HPC) resources strongly promoted the widespread of Large Eddy Simulation (LES) turbulence modelling approaches. In particular, Implicit LES (ILES) based on discontinuous Galerkin (dG) spatial discretizations showed very promising results due to the favourable dispersion and dissipation properties of the method Bassi.Botti.ea_LES_DNS:2015 . The high potential of dG approximations for the under-resolved simulation of turbulent flows has been demonstrated in the literature for those moderate Reynolds numbers conditions where Reynolds-averaged Navier–Stokes (RANS) approaches are known to fall short, *e.g.*, massively separated flows chapelier2014evaluation ; wiart2015implicit .

Research on this topic is growing fast and several efforts focused on devising efficient time integration strategies suited for massively parallel architectures. Indeed, the inherently unsteady nature of LES/ILES and the need to reduce time-to-results pose serious challenges to the achievement of cost effective scale-resolving computations and the ability to fruitfully exploit large computational facilities. In this context high-order implicit time integration schemes are attractive to overcome the strict stability limits of explicit methods when dealing with high-degree polynomial approximations, Bassi20071529 ; uranga2011implicit ; bassi2015linearly . Nevertheless, implicit schemes require to solve large non-linear/linear systems of equations. The sparsity pattern of the global system matrix imposes the use of iterative methods, indeed the number of non-zeros scales as , where is the degree of dG polynomial spaces. As a result high-order accurate computations for industrially relevant application turns out to be highly memory intensive and expensive from the CPU time point of view, even when employing state-of-the-art iterative solvers and modern HPC facilities.

Previous studies considered the possibility of using memory-saving implementations of the iterative solver. In Crivellini201181

, a matrix-free GMRES solver was used to solve steady compressible flows. In matrix-free algorithms the coefficient matrix of a linear system is not stored explicitly, but accesses by evaluating matrix-vector products. The strategy showed significant memory savings, while being at the same time competitive on the overall computational efficiency for sufficiently high-order polynomial approximations when using ILU(0) and Additive Schwarz preconditioners. In

sarshar2017numerical a matrix-free approach is employed in the context of several time integration strategies with applications to unsteady, laminar two-dimensional problems. In ceze2016development the use of a matrix-free GMRES is proposed for the solution of both the primal and adjoint problem with applications to compressible Navier–Stokes equations.Recently the use of a matrix-free implementation was explored and compared to a matrix-based approach in the context of incompressible unsteady turbulent flows, see Franciolini2017276 ; crivellini2017matrix . In particular the solution of the Rayleigh–Benard convection problem and turbulent channel flows at moderately high Reynolds numbers was considered coupling matrix-free with cheap element-wise Block-Jacobi (EWBJ) preconditioners. Unfortunately, when dealing with stiff problems, *e.g.*, stretched elements, low Mach flows or large time step, a severe convergence degradation is observed when using EWBJ preconditioners: the solution is achieved only after a considerable number of iterations.

Multievel methods have been considered in the past as an efficient way to solve both linear and non-linear problems arising from high-order discontinuous Galerkin discretizations. Such methods were first proposed in a dG context by Helenbrook et al. helenbrook2003analysis , Bassi and Rebay bassi2003numerical , Fidkowski et al. fidkowski2005p . Those authors focused on the analysis of a *p*-multigrid (*p*-MG) non-linear solver, proving convergence properties and performance in the context of compressible flows using element- or line-Jacobi smoothing. Several authors also considered multigrid operators built on agglomerated coarse grids, such as *h*-multigrid, see for example prill2009smoothed ; wallraff2015multigrid ; antonietti2015multigrid .
The possibility of using multigrid operators as a preconditioner was also explored in the context of steady compressible flows, see for example shahbazi2009multigrid ; diosady2009preconditioning . In these works, the algorithm is reported as the most efficient and scalable if compared to single-grid preconditioners, and a large reduction in the number of iteration to reach convergence was achieved.
More recently, an *h*-multigrid preconditioner was proposed in botti2017h in the context of steady and unsteady incompressible flows. In this latter work a specific treatment for inherited dG discrezations of viscous terms on coarse levels was introduced, significantly improving the performance of the multigrid iteration.

The aim of the present paper is to extend the memory-saving implementation proposed in Franciolini2017276 ; Crivellini201181 to deal with stiff unsteady turbulent flow problems. The time-accurate numerical framework relies on linearly-implicit Runge–Kutta schemes of the Rosenbrock type. This class of schemes requires the solution of a linear systems at each stage while the matrix is assembled only once per time step. For the linear systems solution we rely on a matrix-free implementation of the Flexible GMRES (FGMRES) method, coupled with a *p*-MG preconditioner. In particular, the *p*-multigrid iteration is built using a matrix-free GMRES smoother on the finest level, and standard matrix-based GMRES smothers on coarser levels. The *h*- rescaled-inherited approach proposed in botti2017h is here extended to the version of the multigrid preconditioner and its impact on the convergence of the iterative method investigated.
Target applications involve the high-order accurate ILES of complex flow configurations using unstructured meshes made of severely stretched and curved elements. In this paper we demonstrate the effectiveness of the proposed coupling between a matrix-free linear solver and a *p*-multigrid preconditioner showing large benefits both in terms of the iterative solver convergence rate and of the CPU time while maintaining a favourable memory footprint.

The paper is structured as follows. Section 2 describes the space and time discretization and presents the multigrid framework here employed, with particular attention to the coarse spaces assembly, the rescaled-inherited approach and the intergrid transfer operators. Sections 3.1 and 3.2 report a thorough assessment of the stabilization scaling on test cases of growing complexity, both steady and unsteady: the Poisson problem, the unsteady flow over a two dimensional cylinder at , and the unsteady flow over a sphere at . Finally, Section 3.3 demonstrates the advantages of using the proposed solver for the solution of the T3L1 flow problem of the ERCOFTAC test case suites, *i.e.*, the incompressible turbulent flow over a rounded leading-edge flat plate at with different levels of free-stream turbulence. After a brief physical discussion of the solution accuracy, we report significant memory savings as well as improvements in computational efficiency with respect to matrix-based methods.

## 2 The numerical framework

In this section the space and time discretizations of the incompressible Navier–Stokes (INS) equations are briefly introduced together with a detailed description of the main building blocks of the *p*-multigrid preconditioner.

### 2.1 The dG discretization

We consider the unsteady INS equations in conservation form with Dirichlet and Neumann boundary conditions in a fixed Cartesian reference frame,

(1a) | |||||

(1b) | |||||

(1c) | |||||

(1d) | |||||

(1e) |

where is the velocity vector, is the pressure, denotes the (constant) viscosity, is the initial condition, is the final simulation time, is the unit outward normal to and ,

, is the identity matrix. The density has been assumed to be uniform and equal to one and the Stokes hypothesis is used for the definition of viscous stresses.

Regarding boundary conditions in (1c)-(1d), and are the boundary data to be imposed on Dirichlet and Neumann boundaries, respectively, such that . In the case that , we also require , where denotes the mean value over . Note that, while using the divergence free constraint (1b) we get

this simplification is unsuitable for Neumann boundaries.

Introducing the convective and viscous flux functions

(2) |

Eqs. (1a)-(1b) can be compactly rewritten in integral form as

In order to define the dG discretization we introduce a triangulation of the computational domain , that is the collection of disjoint mesh elements such that , where is a suitable approximation of . The mesh skeleton is the collection of mesh faces . Internal faces are defined as the intersection of the boundary of two neighboring elements: with . Boundary faces reads . Clearly .

Each component of the velocity vector and the pressure is sought (for )
in the so called *broken polynomial spaces* defined over

(4) |

where is the space of polynomial functions in variables and total degree defined over . Since no continuity requirements are enforced at inter-element boundaries, admits two-valued traces on the partition of mesh skeleton . Accordingly we introduce average and jump operators over internal faces

(5) |

Specific definitions of averages and jumps will be introduced over boundary faces to take into account Dirichlet and Neumann boundary conditions.

The dG discretization of the Navier-Stokes equations reads: find such that, for all :

(6) |

where and is the normal vector with respect to . While obtaining (6) from (3) follows the standard dG FE practice (element-by-element integration by parts after having multiplied by a suitable test function), the dG method hinges on the definition of suitable numerical viscous , and inviscid fluxes , .

According to the BR1 scheme, proposed in BR-jcpns97 , , the consistent gradient is such that

where is the local lifting operator, is the elemental lifting operator and is the set of faces belonging to . In this work we rely on the BR2 scheme, introduced to reduce the stencil of the BR1 discretization and analyzed in the context of the Poisson problem by bmmpr-nmpde and Arnold.Brezzi.ea:2002 . The BR2 viscous fluxes are functions of elemental spatial derivatives corrected by suitable lifting operator contributions

(7) |

ensures consistency and stability of the scheme and guarantees the symmetry of the formulation. As proved by Brezzi et al. bmmpr-nmpde , coercivity for the BR2 discretization of the Laplace equation holds provided that is greater than the maximum number of faces of the elements sharing . In order to impose boundary conditions on , viscous fluxes reads

(8) |

The inviscid numerical fluxes of the dG discretization
result from the exact solution of local Riemann problems
based on an artificial compressibility perturbation of the Euler equations, as proposed in Bassi.Crivellini.eq:2006 .
Boundary conditions for inviscid fluxes are enforced weakly
by properly defining a ghost boundary state
having support on the interface of a ghost neighboring elements .
The ghost boundary state is defined imposing the conservation of Riemann invariants
based on the hyperbolic nature of the artificial compressibility perturbation of the Euler equation.
Accordingly, both the internal state and the boundary data
( or , in case of Dirichlet or Neumann boundary conditions, respectively)
are involved in the definition of .
We remark that, since the performance of the inherited *p*-multigrid strategy
here considered is mostly affected by projections of viscous flux operators,
it is not relevant to provide additional details regarding the inviscid flux treatment.

### 2.2 Implicit time accurate integration

The accurate time integration of the spatially dG discretized INS equations can be presented in compact form by collecting the velocity vector and the pressure polynomial expansions in the vector and identifying the unknown vector at time with , that is . Moreover, we introduce the flux functions and , collecting the viscous and inviscid flux contributions. For all , we define the residual of the dG spatial discretization in (6) as follows

(9) |

where we dropped the mesh step size subscript when working in index notation. For all , the Jacobian of the residual reads . In particular we distinguish the inviscid and the viscous contributions

(10) |

(11) | ||||

Note that, since is a linear function, (11) is a bilinear form, while, by abuse of notation, (10) is a bilinear (resp. trilinear) when is a linear (resp. non-linear) function of .

In this work time accurate integration is performed via the multi-stage linearly implicit (Rosenbrock-type) Runge-Kutta method. As an appealing feature the method requires the solution of a linear system at each stage , while the Jacobian matrix needs to be assembled only once per time step. Prior to introducing the formulation for the temporal discretization we define the following mass bilinear form: for all

(12) |

Given the initial condition we define the sequence iteratively by means of the Rosenbrock scheme as described in the following algorithm.

(13) |

where , , and are real coefficients proper of the Rosenbrock scheme and , with , the solutions at each stage of the scheme that are properly combined to compute the solution at the next time level. The Rosenbrock time marching strategy in Algorithm 1 advances the solution in time by repeatedly solving the linearized system of equations(13), once for each stage of the Runge-Kutta method. Introducing the Jacobian and mass matrix operators

(14) |

the equation system (13) can be compactly rewritten as follows:

(15) |

where is the global matrix operator, and are the unknown polynomial function and the right-hand side arising from the linearly-implicit Runge-Kutta time discretization, respectively. In this work the four stages, order three (ROSI2PW) scheme of Rang and Angermann Lang.Verwer:2001 was employed. This scheme preserves its formal accuracy when applied to the system of DAEs arising form the spatial discretization of the INS equations.

### 2.3 Multigrid preconditioners

In this work we investigate the possibility to solve the global equation system (15)
by means of a FGMRES iterative solver preconditioned with *p*-multigrid.
Since the linearized equations system needs to be repeatedly solved at each time step, the efficiency
of the linear solver strategy is of crucial importance for the effectiveness of the whole method.
The basic idea is to exploit iterative solvers to smooth-out the high-frequency component of the error with respect to the unknown exact solution. Indeed, being iterative solvers not effective at damping low-frequency error components, the iterative solution of coarser problems is exploited to circumvent this issue, thus shifting the low-frequency modes towards the upper side of the spectrum. This simple and effective strategy allows to obtain satisfactory rates of convergence all along the iterative process.

In *p*-multigrid the coarse problems are built based on lower-order dG discretizations with respect to the original problem of degree . We consider coarse levels spanned by the index and indicate the fine and coarse levels with and , respectively. The polynomial degree of level is and the polynomial degrees of the coarse levels are chosen such that , , with . Accordingly the polynomial spaces associated to the coarse levels read .
The coarse problems corresponding to (15) are in the form

(16) |

where is the global matrix operator on level and are the unknown function and the known right-hand side, respectively.

A crucial aspect for the efficiency of the *p*-multigrid iteration is related to the computational cost
of building coarse grid operators .
While it is possible to assemble the bilinear and trilinear forms , and of Section 2.2
on each level with the corresponding polynomial functions , significantly better performances are achievable by *restricting* the fine grid operator by means of so called Galerkin projections.
The former and the latter strategies are named non-inherited and inherited *p*-multigrid, respectively.
As will be clear in what follows the construction of coarse operators is trivial when
polynomial expansions are based on hierarchical orthonormal modal basis functions.

#### 2.3.1 Restriction and prolongation operators

In this section we describe the prolongation and restriction operators required to map polynomial functions on finer and coarser levels, respectively.

Since , the prolongation operator , is the injection operator such that

The prolongation operator from level to level can be recursively defined by the composition of inter-grid prolongation operators: .

The ( projection) restriction operator , is such that

(17) |

and the restriction operator from level to level reads .

When applied to vector functions

the interpolation operators act componentwise,

*e.g.*, . It is interesting to remark that using hierarchical orthonormal modal basis functions restriction and prolongation operators are trivial, in particular restriction from into

is as simple as keeping the degrees of freedom of the modes of order

and discarding the remaining high-frequency modes.### 2.4 Fine and coarse grid Jacobian operators

The non-inherited and the inherited version (denoted with superscript ) of the inviscid and viscous Jacobian operators introduced in (10)-(11), can be defined as follows for

(18) |

The main benefit of inherited algorithms is the possibility to efficiently compute coarse grid operators by means of the so called Galerkin projection, avoiding the cost of assembling bilinear and trilinear forms. The procedure is described in what follows, focusing on the benefits of using hierarchical orthonormal basis functions.

The matrix counterpart of the operator is a sparse block matrix with block dimension and total dimension . The matrix is composed of diagonal blocks and off-diagonal blocks , the latter taking care of the coupling between neighboring elements sharing a face . Once the fine system matrix is assembled, the diagonal and off-diagonal blocks of the Jacobian matrix of coarse levels can be inherited recursively and matrix-free as follows

(19) |

The projection matrices read

(20) |

and represents the set of basis functions spanning the space . When using hierarchical orthonormal basis functions, is the unit diagonal elemental mass matrix and is a unit diagonal rectangular matrix. Accordingly the Galerkin projection in (19) falls back to a trivial and inexpensive sub-block extraction.

Being , it can be demonstrated that
inherited and non-inherited *p*-multigrid algorithms lead to the same inviscid Jacobian operators, that is .
As opposite inherited and non-inherited viscous Jacobian differ because of the terms involving lifting operators.
Note that inherited lifting operators act on traces of polynomial functions mapped into , accordingly

inherited p-multigrid lifting operators, |
(21) | |||

non-inherited p-multigrid lifting operators, |
(22) |

Interestingly, using the definitions of the global and local lifting operators, the bilinear form can be rewritten as follows

(23) |

showing that only the last term, i.e., the stabilization term, cannot be reformulated lifting-free.
In particular, as will be demonstrated in the following section,
the inherited stabilization term
introduces an excessive amount of stabilization with respect to its non-inherited counterpart,
which is detrimental for multigrid algorithm performance.
The same behaviour was previously documented in the context of *h*-multigrid solution strategies,
see botti2017h where the authors consider dG discretizations of the INS equations and AntoniettiDiosBrenner , where preconditioners for
weakly over-penalized symmetric interior penalty dG discretization of elliptic problems are devised.

#### 2.4.1 Scaling of the stabilization term

Following the idea proposed by botti2017h we consider the possibility to introduce a rescaled Galerkin projection
of the stabilization term in order to recover the optimal performances of non-inherited *p*-multigrid algorithm.

As a first point we recall the following bound on the local lifting operator: let , for all

(24) |

where ,
see *e.g.*(Brezzi.Manzini.ea:2000, , Lemma 2), (Toselli03, , Lemma 7.2) or (DiPiErn11, , Lemma 4.33 and Lemma 5.18) for a proof.
The constant depends on , and the shape regularity of the elements sharing and
is inherited from the discrete trace inequality: for all ,

(25) |

As remarked by Di Pietro and Ern (DiPiErn11, , Lemma 1.46) the dependence of on

is a delicate issue that has a precise answer only in specific cases. In this work we follow the estimates given by Hesthaven and Warburton

Warburton03 showing that for simplicial meshes scales as when using complete polynomials of maximum degree . This choice turns out to be conservative regarding the dependence on with respect to estimates derived by Schwab Schwab1998based on tensor product polynomials on mesh elements being affine images of the unit hypercube in

, which suggest a scaling.Using the Cauchy-Schwarz inequality, for all we get

where is independent from and . As a result we are able to introduce the scaling factor such that, for all , it holds

(26) |

The viscous Jacobian stabilization operator reads

(27) |

and, accordingly, the Jacobian stabilization diagonal and off-diagonal block contributions and can be computed recursively and matrix free by means of a rescaled Galerkin projection. The rescaled-inherited blocks of the Jacobian matrix are computed as follows

(28) | ||||

(29) |

where . We remark that and are the Jacobian blocks corresponding to inviscid contributions plus the viscous contributions without the stabilization terms.

####
2.4.2 The *p*-multigrid iteration

In this section we provide an overlook of the sequence of operations involved in *p*-multigrid iterations.
The recursive *p*-multigrid -cycle and full *p*-multigrid -cycle
for the problem on level reads:

Comments

There are no comments yet.