Optimizing OpenFOAM GPU Solvers

The paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient metho...

Full description

Bibliographic Details
Main Author:	Alexander Monakov
Format:	Article
Language:	English
Published:	Ivannikov Institute for System Programming of the Russian Academy of Sciences 2018-10-01
Series:	Труды Института системного программирования РАН
Subjects:	openfoam cuda gpgpu метод сопряжённых градиентов предобуславливание на gpu оптимизация для gpu
Online Access:	https://ispranproceedings.elpub.ru/jour/article/view/1012

id	doaj-8bf04272efe7404b82249be27302c896
record_format	Article
spelling	doaj-8bf04272efe7404b82249be27302c8962020-11-25T02:06:24Zeng Ivannikov Institute for System Programming of the Russian Academy of SciencesТруды Института системного программирования РАН2079-81562220-64262018-10-012201012Optimizing OpenFOAM GPU SolversAlexander Monakov0ИСП РАНThe paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model. Porting the basic algorithm is straightforward, however care should be taken to avoid unnecessary copying over PCI-Express bus. Efficient preconditioning on the GPU is then discussed. We use approximate inverse preconditioning, which can be implemented with good parallelism on the GPU. To amortize the cost of preparing the preconditioner, we allow reuse of preconditioners on the GPU and compute them on the CPU in a helper thread asynchronously. We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using single-precision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method. Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run. As future work we mention support for parallel runs with MPI, research of other solvers such as multigrid, BiCGStab and IDR, and choosing drop tolerance automatically for the AINV preconditioner.https://ispranproceedings.elpub.ru/jour/article/view/1012openfoamcudagpgpuметод сопряжённых градиентовпредобуславливание на gpuоптимизация для gpu
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Alexander Monakov
spellingShingle	Alexander Monakov Optimizing OpenFOAM GPU Solvers Труды Института системного программирования РАН openfoam cuda gpgpu метод сопряжённых градиентов предобуславливание на gpu оптимизация для gpu
author_facet	Alexander Monakov
author_sort	Alexander Monakov
title	Optimizing OpenFOAM GPU Solvers
title_short	Optimizing OpenFOAM GPU Solvers
title_full	Optimizing OpenFOAM GPU Solvers
title_fullStr	Optimizing OpenFOAM GPU Solvers
title_full_unstemmed	Optimizing OpenFOAM GPU Solvers
title_sort	optimizing openfoam gpu solvers
publisher	Ivannikov Institute for System Programming of the Russian Academy of Sciences
series	Труды Института системного программирования РАН
issn	2079-8156 2220-6426
publishDate	2018-10-01
description	The paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model. Porting the basic algorithm is straightforward, however care should be taken to avoid unnecessary copying over PCI-Express bus. Efficient preconditioning on the GPU is then discussed. We use approximate inverse preconditioning, which can be implemented with good parallelism on the GPU. To amortize the cost of preparing the preconditioner, we allow reuse of preconditioners on the GPU and compute them on the CPU in a helper thread asynchronously. We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using single-precision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method. Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run. As future work we mention support for parallel runs with MPI, research of other solvers such as multigrid, BiCGStab and IDR, and choosing drop tolerance automatically for the AINV preconditioner.
topic	openfoam cuda gpgpu метод сопряжённых градиентов предобуславливание на gpu оптимизация для gpu
url	https://ispranproceedings.elpub.ru/jour/article/view/1012
work_keys_str_mv	AT alexandermonakov optimizingopenfoamgpusolvers
_version_	1724934099285573632

Optimizing OpenFOAM GPU Solvers

Similar Items