Effects of mesh loop modes on performance of unstructured finite volume GPU simulations

Abstract In unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same d...

Full description

Bibliographic Details
Main Authors: Yue Weng, Xi Zhang, Xiaohu Guo, Xianwei Zhang, Yutong Lu, Yang Liu
Format: Article
Language:English
Published: SpringerOpen 2021-07-01
Series:Advances in Aerodynamics
Subjects:
GPU
CFD
Online Access:https://doi.org/10.1186/s42774-021-00073-y
id doaj-10e08d555343458c8f4bc6a103d70b9a
record_format Article
spelling doaj-10e08d555343458c8f4bc6a103d70b9a2021-07-25T11:47:05ZengSpringerOpenAdvances in Aerodynamics2524-69922021-07-013112310.1186/s42774-021-00073-yEffects of mesh loop modes on performance of unstructured finite volume GPU simulationsYue Weng0Xi Zhang1Xiaohu Guo2Xianwei Zhang3Yutong Lu4Yang Liu5School of Computer Science and Engineering, Sun Yat-sen UniversitySchool of Computer Science and Engineering, Sun Yat-sen UniversityHartree Centre, STFC Daresbury LaboratorySchool of Computer Science and Engineering, Sun Yat-sen UniversitySchool of Computer Science and Engineering, Sun Yat-sen UniversityChina Aerodynamics Research and Development CenterAbstract In unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.https://doi.org/10.1186/s42774-021-00073-yGPUCFDFinite volumeUnstructured meshMesh loop modesData locality
collection DOAJ
language English
format Article
sources DOAJ
author Yue Weng
Xi Zhang
Xiaohu Guo
Xianwei Zhang
Yutong Lu
Yang Liu
spellingShingle Yue Weng
Xi Zhang
Xiaohu Guo
Xianwei Zhang
Yutong Lu
Yang Liu
Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
Advances in Aerodynamics
GPU
CFD
Finite volume
Unstructured mesh
Mesh loop modes
Data locality
author_facet Yue Weng
Xi Zhang
Xiaohu Guo
Xianwei Zhang
Yutong Lu
Yang Liu
author_sort Yue Weng
title Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
title_short Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
title_full Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
title_fullStr Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
title_full_unstemmed Effects of mesh loop modes on performance of unstructured finite volume GPU simulations
title_sort effects of mesh loop modes on performance of unstructured finite volume gpu simulations
publisher SpringerOpen
series Advances in Aerodynamics
issn 2524-6992
publishDate 2021-07-01
description Abstract In unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.
topic GPU
CFD
Finite volume
Unstructured mesh
Mesh loop modes
Data locality
url https://doi.org/10.1186/s42774-021-00073-y
work_keys_str_mv AT yueweng effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
AT xizhang effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
AT xiaohuguo effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
AT xianweizhang effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
AT yutonglu effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
AT yangliu effectsofmeshloopmodesonperformanceofunstructuredfinitevolumegpusimulations
_version_ 1721282722196881408