A graphics processing unit-accelerated meshless method for two-dimensional compressible flows
A graphics processing unit (GPU) -accelerated meshless method is presented for solving two-dimensional compressible flows over aerodynamic bodies. The Compute Unified Device Architecture (CUDA) Fortran programming model is employed to port the meshless method from central processing unit to GPU as a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2017-01-01
|
Series: | Engineering Applications of Computational Fluid Mechanics |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/19942060.2017.1317027 |
Summary: | A graphics processing unit (GPU) -accelerated meshless method is presented for solving two-dimensional compressible flows over aerodynamic bodies. The Compute Unified Device Architecture (CUDA) Fortran programming model is employed to port the meshless method from central processing unit to GPU as a way of achieving efficiency, which involves implementation of CUDA kernels and management of data storage structure and thread hierarchy. The CUDA kernel subroutines are designed to meet with the point-based computing of the meshless method. The corresponding point-based data structure and thread hierarchy are constructed or manipulated in the paper by presenting two specific GPU implementations of the meshless method, which are developed for solving Navier–Stokes equations. The Jameson–Schmidt–Turkel scheme is used to estimate the flux terms of the Navier–Stokes equations and an explicit four-stage Runge–Kutta scheme is applied to update the solution at time level. After tuning the performances of the resulting two GPU-accelerated meshless solvers by changing the number of threads in a block, a set of typical flows over aerodynamic bodies are simulated for validation. Numerical results are shown in a comparison with available experimental data or computational values that appear in extant literature with an analysis of code performance. This reveals that the cost of computing time of the presented test cases is significantly reduced for both solvers without losing accuracy, while impressive speedups up to 64 times are achieved due to careful management of memory access. |
---|---|
ISSN: | 1994-2060 1997-003X |