Optimization of Stencil Computations on GPUs
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2018
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1523037713249436 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu15230377132494362021-08-03T07:05:53Z Optimization of Stencil Computations on GPUs Rawat, Prashant Singh Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering Stencil computations form the compute-intensive core of many scientific application domains, such as image processing of CT and MRI imaging, computational electromagnetics, seismic processing, and climate modeling. A stencil computation involves element-wise update of an output domain based on a fixed set of neighboring points from the input domain. Such stencil computations are either time iterated, or require successive application of multiple stencil operators on the input domains. Stencil optimization on multi- and many-core architectures has been an active research topic for the past two decades. Stencil computations traditionally have low arithmetic intensity with only a few floating-point operations performed relative to the data transferred per output point, and are therefore memory bandwidth-bound. Since the data movement cost consistently dominates the computational cost in modern architectures, most of these research efforts focus on reducing the data movement in stencils to tackle the bandwidth bottleneck. Consequently, several tiling techniques have been proposed over the years to exploit spatial and temporal reuse across a sequence of stencils or across multiple time steps for time iterated stencil. With the ever-increasing use of GPUs for general purpose computing, application developers have started exploring the acceleration of data-parallel stencils on GPUs. GPUs have lower data movement costs than the multi-core CPU architectures, and hence are an attractive target for accelerating memory bandwidth-bound stencil computations. At the same time, GPUs are compute-intensive with significantly higher number of registers per thread, and therefore suitable for accelerating stencil computations with high arithmetic intensity as well. The arithmetic intensity of a stencil is proportional to its <i>order</i>, which is the number of input elements read from the center along each dimension. In many scientific applications, high-order stencils provide better computational accuracy with lesser data movement than their low-order counterparts. However, the main performance bottleneck for high-order stencils on GPUs is the high register pressure, which causes excessive register spills or a steep drop in achieved parallelism, resulting in a subsequent performance loss.This dissertation proposes novel GPU-centric optimization strategies that address the performance bottlenecks for stencils with different arithmetic intensities: tiling and fusion heuristics for bandwidth-bound stencils with low arithmetic intensity, and register optimizations for high-order stencils with high arithmetic intensity. The proposed optimizations have been implemented into a DSL based stencil optimization framework, STENCILGEN, that can automatically generate high-performance CUDA code from an input DSL specification of the stencil computation. The efficacy of the proposed optimizations is demonstrated via empirical evaluation on a variety of 2D and 3D stencil kernels extracted from PDE solvers, image processing pipelines, and proxy DOE applications. 2018-08-10 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering |
spellingShingle |
Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering Rawat, Prashant Singh Optimization of Stencil Computations on GPUs |
author |
Rawat, Prashant Singh |
author_facet |
Rawat, Prashant Singh |
author_sort |
Rawat, Prashant Singh |
title |
Optimization of Stencil Computations on GPUs |
title_short |
Optimization of Stencil Computations on GPUs |
title_full |
Optimization of Stencil Computations on GPUs |
title_fullStr |
Optimization of Stencil Computations on GPUs |
title_full_unstemmed |
Optimization of Stencil Computations on GPUs |
title_sort |
optimization of stencil computations on gpus |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2018 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 |
work_keys_str_mv |
AT rawatprashantsingh optimizationofstencilcomputationsongpus |
_version_ |
1719453625396232192 |