Optimization of Stencil Computations on GPUs

Bibliographic Details
Main Author:	Rawat, Prashant Singh
Language:	English
Published:	The Ohio State University / OhioLINK 2018
Subjects:	Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436

id	ndltd-OhioLink-oai-etd.ohiolink.edu-osu1523037713249436
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-osu15230377132494362021-08-03T07:05:53Z Optimization of Stencil Computations on GPUs Rawat, Prashant Singh Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering Stencil computations form the compute-intensive core of many scientific application domains, such as image processing of CT and MRI imaging, computational electromagnetics, seismic processing, and climate modeling. A stencil computation involves element-wise update of an output domain based on a fixed set of neighboring points from the input domain. Such stencil computations are either time iterated, or require successive application of multiple stencil operators on the input domains. Stencil optimization on multi- and many-core architectures has been an active research topic for the past two decades. Stencil computations traditionally have low arithmetic intensity with only a few floating-point operations performed relative to the data transferred per output point, and are therefore memory bandwidth-bound. Since the data movement cost consistently dominates the computational cost in modern architectures, most of these research efforts focus on reducing the data movement in stencils to tackle the bandwidth bottleneck. Consequently, several tiling techniques have been proposed over the years to exploit spatial and temporal reuse across a sequence of stencils or across multiple time steps for time iterated stencil. With the ever-increasing use of GPUs for general purpose computing, application developers have started exploring the acceleration of data-parallel stencils on GPUs. GPUs have lower data movement costs than the multi-core CPU architectures, and hence are an attractive target for accelerating memory bandwidth-bound stencil computations. At the same time, GPUs are compute-intensive with significantly higher number of registers per thread, and therefore suitable for accelerating stencil computations with high arithmetic intensity as well. The arithmetic intensity of a stencil is proportional to its <i>order</i>, which is the number of input elements read from the center along each dimension. In many scientific applications, high-order stencils provide better computational accuracy with lesser data movement than their low-order counterparts. However, the main performance bottleneck for high-order stencils on GPUs is the high register pressure, which causes excessive register spills or a steep drop in achieved parallelism, resulting in a subsequent performance loss.This dissertation proposes novel GPU-centric optimization strategies that address the performance bottlenecks for stencils with different arithmetic intensities: tiling and fusion heuristics for bandwidth-bound stencils with low arithmetic intensity, and register optimizations for high-order stencils with high arithmetic intensity. The proposed optimizations have been implemented into a DSL based stencil optimization framework, STENCILGEN, that can automatically generate high-performance CUDA code from an input DSL specification of the stencil computation. The efficacy of the proposed optimizations is demonstrated via empirical evaluation on a variety of 2D and 3D stencil kernels extracted from PDE solvers, image processing pipelines, and proxy DOE applications. 2018-08-10 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering
spellingShingle	Computer Science Stencil Computations GPGPU Register Pressure Fusion Tiling Instruction Reordering Rawat, Prashant Singh Optimization of Stencil Computations on GPUs
author	Rawat, Prashant Singh
author_facet	Rawat, Prashant Singh
author_sort	Rawat, Prashant Singh
title	Optimization of Stencil Computations on GPUs
title_short	Optimization of Stencil Computations on GPUs
title_full	Optimization of Stencil Computations on GPUs
title_fullStr	Optimization of Stencil Computations on GPUs
title_full_unstemmed	Optimization of Stencil Computations on GPUs
title_sort	optimization of stencil computations on gpus
publisher	The Ohio State University / OhioLINK
publishDate	2018
url	http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436
work_keys_str_mv	AT rawatprashantsingh optimizationofstencilcomputationsongpus
_version_	1719453625396232192

Optimization of Stencil Computations on GPUs

Similar Items