Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-s...

Full description

Bibliographic Details
Main Author:	Panwar, Lokendra Singh
Other Authors:	Computer Science
Format:	Others
Published:	Virginia Tech 2014
Subjects:	Heterogeneous Computing Graphics Processing Unit (GPU) GPU Emulation Performance Modeling Finite Difference Method Seismology Modeling
Online Access:	http://hdl.handle.net/10919/50585

id	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-50585
record_format	oai_dc
spelling	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-505852020-09-29T05:45:59Z Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures Panwar, Lokendra Singh Computer Science Feng, Wu-Chun Athanas, Peter M. Cao, Yong Heterogeneous Computing Graphics Processing Unit (GPU) GPU Emulation Performance Modeling Finite Difference Method Seismology Modeling Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection of best processor for an application, effective performance optimization strategies, maintaining performance portability across architectures etc. In this thesis, we present our techniques and approach to address some of these significant issues. Firstly, we present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases of this technique include scheduling or migrating GPU workloads over a heterogeneous cluster with different types of GPUs. We then present our approach to accelerate a seismology modeling application that is based on the finite difference method (FDM), using MPI and CUDA over a hybrid CPU+GPU cluster. We describe the generic computational complexities involved in porting such applications to the GPUs and present our strategy of efficient performance optimization and characterization. We also show how performance modeling can be used to reason and drive the hardware-specific optimizations on the GPU. The performance evaluation of our approach delivers a maximum speedup of 23-fold with a single GPU and 33-fold with dual GPUs per node over the serial version of the application, which in turn results in a many-fold speedup when coupled with the MPI distribution of the computation across the cluster. We also study the efficacy of GPU-integrated MPI, with MPI-ACC as an example implementation, in a seismology modeling application and discuss the lessons learned. Master of Science 2014-10-22T08:00:34Z 2014-10-22T08:00:34Z 2014-10-21 Thesis vt_gsexam:2246 http://hdl.handle.net/10919/50585 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection	NDLTD
format	Others
sources	NDLTD
topic	Heterogeneous Computing Graphics Processing Unit (GPU) GPU Emulation Performance Modeling Finite Difference Method Seismology Modeling
spellingShingle	Heterogeneous Computing Graphics Processing Unit (GPU) GPU Emulation Performance Modeling Finite Difference Method Seismology Modeling Panwar, Lokendra Singh Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
description	Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection of best processor for an application, effective performance optimization strategies, maintaining performance portability across architectures etc. In this thesis, we present our techniques and approach to address some of these significant issues. Firstly, we present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases of this technique include scheduling or migrating GPU workloads over a heterogeneous cluster with different types of GPUs. We then present our approach to accelerate a seismology modeling application that is based on the finite difference method (FDM), using MPI and CUDA over a hybrid CPU+GPU cluster. We describe the generic computational complexities involved in porting such applications to the GPUs and present our strategy of efficient performance optimization and characterization. We also show how performance modeling can be used to reason and drive the hardware-specific optimizations on the GPU. The performance evaluation of our approach delivers a maximum speedup of 23-fold with a single GPU and 33-fold with dual GPUs per node over the serial version of the application, which in turn results in a many-fold speedup when coupled with the MPI distribution of the computation across the cluster. We also study the efficacy of GPU-integrated MPI, with MPI-ACC as an example implementation, in a seismology modeling application and discuss the lessons learned. === Master of Science
author2	Computer Science
author_facet	Computer Science Panwar, Lokendra Singh
author	Panwar, Lokendra Singh
author_sort	Panwar, Lokendra Singh
title	Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
title_short	Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
title_full	Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
title_fullStr	Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
title_full_unstemmed	Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
title_sort	performance modeling, optimization, and characterization on heterogeneous architectures
publisher	Virginia Tech
publishDate	2014
url	http://hdl.handle.net/10919/50585
work_keys_str_mv	AT panwarlokendrasingh performancemodelingoptimizationandcharacterizationonheterogeneousarchitectures
_version_	1719346161658101760

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Similar Items