Automatic Empirical Performance Modeling of Parallel Programs
Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes or solving larger problems. Often, such performance bugs manifest themselves only when the code is put into production, a point where remediation can be difficult. Manuall...
Main Author: | |
---|---|
Format: | Others |
Language: | en |
Published: |
2018
|
Online Access: | https://tuprints.ulb.tu-darmstadt.de/7234/25/dissertation.pdf Calotoiu, Alexandru <http://tuprints.ulb.tu-darmstadt.de/view/person/Calotoiu=3AAlexandru=3A=3A.html> (2018): Automatic Empirical Performance Modeling of Parallel Programs.Darmstadt, Technische Universität, [Ph.D. Thesis] |
id |
ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-7234 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
description |
Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes or solving larger problems. Often, such performance bugs manifest themselves only when the code is put into production, a point where remediation can be difficult. Manually creating analytical performance models provides insights into optimization opportunities but is extremely costly if done for applications of realistic size. The effort limits application developers to only attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. Furthermore, tuning large applications requires a clever exploration of the design and configuration space. Especially on supercomputers, this space is so large that its exhaustive traversal via performance experiments becomes too expensive, if not impossible.
If we have to consider multiple performance-relevant parameters and their possible interactions at the same time, a common requirement in many situations, this task becomes even more complex.
The initial contribution of this thesis is a method to substantially improve both coverage and speed of performance modeling and analysis. Generating an empirical performance model automatically for each part of a parallel program with respect to the variation of a relevant parameter such as process count or problem size, it becomes possible to easily identify those parts that will reduce performance at larger core counts or when solving a bigger problem.
In the next step, we extended the approach with a method capable of modeling any combination of multiple execution parameters simultaneously, provided sufficient performance measurements are available. Multi-parameter modeling has so far been outside the reach of automatic methods due to the exponential growth of the model search space. Specialized heuristics developed as part of this work traverse the search space rapidly and generate insightful performance models that enable a wide range of uses from performance predictions for balanced machine design to performance tuning.
Finally we present a method that employs automated performance modeling to quickly predict application requirements for varying scales and problem sizes. Following this approach, it is possible to determine future requirements of major scientific applications, derive an optimization strategy, and illustrate system design tradeoffs in the light of their requirements. This supports the co-design process by informing hardware acquisition decisions with the actual needs of the software.
The methods described in this work are implemented in the performance analysis tool Extra-P. Extra-P has been released as open source and has been successfully used to gain insight into the performance of numerous scientific applications from a large range of fields.
Since its release, Extra-P has an impact on the HPC community. Developers at both universities and research centers have used Extra-P to better understand the performance of their research codes.
Tutorials on the use of Extra-P have been offered at international conferences such as EuroMPI and Supercomputing further demonstrating the effectiveness of this approach in making performance modeling available to developers without requiring expert knowledge of the topic.
This work simplifies and streamlines the performance modeling process, offering insights into application behavior quickly and automatically and allowing the developer to focus on transforming these insights into tangible performance improvements. |
author |
Calotoiu, Alexandru |
spellingShingle |
Calotoiu, Alexandru Automatic Empirical Performance Modeling of Parallel Programs |
author_facet |
Calotoiu, Alexandru |
author_sort |
Calotoiu, Alexandru |
title |
Automatic Empirical Performance Modeling of Parallel Programs |
title_short |
Automatic Empirical Performance Modeling of Parallel Programs |
title_full |
Automatic Empirical Performance Modeling of Parallel Programs |
title_fullStr |
Automatic Empirical Performance Modeling of Parallel Programs |
title_full_unstemmed |
Automatic Empirical Performance Modeling of Parallel Programs |
title_sort |
automatic empirical performance modeling of parallel programs |
publishDate |
2018 |
url |
https://tuprints.ulb.tu-darmstadt.de/7234/25/dissertation.pdf Calotoiu, Alexandru <http://tuprints.ulb.tu-darmstadt.de/view/person/Calotoiu=3AAlexandru=3A=3A.html> (2018): Automatic Empirical Performance Modeling of Parallel Programs.Darmstadt, Technische Universität, [Ph.D. Thesis] |
work_keys_str_mv |
AT calotoiualexandru automaticempiricalperformancemodelingofparallelprograms |
_version_ |
1719327490965504000 |
spelling |
ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-72342020-07-15T07:09:31Z http://tuprints.ulb.tu-darmstadt.de/7234/ Automatic Empirical Performance Modeling of Parallel Programs Calotoiu, Alexandru Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes or solving larger problems. Often, such performance bugs manifest themselves only when the code is put into production, a point where remediation can be difficult. Manually creating analytical performance models provides insights into optimization opportunities but is extremely costly if done for applications of realistic size. The effort limits application developers to only attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. Furthermore, tuning large applications requires a clever exploration of the design and configuration space. Especially on supercomputers, this space is so large that its exhaustive traversal via performance experiments becomes too expensive, if not impossible. If we have to consider multiple performance-relevant parameters and their possible interactions at the same time, a common requirement in many situations, this task becomes even more complex. The initial contribution of this thesis is a method to substantially improve both coverage and speed of performance modeling and analysis. Generating an empirical performance model automatically for each part of a parallel program with respect to the variation of a relevant parameter such as process count or problem size, it becomes possible to easily identify those parts that will reduce performance at larger core counts or when solving a bigger problem. In the next step, we extended the approach with a method capable of modeling any combination of multiple execution parameters simultaneously, provided sufficient performance measurements are available. Multi-parameter modeling has so far been outside the reach of automatic methods due to the exponential growth of the model search space. Specialized heuristics developed as part of this work traverse the search space rapidly and generate insightful performance models that enable a wide range of uses from performance predictions for balanced machine design to performance tuning. Finally we present a method that employs automated performance modeling to quickly predict application requirements for varying scales and problem sizes. Following this approach, it is possible to determine future requirements of major scientific applications, derive an optimization strategy, and illustrate system design tradeoffs in the light of their requirements. This supports the co-design process by informing hardware acquisition decisions with the actual needs of the software. The methods described in this work are implemented in the performance analysis tool Extra-P. Extra-P has been released as open source and has been successfully used to gain insight into the performance of numerous scientific applications from a large range of fields. Since its release, Extra-P has an impact on the HPC community. Developers at both universities and research centers have used Extra-P to better understand the performance of their research codes. Tutorials on the use of Extra-P have been offered at international conferences such as EuroMPI and Supercomputing further demonstrating the effectiveness of this approach in making performance modeling available to developers without requiring expert knowledge of the topic. This work simplifies and streamlines the performance modeling process, offering insights into application behavior quickly and automatically and allowing the developer to focus on transforming these insights into tangible performance improvements. 2018 Ph.D. Thesis NonPeerReviewed text CC-BY-NC-ND 4.0 International - Creative Commons, Attribution Non-commerical, No-derivatives https://tuprints.ulb.tu-darmstadt.de/7234/25/dissertation.pdf Calotoiu, Alexandru <http://tuprints.ulb.tu-darmstadt.de/view/person/Calotoiu=3AAlexandru=3A=3A.html> (2018): Automatic Empirical Performance Modeling of Parallel Programs.Darmstadt, Technische Universität, [Ph.D. Thesis] en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess |