Tempest: A Framework for High Performance Thermal-Aware Distributed Computing

Compute clusters are consuming more power at higher densities than ever before. This results in increased thermal dissipation, the need for powerful cooling systems, and ultimately a reduction in system reliability as temperatures increase. Over the past several years, the research community has rea...

Full description

Bibliographic Details
Main Author: Pyla, Hari Krishna
Other Authors: Computer Science
Format: Others
Published: Virginia Tech 2014
Subjects:
Online Access:http://hdl.handle.net/10919/33198
http://scholar.lib.vt.edu/theses/available/etd-05242007-220451/
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-33198
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-331982020-09-26T05:38:07Z Tempest: A Framework for High Performance Thermal-Aware Distributed Computing Pyla, Hari Krishna Computer Science Varadarajan, Srinidhi Ramakrishnan, Naren Ribbens, Calvin J. Cameron, Kirk W. parallel processing thermal profiling Compute clusters are consuming more power at higher densities than ever before. This results in increased thermal dissipation, the need for powerful cooling systems, and ultimately a reduction in system reliability as temperatures increase. Over the past several years, the research community has reacted to this problem by producing software tools such as HotSpot and Mercury to estimate system thermal characteristics and validate thermal-management techniques. While these tools are flexible and useful, they suffer several limitations: for the average user such simulation tools can be cumbersome to use, these tools may take significant time and expertise to port to different systems. Further, such tools produce significant detail and accuracy at the expense of execution time enough to prohibit iterative testing. We propose a fast, easy to use, accurate, portable, software framework called Tempest (for temperature estimator) that leverages emergent thermal sensors to enable user profiling, evaluating, and reducing the thermal characteristics of systems and applications. In this thesis, we illustrate the use of Tempest to analyze the thermal effects of various parallel benchmarks in clusters. We also show how users can analyze the effects of thermal optimizations on cluster applications. Dynamic Voltage and Frequency Scaling (DVFS) reduces the power consumption of high-performance clusters by reducing processor voltage during periods of low utilization. We designed Tempest to measure the runtime effects of processor frequency on thermals. Our experiments indicate HPC workload characteristics greatly impact the effects of DVFS on temperature. We propose a thermal-aware DVFS scheduling approach that proactively controls processor voltage across a cluster by evaluating and predicting trends in processor temperature. We identify approaches that can maintain temperature thresholds and reduce temperature with minimal impact on performance. Our results indicate that proactive, temperature-aware scheduling of DVFS can reduce cluster-wide processor thermals by more than 10 degrees Celsius, the threshold for improving electronic reliability by 50%. Master of Science 2014-03-14T20:38:29Z 2014-03-14T20:38:29Z 2007-05-18 2007-05-24 2007-06-08 2007-06-08 Thesis etd-05242007-220451 http://hdl.handle.net/10919/33198 http://scholar.lib.vt.edu/theses/available/etd-05242007-220451/ Thesis_new.pdf In Copyright http://rightsstatements.org/vocab/InC/1.0/ application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic parallel processing
thermal profiling
spellingShingle parallel processing
thermal profiling
Pyla, Hari Krishna
Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
description Compute clusters are consuming more power at higher densities than ever before. This results in increased thermal dissipation, the need for powerful cooling systems, and ultimately a reduction in system reliability as temperatures increase. Over the past several years, the research community has reacted to this problem by producing software tools such as HotSpot and Mercury to estimate system thermal characteristics and validate thermal-management techniques. While these tools are flexible and useful, they suffer several limitations: for the average user such simulation tools can be cumbersome to use, these tools may take significant time and expertise to port to different systems. Further, such tools produce significant detail and accuracy at the expense of execution time enough to prohibit iterative testing. We propose a fast, easy to use, accurate, portable, software framework called Tempest (for temperature estimator) that leverages emergent thermal sensors to enable user profiling, evaluating, and reducing the thermal characteristics of systems and applications. In this thesis, we illustrate the use of Tempest to analyze the thermal effects of various parallel benchmarks in clusters. We also show how users can analyze the effects of thermal optimizations on cluster applications. Dynamic Voltage and Frequency Scaling (DVFS) reduces the power consumption of high-performance clusters by reducing processor voltage during periods of low utilization. We designed Tempest to measure the runtime effects of processor frequency on thermals. Our experiments indicate HPC workload characteristics greatly impact the effects of DVFS on temperature. We propose a thermal-aware DVFS scheduling approach that proactively controls processor voltage across a cluster by evaluating and predicting trends in processor temperature. We identify approaches that can maintain temperature thresholds and reduce temperature with minimal impact on performance. Our results indicate that proactive, temperature-aware scheduling of DVFS can reduce cluster-wide processor thermals by more than 10 degrees Celsius, the threshold for improving electronic reliability by 50%. === Master of Science
author2 Computer Science
author_facet Computer Science
Pyla, Hari Krishna
author Pyla, Hari Krishna
author_sort Pyla, Hari Krishna
title Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
title_short Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
title_full Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
title_fullStr Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
title_full_unstemmed Tempest: A Framework for High Performance Thermal-Aware Distributed Computing
title_sort tempest: a framework for high performance thermal-aware distributed computing
publisher Virginia Tech
publishDate 2014
url http://hdl.handle.net/10919/33198
http://scholar.lib.vt.edu/theses/available/etd-05242007-220451/
work_keys_str_mv AT pylaharikrishna tempestaframeworkforhighperformancethermalawaredistributedcomputing
_version_ 1719342695833403392