Qualitative Performance Analysis for Large-Scale Scientific Workflows

<p>Today, large-scale scientific applications are both data driven and distributed. To support the scale and inherent distribution of these applications, significant heterogeneous and geographically distributed resources are required over long periods of time to ensure adequate performance. F...

Full description

Bibliographic Details
Main Author: Buneci, Emma
Other Authors: Reed, Daniel A
Format: Others
Language:en_US
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10161/695
id ndltd-DUKE-oai-dukespace.lib.duke.edu-10161-695
record_format oai_dc
spelling ndltd-DUKE-oai-dukespace.lib.duke.edu-10161-6952013-01-07T20:07:07ZQualitative Performance Analysis for Large-Scale Scientific WorkflowsBuneci, EmmaComputer ScienceComputer Sciencefeature selectiontime series analysisperformance analysissignaturesscientific applications<p>Today, large-scale scientific applications are both data driven and distributed. To support the scale and inherent distribution of these applications, significant heterogeneous and geographically distributed resources are required over long periods of time to ensure adequate performance. Furthermore, the behavior of these applications depends on a large number of factors related to the application, the system software, the underlying hardware, and other running applications, as well as potential interactions among these factors.</p> <p>Most Grid application users are primarily concerned with obtaining the result of the application as fast as possible, without worrying about the details involved in monitoring and understanding factors affecting application performance. In this work, we aim to provide the application users with a simple and intuitive performance evaluation mechanism during the execution time of their long-running Grid applications or workflows. Our performance evaluation mechanism provides a qualitative and periodic assessment of the application's behavior by informing the user whether the application's performance is expected or unexpected. Furthermore, it can help improve overall application performance by informing and guiding fault-tolerance services when the application exhibits persistent unexpected performance behaviors.</p> <p>This thesis addresses the hypotheses that in order to qualitatively assess application behavioral states in long-running scientific Grid applications: (1) it is necessary to extract temporal information in performance time series data, and that (2) it is sufficient to extract variance and pattern as specific examples of temporal information. Evidence supporting these hypotheses can lead to the ability to qualitatively assess the overall behavior of the application and, if needed, to offer a most likely diagnostic of the underlying problem.</p> <p>To test the stated hypotheses, we develop and evaluate a general <em> qualitative performance analysis</em> framework that incorporates (a) techniques from time series analysis and machine learning to extract and learn from data, structural and temporal features associated with application performance in order to reach a qualitative interpretation of the application's behavior, and (b) mechanisms and policies to reason over time and across the distributed resource space about the behavior of the application. </p> <p>Experiments with two scientific applications from meteorology and astronomy comparing signatures generated from instantaneous values of performance data versus those generated from temporal characteristics support the former hypothesis that temporal information is necessary to extract from performance time series data to be able to accurately interpret the behavior of these applications. Furthermore, temporal signatures incorporating variance and pattern information generated for these applications reveal signatures that have distinct characteristics during well-performing versus poor-performing executions. This leads to the framework's accurate classification of instances of similar behaviors, which represents supporting evidence for the latter hypothesis. The proposed framework's ability to generate a qualitative assessment of performance behavior for scientific applications using temporal information present in performance time series data represents a step towards simplifying and improving the quality of service for Grid applications.</p>DissertationReed, Daniel A2008-05-30Dissertation4788576 bytesapplication/pdfhttp://hdl.handle.net/10161/695en_US
collection NDLTD
language en_US
format Others
sources NDLTD
topic Computer Science
Computer Science
feature selection
time series analysis
performance analysis
signatures
scientific applications
spellingShingle Computer Science
Computer Science
feature selection
time series analysis
performance analysis
signatures
scientific applications
Buneci, Emma
Qualitative Performance Analysis for Large-Scale Scientific Workflows
description <p>Today, large-scale scientific applications are both data driven and distributed. To support the scale and inherent distribution of these applications, significant heterogeneous and geographically distributed resources are required over long periods of time to ensure adequate performance. Furthermore, the behavior of these applications depends on a large number of factors related to the application, the system software, the underlying hardware, and other running applications, as well as potential interactions among these factors.</p> <p>Most Grid application users are primarily concerned with obtaining the result of the application as fast as possible, without worrying about the details involved in monitoring and understanding factors affecting application performance. In this work, we aim to provide the application users with a simple and intuitive performance evaluation mechanism during the execution time of their long-running Grid applications or workflows. Our performance evaluation mechanism provides a qualitative and periodic assessment of the application's behavior by informing the user whether the application's performance is expected or unexpected. Furthermore, it can help improve overall application performance by informing and guiding fault-tolerance services when the application exhibits persistent unexpected performance behaviors.</p> <p>This thesis addresses the hypotheses that in order to qualitatively assess application behavioral states in long-running scientific Grid applications: (1) it is necessary to extract temporal information in performance time series data, and that (2) it is sufficient to extract variance and pattern as specific examples of temporal information. Evidence supporting these hypotheses can lead to the ability to qualitatively assess the overall behavior of the application and, if needed, to offer a most likely diagnostic of the underlying problem.</p> <p>To test the stated hypotheses, we develop and evaluate a general <em> qualitative performance analysis</em> framework that incorporates (a) techniques from time series analysis and machine learning to extract and learn from data, structural and temporal features associated with application performance in order to reach a qualitative interpretation of the application's behavior, and (b) mechanisms and policies to reason over time and across the distributed resource space about the behavior of the application. </p> <p>Experiments with two scientific applications from meteorology and astronomy comparing signatures generated from instantaneous values of performance data versus those generated from temporal characteristics support the former hypothesis that temporal information is necessary to extract from performance time series data to be able to accurately interpret the behavior of these applications. Furthermore, temporal signatures incorporating variance and pattern information generated for these applications reveal signatures that have distinct characteristics during well-performing versus poor-performing executions. This leads to the framework's accurate classification of instances of similar behaviors, which represents supporting evidence for the latter hypothesis. The proposed framework's ability to generate a qualitative assessment of performance behavior for scientific applications using temporal information present in performance time series data represents a step towards simplifying and improving the quality of service for Grid applications.</p> === Dissertation
author2 Reed, Daniel A
author_facet Reed, Daniel A
Buneci, Emma
author Buneci, Emma
author_sort Buneci, Emma
title Qualitative Performance Analysis for Large-Scale Scientific Workflows
title_short Qualitative Performance Analysis for Large-Scale Scientific Workflows
title_full Qualitative Performance Analysis for Large-Scale Scientific Workflows
title_fullStr Qualitative Performance Analysis for Large-Scale Scientific Workflows
title_full_unstemmed Qualitative Performance Analysis for Large-Scale Scientific Workflows
title_sort qualitative performance analysis for large-scale scientific workflows
publishDate 2008
url http://hdl.handle.net/10161/695
work_keys_str_mv AT buneciemma qualitativeperformanceanalysisforlargescalescientificworkflows
_version_ 1716473372421914624