Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection

This dissertation develops a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modeling. These analytical techniques have several limitations which can restrict their applicability. Simula...

Full description

Bibliographic Details
Main Author: Clark, Jeffrey Alan
Language:ENG
Published: ScholarWorks@UMass Amherst 1993
Subjects:
Online Access:https://scholarworks.umass.edu/dissertations/AAI9408266
id ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-2701
record_format oai_dc
spelling ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-27012020-12-02T14:28:26Z Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection Clark, Jeffrey Alan This dissertation develops a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modeling. These analytical techniques have several limitations which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modeling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach have been derived and integrated into a generalized software testbed called the REliable Architecture Characterization Tool (REACT). The effectiveness of REACT is demonstrated through the analysis of several alternative fault-tolerant multiprocessor architectures. Specifically, two dependability tradeoffs associated with triple-modular redundant (TMR) systems are investigated. The first explores the reliability-performance tradeoff made by voting unidirectionally, instead of bidirectionally, on either memory read or write accesses. The second examines the reliability-cost tradeoff made by duplicating, rather than triplicating, memory modules and comparing their outputs via error detecting codes. Both studies show that in many cases, acceptably little reliability is sacrificed for potentially large performance increases or cost reductions, in comparison to the original TMR system design. 1993-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI9408266 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Electrical engineering|Computer science
collection NDLTD
language ENG
sources NDLTD
topic Electrical engineering|Computer science
spellingShingle Electrical engineering|Computer science
Clark, Jeffrey Alan
Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
description This dissertation develops a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modeling. These analytical techniques have several limitations which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modeling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach have been derived and integrated into a generalized software testbed called the REliable Architecture Characterization Tool (REACT). The effectiveness of REACT is demonstrated through the analysis of several alternative fault-tolerant multiprocessor architectures. Specifically, two dependability tradeoffs associated with triple-modular redundant (TMR) systems are investigated. The first explores the reliability-performance tradeoff made by voting unidirectionally, instead of bidirectionally, on either memory read or write accesses. The second examines the reliability-cost tradeoff made by duplicating, rather than triplicating, memory modules and comparing their outputs via error detecting codes. Both studies show that in many cases, acceptably little reliability is sacrificed for potentially large performance increases or cost reductions, in comparison to the original TMR system design.
author Clark, Jeffrey Alan
author_facet Clark, Jeffrey Alan
author_sort Clark, Jeffrey Alan
title Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
title_short Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
title_full Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
title_fullStr Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
title_full_unstemmed Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
title_sort dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection
publisher ScholarWorks@UMass Amherst
publishDate 1993
url https://scholarworks.umass.edu/dissertations/AAI9408266
work_keys_str_mv AT clarkjeffreyalan dependabilityanalysisoffaulttolerantmultiprocessorarchitecturesthroughsimulatedfaultinjection
_version_ 1719363686689144832