Software Techniques For Dependable Execution

abstract: Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number of safety-critical applications – application w...

Full description

Bibliographic Details
Other Authors: Didehban, Moslem (Author)
Format: Doctoral Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.51604
id ndltd-asu.edu-item-51604
record_format oai_dc
spelling ndltd-asu.edu-item-516042019-02-02T03:01:06Z Software Techniques For Dependable Execution abstract: Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability. Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the efficacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control flow operations, nZDC checks the results of such operations. nZDC en sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control flow checking mechanism which replicates compare and branch instructions and detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the effect of error from the program state and resumes program normal execution from the error detection point. Dissertation/Thesis Didehban, Moslem (Author) Shrivastava, Aviral (Advisor) Wu, Carole-Jean (Committee member) Clark, Lawrence (Committee member) Mahlke, Scott (Committee member) Arizona State University (Publisher) Computer engineering Computer science Compiler transfromation Instruction Duplication Redundancy Reliability Silent Data Corruption Soft Error eng 129 pages Doctoral Dissertation Computer Engineering 2018 Doctoral Dissertation http://hdl.handle.net/2286/R.I.51604 http://rightsstatements.org/vocab/InC/1.0/ 2018
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic Computer engineering
Computer science
Compiler transfromation
Instruction Duplication
Redundancy
Reliability
Silent Data Corruption
Soft Error
spellingShingle Computer engineering
Computer science
Compiler transfromation
Instruction Duplication
Redundancy
Reliability
Silent Data Corruption
Soft Error
Software Techniques For Dependable Execution
description abstract: Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability. Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the efficacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control flow operations, nZDC checks the results of such operations. nZDC en sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control flow checking mechanism which replicates compare and branch instructions and detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the effect of error from the program state and resumes program normal execution from the error detection point. === Dissertation/Thesis === Doctoral Dissertation Computer Engineering 2018
author2 Didehban, Moslem (Author)
author_facet Didehban, Moslem (Author)
title Software Techniques For Dependable Execution
title_short Software Techniques For Dependable Execution
title_full Software Techniques For Dependable Execution
title_fullStr Software Techniques For Dependable Execution
title_full_unstemmed Software Techniques For Dependable Execution
title_sort software techniques for dependable execution
publishDate 2018
url http://hdl.handle.net/2286/R.I.51604
_version_ 1718970016902152192