Constructing fail-controlled nodes for distributed systems : a software approach

Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are J...

Full description

Bibliographic Details
Main Author: Brasileiro, Francisco Vilar
Published: University of Newcastle Upon Tyne 1995
Subjects:
005
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827
id ndltd-bl.uk-oai-ethos.bl.uk-260827
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-2608272015-03-19T03:42:30ZConstructing fail-controlled nodes for distributed systems : a software approachBrasileiro, Francisco Vilar1995Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.005Fault toleranceUniversity of Newcastle Upon Tynehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827http://hdl.handle.net/10443/1971Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 005
Fault tolerance
spellingShingle 005
Fault tolerance
Brasileiro, Francisco Vilar
Constructing fail-controlled nodes for distributed systems : a software approach
description Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.
author Brasileiro, Francisco Vilar
author_facet Brasileiro, Francisco Vilar
author_sort Brasileiro, Francisco Vilar
title Constructing fail-controlled nodes for distributed systems : a software approach
title_short Constructing fail-controlled nodes for distributed systems : a software approach
title_full Constructing fail-controlled nodes for distributed systems : a software approach
title_fullStr Constructing fail-controlled nodes for distributed systems : a software approach
title_full_unstemmed Constructing fail-controlled nodes for distributed systems : a software approach
title_sort constructing fail-controlled nodes for distributed systems : a software approach
publisher University of Newcastle Upon Tyne
publishDate 1995
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827
work_keys_str_mv AT brasileirofranciscovilar constructingfailcontrollednodesfordistributedsystemsasoftwareapproach
_version_ 1716733915875508224