Constructing fail-controlled nodes for distributed systems : a software approach

Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are J...

Full description

Bibliographic Details
Main Author:	Brasileiro, Francisco Vilar
Published:	University of Newcastle Upon Tyne 1995
Subjects:	005 Fault tolerance
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827

id	ndltd-bl.uk-oai-ethos.bl.uk-260827
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-2608272015-03-19T03:42:30ZConstructing fail-controlled nodes for distributed systems : a software approachBrasileiro, Francisco Vilar1995Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.005Fault toleranceUniversity of Newcastle Upon Tynehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827http://hdl.handle.net/10443/1971Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	005 Fault tolerance
spellingShingle	005 Fault tolerance Brasileiro, Francisco Vilar Constructing fail-controlled nodes for distributed systems : a software approach
description	Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.
author	Brasileiro, Francisco Vilar
author_facet	Brasileiro, Francisco Vilar
author_sort	Brasileiro, Francisco Vilar
title	Constructing fail-controlled nodes for distributed systems : a software approach
title_short	Constructing fail-controlled nodes for distributed systems : a software approach
title_full	Constructing fail-controlled nodes for distributed systems : a software approach
title_fullStr	Constructing fail-controlled nodes for distributed systems : a software approach
title_full_unstemmed	Constructing fail-controlled nodes for distributed systems : a software approach
title_sort	constructing fail-controlled nodes for distributed systems : a software approach
publisher	University of Newcastle Upon Tyne
publishDate	1995
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260827
work_keys_str_mv	AT brasileirofranciscovilar constructingfailcontrollednodesfordistributedsystemsasoftwareapproach
_version_	1716733915875508224

Constructing fail-controlled nodes for distributed systems : a software approach

Similar Items