A Fault-Tolerance Framework for Jaca-Based Distributed Computing System

碩士 === 大同大學 === 資訊工程研究所 === 89 === With advances in widespread networking, public WWW environment, and platform-independent Java bytecode, millions of Java-capable computers can be connected for sharing computing ability now. These heterogeneous supercomputers, workstations, personal comp...

Full description

Bibliographic Details
Main Authors: Chih-Lan Yang, 楊志郎
Other Authors: Liang-Teh Lee
Format: Others
Language:en_US
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/45561815753397525942
Description
Summary:碩士 === 大同大學 === 資訊工程研究所 === 89 === With advances in widespread networking, public WWW environment, and platform-independent Java bytecode, millions of Java-capable computers can be connected for sharing computing ability now. These heterogeneous supercomputers, workstations, personal computers, and laptops, can be merged as a pool of distributed Java virtual machines and exploit their large number of computing cycles for CPU-intensive applications. In order to provide a robust distributed environment, a Fault-Tolerance Framework for Java-Based Distributed Computing System (FJDCS) has been proposed in this thesis. The most important advantage of our system is providing an enhanced and configurable fault-tolerance mechanism to all of legacy Java applications. In the very unreliable networking environment like public computing pool, the RMI mechanism still lacks a robust fault-tolerance mechanism to ensure that every computation can be completed in an iteration. We extended the RMI API and combined the replication mechanism that can be categorized to active replication mechanism to build our FJDCS API. Programmers can just extend our API directly and do not need to modify their legacy applications to get our robust fault-tolerance mechanism. In most cases, an application is completed by many cooperated tasks. In the proposed system, we replicate every task by two or more instances and dispatch them to the different computing nodes concurrently. When one of the computing nodes that process the instances of the same task has completed its operation, this task is completed. In the very unreliable network, we can configure the number of clones for one task to ensure that at least one computing node can complete this task.