A distributed snapshot protocol for virtual machines

The distributed snapshot protocol is a critical technology in the areas of disaster recovery and computer security of distributed systems, and there have appeared a huge number of projects working on this topic since the 1970's. Recently, with the popularity of parallel computing and disaste...

Full description

Bibliographic Details
Main Author: Peng, Gang
Language:English
Published: University of British Columbia 2011
Online Access:http://hdl.handle.net/2429/32049
Description
Summary:The distributed snapshot protocol is a critical technology in the areas of disaster recovery and computer security of distributed systems, and there have appeared a huge number of projects working on this topic since the 1970's. Recently, with the popularity of parallel computing and disaster recovery, this topic has received more and more attention from both academic and industrial researchers. However, all the existing protocols have several common disadvantages. First, existing protocols all require several modifications to the target processes or their OS, which is usually error prone and sometimes impractical. Second, all the existing protocols are only aiming at taking snapshots of processes, not whole entire OS images, which constrains the areas to which they can be applied. This thesis introduces the design and implementation of our hypervisor level, coordinated non-blocking distributed snapshot protocol. Superior to all the existing protocols, it provides a simpler and totally transparent snapshot platform to both the target processes and their OS images. Based on several observations of the target environment, we simplify our protocol by intentionally ignoring the channel states, and to hide our protocol from the target processes and their OS, we, on one hand, exploit VM technology to silently insert our protocol under the target OS, and on the other hand, design and implement two kernel modules and a management daemon system in the control domain. We test our protocol with several popular benchmarks and all the experimental results prove the correctness and the efficiency of our protocol. === Science, Faculty of === Computer Science, Department of === Graduate