Transparent checkpointing over RDMA-based networks

Fault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart solution has been available for applications running over...

Full description

Bibliographic Details
Published:
Online Access:http://hdl.handle.net/2047/D20290392