Transparent checkpointing over RDMA-based networks
Fault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart solution has been available for applications running over...
Published: |
|
---|---|
Online Access: | http://hdl.handle.net/2047/D20290392 |