Compiler-assisted staggered checkpointing
To make progress in the face of failures, long-running parallel applications need to save their state, known as a checkpoint. Unfortunately, current checkpointing techniques are becoming untenable on large-scale supercomputers. Many applications checkpoint all processes simultaneously--a technique t...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/2152/ETD-UT-2010-08-1746 |