id |
ndltd-NEU--neu-cj82rk49b
|
record_format |
oai_dc
|
spelling |
ndltd-NEU--neu-cj82rk49b2021-04-13T05:14:15ZTransparent checkpointing over RDMA-based networksFault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart solution has been available for applications running over RDMA-based networks. RDMA-based networks are the primary network used in high-performance computing, and many researchers believe that RDMA networks will be widely deployed in the Cloud as the costs decrease. Existing approaches often rely on a solution that is specific to the particular MPI implementation or other parallel model in order to disconnect the network at checkpoint time, and to reconnect the network at restart time. Such schemes are difficult to incorporate for new parallel programming models, and also imply higher checkpoint overhead.http://hdl.handle.net/2047/D20290419
|
collection |
NDLTD
|
sources |
NDLTD
|
description |
Fault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart solution has been available for applications running over RDMA-based networks. RDMA-based networks are the primary network used in high-performance computing, and many researchers believe that RDMA networks will be widely deployed in the Cloud as the costs decrease. Existing approaches often rely on a solution that is specific to the particular MPI implementation or other parallel model in order to disconnect the network at checkpoint time, and to reconnect the network at restart time. Such schemes are difficult to incorporate for new parallel programming models, and also imply higher checkpoint overhead.
|
title |
Transparent checkpointing over RDMA-based networks
|
spellingShingle |
Transparent checkpointing over RDMA-based networks
|
title_short |
Transparent checkpointing over RDMA-based networks
|
title_full |
Transparent checkpointing over RDMA-based networks
|
title_fullStr |
Transparent checkpointing over RDMA-based networks
|
title_full_unstemmed |
Transparent checkpointing over RDMA-based networks
|
title_sort |
transparent checkpointing over rdma-based networks
|
publishDate |
|
url |
http://hdl.handle.net/2047/D20290419
|
_version_ |
1719395779211165696
|