Summary: | 碩士 === 國立臺灣科技大學 === 電子工程系 === 92 === Abstract
In a lot of scientific fields, including physics, aeronautics and astronautics, atmosphere and image processing, and particularly in one of most popular topics — biotechnology, they often need a lot of computation power. For the moment, many researchers employ PVM in a parallel computing environment to satisfy their requirements of computation power. However, those application programs need a long time to finish their jobs in distributed environments. If they encounter an accident like power failure, then the result will be reduced to ashes. We can employ the checkpointing skill to avoid this situation.
The main purpose of checkpointing is to provide rollback and recovery (also called fault-tolerance). It records all the status of the executing program at some instants, called checkpoints. After the computer reboots, the program can be restarted from the last checkpoint. In addition, it also can be utilized in the task migrations. In the past, checkpoints are stored in the disk or other hosts in the network. In these two methods, ether is the disk access speed too slow to achieve high performance, or the cost is expensive.
We employ the SRAM card and FLASH memory whose capacity and speed are significantly improved recently. SRAM card and FLASH memory are formed the two-level storage structure to achieve I/O and computation overlapping, resulting in improving the system performance. Our scheme possesses the following advantages: (1) Consistent checkpoint is guaranteed, resulting in no problem of domino effect. (2) It only needs to maintain two checkpoints. (3) Its implementation is easy and fast. The implementation results show that our method is effective for checkpointing.
|