The Design of Fault Tolerance of Cluster Computing Platform
碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculate...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/62638831726857133954 |
id |
ndltd-TW-100NSYS5392057 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NSYS53920572015-10-13T21:22:19Z http://ndltd.ncl.edu.tw/handle/62638831726857133954 The Design of Fault Tolerance of Cluster Computing Platform 叢集計算之容錯設計 Yu-tien Liao 廖榆恬 碩士 國立中山大學 資訊工程學系研究所 100 If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler’s decision of jobs’ order. When fault occurs, calculating in normal nodes’ results will back to control node, and then the fault node’s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad. Chun-Hung Lin 林俊宏 2012 學位論文 ; thesis 70 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler’s decision of jobs’ order. When fault occurs, calculating in normal nodes’ results will back to control node, and then the fault node’s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad.
|
author2 |
Chun-Hung Lin |
author_facet |
Chun-Hung Lin Yu-tien Liao 廖榆恬 |
author |
Yu-tien Liao 廖榆恬 |
spellingShingle |
Yu-tien Liao 廖榆恬 The Design of Fault Tolerance of Cluster Computing Platform |
author_sort |
Yu-tien Liao |
title |
The Design of Fault Tolerance of Cluster Computing Platform |
title_short |
The Design of Fault Tolerance of Cluster Computing Platform |
title_full |
The Design of Fault Tolerance of Cluster Computing Platform |
title_fullStr |
The Design of Fault Tolerance of Cluster Computing Platform |
title_full_unstemmed |
The Design of Fault Tolerance of Cluster Computing Platform |
title_sort |
design of fault tolerance of cluster computing platform |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/62638831726857133954 |
work_keys_str_mv |
AT yutienliao thedesignoffaulttoleranceofclustercomputingplatform AT liàoyútián thedesignoffaulttoleranceofclustercomputingplatform AT yutienliao cóngjíjìsuànzhīróngcuòshèjì AT liàoyútián cóngjíjìsuànzhīróngcuòshèjì AT yutienliao designoffaulttoleranceofclustercomputingplatform AT liàoyútián designoffaulttoleranceofclustercomputingplatform |
_version_ |
1718060513240285184 |