The Design of Fault Tolerance of Cluster Computing Platform

碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculate...

Full description

Bibliographic Details
Main Authors:	Yu-tien Liao, 廖榆恬
Other Authors:	Chun-Hung Lin
Format:	Others
Language:	zh-TW
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/62638831726857133954

id	ndltd-TW-100NSYS5392057
record_format	oai_dc
spelling	ndltd-TW-100NSYS53920572015-10-13T21:22:19Z http://ndltd.ncl.edu.tw/handle/62638831726857133954 The Design of Fault Tolerance of Cluster Computing Platform 叢集計算之容錯設計 Yu-tien Liao 廖榆恬碩士國立中山大學資訊工程學系研究所 100 If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler’s decision of jobs’ order. When fault occurs, calculating in normal nodes’ results will back to control node, and then the fault node’s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad. Chun-Hung Lin 林俊宏 2012 學位論文 ; thesis 70 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler’s decision of jobs’ order. When fault occurs, calculating in normal nodes’ results will back to control node, and then the fault node’s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad.
author2	Chun-Hung Lin
author_facet	Chun-Hung Lin Yu-tien Liao 廖榆恬
author	Yu-tien Liao 廖榆恬
spellingShingle	Yu-tien Liao 廖榆恬 The Design of Fault Tolerance of Cluster Computing Platform
author_sort	Yu-tien Liao
title	The Design of Fault Tolerance of Cluster Computing Platform
title_short	The Design of Fault Tolerance of Cluster Computing Platform
title_full	The Design of Fault Tolerance of Cluster Computing Platform
title_fullStr	The Design of Fault Tolerance of Cluster Computing Platform
title_full_unstemmed	The Design of Fault Tolerance of Cluster Computing Platform
title_sort	design of fault tolerance of cluster computing platform
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/62638831726857133954
work_keys_str_mv	AT yutienliao thedesignoffaulttoleranceofclustercomputingplatform AT liàoyútián thedesignoffaulttoleranceofclustercomputingplatform AT yutienliao cóngjíjìsuànzhīróngcuòshèjì AT liàoyútián cóngjíjìsuànzhīróngcuòshèjì AT yutienliao designoffaulttoleranceofclustercomputingplatform AT liàoyútián designoffaulttoleranceofclustercomputingplatform
_version_	1718060513240285184

The Design of Fault Tolerance of Cluster Computing Platform

Similar Items