Using Multi-Task Queues to Improve Data Locality in Hadoop

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slav...

Full description

Bibliographic Details
Main Authors:	Jhong-YiChen, 陳仲毅
Other Authors:	Sun-Yuan Hsieh
Format:	Others
Language:	en_US
Published:	2013
Online Access:	http://ndltd.ncl.edu.tw/handle/60951468672167042635

id	ndltd-TW-101NCKU5392044
record_format	oai_dc
spelling	ndltd-TW-101NCKU53920442015-10-13T22:51:34Z http://ndltd.ncl.edu.tw/handle/60951468672167042635 Using Multi-Task Queues to Improve Data Locality in Hadoop 使用多重任務佇列以改善Hadoop中之資料地域性 Jhong-YiChen 陳仲毅碩士國立成功大學資訊工程學系碩博士班 101 Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance. Sun-Yuan Hsieh 謝孫源 2013 學位論文 ; thesis 50 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance.
author2	Sun-Yuan Hsieh
author_facet	Sun-Yuan Hsieh Jhong-YiChen 陳仲毅
author	Jhong-YiChen 陳仲毅
spellingShingle	Jhong-YiChen 陳仲毅 Using Multi-Task Queues to Improve Data Locality in Hadoop
author_sort	Jhong-YiChen
title	Using Multi-Task Queues to Improve Data Locality in Hadoop
title_short	Using Multi-Task Queues to Improve Data Locality in Hadoop
title_full	Using Multi-Task Queues to Improve Data Locality in Hadoop
title_fullStr	Using Multi-Task Queues to Improve Data Locality in Hadoop
title_full_unstemmed	Using Multi-Task Queues to Improve Data Locality in Hadoop
title_sort	using multi-task queues to improve data locality in hadoop
publishDate	2013
url	http://ndltd.ncl.edu.tw/handle/60951468672167042635
work_keys_str_mv	AT jhongyichen usingmultitaskqueuestoimprovedatalocalityinhadoop AT chénzhòngyì usingmultitaskqueuestoimprovedatalocalityinhadoop AT jhongyichen shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng AT chénzhòngyì shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng
_version_	1718080738924953600

Using Multi-Task Queues to Improve Data Locality in Hadoop

Similar Items